Wednesday, May 7, 2025

Summary

On March 25, 2025, a misconfigured .htaccess redirect rule caused an outage affecting all SiteNow v2/v3 websites. The redirect, intended to support a single newly launched campus website, inadvertently redirected traffic for all SiteNow websites to the new website and caused an infinite redirect loop for the destination website. 

 

Root Cause

The redirect rule was improperly scoped within an .htaccess file shared by all SiteNow v2/v3 websites, causing it to apply globally. This resulted from an oversight during our review process, where the potential impact on unrelated websites was not fully tested. Additionally, using a 301 (permanent) redirect caused browser-level caching without expiration for some users. 

 

Impact

All SiteNow v2/v3 websites were intermittently unavailable or redirected incorrectly for approximately 60 minutes. Some users experienced lingering issues due to cached redirects in their browsers. 

 

Resolution

The faulty redirect rule was removed, and the updated .htaccess file was deployed across the SiteNow v2/v3 infrastructure. 

Website and Varnish caches were purged for all affected websites. 

A public notice with initial findings was published on the SiteNow website, including a link to an ITS support article for users still experiencing browser-based redirect issues. This notice was promoted through the UI Web Listserv and the SiteNow Announcements Dispatch lists. 

 

Timeline of Events

4:30 p.m. – A code update including a new .htaccess rule was deployed to support a website launch. 

4:39 p.m. – A team member noticed an infinite redirect loop on the new website. 

4:45 p.m. – Monitoring tools alerted us to failures for high-profile SiteNow websites. 

4:46 p.m. – The issue was traced to the recently deployed code update, specifically that the new .htaccess redirect rule was not scoped properly. 

4:45 p.m. - Because the ITS website was down, notice of the outage was posted in the ITS Support Microsoft Teams channel. 

4:55 p.m. – A code update with removal of the redirect rule was merged into the code repository. 

5:20 p.m. – Our code build process completed and a deployment was triggered. 

5:49 p.m. – Work began to purge the Varnish cache for each website as deployments progressed. 

~6:00 p.m. – Reports from University staff confirmed that high-traffic websites had recovered for most visitors. 

 

Preventive Actions

  • Expand validation testing to include websites beyond the intended target when modifying shared files. 

  • Flag the .htaccess file and other global files in internal documentation as requiring heightened code review scrutiny. 

  • Where possible, validate new redirect rules in non-production environments before deployment. 

  • Consider using 302 (temporary) redirects initially to confirm solution before converting to 301 (permanent) redirects. 

  • Review and update the team’s outage response plan to improve communication, coordination, and recovery workflows.