Or should we say "Who said lightening couldn't NOT strike twice?"
Charles Babcock, Editor at Large for InformationWeek wrote about the recent Microsoft Azure Outage today. In this article, Babcock gives us an update on what Microsoft reported as the cause of the "glitch" with Azure cloud services that caused outages on Wednesday. The cause was "apparently" related to leap-day.
Bill Laing, Microsoft's corporate VP for server and cloud, reported that "While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year...However, some sub-regions and customers are still experiencing issues and as a result of these issues they may be experiencing a loss of application functionality. We are actively working to address these remaining issues."
In reviewing Laing's statements Babcock questions which customers are affected, how are they affected, and what is the nature of the ongoing outage? Instead of touching upon any of these points in a transparent way, Laing's sharp focus has faded to fuzzy gray, with the thrice-cited "issues" serving as a substitute for saying anything concrete about the remaining problems.
Babcock reminds us of the "lightening strike" last August 7th that affected both Amazon and Microsoft facilities. Three days later they acknowledged there hadn't been a lightening strike. This incident is a reminder that the best practices of cloud computing operations are still a work in progress, not an established science. And while prevention is better than cure, infrastructure-as-a-service operators may not know everything they need to about these large-scale environments.
So, lightening does "not" strike twice. When cloud solutions are not put on highly scalable, secure mainframe platforms, you will continue to have these problems.