RIM explains BlackBerry outage
Published: 20 Apr 2007 09:39 BST
Research in Motion finally gave some details late on Thursday about what caused a severe outage of its popular BlackBerry email service that began on Tuesday evening and lasted until the early hours of Wednesday morning.
The company said in a statement that it had ruled out security and capacity issues as a cause of the outage that left millions of so called "CrackBerry" addicts without access to their email for several hours. The company also said the incident was not caused by any hardware failure or core software issue.
Ruling out those causes, the company has "determined that the incident was triggered by the introduction of a new, non-critical system routine that was designed to provide better optimisation of the system's cache." In computing terms, a cache is a temporary storage area for that allows data to be served up quickly.
RIM said the "system routine" was not expected to impact the regular operations of the BlackBerry servers and infrastructure. But despite previous testing, the new system routine produced an unexpected impact that set off a chain reaction triggering a series of interaction errors between the system's operational database and the cache.
After RIM isolated the database problem and tried unsuccessfully to fix the issue, it began its failover process to a backup system. But that also failed.
"Although the backup system and failover process had been repeatedly and successfully tested previously, the failover process did not fully perform to RIM's expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue," the company said in the statement.
RIM also said it has already identified several aspects of its testing, monitoring and recovery processes that it plans to enhance as a result of the incident.
Read this
Have you seen this cab?
Inspired by end-of-week lethargy, and despite the suspicion of it being a bit ungreen, I thought I'd have a shot this morning at catching one of those free Microsoft cabs...
Since the outage began around 5pm PDT on Tuesday, the company had been quiet about its cause. But experts said they were convinced the issue had to do with RIM's network since subscribers were still able to make phone calls and send and receive SMS text messages.
RIM's service is centralised and works by routing all BlackBerry emails through one of two main Network Operations Centers, which are essentially large data centres. One NOC is located in Canada and it primarily services the Western Hemisphere as well as parts of Asia. The other data centre, located in the UK, handles email traffic in Europe, Africa and the Middle East. Analysts had speculated that since most of the users affected by the outage were based in North America that it was likely a problem occurred in the NOC located in Waterloo, Canada.
By Wednesday morning RIM said that email had begun trickling into inboxes across North America. The service was operating normally on Thursday, the company said.
RIM has built a strong reputation as a reliable service provider attracting bankers, lawyers and even congressional lawmakers as subscribers. The company has recently been trying to broaden its appeal to consumers with new products, such as the BlackBerry Pearl and the BlackBerry 8800.
The new strategy has helped the company rapidly grow its subscribers. In the company's latest quarter, it reported it had added 1.02 million new subscribers, taking its total to eight million. This is a huge increase from the two million subscribers the company reported a year ago when it settled its patent infringement case with NTP. The company expects to add between 1.125 million and 1.15 million subscribers during the current quarter.









