Facebook struggles to fix a massive outage



[ad_1]

A prolonged global outage of Facebook’s apps has prompted the company’s engineers to scramble to fix the issue at one of its data centers in California, according to two people familiar with the situation.

The outage, which began around 11:40 a.m. ET on Monday, resulted in the shutdown of all Facebook apps, including Instagram and WhatsApp, around the world, affecting billions of users and millions of advertisers. Inside Facebook, the outage also shattered nearly every internal system that employees use to communicate and work. At 6 p.m. ET, it looks like most services are back on line.

Several employees said The edge they resorted to chatting through their work-provided Outlook email accounts, as Facebook mainly runs on an internal version of the social network which is currently not accessible. While employees could send email to each other, they could not send or receive emails from external addresses.

Since Facebook requires employees to sign in with their work accounts to access tools like Google Docs and Zoom, those services were also not working, which led some employees to use alternative services like FaceTime and Discord. ‘Apple. Employees who were already authenticated with non-Facebook tools like Google Docs before the outage started still had access.

Facebook engineers were sent to one of its major US data centers in California to restore service, meaning the fix couldn’t be done remotely. To further complicate matters, the blackout temporarily prevented some employees from accessing company buildings and conference rooms with their badges, according to The New York Times, who first reported that engineers had been dispatched to the data center.

In an email to employees sent shortly after the service was restored, CTO Mike Schroepfer said the issue “was affecting our backbone network which connects all of our data centers to each other.. “

“If you are not actively working on recovery, be patient and do not rush to reload everything to avoid slowing down the establishment” of the network, he warned in the memo which was seen by The edge.

Facebook did not provide a detailed explanation for the outage, although outside experts say it was due to an issue with BGP, or Border Gateway Protocol, networking technology.

On Monday evening, Facebook vice president of infrastructure Santosh Janardhan posted a corporate blog post saying the outage was the result of a “faulty configuration change,” adding that the company had failed. “no evidence that user data was compromised as a result of this downtime.”

“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted that communication,” Janardhan wrote. “This disruption in network traffic has had a cascading effect on the way our data centers communicate, shutting down our services. “

Update October 4 at 6:33 p.m. ET: Noted that the outage ends as Facebook and its other services come back online.

Update October 4 at 8:05 p.m. ET: Added more information about the outage that was shared with Facebook employees.

Update October 4 at 9:06 p.m. ET: Added confirmed report that an angle grinder was used to access server cages.

Correction Oct. 4, 9:25 p.m. ET: A previous version of this story included a confirmed report that Facebook was using an angle grinder to access server cages. The registrant has since withdrawn the confirmation and we have removed the allegation from this story.

Update October 4 at 10:29 p.m. ET: Added more details from Facebook about the outage.



[ad_2]

Source link