With CrowdStrike Holdings Inc.’s stock in freefall after a calamitous outage that rippled across North America, Europe, Asia, Australia and Africa on Friday, wiping out billions of dollars from the global economy, IT leaders had advice — and words of caution — on how to minimize a repeat of a digital breakdown that many are calling the real Y2K.
“Following [Friday’s] outage, we expect various IT processes and applications could see lingering problems for days, creating a ‘blast radius’ unlike we’ve seen in recent years,” Chuck Herrin, field chief technology officer for API Security at F5, said.
“Outages are not a problem we’re going to completely solve. Cloud environments are only growing more complex and interconnected,” Spencer Kimball, chief executive and co-founder of database startup Cockroach Labs, said in an email message. “This complexity at scale will continue to increase risk, particularly for businesses that are still in the initial stages of cloud adoption. Continuous monitoring and alerting are essential to detect and address issues before they escalate.”
To minimize risk of future incidents, particularly in the deployment of software across the cloud, Vaibhav Malik, security solutions architect at Cloudflare, offered a laundry list of suggestions for companies and government agencies. He recommends users: Diversify critical systems with multiple vendors to avoid single points of failure; implement robust testing procedures before deployment; enhance monitoring capabilities by investing in tools that quickly detect and alert system-wide issues; develop and regularly test business-continuity plans in the event key systems fail; and strengthen vendor management through close evaluation of security practices and update procedures of critical vendors.
The advice comes as little solace for millions of people who encountered the dreaded blue screen of death that affected everything from airlines, banks and media companies to hospitals, online shopping sites and billboards.
Lee Kair, principal and head of transportation and innovation at The Chertoff Group, said the airlines’ woes – 33,000 flights were delayed worldwide – highlights the industry’s reliance on customer-facing services and the interconnected nature of its ecosystem. “There was a cataclysmic, cascading effect,” Kair said in an interview. He noted more than 7,000 flights were delayed in the U.S. in the worst day of air travel here since 9-11.
And it draws heavy scrutiny of CrowdStrike, a well-regarded company whose name recognition soared for all the wrong reasons Friday. The company is facing widespread criticism, the possibility of legal action and lost customers. Tesla Inc. CEO Elon Musk said he has stopped CrowdStrike software after the botched update.
“Trust in CrowdStrike may decrease, leading customers to question the resilience of security products,” Itzik Alvas, CEO and co-founder of Entro Security, said in an email message. “In the long run, new startups may emerge, focusing on emergency handling and resilience.”
William Blair analyst Jonathan Ho put it bluntly in a research note Friday. “We would be buyers of the stock on weakness today,” he wrote. CrowdStrike shares tumbled 11% in regular trading.
The red alert prompted a post on X in which CrowdStrike CEO George Kurtz said the outages were caused by a defect found in a single content update of its Falcon software on Microsoft Windows operating systems. CrowdStrike says Falcon is designed to protect files saved in the cloud.
Said Ouissal, CEO of Zededa, a cloud-native edge management and operations company, said the outage starkly illustrates “exactly why decentralizing IT needs to be a top priority for organizations managing critical or industrial infrastructure.”
“Currently, the only way to fix today’s issue is to manually reset each IT endpoint, which is costing airlines, healthcare orgs, and others way too much time,” Ouissal said.
While automated updates offer convenience, they can also expose organizations to vulnerabilities within their supply chain, added TensorWave co-founder Piotr Tomasik. To mitigate these risks, he said, it is essential to implement a phased deployment approach, starting with smaller groups of systems to identify and address potential issues before they escalate.
Neatsun Ziv, CEO and co-founder of OX Security, suggests agentless updates rather than automatically updating agents on endpoint servers could “help alleviate issues.”
Cloudflare’s Malik offers simple advice for real-time crisis management for such a cataclysmic event: Activate your incident response team immediately; rapidly assess the scope and impact; communicate clearly and frequently; coordinate with vendors and partners; document everything; begin recovery efforts; and conduct a thorough post-incident review.
Ultimately, Wedbush Securities analyst Dan Ives insisted in a research note that CrowdStrike “remains the gold standard and we believe this historical incident will only be a dark chapter for the company and not impact the long-term bull story for the name.”
Still, what happened to CrowdStrike and untold millions of people and companies is likely to happen again as essential services such as banking, health care and transportation increasingly rely on interconnected technologies for everyday tasks such as work, communication, education and accessing information.
“A global IT outage like this one serves as a stark reminder of how deeply intertwined our lives are with digital connectivity and the urgent need to reinforce our IT systems against such vulnerabilities,” Matt Tuson, general manager of EMEA, said in an email message.
“The scale of today’s global IT outage is unparalleled in recent history,” Catchpoint CEO Mehdi Daoudi said. “It serves as a stark reminder that our entire world is powered by digital experiences and that the internet is neither magically infallible nor inherently resilient. This is a reminder you need to manage and control change: Don’t blindly update software or change configuration.”