Starbucks Corp.’s glitchy mobile-ordering app experienced gremlins again this week — creating an IT nightmare for two days, angering customers, and calling into question how the company can avoid further disruptions during the critical holiday season.
Problems with the app, which initially malfunctioned Thursday when the coffee giant debuted its popular holiday menu, continued into Friday, triggering thousands of complaints nationwide.
Downdetector, which tracks app and website outages, received more than 2,500 customer complaints with the app Friday after collecting more than 3,600 on Thursday. The popular pre-ordering app — which has experienced problems in late July 2024, and a few times in 2022 — forced latte and tea drinkers to wait in line or bypass orders altogether.
Starbucks’ issues may have been exacerbated by a surge in traffic on the first day of its holiday menu and increased searches for Red Cup Day 2024. Both events are wildly popular within the Starbucks Universe. This year’s holiday menu features the additions of limited-time favorite drinks such as Peppermint Mocha and Caramel Brulée Latte, as well as the debut of new drinks like Cran-Merry Orange Refresher and Cran-Merry Orange Lemonade Refresher.
Red Cup Day marks the annual introduction of Starbucks offering a free reusable red cup for the holiday season.
The economic hit, albeit temporary, is sure to sting: An increasingly significant chunk of Starbucks orders are via the mobile app, as more customers bypass waiting in line either in-store and via drive-through. A series of outages might not just cost Starbucks sales in the short term but drive some of its customers to competitors with more dependable pre-order apps, according to tech marketing experts.
What is most troubling for Starbucks and other retailers is the problem has deepened over the past few years as more companies lean on software developers.
“Engineering orgs are just not well managed. The move to DevOps, which places more responsibility on developers to write code and manage operations, is leading to a decline in quality assurance and more failed rollouts like this,” Eric Wegner-Tamo, founder and CEO at Velos Mobil, a professional services company that builds mobile apps for small- to large-sized businesses, said in an interview. “The C-suites of these companies are very demanding while doing heavy layoffs that create structural issues. C-suite puts a lot of pressure on DevOps teams. It ends up eroding some of the checks and balances built into these systems.”
Spencer Kimball, co-founder and CEO of Cockroach Labs said the outage “once again starkly illustrates a reality that many business leaders acknowledge but few address — traditional disaster recovery strategies are insufficient in our always-on retail environment.” He added that research indicates 95% of executives recognize operational vulnerabilities, but nearly half admit their organizations fall short in enhancing resilience.
Concurrently, the rise of smartphones and blur-fast WiFi networks has created a situation in which companies are increasingly defined and dependent on their web apps. Throw in miscommunication between the marketing and engineering teams, and it can lead to website crashes and lost revenue.
“Your customers can’t drink their coffee with (an app), but they can do a whole lot else that is relevant to you — most importantly, interact with you,” Guy Currier, an analyst at The Futurum Group, said in an email. “Dev and operations teams work hard and thoughtfully to maintain app availability. But the systems are so complex and interdependent in production that even reasonably rigorous testing and controls won’t always uncover critical stresses or dependencies after deployment at scale. Even a mere data update — which theoretically at least, is all Starbucks’ holiday menu should be — can introduce a catastrophic problem.”
IT professionals say there are several lessons, many of them cautionary, that can be learned to avoid a mobile app meltdown.
Nora Jones, senior director of product management at PagerDuty Inc., a digital operations management platform, offered several potential solutions. She said setting up additional on-call structures with extra team members or secondary rotations will help handle increased demand and the increased challenges of high-traffic days. Reviewing past on-call setups and assessing recent changes in business or traffic projections can also help identify potential adjustments. Running “pre-mortem” exercises to pinpoint potential failure points enables companies to set up fallbacks that prevent revenue loss — for instance, having a backup page ready if a promotional page fails to load can help keep users engaged. Additionally, organizing team game days to test systems, setting up dedicated communication channels, and implementing notifications (like mobile and push alerts) can ensure teams are prepared to respond quickly and effectively to any issues.
Many retailers now institute what is called a “holiday change freeze” that restricts or denies changes to their IT systems and applications. Those policies often start in early November, ahead of the heavy holiday season where business picks up significantly, to avoid any missteps.
With its app down, all Starbucks could do was provide updates on its customer service account earlier Friday. The company said it was “currently experiencing a temporary outage of the order-ahead-and-pay feature in our app. We continue to welcome and serve customers in our drive-thru and in-store.”
By mid-morning Friday, Starbucks told CBS that functionality within the app had been restored. It did not, however, indicate what went wrong. Indeed, shares of Starbucks were up slightly in trading early Friday despite the two-day outage.
The last time the Starbucks app was down for an extended time was in July, during the global Microsoft Corp.-CrowdStrike Inc. software update outage that roiled airlines, banks, media outlets, retail stores, hospitals and other industries. The resulting imbroglio has led to a series of lawsuits involving CrowdStrike and Delta Air Lines. The outage is estimated to have cost companies worldwide at least $5 billion in lost revenue.
CrowdStrike’s stock plummeted 25% in the days following the digital disruption, wiping out about $22 billion in market value.
A flawed IT strategy prompted one marketing expert to compare it to someone who opts to avoid insurance to save a few bucks. “Show me an IT outage and I’ll show you a team that decided they were too smart to need the tools available to prevent that outage,” David Nicholson, chief technology advisor for The Futurum Group, said in an email.
“Outages like this are inevitable, and all you want to do is invest enough in tools, talent, and processes to reduce the frequency of outages to an acceptable level,” Currier said. “I suppose my main advice would be to stay steady, and be brutally open and honest internally, without blowback, about root causes and the cost of remediation and prevention. This conversation happens across all the teams, business and IT/dev alike, so that everyone knows what to expect and there are no demoralizing blame games or promises of perfection.”