Carissa Blossom

Member of Production Engineering team for Eats & Delivery @Uber

Carissa Blossom is a member of the Production Engineering team for Eats & Delivery at Uber. She is also an Incident Commander for Ring0, Uber Engineering’s primary task force for critical outage mitigation. In her four years at Uber, she has served as Production Engineer for Eats, Software Networking Edge and Marketplace.


User Simulation for Rapid Outage Mitigation

Uber operates in over 10,000 cities across the world, with different offerings and features in each market. People all over the world rely on Uber for critical aspects of their daily lives. Uber can be their source of income, their commuting strategy, or their ride to the hospital. In COVID times, our role is all the more critical with Uber Eats acting as the central source of income for an increasing number of restaurants and couriers. With so much at stake, we don’t have the luxury of waiting for aggregate production tracking metrics to notice that some discernible percentage of users are experiencing an issue -- the changes made to our 4000+ microservice architecture are rolled out by both engineering and operations teams leveraging multiple different tools at the city, zone, region and global levels at a frequency that is impossible to coordinate. In this environment, how does Uber maintain a high level of reliability and prevent outages before they are felt by users?    

In this talk, I walk through the independent, external monitoring service that Uber developed to identify issues in production at the individual city level all across the globe, and how we leveraged composable integration tests simulating thousands of diverse, test users to cut our time to mitigation in half. Attendees will also learn how Uber surfaces and predicts its most dire outages and how the combination of machine learning and tracing enables us to reliably narrow down the root cause of an outage. 


Wednesday Nov 4 / 11:40AM PST (40 minutes )

TRACK Architecting for Confidence: Building Resilient Systems ADD TO CALENDAR Add to calendar SHARE

Operational Excellence Panel

Being on call for a production system can be stressful whether it is your first time or you have been carrying a pager for years. When that alert goes off, will you be prepared? Will your system reliability mechanisms behave as intended? If not, are you able to debug and understand what’s going on? This roundtable pulls together software engineers and site reliability engineers with experience operating complex systems in production.

Topics are likely to include designing for operability, mitigation techniques, testing strategies, and lessons learned. As an audience member, you will also have the chance to ask the panel questions.


Wednesday Nov 4 / 12:30PM PST (40 minutes )

TRACK Architecting for Confidence: Building Resilient Systems ADD TO CALENDAR Add to calendar SHARE

3 weeks of live software engineering content designed around your schedule.

Don’t miss out! Save your seat now