You are viewing content from a past/completed QCon Plus - November 2020

PANEL DISCUSSION

Operational Excellence Panel

Being on call for a production system can be stressful whether it is your first time or you have been carrying a pager for years. When that alert goes off, will you be prepared? Will your system reliability mechanisms behave as intended? If not, are you able to debug and understand what’s going on? This roundtable pulls together software engineers and site reliability engineers with experience operating complex systems in production.

Topics are likely to include designing for operability, mitigation techniques, testing strategies, and lessons learned. As an audience member, you will also have the chance to ask the panel questions.


Speaker

Carissa Blossom

Member of Production Engineering team for Eats & Delivery @Uber

Carissa Blossom is a member of the Production Engineering team for Eats & Delivery at Uber. She is also an Incident Commander for Ring0, Uber Engineering’s primary task force for critical outage mitigation. In her four years at Uber, she has served as Production Engineer for Eats,...

Read more

Speaker

Tammy Bryant Butow

Principal Site Reliability Engineer @Gremlin

Tammy Butow is the principal SRE at Gremlin, where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Gremlin helps engineers build resilient systems using their control plane and API. Tammy previously led SRE teams at Dropbox...

Read more
Find Tammy Bryant Butow at:

Speaker

Suudhan Rangarajan

Senior Software Engineer @Netflix

Suudhan Rangarajan works on the Playback API team at Netflix, responsible for ensuring that customers receive the best possible playback experience every time they click play. A few dozen playback microservices fill a key role in enabling Netflix to stream amazing content to 125M+ members on...

Read more
Find Suudhan Rangarajan at:

From the same track

Session

User Simulation for Rapid Outage Mitigation

Wednesday Nov 4 / 02:40PM EST

Uber operates in over 10,000 cities across the world, with different offerings and features in each market. People all over the world rely on Uber for critical aspects of their daily lives. Uber can be their source of income, their commuting strategy, or their ride to the hospital. In COVID...

Carissa Blossom

Member of Production Engineering team for Eats & Delivery @Uber

Session

Less Mess, Less Stress: The Reliability Benefits of Custom Tools

Wednesday Nov 4 / 01:00PM EST

Tooling is an often overlooked, yet critical, component of infrastructure for most engineering teams. Adopting new infrastructure is easier than ever thanks to the cloud-native movement, open-source community, and the proliferation of Infrastructure-as-a-Service (IaaS) providers. However, each...

Daniel Hochman

Platform Engineer @Lyft

Session

A Sticky Situation: How Netflix Gains Confidence in Changes

Wednesday Nov 4 / 01:50PM EST

How do you know whether a change will affect end users in a negative way? As interactions in distributed systems grow increasingly complex, it can be challenging to get an answer to this question. One approach is to use a canary in which we introduce a new service into the environment, users...

Haley Tucker

Senior Software Engineer, Resilience Team @Netflix

View full Schedule

Less than

15

weeks until QCon Plus May 2022

Level-up on the emerging software trends and practices you need to know about.

Deep-dive with world-class software leaders at QCon Plus (Nov 1-12, 2021).

Save your spot for $549 before February 7th

Register