You are viewing content from a past/completed QCon - November 2020

Architecting for Confidence: Building Resilient Systems

For any complex system, there is a wide array of activities that can increase system reliability and operator confidence. Each activity contributes in a different way.

If your system is safety-sensitive, you may invest heavily in pre-production testing strategies. If you want to holistically understand the effect of a change on individual users, you may use a sticky canary. If you don’t know your resource limits or bottlenecks, a load test could be useful. To validate design decisions around reliability mechanisms that don’t get exercised regularly, you may run chaos experiments. All these activities converge to build a stronger system that holds up to the pressures of production, but eventually your operators will have to engage to triage outages. When they do, it’s important they are comfortable doing so.

In this track, we will delve into each of these areas to provide attendees with the tools they need to build resilient systems and empower operators.


Haley Tucker

Senior Software Engineer, Resilience Team @Netflix
Haley Tucker is a member of the Resilience Engineering team at Netflix where she is responsible for improving the reliability of the Netflix ecosystem by supporting developers and building trustable and safe tooling. Prior to that, she worked on the Playback Features team where her services... Read more Find Haley Tucker at:

Wednesday Nov 4 / 12:00PM EST


From this track


Build your learning journey and level-up on the skills most in-demand in 2021. Attend QCon Plus (May 17-28, 2021).

Save your spot for $549 before May 1st