If you registered for this event, please log in into your QCon Plus account.

Session

Production Readiness: Fighting Fires or Building Better Systems?

In 2018 Tanya Reilly gave a talk called ‘The History of Fire Escapes’ in which she argues that we need to ‘focus on better software, not better incident response’. When I was recently asked how much time SREs should spend firefighting, that talk came to mind. The ideal amount of time to spend responding to incidents is 0 (which we never achieve, but that’s the goal). A healthy SRE team spends most of its time improving the systems it is responsible for.

Reilly also argues that ‘software needs a fire code’, in other words, a set of best practices to prevent or mitigate failures - just as building regulations have evolved over time to reduce the incidence and impact of building fires.

We don’t have fire codes for software. But many organisations use Production Readiness Reviews (PRRs) as a model to improve reliability of software systems. This talk will discuss why we don’t have a fire code for software; what PRRs can and cannot achieve in terms of reliability; the difference between PRRs run by a team that is onboarding a service and PRRs run by consulting teams; and what to do when all your team does is fight fires.


Speaker

Laura Nolan

Site Reliability Engineer @Slack, Contributor to Seeking SRE, & SRECon Steering Committee

Laura Nolan's background is in Site Reliability Engineering, software engineering, distributed systems, and computer science. She wrote the 'Managing Critical State' chapter in the O'Reilly 'Site Reliability Engineering' book, as well as contributing to the more recent...

Read more

Date

Wednesday Nov 10 / 08:10AM PST (40 minutes)

Track

Production Readiness

Topics

Production ReadinessDevopsInfrastructureSREDistributed Programming

Add to Calendar

Add to calendar

Share

From the same track

Session Production Readiness

Prod Lessons - Deployment Validation and Graceful Degradation

Wednesday Nov 10 / 09:10AM PST

Key to Site Reliability Engineering is building frameworks and “guardrails” that enable the product to be developed safely. If patterns can be identified in outages and bugs, preventing those problems systematically gives SRE unparalleled leverage to improve stability. During...

Anika Mukherji

Software Engineer @Pinterest

Session Production Readiness

Incident Response

Wednesday Nov 10 / 10:10AM PST

The presentation details are currently being prepared.

Nora Jones

Founder and CEO @jeli_io

PANEL DISCUSSION Production Readiness

Production Readiness Panel

Wednesday Nov 10 / 11:10AM PST

What does it mean for an app to truly be ready for Production? Join Ines Sombra (Senior Director of Engineering at Fastly) as she discusses production readiness. She will be joined by engineers from incident response, SRE, chaos engineering, as they have a lively discussion around production.

Kolton Andrus

Founder and CEO of @GremlinInc

View full Schedule

Less than

1

weeks until QCon Plus Nov 2021

Level-up on the emerging software trends and practices you need to know about.

Deep-dive with world-class software leaders at QCon Plus (Nov 1-12, 2021).

Save your spot for $799 before November 12th

Register