The Eternal Sunshine of the Toil-Less Prod

One of the most important decisions in building an SRE practice is what kind of work should be assigned to the SRE team, and in what percentages. At Red Hat, we ship OpenShift both as a product and as a service, which can make it extra difficult to draw the lines between feature development and toil automation work. In addition, we face the usual SRE struggle between striving for toil minimization and unintentionally devaluing Ops-type work. In this talk, we will discuss the evolution from shipping products to running services, and what we've learned while trying different approaches. 


Speaker

Sasha Rosenbaum

Director of the Cloud Services Black Belt Team @RedHat

Sasha is a Director of the Cloud Services Black Belt team at Red Hat, where she is helping enterprise customers successfully migrate to Managed OpenShift on customers’ favorite public cloud.

In her career, Sasha has worked in development, operations, consulting, and cloud architecture. Sasha is an organizer of DevOpsDays Chicago, a chair of DeliveryConf, and a published author.

Read more

Date

Tuesday Dec 6 / 12:30PM PST ( 50 minutes )

Topics

SRE

Share

From the same track

Session SRE

Did the Chaos Test Pass?

Tuesday Dec 6 / 10:10AM PST

People used to ask me all the time how to figure out if their chaos test has “passed,” and I’d always say “well, that’s a loaded question.” To confirm that a chaos test “passed,” we need to do verification of hypotheses - sometimes you’re trying to prove some system behavior occurred in response

Christina Yakomin

Senior Site Reliability Engineering Specialist @Vanguard_Group

Session SRE

The Endgame of SRE

Tuesday Dec 6 / 11:20AM PST

The containers are deployed and the builds are green. Yaml flows through the system, linted, reviewed, tested, and shipped with ease and regularity. Our intrepid SRE finds themself at a crossroads. The infrastructure is great but teams still struggle to maintain error budgets.

Amy Tobey

Senior Principal Engineer and SRE Practice Leader @Equinix

Session SRE

Rethinking Reliability: What You Can (and Can't) Learn From Incidents

Tuesday Dec 6 / 09:00AM PST

This talk presents research collected from the VOID—an open database of public incident reports. Containing over 2,000 reports for almost 700 organizations, the database allows for more structured review and research about software-related incident reporting.

Courtney Nash

Internet Incident Librarian & Senior Research Analyst @Verica