Tammy Bryant Butow

Principal Site Reliability Engineer @Gremlin

Tammy Butow is the principal SRE at Gremlin, where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Gremlin helps engineers build resilient systems using their control plane and API. Tammy previously led SRE teams at Dropbox responsible for databases and storage systems used by over 500 million customers. Prior to this Tammy worked at DigitalOcean and at one of Australia’s largest banks in security engineering, product engineering, and infrastructure engineering. Tammy is the co-founder of Girl Geek Academy, a movement to teach 1 millon girls technical skills by 2025.

Find Tammy Bryant Butow at:


Operational Excellence Panel

Being on call for a production system can be stressful whether it is your first time or you have been carrying a pager for years. When that alert goes off, will you be prepared? Will your system reliability mechanisms behave as intended? If not, are you able to debug and understand what’s going on? This roundtable pulls together software engineers and site reliability engineers with experience operating complex systems in production.

Topics are likely to include designing for operability, mitigation techniques, testing strategies, and lessons learned. As an audience member, you will also have the chance to ask the panel questions.


Wednesday Nov 4 / 12:30PM PST (40 minutes)


Architecting for Confidence: Building Resilient Systems

Add to Calendar

Add to calendar



Observing and Understanding Failures: SRE Apprentices

In this session, Tammy will share how Padawans and Jedis can inspire and teach us how to help people of a wide variety of backgrounds, ages, and experience levels to observe and understand failures in production. Tammy will share how she and a colleague created an SRE Apprentice program to hire and train new SREs who wanted a career change. Tammy will cover practical lessons learned, things she'd change and she'll also share how you can create and rollout a program for SRE Apprentices within your organization. Tammy will also share feedback from the SRE Apprentices themselves.  Is it difficult to observe and understand failures? Why is training from someone more experienced helpful? What are the hardest and easiest things to learn about observing and understanding failures as an SRE for 500 million+ users?


Tuesday May 18 / 08:00AM PDT (40 minutes)


Observability and Understandability in Production


ObservabilityDevopsIncident Management

Add to Calendar

Add to calendar


Less than


weeks until QCon Plus Nov 2021

Level-up on the emerging software trends and practices you need to know about.

Deep-dive with world-class software leaders at QCon Plus (Nov 1-12, 2021).

Save your spot for $799 before November 12th