You are viewing content from a past/completed QCon

Less Mess, Less Stress: The Reliability Benefits of Custom Tools

What You'll Learn

1How an overreliance on vendor tooling leads to worse reliability outcomes.

2How Lyft lowered MTTR for its most common alerts using custom tooling.

3How Clutch can extend to your organization's operational use cases.

Tooling is an often overlooked, yet critical, component of infrastructure for most engineering teams. Adopting new infrastructure is easier than ever thanks to the cloud-native movement, open-source community, and the proliferation of Infrastructure-as-a-Service (IaaS) providers. However, each new infrastructure component comes with its own set of configuration, tooling, logs, and metrics resulting in increased cognitive load. Additionally, initial tooling strategies rarely scale to more complicated architectures or more stringent SLAs.

Clutch is an open-source platform from Lyft that empowers organizations to take control of their operator experience. While augmenting or replacing existing tools requires investment, it comes with significant benefits for developers and end-users. Clutch aims to streamline key actions in order to reduce the time to resolve incidents (MTTR), lower onboarding costs for new engineers, lower overall cognitive load when interacting with infrastructure, improve developer productivity, and even prevent incidents by eliminating the chance of accidents during normal maintenance.


Daniel Hochman

Platform Engineer @Lyft

Daniel Hochman is the tech lead of the platform tools team at Lyft and the creator of Clutch, the open-source platform for infrastructure management. As an early engineer at Lyft, Daniel successfully guided the platform through the explosion of product and organizational growth. He wrote one of the first microservices at Lyft, storing and indexing location data at millions of queries per second. In addition, he implemented several key platform elements to make it easy for engineering teams to build scalable systems while also allowing them to deliver new features quickly.


Wednesday Nov 4 / 10:00AM PST (40 minutes )

TRACK Architecting for Confidence: Building Resilient Systems ADD TO CALENDAR Add to calendar

From the same track

View full Schedule

3 weeks of live software engineering content designed around your schedule.

Don’t miss out! Save your seat now