Operating Microservices

Building and operating distributed systems is hard, and microservices are no different. Learn strategies for not just building a service but operating them at scale.


Michelle Brush

Engineering Manager, SRE @Google Michelle Brush is a math geek turned computer geek with 20 years of software development experience. She has developed algorithms and data structures for pathfinding, search, compression, and data mining in embedded as well as distributed systems. In her current role as an SRE Manager for Google, she leads the teams of SREs that ensure GCP's APIs are reliable. Previously, she served as the Director of HealtheIntent Architecture for Cerner Corporation, responsible for the data processing platform for Cerner’s Population Health solutions. Prior to her time at Cerner, she was the lead engineer for Garmin's automotive routing algorithm. Find Michelle Brush at:

Tuesday Nov 10 / 09:00AM PST


From this track


Solving Mysteries Faster with Observability

Tuesday Nov 10 / 11:40AM PST

Everyone loves a good mystery, but not when it involves operating our services. Investigating production issues in a microservice architecture can make you feel like a detective, combing through evidence and gathering clues to reconstruct the scene of a crime, all while the clock is ticking. You hop from log store to dashboard, digging for details as you strive to unravel what really happened. All this time spent investigating is expensive, for engineers as well as customers -- and even then, finding an issue is not the same as resolving it!  

Edgar is a tool used and built by Netflix engineers to quickly investigate and solve production issues. Edgar starts with distributed tracing, which shows a request’s path through a complex system. But the request’s path is only a small part of the data available about a request. Dozens of dashboards hold their own insights on what happened, and it takes time for engineers to jump between individual dashboards. Edgar strives to get all this data in one place, supplementing traces with additional context like log correlation, metadata about services, and intelligent analysis. Not only does this help Netflix engineers investigate more efficiently, it empowers our customer service operations to access the same information. Our engineers and customer service operations rely on Edgar so they can quickly get our members back to enjoying their favorite movies and shows. 

You’ll leave this talk with an understanding of how we enhance distributed tracing with additional context like logs and metadata to resolve issues. You’ll see examples of how Edgar has paid off and hear about challenges we faced. Finally, we’ll inspire you to leverage the data you already have to help you and your team solve mysteries faster.

Elizabeth Carretto Senior Software Engineer @Netflix

3 weeks of live software engineering content designed around your schedule.

Don’t miss out! Save your seat now