You are viewing content from a past/completed QCon - November 2020

Designing Better ML Systems: Learnings From Netflix

What You'll Learn

1Hear about some of the lessons learned by Netflix building their machine learning infrastructure.

2Learn what are some of the tradeoffs to consider when designing or buying a machine learning system.

Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery to making our infrastructure more resilient to failures and beyond. Our unique culture of freedom & responsibility affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of ML approaches to tackle interesting problems and a diverse set of ML tools and systems.

In this talk, we will share some lessons learned in our multi-year journey building and operating ML systems covering a diverse range of scale - from ad-hoc experimentation on a laptop to large scale model training & serving systems. We will focus on best practices for managing the lifecycle of various ML systems and a design philosophy that is geared towards making data scientists more productive.

What are you working on these days?

I am part of the ML Infrastructure team at Netflix. My team provides infrastructure support for all ML efforts that Netflix is involved in, all the way from when content is pitched to us to it being played by a subscriber. More specifically, I have been building Metaflow, our open-source ML platform for the past couple of years.

What are the goals for your talk?

I have been building ML systems at companies, big and small, for close to a decade now. In this talk, I am going to focus on some of the architectural patterns that have worked well for us, across different dimensions of scale as well as design decisions that have allowed our end-users to be highly productive. In December 2019, we open-sourced our ML platform, Metaflow, which incorporates a lot of these learnings. My expectation is that members of the audience will find value in our experiences thus far.

What do you want people to leave the talk with?

The ML landscape (especially MLOps) is changing very rapidly, and it is non-trivial to design productivity-oriented ML infrastructure for data scientists. I hope people will resonate with some of the trade-offs and design decisions we took and incorporate those when they are designing their own ML platforms or evaluating an existing solution.


Savin Goyal

Engineer on the ML Infrastructure team @Netflix
Savin is an engineer on the ML Infrastructure team at Netflix. He focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix. Read more Find Savin Goyal at:

Tuesday Nov 10 / 01:00PM EST (40 minutes)

TRACK Machine Learning for the Software Engineer ADD TO CALENDAR Calendar IconAdd to calendar

Build your learning journey and level-up on the skills most in-demand in 2021. Attend QCon Plus (May 17-28, 2021).

Save your spot for $599 before May 28th