You are viewing content from a past/completed QCon Plus - November 2020


Designing Better ML Systems: Learnings From Netflix

Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery to making our infrastructure more resilient to failures and beyond. Our unique culture of freedom & responsibility affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of ML approaches to tackle interesting problems and a diverse set of ML tools and systems.

In this talk, we will share some lessons learned in our multi-year journey building and operating ML systems covering a diverse range of scale - from ad-hoc experimentation on a laptop to large scale model training & serving systems. We will focus on best practices for managing the lifecycle of various ML systems and a design philosophy that is geared towards making data scientists more productive.

Main Takeaways

1 Hear about some of the lessons learned by Netflix building their machine learning infrastructure.

2 Learn what are some of the tradeoffs to consider when designing or buying a machine learning system.

What are you working on these days?

I am part of the ML Infrastructure team at Netflix. My team provides infrastructure support for all ML efforts that Netflix is involved in, all the way from when content is pitched to us to it being played by a subscriber. More specifically, I have been building Metaflow, our open-source ML platform for the past couple of years.

What are the goals for your talk?

I have been building ML systems at companies, big and small, for close to a decade now. In this talk, I am going to focus on some of the architectural patterns that have worked well for us, across different dimensions of scale as well as design decisions that have allowed our end-users to be highly productive. In December 2019, we open-sourced our ML platform, Metaflow, which incorporates a lot of these learnings. My expectation is that members of the audience will find value in our experiences thus far.

What do you want people to leave the talk with?

The ML landscape (especially MLOps) is changing very rapidly, and it is non-trivial to design productivity-oriented ML infrastructure for data scientists. I hope people will resonate with some of the trade-offs and design decisions we took and incorporate those when they are designing their own ML platforms or evaluating an existing solution.


Savin Goyal

Engineer on the ML Infrastructure team @Netflix

Savin is an engineer on the ML Infrastructure team at Netflix. He focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix.

Read more
Find Savin Goyal at:

From the same track


Data-Driven Development in the Automotive Field

Tuesday Nov 10 / 01:50PM EST

In the new era of new mobility where solving many challenging tasks of autonomous driving is not possible with classical software development. It is due to the fact that we cannot write every rule by hand and thus we would like to learn it from a huge amount of data recorded by different driving...

Toshika Srivastava

AI expert @Audi


Scaling & Optimising the Training of Predictive Models

Tuesday Nov 10 / 02:40PM EST

Modern Machine Learning has brought with it countless advances, both algorithmically and with respect to tooling; there is relentless growth on all fronts. Nevertheless, we are faced with a multitude of challenges when trying to pull all these threads of progress together in a meaningful way,...

Nicholas Mitchell

Machine Learning Engineer at @argoai


Panel: The Purpose of Machine Learning

Tuesday Nov 10 / 03:30PM EST

In the machine learning world, there is no shortage of buzzwords. Each trend follows the next in rapid succession, during the panel we will discuss how and when our panelists decide to pick up on a trend and when to focus on proven technology. We delve into these decisions based on their...

Thomas van Heyningen

Data Science Consultant at NAVARA

Nicholas Mitchell

Machine Learning Engineer at @argoai

Jendrik J├Ârdening

CTO @Nooxit

Diana Hu

Startup Advisor and former CTO for Escher Reality

View full Schedule

Less than


weeks until QCon Plus May 2022

Level-up on the emerging software trends and practices you need to know about.

Deep-dive with world-class software leaders at QCon Plus (Nov 1-12, 2021).

Save your spot for $549 before February 7th