Jun He

Sr. Software Engineer in the Big Data Orchestration Team @Netflix Jun He is a Sr. Software Engineer in the Big Data Orchestration team at Netflix, where he is responsible for building the big data workflow scheduler to manage and automate ML and data pipelines at Netflix. Prior to Netflix, He spent a few years building distributed services and search infrastructure at Airbnb. He was the main contributor to design and build message bus and search pipeline at Airbnb. He has spent the majority of his career in different areas related to distributed systems and infrastructure.


Robust Foundation for Data Pipelines at Scale - Lessons From Netflix

At Netflix, Data/ML pipelines are widely used and have become central for the business. A very wide scenario presents diverse use cases that go beyond recommendations, predictions and data transformations. As big data and ML gains presence and becomes more impactful, the scalability and stability of the ecosystem have increasingly become more important for our data scientists and the company.  

Over the past years, we have developed a robust foundation composed of multiple cloud services and libraries, that provides users a consistent way to define, execute and monitor units of work. In the big data and ML space, our foundation is responsible for reliable executing a large number of Data/ML workflows containing tens of thousands parallel jobs, in addition to event-driven triggers and conditional branches.  

In this talk, we will share our experiences of building and operating the orchestration platform for Netflix’s big data ecosystem. We will talk about challenges we faced to manage hundreds of thousands of pipelines, and lessons we learned to automate them over the past years, such as fair resource allocation, scaling problems, and security concerns. We will also share best practices for the workflow lifecycle management and design philosophy for workflow automation, including patterns we developed and approaches we took.


Thursday May 27 / 10:10AM EDT (40 minutes)

TRACK Modern Data Pipelines and Data Mesh TOPICS ArchitectureData PipelineMachine LearningDatabase ADD TO CALENDAR Calendar IconAdd to calendar

Build your learning journey and level-up on the skills most in-demand in 2021. Attend QCon Plus (May 17-28, 2021).

Save your spot for $599 before May 28th