You are viewing content from a past/completed QCon Plus - May 2021

Session

Beating the Speed of Light with Intelligent Request Routing

Network request latency is crucial for many Internet applications. For Netflix it matters even outside video streaming - lower latencies to our AWS cloud endpoints mean smoother browsing experience for hundreds of millions of members. The catch - Netflix service is used on hundreds of millions of devices all around the world, connecting to our data centers over the open Internet - an ever-changing global network with many possible paths, distributed ownership and lack of centralized control.

This talk is both about API acceleration technology and data-driven approach to building distributed systems at a global scale that are safe to deploy and easy to maintain. While this talk demonstrates Netflix’s journey, the main principles and techniques can easily be applied and practiced by every owner of Internet-based services. 

From this talk you’ll learn:

  • how to build the Internet latency map for your customers;
  • how to leverage the knowledge of network protocols and edge infrastructure to do the impossible - beat the speed of light; 
  • how to use a data-driven approach to evolve your client-server interactions;
  • how to do that with a small team, on a tight schedule and minimal risk to your users.

Main Takeaways

1 Find out how to leverage the knowledge of network protocols to improve device to server interactions.

2 Learn how to use a data-driven approach to guide optimizations of your network requests.


What is the work that you're doing today?

I'm a Director of Engineering working as part of the content delivery team at Netflix. This is the group that builds and runs the Open Connect CDN, which is deployed at thousands of locations worldwide and delivers all of the Netflix streaming traffic. My main focus today is on finding ways to leverage this distributed CDN infrastructure to optimize device-cloud requests that power personalized Netflix UIs. 

As part of my role, I both drive projects in the traffic management space and work with partners across Netflix engineering, from device to cloud platform, and find opportunities to improve the quality of Netflix experience, identify risks or blinds spots and help inform the evolution of Netflix end-to-end architecture.

Is that mainly CDN related or are we also talking about other data flows or networks?

This is where my role gets very interesting. Historically, our primary goal for the Content Delivery team was optimizing the delivery of static content, primarily video. We excelled at that by building a heavily distributed and very well-connected global infrastructure and I was fortunate to spend over 8 years helping with that.

Now I look for opportunities to optimize network interactions beyond static content. Netflix UIs are heavily personalized with data generated by hundreds of microservices running in the AWS cloud. This problem set is different from static content delivery, and it’s been a fun challenge to come up with solutions to address that with smart request routing or network protocol optimizations.

What sort of impact do new protocols on the client have for you, like QUIC, HTTP2, or anything else that they're using for Netflix?

By now we've run dozens of network experiments, testing protocols like HTTP2, TLS1.3 or playing with TCP configurations. Features like 0-RTT with TLS1.3 or removing head-of-line blocking with HTTP2 help performance, often significantly. 

Yet when operating at the global scale, with services running on thousands of devices, starting from the latest iPhone to 10 years old Smart TV, one requires a more thorough understanding. HTTP2 might be great for concurrency, but less so if an application already bundles a bunch of requests. Some of the network benefits might be negated by additional compute overhead on the device or server. Looking at application metrics often doesn’t provide a clear signal, so we’ve invested in building an advanced network measurement that is deployed on devices running Netflix applications, but allowing us to run controlled experiments to estimate potential tradeoffs.

What are your goals for the talk?

I have been observing that there is a lot of focus on device side or client-side technologies, especially at generic software engineering conferences. As a general pattern, I’ve seen many engineers treating the network as a black box. One of the primary goals of my talk is to demystify what happens with requests between devices and servers and share how one should look at their network interactions. Shed some light on performance impact one could achieve by applying basic knowledge of networking protocols and combining them with data to drive traffic management decisions.

I’ll be talking about a few problems we’ve solved at Netflix, but really my main point is to demonstrate our development process that we found extremely powerful. Our small team of 3 engineers followed it to better route requests, validate benefits of various transport or application protocol options or balance traffic across cloud regions. We’ve reaped many engineering productivity benefits by investing in a quick experimentation workflow that I’ll demonstrate and that I think will be very beneficial for others to adopt.

I want the attendees to walk away from my talk empowered to take control of their network requests, ready to try some of the many network optimization options available to them.


Speaker

Sergey Fedorov

Director of Engineering @Netflix

Sergey is a hands-on engineering leader working for the Content Delivery team at Netflix. An early member of the team that built an Open Connect CDN delivering 13% of the world Internet traffic, he spent years building monitoring and data analysis systems for Netflix video streaming. As part of...

Read more
Find Sergey Fedorov at:

Date

Tuesday May 25 / 07:10AM PDT (40 minutes)

Track

Accelerating APIs and Edge Computing

Topics

ArchitectureNetworkingTraffic ManagementAWSCloud Computing

Add to Calendar

Add to calendar

Share

From the same track

Session Edge Computing

How Do You Distribute Your Database Over Hundreds of Edge Locations?

Tuesday May 25 / 06:10AM PDT

Writing a database is hard enough as is. Writing a database that seamlessly replicates writes to multiple continents is extremely hard. But AWS, Azure and smaller players like FaunaDB have great solutions in this space.But what if you want to distribute your database over a few hundred locations...

Erwin van der Koogh

Product Manager @Cloudflare

Session Networking

How Facebook Is Bringing QUIC to Billions

Tuesday May 25 / 08:10AM PDT

It took Facebook less than a year to implement IETF QUIC and HTTP/3. It took us another 2 years to enable QUIC and HTTP/3 for billions of people. In this talk we will discuss the unexpected technical challenges we faced, from edge load balancer to mobile clients, and from application tweaking to...

Matt Joras

Software Engineer @Facebook

Yang Chi

Software Engineer @Facebook

PANEL DISCUSSION API

Panel: Living on the Edge

Tuesday May 25 / 09:10AM PDT

When do you need to care about edge optimizations, and when it may not be worth the effort? How does the development and operations workflow look like when operating on hundreds or thousands of edge locations? What are the big challenges today and what’s coming next?Join the discussion...

Jose Nino

Staff Software Engineer @Lyft

Rita Kozlov

Product Manager @CloudflareDev

Ivan Ivanov

Engineering Manager on the CDN Reliability team @Netflix

View full Schedule

Less than

1

weeks until QCon Plus Nov 2021

Level-up on the emerging software trends and practices you need to know about.

Deep-dive with world-class software leaders at QCon Plus (Nov 1-12, 2021).

Save your spot for $799 before November 12th

Register