Stress Free Change Validation at Netflix

How do you gain confidence that a system modification does what it’s supposed to do? A refactoring should not cause a functional change, whereas a feature modification should cause a specific kind of change. Tests are great for validating assertions one has already thought of, but for sufficiently complex systems it is infeasible or cost-prohibitive to verify the long tail of possible outcomes through testing. Without high confidence in our code changes, it is difficult to move quickly without breaking things.

I built a framework at Netflix that allows our engineers to define basic characteristics of a service API, and then replays pairs of identical requests against two versions of the service looking for differences in the responses. When changes are found, the framework filters out noise using a variety of techniques which effectively bring noise down to zero. I will share examples of how this has helped accelerate development by unlocking stress-free refactoring in my team’s most critical service, and how we’ve used it to perform large technical debt migrations in record time.

Key Takeaways

  1. Why we need a high confidence change process for maintainable code bases
  2. How zero-noise diffs help close the confidence gap left by tests and canaries
  3. Recommended practices for building a diff system that controls for non-idempotent dependencies and (some) side effects

Speaker

Javier Fernandez-Ivern

Senior Software Engineer @Netflix

Javier Fernandez-Ivern is a member of the Playback Experience team at Netflix, where he is responsible for ensuring that customers always enjoy their favorite shows with the best video, audio, text, and other features available. His services fill a key role in enabling Netflix to stream amazing content to more than 220M members on thousands of devices worldwide. Prior to Netflix, Javier spent a few years working at a competitive programming startup before moving into a consulting role where he built web applications for a variety of clients. After trying out management at Capital One, he returned to his software engineering roots and joined Netflix. Javier enjoys developing and operating highly available services, and the scale at Netflix has been a unique and exciting challenge. Javier received a MS in Computer Science from Eastern Washington University.

Read more

Date

Thursday Dec 1 / 11:20AM PST ( 50 minutes )

Topics

Architecture

Share

From the same track

Session Architecture

Adopting Continuous Delivery at Lyft

Thursday Dec 1 / 09:00AM PST

All organizations, regardless of size, need to be able to make rapid changes and improvements in their constantly growing systems. How can we handle all this change while maintaining a reliable product? 

Speaker image - Tom Wanielista
Tom Wanielista

Senior Staff Software Engineer @Lyft

Session Architecture

Dark Side of DevOps

Thursday Dec 1 / 10:10AM PST

Topics like “you build it, you run it” and “shifting testing/security/data governance left” are popular: moving things to the earlier stages of software development, empowering engineers, shifting control definitely sounds good.

Speaker image - Mykyta Protsenko
Mykyta Protsenko

Senior Software Engineer @Netflix

Session Architecture

Log4Shell Response Patterns & Learnings From Them

Thursday Dec 1 / 12:30PM PST

In early December 2021, rumors about a remote code execution (RCE) vulnerability in Log4j began circulating on social media, dubbed Log4Shell. Over the next three days, those rumors were confirmed and the immense scope of the vulnerability became clear.

Speaker image - Tapabrata Pal
Tapabrata Pal

Vice President of Architecture @Fidelity