Distributed Systems for Developers

Computer science in practice. An applied track that fuses together the human side of computer science with the technical choices that are made along the way


Cindy Sridharan

Distributed Systems Engineer Cindy Sridharan is a distributed systems engineer. She's the author of a book on Distributed Systems Observability with O'Reilly and the co-author of an upcoming book on distributed systems engineering in the cloud. She runs the Prometheus user group in San Francisco, has been a reviewer of several technical books and on the program committee of leading industry conferences on systems engineering. She lives in San Francisco and in her spare time enjoys hiking the gorgeous outdoors of the Bay Area, reading way too many papers and occasionally blogging about building resilient and maintainable systems.

Wednesday Nov 11 / 09:00AM PST


From this track


Change Data Capture for Distributed Databases @Netflix

Wednesday Nov 11 / 10:40AM PST

At Netflix, we have hundreds of microservices that rely on hybrid backends ranging from RDS or NoSQL to ElasticSearch or Iceberg. This necessitates hybrid backends for distributed high throughput applications since no single database can handle all the access patterns. A classic example of this pattern is Apache Cassandra for robustness and resilience and Elastic Search for search capabilities. Keeping data in sync among these data stores is a challenging problem usually solved by individual teams building sync processes and audits. This adds complexity and operational overhead to the teams. Change Data Capture (CDC) provides an optimal solution for receiving all changes seen on a database. 

CDC events from NoSQL databases with Active-Active setups like Apache Cassandra have unique challenges due to data partitioning and replication. Current CDC solutions for this rely on running within the database cluster and providing a stream with duplicated events. Our solution takes a more efficient approach by de-duping the stream in a stream processing framework. This involves having a distributed copy of the source DB in a stream processing system like Apache Flink. This enables better handling of the CDC stream since we can have before and after images of row changes. 

In this talk, we will cover the challenges associated with capturing CDC events from Cassandra (C*) and how we efficiently provide a data stream of change events while maintaining minimal load on the C* cluster. We will discuss the Flink ecosystem and how the backing store of RocksDB is used to make sure the data stream has full row changes instead of differences between rows which is provided by C* CDC.

Raghuram Onti Srinivasan Senior Software Engineer @Netflix Tharanga Gamaethige Senior Software Engineer - Core Data Platform Team @Netflix

3 weeks of live software engineering content designed around your schedule.

Don’t miss out! Save your seat now