At DoorDash, real time events are an important data source to gain insight into our business but building a system capable of handling billions of real time events is challenging. Events are generated from our services and user devices and need to be processed and transported to different destinations to help us make data driven decisions on the platform.
Historically, DoorDash has a few data pipelines that get data from our legacy monolithic web application and ingest the data into Snowflake. These pipelines incurred high data latency, significant cost and operational overhead. Two years ago, we started the journey of creating a real time event processing system named Iguazu to replace the legacy data pipelines and address the following event processing needs we anticipated as the data volume grows with the business:
- Data ingest from heterogeneous data sources including mobile and web
- Easily accessible by streaming data consumers with higher abstractions
- High data quality with end to end schema enforcement
- Scalable, fault tolerant and easy to operate for a small team
To meet those objectives, we decided to create a new event processing system with a strategy to leverage open source frameworks which can be customized and better integrated with DoorDash’s infrastructure. Stream processing platforms like Apache Kafka and Apache Flink have matured in the past few years and become easy to adopt. By leveraging those excellent building blocks, applying best engineering practices and a platform mindset, we built the system from ground up and scaled it to process hundreds of billions of events per day with 99.99% delivery rate and only minutes of latency to Snowflake and other OLAP data warehouses.
The talk will discuss in detail the design of the system including major components of event producing, event processing with Flink and streaming SQL, event format and schema validation, Snowflake integration, and self-serve platform. It will also showcase how we have adapted and customized open source frameworks to address our needs, and solved some of the challenges including schema update and service/resource orchestration in an infrastructure-as-code environment.
Speaker
Allen Wang
Software Engineer @DoorDash
Allen Wang is currently a tech lead at data platform at DoorDash. He is the architect for the Iguazu event processing system and a founding member of the real time streaming platform team.
Prior to joining DoorDash, he was a lead in the real time data infrastructure team at Netflix where he created the Kafka infrastructure for Netflix’s Keystone data pipeline and was a main contributor to shape the Kafka ecosystem at Netflix.
He is a contributor to Apache Kafka and NetflixOSS, and a two-time QCon speaker.