From Zero to A Hundred Billion: Building Scalable Real Time Event Processing At DoorDash

At DoorDash, real time events are an important data source to gain insight into our business but building a system capable of handling billions of real time events is challenging. Events are generated from our services and user devices and need to be processed and transported to different destinations to help us make data driven decisions on the platform.

Historically, DoorDash has a few data pipelines that get data from our legacy monolithic web application and ingest the data into Snowflake. These pipelines incurred high data latency, significant cost and operational overhead. Two years ago, we started the journey of creating a real time event processing system named Iguazu to replace the legacy data pipelines and address the following event processing needs we anticipated as the data volume grows with the business:

  • Data ingest from heterogeneous data sources including mobile and web
  • Easily accessible by streaming data consumers with higher abstractions
  • High data quality with end to end schema enforcement
  • Scalable, fault tolerant and easy to operate for a small team

To meet those objectives, we decided to create a new event processing system with a strategy to leverage open source frameworks which can be customized and better integrated with DoorDash’s infrastructure. Stream processing platforms like Apache Kafka and Apache Flink have matured in the past few years and become easy to adopt. By leveraging those excellent building blocks, applying best engineering practices and a platform mindset, we built the system from ground up and scaled it to process hundreds of billions of events per day with 99.99% delivery rate and only minutes of latency to Snowflake and other OLAP data warehouses.

The talk will discuss in detail the design of the system including major components of event producing, event processing with Flink and streaming SQL, event format and schema validation, Snowflake integration, and self-serve platform. It will also showcase how we have adapted and customized open source frameworks to address our needs, and solved some of the challenges including schema update and service/resource orchestration in an infrastructure-as-code environment.


Speaker

Allen Wang

Software Engineer @DoorDash

Allen Wang is currently a tech lead at data platform at DoorDash. He is the architect for the Iguazu event processing system and a founding member of the real time streaming platform team. 

Prior to joining DoorDash, he was a lead in the real time data infrastructure team at Netflix where he created the Kafka infrastructure for Netflix’s Keystone data pipeline and was a main contributor to shape the Kafka ecosystem at Netflix.

He is a contributor to Apache Kafka and NetflixOSS, and a two-time QCon speaker.

Read more

Date

Wednesday Nov 30 / 12:50PM PST ( 50 minutes )

Topics

Architecture Scalability Event Processing Pipelines Streaming SQL Self-Serve Platform Frameworks Schema Orchestration Infrastructure

Share

From the same track

Session Architecture

Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Wednesday Nov 30 / 11:40AM PST

Azure Cosmos DB is a fully-managed, multi-tenant, distributed, shared-nothing, horizontally scalable database that provides planet-scale capabilities and multi-model APIs for Apache Cassandra, MongoDB, Gremlin, Tables, and the Core (SQL) APIs.

Speaker image - Mei-Chin Tsai
Mei-Chin Tsai

Partner Director of Software Eng Manager @Microsoft

Speaker image - Vinod Sridharan
Vinod Sridharan

Principal Software Engineering Architect @Microsoft

Session Architecture

Honeycomb: How We Used Serverless to Speed Up Our Servers

Wednesday Nov 30 / 10:30AM PST

Honeycomb is the state of the art in observability: customers send us lots of data and then compose complex, ad-hoc queries. Most are simple, some are not. Some are REALLY not; this load is both complex, spontaneous, and urgent.

Speaker image - Jessica Kerr
Jessica Kerr

Principal Developer Evangelist @honeycombio

Session Architecture

Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System

Wednesday Nov 30 / 02:00PM PST

Magic Pocket is used to store all of Dropbox’s data.

Speaker image - Facundo Agriel
Facundo Agriel

Software Engineer / Tech Lead @Dropbox, previously @Amazon

Session Architecture

Amazon DynamoDB: Evolution of a Hyper-Scale Cloud Database Service

Wednesday Nov 30 / 09:20AM PST

Amazon DynamoDB is a cloud database service that provides consistent performance at any scale. Hundreds of thousands of customers rely on DynamoDB for its fundamental properties: consistent performance, availability, durability, and a fully managed serverless experience.

Speaker image - Akshat Vig
Akshat Vig

Principal Engineer NoSQL databases @awscloud