At DoorDash, real time events are an important data source to gain insight into our business but building a system capable of handling billions of real time events is challenging. Events are generated from our services and user devices and need to be processed and transported to different destinations to help us make data driven decisions on the platform.

Historically, DoorDash has a few data pipelines that get data from our legacy monolithic web application and ingest the data into Snowflake. These pipelines incurred high data latency, significant cost and operational overhead. Two years ago, we started the journey of creating a real time event processing system named Iguazu to replace the legacy data pipelines and address the following event processing needs we anticipated as the data volume grows with the business:

Data ingest from heterogeneous data sources including mobile and web
Easily accessible by streaming data consumers with higher abstractions
High data quality with end to end schema enforcement
Scalable, fault tolerant and easy to operate for a small team

To meet those objectives, we decided to create a new event processing system with a strategy to leverage open source frameworks which can be customized and better integrated with DoorDash’s infrastructure. Stream processing platforms like Apache Kafka and Apache Flink have matured in the past few years and become easy to adopt. By leveraging those excellent building blocks, applying best engineering practices and a platform mindset, we built the system from ground up and scaled it to process hundreds of billions of events per day with 99.99% delivery rate and only minutes of latency to Snowflake and other OLAP data warehouses.

The talk will discuss in detail the design of the system including major components of event producing, event processing with Flink and streaming SQL, event format and schema validation, Snowflake integration, and self-serve platform. It will also showcase how we have adapted and customized open source frameworks to address our needs, and solved some of the challenges including schema update and service/resource orchestration in an infrastructure-as-code environment.

From the same track

Session Architecture

Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Wednesday Nov 30 / 11:40AM PST

Azure Cosmos DB is a fully-managed, multi-tenant, distributed, shared-nothing, horizontally scalable database that provides planet-scale capabilities and multi-model APIs for Apache Cassandra, MongoDB, Gremlin, Tables, and the Core (SQL) APIs.

Mei-Chin Tsai

Partner Director of Software Eng Manager @Microsoft

Vinod Sridharan

Principal Software Engineering Architect @Microsoft

Session Architecture

Honeycomb: How We Used Serverless to Speed Up Our Servers

Wednesday Nov 30 / 10:30AM PST

Honeycomb is the state of the art in observability: customers send us lots of data and then compose complex, ad-hoc queries. Most are simple, some are not. Some are REALLY not; this load is both complex, spontaneous, and urgent.

Jessica Kerr

Principal Developer Evangelist @honeycombio

Session Architecture

Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System

Wednesday Nov 30 / 02:00PM PST

Magic Pocket is used to store all of Dropbox’s data.

Facundo Agriel

Software Engineer / Tech Lead @Dropbox, previously @Amazon

Session Architecture

Amazon DynamoDB: Evolution of a Hyper-Scale Cloud Database Service

Wednesday Nov 30 / 09:20AM PST

Amazon DynamoDB is a cloud database service that provides consistent performance at any scale. Hundreds of thousands of customers rely on DynamoDB for its fundamental properties: consistent performance, availability, durability, and a fully managed serverless experience.

Akshat Vig

Principal Engineer NoSQL databases @awscloud

From Zero to A Hundred Billion: Building Scalable Real Time Event Processing At DoorDash

Speaker

Allen Wang

Speaker

Allen Wang

Date

Track

Topics

Share

From the same track

Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Honeycomb: How We Used Serverless to Speed Up Our Servers

Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System

Amazon DynamoDB: Evolution of a Hyper-Scale Cloud Database Service

Follow QCon

Contact

Menu

Conferences around the World