Magic Pocket: Dropbox’s Exabyte-Scale Block Storage System

Magic Pocket is used to store all of Dropbox’s data. It is a horizontally scalable exabyte-scale block storage system which operates out of multiple regions, is able to maintain 99.99% availability and has extremely high durability guarantees, while being more cost efficient than operating in the cloud.

This system is able to facilitate new drive technology, handle millions of queries per second, and automatically identify and repair hundreds of hardware failures per day. We are constantly innovating in this space and work closely with hard drive vendors to adopt the latest drive technology (https://techcrunch.com/2020/10/26/dropbox-begins-shift-to-high-efficiency-western-digital-shingled-magnetic-recording-disks/). Each storage device contains 100+ drives and is multiple petabytes in size. Given the blast radius of single device failures, it is critical that our erasure codes and traffic are all built with this in mind.

In this talk we will deep dive into the architecture of Magic Pocket, some early key design patterns that we still live by to this day, and the challenges of operating such a system at this scale in order to be cost efficient and support many critical requirements.

The key takeaways for this talk are:

  • Provide an overview of the architecture of Magic Pocket. This includes key services, databases, how multi-region replication works, repairs, and a discussion on the storage devices.
  • Key architecture lessons, which had the most impact on Magic Pocket.
  • How we are able to operate such a system, while being extremely cost efficient.

Our system is much cheaper than operating in the cloud, but it operates with a high bar. We discuss these challenges in more detail for others looking to make this transition and what these trade-offs look like.


Speaker

Facundo Agriel

Software Engineer / Tech Lead @Dropbox

Facundo is currently the tech lead for Dropbox's exabyte-scale block storage system. This team manages everything from customized storage machines with many petabytes of capacity to the client APIs other teams use internally. Prior to working at Dropbox, Facundo worked at Amazon on a variety of scheduling problems for Amazon's last mile delivery team.

Read more
Find Facundo Agriel at:

From the same track

Session

Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Azure Cosmos DB is a fully-managed, multi-tenant, distributed, shared-nothing, horizontally scalable database that provides planet-scale capabilities and multi-model APIs for Apache Cassandra, MongoDB, Gremlin, Tables, and the Core (SQL) APIs.

Mei-Chin Tsai

Partner Director of Software Eng Manager @Microsoft

Vinod Sridharan

Principal Software Engineering Architect @Microsoft

Session

Honeycomb: How We Used Serverless to Speed Up Our Servers

Honeycomb is the state of the art in observability: customers send us lots of data and then compose complex, ad-hoc queries. Most are simple, some are not. Some are REALLY not; this load is both complex, spontaneous, and urgent.

Jessica Kerr

Principal Developer Evangelist @honeycombio

Session

From Zero to A Hundred Billion: Building Scalable Real Time Event Processing At DoorDash

At DoorDash, real time events are an important data source to gain insight into our business but building a system capable of handling billions of real time events is challenging.

Allen Wang

Software Engineer @DoorDash

Session

DynamoDB: Evolution of a Hyper-Scale Cloud Database Service

Amazon DynamoDB is a cloud database service that provides consistent performance at any scale. Hundreds of thousands of customers rely on DynamoDB for its fundamental properties: consistent performance, availability, durability, and a fully managed serverless experience.

Akshat Vig

Principal Engineer NoSQL databases @awscloud