Magic Pocket is used to store all of Dropbox’s data. It is a horizontally scalable exabyte-scale blob storage system which operates out of multiple regions, is able to maintain 99.99% availability and has extremely high durability guarantees, while being more cost efficient than operating in the cloud.
This system is able to facilitate new drive technology, handle millions of queries per second, and automatically identify and repair hundreds of hardware failures per day. We are constantly innovating in this space and work closely with hard drive vendors to adopt the latest drive technology (https://techcrunch.com/2020/10/26/dropbox-begins-shift-to-high-efficiency-western-digital-shingled-magnetic-recording-disks/). Each storage device contains 100+ drives and is multiple petabytes in size. Given the blast radius of single device failures, it is critical that our erasure codes and traffic are all built with this in mind.
In this talk we will deep dive into the architecture of Magic Pocket, some early key design patterns that we still live by to this day, and the challenges of operating such a system at this scale in order to be cost efficient and support many critical requirements.
The key takeaways for this talk are:
- Provide an overview of the architecture of Magic Pocket. This includes key services, databases, how multi-region replication works, repairs, and a discussion on the storage devices.
- Key architecture lessons, which had the most impact on Magic Pocket.
- How we are able to operate such a system, while being extremely cost efficient.
Our system is much cheaper than operating in the cloud, but it operates with a high bar. We discuss these challenges in more detail for others looking to make this transition and what these trade-offs look like.
Interview:
What is the focus of your work these days?
I'm currently leading the team that manages Magic Pocket. Magic Pocket is Dropbox's exabyte scale blob storage system. It’s the storage system for Dropbox and internal use-cases. Our focus right now is continuing to scale Magic Pocket for continued growth, work with hardware vendors to adopt the latest technology that will drive efficiencies, and improve the reliability of the system. Due to its size there's always something interesting going on.
What's the motivation behind your talk?
The motivation of my talk is to provide an overview of the architecture behind Magic Pocket and how we're able to efficiently manage such a system. Also, lessons we learned along the way as the system has gone through several iterations. For example, we've built a cold storage tier, we've adopted SMR technology, performed very large migrations, and so on. I hope that people find all of these different projects as interesting as our team has.
How would you describe the persona and level of the target audience for your session?
I would say anyone that is interested in larger distributed systems, storage at scale or anyone curious on how to run such a system, perhaps your team is thinking about building something like this and you're trying to assess these tradeoffs, or just looking for some inspiration about building an on-premise storage solution like this.
What would you like his persona to walk away with after watching your presentation?
I hope that folks out there walk away with some inspiration after learning more about Magic Pocket. I also hope that the lessons learned and tradeoffs we have made along the way resonate in projects people are working on at any scale and in any project outside of storage systems.
Speaker
Facundo Agriel
Software Engineer / Tech Lead @Dropbox, previously @Amazon
Facundo is currently the tech lead for Dropbox's exabyte-scale blob storage system. This team manages everything from customized storage machines with many petabytes of capacity to the client APIs other teams use internally. Prior to working at Dropbox, Facundo worked at Amazon on a variety of scheduling problems for Amazon's last mile delivery team.