Session

Designing IoT Data Pipelines for Deep Observability

What You'll Learn

1Hear how Tesla deals with large data ingestion and processing.

2Find out what are some of the challenges with IoT data collecting and processing, and how to deal with them.


Millions of IoT devices emitting trillions of events per day enable us to track the health of the Tesla fleet. From a data engineering perspective, it's a challenging scale, but what makes it unique is how naturally fragile the data pipeline is. The physical world is full of chaos: what if network partitions lasted days, not seconds? Ever suffered data loss due to sub-zero temperatures? As these faults propagate, they threaten the quality of data-driven insights.

In this talk, we will share recipes that immunize IoT data pipelines against these faults from the get-go. We have built these recipes around a toolchain called dataflow. While designed for IoT, dataflow concepts generalize well. Learn how to apply these techniques to draw insights such as a drop in coverage, anomalous patterns & end to end processing latencies.

What's the work you do today?

I work in the core data platform team at Tesla. The mission of the Data Platforms team is to support and enhance Tesla products with data science, services, and platforms, by:

  • Ingesting and serving high-volume time-series sensor data
  • Providing analytical tools and products
  • Empowering teams to build scalable pipelines 

What are your goals for the talk?

I have two goals for my talk. The first is to share techniques that we have developed to guarantee fault tolerance in our pipelines and also make them very ops-friendly. We are an extremely small, ops-focused team that owns all mission critical data systems. And we can do that by having a really strong hold on how we operate these pipelines. I will dwell into a toolchain we have built to serve this purpose, that has cut down operational toil & allows us to move fast without compromising with quality.

What are the takeaways?

I have different groups in mind. I hope engineers come away with an interest in bringing order to chaotic data pipelines. Data scientists might get interested in having self-serve data systems to work with data but also verify data quality and more. And product managers might get excited about being able to connect IoT data to end user experiences.

In IoT there's new failure modes, complete disconnection or data backing up for a long time. Are there existing patterns in CS for this or is this a totally new field?

I don't think this is a totally new field. What is different, not necessarily new, is the limit at which these things are getting pushed. I believe IoT has finally arrived. The buzz has been there for a while, and there has been some innovation. But limits are being tested now. If we think data is big now, it's going to get bigger. We need paranoia driven development. In IoT, you can trust nothing, all assumptions are tested. We need better tools for privacy, governance, access control, and auditing. We also need specialized data formats & compression.


SPEAKER

Shrijeet Paliwal

Sr. Staff Software Engineer @Tesla

Shrijeet is a founding member of data platform team at Tesla, tasked to build a scalable, self-served, cost-efficient data platform that caters everything data at Tesla. He has his hands all the way from improving server provisioning & automation to writing multi-petabyte time series storage. Prior to Tesla, Shrijeet has contributed to data infrastructure at Pinterest and Rocket Fuel.

DATE

Tuesday Nov 17 / 10:50AM PST (40 minutes )

TRACK Modern Data Engineering ADD TO CALENDAR Add to calendar SHARE

3 weeks of live software engineering content designed around your schedule.

Don’t miss out! Save your seat now

Register
TOP