You are viewing content from a past/completed QCon Plus - November 2020

Session

Designing IoT Data Pipelines for Deep Observability

Millions of IoT devices emitting trillions of events per day enable us to track the health of the Tesla fleet. From a data engineering perspective, it's a challenging scale, but what makes it unique is how naturally fragile the data pipeline is. The physical world is full of chaos: what if network partitions lasted days, not seconds? Ever suffered data loss due to sub-zero temperatures? As these faults propagate, they threaten the quality of data-driven insights.

In this talk, we will share recipes that immunize IoT data pipelines against these faults from the get-go. We have built these recipes around a toolchain called dataflow. While designed for IoT, dataflow concepts generalize well. Learn how to apply these techniques to draw insights such as a drop in coverage, anomalous patterns & end to end processing latencies.

Main Takeaways

1 Hear how Tesla deals with large data ingestion and processing.

2 Find out what are some of the challenges with IoT data collecting and processing, and how to deal with them.


What's the work you do today?

I work in the core data platform team at Tesla. The mission of the Data Platforms team is to support and enhance Tesla products with data science, services, and platforms, by:

  • Ingesting and serving high-volume time-series sensor data
  • Providing analytical tools and products
  • Empowering teams to build scalable pipelines¬†

What are your goals for the talk?

I have two goals for my talk. The first is to share techniques that we have developed to guarantee fault tolerance in our pipelines and also make them very ops-friendly. We are an extremely small, ops-focused team that owns all mission critical data systems. And we can do that by having a really strong hold on how we operate these pipelines. I will dwell into a toolchain we have built to serve this purpose, that has cut down operational toil & allows us to move fast without compromising with quality.

What are the takeaways?

I have different groups in mind. I hope engineers come away with an interest in bringing order to chaotic data pipelines. Data scientists might get interested in having self-serve data systems to work with data but also verify data quality and more. And product managers might get excited about being able to connect IoT data to end user experiences.

In IoT there's new failure modes, complete disconnection or data backing up for a long time. Are there existing patterns in CS for this or is this a totally new field?

I don't think this is a totally new field. What is different, not necessarily new, is the limit at which these things are getting pushed. I believe IoT has finally arrived. The buzz has been there for a while, and there has been some innovation. But limits are being tested now. If we think data is big now, it's going to get bigger. We need paranoia driven development. In IoT, you can trust nothing, all assumptions are tested. We need better tools for privacy, governance, access control, and auditing. We also need specialized data formats & compression.


Speaker

Shrijeet Paliwal

Sr. Staff Software Engineer @Tesla

Shrijeet is a founding member of data platform team at Tesla, tasked to build a scalable, self-served, cost-efficient data platform that caters everything data at Tesla. He has his hands all the way from improving server provisioning & automation to writing multi-petabyte time series storage....

Read more

Date

Tuesday Nov 17 / 01:50PM EST (40 minutes)

Track

Modern Data Engineering

Add to Calendar

Add to calendar

Share

From the same track

Session

Building Latency Sensitive User Facing Analytics via Apache Pinot

Tuesday Nov 17 / 02:40PM EST

Real-time analytics has become the need of the hour for modern Internet companies. The ability to derive internal insights around business metrics, user growth & adoption as well as security incidents from all the raw logs is crucial for day to day operation. Even more critical is enabling...

Chinmay Soman

PMC Member/Commiter @SamzaStream

PANEL DISCUSSION

Modern Data Engineering Panel

Tuesday Nov 17 / 03:30PM EST

Data Engineering is a vast field that concerns itself with efficient access to data based on the needs of a business. Though data is the prized entity from which a company extracts insights, data doesn't exist in a void. It first needs to be stored somewhere and then an API needs to be...

Shrijeet Paliwal

Sr. Staff Software Engineer @Tesla

Chinmay Soman

PMC Member/Commiter @SamzaStream

Chris Riccomini

Distinguished Engineer @WePay

Session

Serverless Search for My Blog With Java, Quarkus, & AWS Lambda

Tuesday Nov 17 / 01:00PM EST

A Serverless app? With Java?! Absolutely!We’ll discuss when Serverless is a great fit (and when it isn’t!) and why you don’t need to leave the Java platform when going Serverless. Based on the real-world example of a Serverless blog search, you’ll learn how Quarkus and...

Gunnar Morling

Open Source Software Engineer @RedHat

View full Schedule

Less than

22

weeks until QCon Plus May 2022

Level-up on the emerging software trends and practices you need to know about.

Deep-dive with world-class software leaders at QCon Plus (Nov 1-12, 2021).

Save your spot for $499 before January 10th

Register