Skip to main content
Engineering

Building a Large-scale Transactional Data Lake at Uber Using Apache Hudi

June 9, 2020 / Global
Featured image for Building a Large-scale Transactional Data Lake at Uber Using Apache Hudi
Figure 1. Apache Hudi ingests change logs, events and incremental streams to serve different use cases by exposing different views on the table.
Figure 2. Hudi’s copy-on-write feature enables us to perform file level updates, improving data freshness drastically.
Figure 3. The Apache Hudi team at Uber developed a data compaction strategy for merge-on-read tables to convert recent partitions in a columnar format frequently, thereby limiting query side compute cost.
Figure 4. Apache Hudi use cases include data analytics and infrastructure health monitoring.
Nishith Agarwal

Nishith Agarwal

Nishith Agarwal currently leads the Hudi project at Uber and works largely on data ingestion. His interests lie in large scale distributed systems. Nishith is one of the initial engineers of Uber’s data team and helped scale Uber's data platform to over 100 petabytes while reducing data latency from hours to minutes.

Posted by Nishith Agarwal

Category: