Skip to main content
Engineering

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

12 March 2017 / Global
Featured image for Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop
Figure 1: Lambda architecture requires double compute and double serving.
Figure 2: Kappa architecture simplifies computing by unifying processing, but serving complexity still exists.
Figure 3: Hudi simplifies serving for workloads tolerating minute-level latency.
Figure 4: The above diagram demonstrates the distribution of use-cases across different latencies and completeness levels at Uber.

Figure 5: Hudi Storage Internals. The above Hudi Storage diagram depicts a commit time in YYYYMMDDHHMISS format and can be simplified as HH:SS.
Figure 6: Hudi datasets filter the latest versions and merges them with the log before serving records.
Figure 7: Hudi enables chaining computations so modeled tables can be served in Hadoop.
Vinoth Chandar

Vinoth Chandar

Vinoth Chandar is a staff engineer on Uber's Mobile Engineering team.

Posted by Prasanna Rajaperumal, Vinoth Chandar

Category: