Skip to main content
Engineering, Data / ML

Balancing HDFS DataNodes in the Uber DataLake

14 March / Global
Featured image for Balancing HDFS DataNodes in the Uber DataLake
Image
Image
Figure 2: One of our biggest clusters comprising around thousands of DataNodes with hundred PBs of capacity has skewed DataNodes.
Image
Figure 3: HDFS Balancer Architecture.
Image
Figure 4: Existing Algorithm.
Image
Figure 5: New Algorithm.
Image
Figure 6: New Hadoop Configuration for Defining Percentile.
Image
Figure 7: New Hadoop Configuration for Defining Aggressive Balancing.
Image
Figure 8: Old Algorithm – Pairs Formed.
Image
Figure 9: Old Algorithm – New Over-Utilized Nodes Came Up.
Image
Figure 10: New Algorithm – Preferred Optimization.
Image
Figure 11: Snapshots of our Metrics Dashboard.
Image
Figure 12: DataNodes at a similar level due to the algorithm change and below 85% utilization for our biggest cluster.
Image
Figure 13: Panels reflecting the DataNode skew is reduced.
Image
Figure 14: Before balancer algorithm changes – Datanodes with high usage above 90% are 50.8%.
Image
Figure 15: After balancer algorithm changes – Datanodes with high usage above 90% are below 0.
Image
Figure 16: One of our clusters with less cluster utilization around 65%.
Image
Figure 17: Cluster utilization increased to around 83% for the same cluster above.
Image
Figure 18: Increase in throughput by more than 3x due to algorithm changes.
Atul Kaushik

Atul Kaushik

Atul Kaushik is a Software Engineer II with the Data storage team at Uber. He has been working on optimizations related to DataNode balancing and developing HDFS Quota solutions for customers at Uber.

Yangjun Zhang

Yangjun Zhang

Yangjun is a Staff Software Engineer with the Data storage team at Uber. He has been working on the reliability, efficiency, and modernization improvement for the HDFS dataplane.

Posted by Atul Kaushik, Yangjun Zhang