Skip to main content
Engineering

How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber)

December 22, 2021 / Global
Featured image for How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber)
Image
Figure 1: GC CPU cost of Example Service #1
Image
Figure 2: GC CPU cost of Example Service #1
Image
Figure 3: Example heap with default configuration.
Figure 4: Normal operation. Default configuration on the left, manually tuned on the right.
Figure 5: Double the load. Default configuration on the left, manually tuned on the right.
Figure 6: Double the load, but using the tuner. Default configuration on the left, GOGCTuner tuned on the right.
Figure 7: Graph for intervals between GCs.
Figure 8: Graph for p99 GC CPU cost.
Figure 9: Graph for estimated p99 live dataset.
Figure 10: Graph for min, p50, p99 GOGC value assigned to the application by the tuner.
Figure 11: Example code for GC triggered events.
Figure 12: Observability service that operates on thousands of compute cores with high standard deviation for live_dataset (max value was 10X of the lowest value), showed ~65% reduction in p99 CPU utilization.
Figure 13: Mission critical Uber eats service that operates on thousands of compute cores, showed ~30% reduction in p99 CPU utilization.
Cristian Velazquez

Cristian Velazquez

Cristian Velazquez is a Staff Software Engineer on the Maps Production Engineering team at Uber. He works on multiple efficiency initiatives across multiple organizations. He has done several tuning across multiple services and multiple Java versions.

Posted by Cristian Velazquez

Category: