How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber)
December 22, 2021 / GlobalIntroduction
As part of Uber engineering’s wide efforts to reach profitability, recently our team was focused on reducing cost of compute capacity by improving efficiency. Some of the most impactful work was around GOGC optimization. In this blog we want to share our experience with a highly effective, low-risk, large-scale, semi-automated Go GC tuning mechanism.
Uber’s tech stack is composed of thousands of microservices, backed by a cloud-native, scheduler-based infrastructure. Most of these services are written in Go. Our team, Maps Production Engineering, has previously played an instrumental role in significantly improving the efficiency of multiple Java services by tuning GC. At the beginning of 2021, we explored the possibilities of having a similar impact on Go-based services. We ran several CPU profiles to assess the current state of affairs and we found that GC was the top CPU consumer for a vast majority of mission-critical services. Here is a representation of some CPU profiles where GC (identified by the runtime.scanobject method) is consuming a significant portion of allocated compute resources.
Service #1
Service #2
Emboldened by this finding, we commenced to tune GC for the relevant services. To our delight, Go’s GC implementation and the simplicity of tuning allowed us to automate the bulk of the detection and tuning mechanism. We detail our approach and its impact in the following sections.
GOGC Tuner
Go runtime invokes a concurrent garbage collector at periodic intervals unless there is a triggering event before it. The triggering events are based on memory back pressure. Due to this, GC-impacted Go services benefit from more memory, since it reduces the times GC has to run. In addition, we realized that our host-level CPU to memory ratio is 1:5 (1 core : 5 GB RAM), while most Golang services were configured with a 1:1 to 1:2 ratio. Therefore, we were confident that we could leverage using more memory to reduce GC CPU impact. This is a service-agnostic mechanism that can yield a large impact when applied judiciously.
Delving deep into Go’s garbage collection is beyond the scope of this article, but here are the relevant bits for this work: Garbage collection in Go is concurrent and involves analyzing all objects to identify which ones are still reachable. We would call the reachable objects the “live dataset.” Go offers only one knob, GOGC, expressed in percentage of live dataset, to control garbage collection. The GOGC value acts as a multiplier for the dataset. The default value of GOGC is 100%, which means Go runtime will reserve the same amount of memory for new allocations as the live dataset. For instance:
hard_target = live_dataset + live_dataset * (GOGC / 100).
Then the pacer is in charge of predicting when it is the best time to trigger GC to avoid hitting the hard target (soft target).
Dynamic and Diverse: One Size Does Not Fit All
We identified that a fixed GOGC value-based tuning is not suitable for services at Uber. Some of the challenges are:
- It is not aware of the maximum memory assigned to the container and can cause out of memory issues.
- Our microservices have a significantly diverse memory utilization portfolio. For example, a sharded system can have very different live datasets. We experienced this in one of our services where the p99 utilization was 1G but the p1 was 100MB, therefore the 100MB instances were having a huge GC impact.
A Case for Automation
The pain points previously presented are the reason for the conception of GOGCTuner. GOGCTuner is a library which simplifies the process of tuning garbage collection for service owners and adds a reliability layer on top of it.
GOGCTuner dynamically computes the correct GOGC value in accordance with the container’s memory limit (or the upper limit from the service owner) and sets it using Go’s runtime API. Following are the specifics of the GOGCTuner library’s features:
- Simplified configuration for easier reasoning and deterministic calculations. GOGC at 100% is not clear for beginner Go developers and it is not deterministic, because it still depends on the live dataset. On the other hand a 70% limit ensures that the service is always going to use 70% of the heap space.
- Protection against OOMs (Out Of Memory): The library reads the memory limit from the cgroup and uses a default hard limit of 70%, a safe value from our experience.
-
-
- It is important to note that there is a limit to this protection. The tuner can only adjust the buffer allocation, so if your service live objects are higher than the limit the tuner would set a default lower limit of 1.25X your live objects utilization.
-
-
- Allow higher GOGC values for corner cases like:
-
-
- As we mentioned above, manual GOGC is not deterministic. We are still relying on the size of the live dataset. What if live_dataset doubles our last peak value? GOGCTuner would enforce the same memory limit at the cost of more CPU. Manual tuning instead could cause OOMs. Therefore, service owners used to give plenty of buffer for these types of scenarios. See the example below:
-
-