Skip to main content
Engineering, Data / ML

Introduction to Kafka Tiered Storage at Uber

1 July 2024 / Global
Featured image for Introduction to Kafka Tiered Storage at Uber
Image
Figure 1: Uber’s Data Pipeline.
Image
Figure 2: End to end interaction of Kafka broker with tiered storage.
Image
Figure 3: Local log offsets and remote log offsets.
Image
Figure 4: High level architecture of Kafka tiered storage components.
Image
Figure 5: The above diagram depicts a topic partition’s log segments with their respective start offsets. Before tiered storage is enabled, there will not be any segments in the remote storage. 
Image
Figure 6: The above diagram depicts the eligible segments started copying to remote storage after tiered storage is enabled for that topic.
Image
Figure 7: The above diagram depicts some of the segments in the local storage were deleted based on the local retention configuration. We can see that segments earlier to offset 300 were deleted, but those segments are available in remote storage.
Image
Figure 8: The above diagram depicts the cleaning up of remote log segments based on the complete log retention configuration. Here, segments earlier to offset 200 were deleted. 
Image
Figure 9: Remote fetch path.
Satish Duggana

Satish Duggana

Satish Duggana is a Sr Staff Software Engineer on Uber's Data Team. He leads realtime streaming teams in Bangalore building scalable, reliable, and efficient infrastructure. He is a Committer and PMC Member for Apache Kafkanand Apache Storm.

Kamal Chandraprakash

Kamal Chandraprakash

Kamal Chandraprakash is a Senior Software Engineer on Uber's Data Team. He works on building scalable, reliable and performant streaming systems.

Abhijeet Kumar

Abhijeet Kumar

Abhijeet Kumar is a Staff Software Engineer/TLM and leads Uber's Kafka Team in Bangalore. He works on building scalable, reliable, and performant streaming systems.

Posted by Satish Duggana, Kamal Chandraprakash, Abhijeet Kumar