Skip to main content
Engineering, Data / ML, Uber AI

Scaling AI/ML Infrastructure at Uber

March 28 / Global
Featured image for Scaling AI/ML Infrastructure at Uber
Image
Fig 1: Unified federation layer for ML workload allocation.
Image
Fig2: Training efficiency improvements through network link capacity upgrades.
Image
Fig 3: Deep learning training and serving performance-price evaluation.
Image
Fig 4: Deep learning serving latency with and without TensorRT optimizations.
Image
Fig 5: LLM serving latency comparison by framework (H100).
Image
Fig 6: LLM serving throughput comparison by framework using the same latency budget and minimum number of GPUs required (H100).
Image
Fig 7: Design framework for memory offload experimentation.
Image
Fig 8: Training efficiency implementing deepspeed memory offload optimization.
Nav Kankani

Nav Kankani

Nav is a Platform Architect on Uber's infrastructure team. He has been working in the areas of AI/ML, hyperscale cloud platforms, storage systems, and the semiconductor industry for the past 21 years. He earned a Master’s degree in electrical and computer engineering from the University of Arizona and an MBA from Hamline University. He is also named inventor on 21+ US patents.

Rush Tehrani

Rush Tehrani

Rush is an Engineering Manager on the AI Platform Team at Uber. He supports the teams responsible for deployment and serving of classical, deep learning, and generative AI models, machine learning on mobile, and generative AI API gateway. Before joining Uber, he was the founder of Onepanel, an open source, Kubernetes native computer vision platform.

Anant Vyas

Anant Vyas

Anant Vyas is the tech lead of AI Infrastructure at Uber, where his focus is on maximizing the performance and reliability of their extensive computing resources. Prior to this role, he contributed to the Compute Platform team, specializing in the development of resource scheduling systems.

Posted by Nav Kankani, Rush Tehrani, Anant Vyas