NVIDIA: Accelerating Deep Learning with Uber’s Horovod
November 14, 2018 / GlobalNVIDIA, inventor of the GPU, creates solutions for building and training AI-enabled systems. In addition to providing hardware and software for much of the industry’s AI research, NVIDIA is building an AI computing platform for developers of self-driving vehicles. With over 370 automotive companies among their self-driving partners, NVIDIA has established itself as the industry leader in creating systems for sensing, perceiving, mapping, and driving this next generation of driverless transportation.
The AI perception models need to be trained under intense conditions, across eight Volta GPUs inside DGX-1 servers to ensure that the vehicles using them can reliably assess and safely react to the world around them. To determine the performance capacities of their GPUs, Tim Zaman, deep learning software engineer at NVIDIA, and his team leverage machine learning software that enables each new generation of GPU to work faster and more efficiently both individually and as part of a distributed system.
“Researchers just want something that works, is fast, has a straightforward API, and is simple to use,” said Tim. “At the end of the day, they don’t want to have to worry about the software so that they can just focus on their research.”
Horovod, Uber’s open source distributed deep learning system, was a clear choice for NVIDIA. With only a few lines of code, Horovod allowed them to scale from one to eight GPUs, optimizing model training for their self-driving sensing and perception technologies, leading to faster, safer systems.
Large-scale GPU training
NVIDIA assessed a variety of options when it came to selecting a framework that could meet these needs. At first, they could only train non-parallel workloads on a single device, making distributed training for autonomous technologies extremely difficult.
To ensure that their GPUs are battle tested for handling high performance training and can adapt to the ever-evolving nature of deep learning, NVIDIA needed an API that was easy-to-use, quick to iterate on, and could be distributed across entire workloads. Horovod presented the ultimate solution.
In fact, to develop Horovod, the Uber team leveraged some of NVIDIA’s open source software and hardware, including NCCL, an open source low-level API used to communicate between GPUs. Horovod’s seamless implementation in their GPUs was another testament to the natural partnership between the two AI-focused companies.
“Working with the NCCL team and the rest of NVIDIA was a true pleasure,” said Alex Sergeev, Horovod project lead. “We launched this collaboration over a year ago when NCCL 2 was entering early access phase and, through this collaboration, we were able to quickly build a solution that improved on both usability and performance aspects of distributed deep learning. Anytime we had an issue or suggestion, the NCCL team was there to make the product better for end users.”
Enter Horovod
According to Tim, Horovod far outperformed any other high-level library they had previously tried. For Zaman, usability and speed were Horovod’s key differentiating factors.
“It’s actually quite remarkable that Horovod scored so well on these two metrics because usually you make a tradeoff where it’s very usable but a lot slower, or vice versa,” said Tim. “Horovod brings together a lot of pieces into one package that was easy to use and generated great performance for our team.”
As part of their research, Tim’s team optimized their GPUs in 2017 to work with TensorFlow, a popular and widely used distributed training framework for deep learning, and the set-up has been stable ever since. However, a frequent complaint from their users was that TensorFlow code, when parallelized, is prone to user-error and hard to reason about. Horovod filled a big gap in this process by making TensorFlow easy to work with, particularly when it came to distributed training. According to Tim, Horovod’s ease-of-use and simplicity drove changes made by the TensorFlow team themselves to ensure more user friendly multi-device distribution.
Using Horovod
As NVIDIA continues training on their GPUs, Horovod becomes ever more important to the robust development of its autonomous solutions. NVIDIA leverages Horovod for training perception models processed by its DGX Systems. Building such systems demands an infrastructure capable of training thousands of hours of data and millions of images via deep learning and AI.
At NVIDIA, Horovod training jobs are run on their DGX SATURNV cluster. From there, it runs in Docker containers (hosted on NGC) on pre-made Docker images that include deep learning frameworks, configured to be highly optimized. To train their self-driving systems, they use TensorFlow images that come with Horovod pre-installed on them alongside CUDA, CuDNN, and NCCL. With Horovod, researchers experience a scaling factor greater than seven times on an eight GPU system, with hundreds of multi-GPU jobs launched per day per perception model (e.g., lane detector, road signs, etc.). They automate the process of launching jobs and finding optimized parameters using MagLev, NVIDIA’s AI training and inference infrastructure.
Specifically, Horovod exposes a few low and high-level primitives that are easy for most deep learning practitioners to use. One example, Tim notes, is called average use, which takes a tensor (a value of all the distributed tasks that are running) and returns the reduction of that (in other words, the mean). Horovod allows users to return the value average across all nodes using one line of code. A high-level example is the optimizer object, which takes care of the training in TensorFlow; Horovod offers a one-line optimizer that enables developers to train across distributed nodes, affording greater speed and resource optimization.
Scaling AI with NVIDIA and Horovod
Once implemented in their self-driving perception specs, NVIDIA was able to iterate quickly, receiving nearly immediate assistance from Uber’s Horovod team when issues arose. Over time, support for Keras and PyTorch were added to Horovod, offering even more expansive opportunities for NVIDIA’s deep learning training.
“It’s very important that an open source project is maintained, and with Horovod, we have no doubt that our questions will be answered as soon as possible,” Tim said. “We really enjoy working with this team and seeing where Horovod can take us.”
As NVIDIA continues to develop self-driving systems for production deployment, the team looks forward to leveraging Horovod to build GPU and software technologies that power safer, smarter autonomous vehicles.
Learn more about Horovod and other Uber Open Source projects!
Interested in working on Horovod? Apply for a role on our Seattle-based team!
Molly Vorwerck
Molly Vorwerck is the Eng Blog Lead and a senior program manager on Uber's Tech Brand Team, responsible for overseeing the company's technical narratives and content production. In a previous life, Molly worked in journalism and public relations. In her spare time, she enjoys scouring record stores for Elvis Presley records, reading and writing fiction, and watching The Great British Baking Show.
Posted by Molly Vorwerck
Related articles
Most popular
Unified Checkout: Streamlining Uber’s Payment Ecosystem
The Accounter: Scaling Operational Throughput on Uber’s Stateful Platform
Introducing the Prompt Engineering Toolkit
Serving Millions of Apache Pinot™ Queries with Neutrino
Products
Company