Skip to main content
Engineering, Uber AI

Faster Neural Networks Straight from JPEG

11 December 2018 / Global
Featured image for Faster Neural Networks Straight from JPEG
Figure 1. The JPEG encoding process consists of several stages, here shown right to left.
Figure 2. In a typical training procedure, a JPEG image is decompressed to RGB and then fed into a neural network.
Figure 3. Our proposed process: decompress JPEG images only to the DCT representation stage, then feed this representation directly into a neural network. As we’ll see, the frequency representation allows us to skip the first portion of the network, saving computation, and, compared to using raw pixels, results in networks with higher accuracy!
Figure 4. In this graph, we map the relationship between top-five error and speed in images per second for two networks trained from pixels.
Figure 5. Different trade-off curves are available when making ResNet-50 shorter or thinner but still using RGB input.
Figure 6. In the general form used for networks taking DCT input, T1 and T2 may be arbitrary learned or non-learned transforms.
Figure 7. Using a DCT representation and simply merging data streams as early as possible—with a single layer—results in a shifted Pareto front with both faster and more accurate networks.
Figure 8. Late-Concat models push the Pareto front forward again. By allowing significantly deeper T1 than T2—more computation along the Y path than Cb/Cr paths—we obtain the best speed/accuracy tradeoffs. Late-Concat-RFA-Thinner is 1.77x faster than a vanilla ResNet-50 at about the same accuracy.
Figure 9. First layer features learned by ResNet-50 with RGB pixel input. Many edge detectors are primarily black and white, operating in luminance space. Many color features are either constant over space or lower frequency and may serve just to pass rough color information to higher layers where it will be needed. We’ve ; should we have expected all along for color not to be needed until later in the network?
Lionel Gueguen

Lionel Gueguen

Lionel Gueguen is a senior software engineer with Uber ATG.

Rosanne Liu

Rosanne Liu

Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.

Alex Sergeev

Alex Sergeev

Alex Sergeev is a deep learning engineer on the Machine Learning Platform team.

Jason Yosinski

Jason Yosinski

Jason Yosinski is a former founding member of Uber AI Labs and formerly lead the Deep Collective research group.

Posted by Lionel Gueguen, Rosanne Liu, Alex Sergeev, Jason Yosinski