Skip to main content
Engineering, Uber AI

An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

10 July 2018 / Global
Featured image for An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
Figure 1. (a) The Supervised Rendering task requires a network to paint a square given its (i, j) location. (b) Example data points and (c) a visualization of the train vs. test sets for uniform and quadrant splits.
Figure 2. (a) Train vs. test IOU for the supervised rendering task on both the uniform and quadrant split. No models reach an IOU of 1.0. (b) Training one of the better models takes 90 minutes to attain an IOU of 0.8.
Figure 3. (a) The Supervised Coordinate Classification task requires a network to paint a single pixel given its (i, j) location. (b) Example data points and (c) a visualization of train vs. test splits.
Figure 4. (a) Train vs. test accuracy for the supervised coordinate classification task on both the uniform and quadrant split. Though some models memorized the training set, none attain higher than 86 percent accuracy on the test set for the easier uniform split. This means convolution fails to generalize even one pixel away. (b) Training to this lackluster 86 percent accuracy takes over an hour.
Figure 5. Model predictions on a few neighboring pixels. The network overfits—train accuracy is perfect whereas test is 86 percent—which is all the more surprising as most test pixels are nearly completely surrounded by train pixels. Further, the network visibly struggles even to fit the train set, with significant probability leaking outside the target pixel.
Figure 6. Comparison of a convolutional layer and a CoordConv layer. The CoordConv layer takes as input additional channels filled with coordinate information, here, the i and j coordinates.
Figure 7. CoordConv quickly attains perfect performance on both splits of the Supervised Coordinate Classification task.
Figure 8. Many CoordConv models quickly attain perfect performance on both splits of the Supervised Rendering task.
Figure 9. As seen before, deconvolution struggles on the supervised coordinate classification task, whereas CoordConv attains 100 percent training and test accuracy. The solution is visibly simpler.
Figure 10. Convolution struggles to model the supervised regression task, whereas CoordConv models it well.
Figure 11. A walk in the latent space with ordinary convolutional GAN (left) and CoordConv GAN (right). In ordinary GAN we observe visual artifacts tied to the canvas, and bits of objects fading in and out. With CoordConv GAN, the objects are coherent and motion is smoother.
Figure 12. Another walk in the latent space with ordinary convolutional VAE (left) and CoordConv VAE (right). In ordinary VAE objects fade in and out, whereas CoordConv VAE have them move around smoothly.
Figure 13. A third walk in the latent space with ordinary convolutional GAN (left) and CoordConv GAN (right) trained on LSUN bedroom dataset. With convolution we again observe frozen objects fading in and out. With CoordConv, we instead see smooth geometric transformations, including translation and deformation.
Figure 14. Results using A2C to train on Atari games. Out of 9 games, (a) in 6 CoordConv improves over convolution, (b) in 2 performs similarly, and (c) on 1 it is slightly worse.
Rosanne Liu

Rosanne Liu

Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.

Joel Lehman

Joel Lehman

Joel Lehman was previously an assistant professor at the IT University of Copenhagen, and researches neural networks, evolutionary algorithms, and reinforcement learning.

Piero Molino

Piero Molino

Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.

Felipe Petroski Such

Felipe Petroski Such

Felipe Petroski Such is a research scientist focusing on deep neuroevolution, reinforcement learning, and HPC. Prior to joining the Uber AI labs he obtained a BS/MS from the RIT where he developed deep learning architectures for graph applications and ICR as well as hardware acceleration using FPGAs.

Eric Frank

Eric Frank

Before joining Uber AI Labs as a researcher, Eric invented AI oriented toys for Kite and Rocket Research. He was also a research assistant at the University of Rochester and makes art in his free time.

Alex Sergeev

Alex Sergeev

Alex Sergeev is a deep learning engineer on the Machine Learning Platform team.

Jason Yosinski

Jason Yosinski

Jason Yosinski is a former founding member of Uber AI Labs and formerly lead the Deep Collective research group.

Posted by Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, Jason Yosinski