Skip to main content
Uber AI

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

May 6, 2020 / Global
Featured image for Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Image
Figure 1. POET continually produces new and increasingly complex environments, along with paired agents that learn to solve them.
Figure 2. A sample CPPN (left) and its generated landscape (right). The CPPN produces y coordinates given each x coordinate, which are then rendered into a bipedal walker environment that is modified from .
Figure 3. The emergence of bumps in the landscape induces a different ranking of agents with different walking gaits. For example, an agent that walks with one leg raised is not energy-efficient on flat ground and thus ranks last, but that gait happens to enable it to step over high bumps and thus ranks at the top in a more rugged environment.
Figures 4a and 4b. A comparison of sample environments from (a) the simple, hand-designed encoding in the Original POET and (b) CPPN-encoded environments created and solved by Enhanced POET.

Figure 5. POET builds an ever-expanding tree of diverse environments in each run. Each node represents a unique environmental challenge that POET invented. Within each node, the overall shape of each 2D obstacle course is depicted by a (very small) line plot. For a high-resolution version, refer to Figure 10 of our accompanying . In the animation, the tree grows from its root in the order environments were added over the course of an Enhanced POET run.
Figure 6. Comparing the ANNECS metric for Original POET and Enhanced POET. The Original POET gradually loses the ability to create meaningfully new challenges and thus plateaus after around 20,000 iterations. By contrast, the Enhanced POET maintains an ability to innovate, as evidenced by its ANNECS score consistently increasing. Plotted is the median across five runs (solid lines) and 95 percent bootstrapped confidence intervals of the median (shaded regions).
Rui Wang

Rui Wang

Rui Wang is a senior research scientist with Uber AI. He is passionate about advancing the state of the art of machine learning and AI, and connecting cutting-edge advances to the broader business and products at Uber. His recent work at Uber was published on leading international conferences in machine learning and AI (ICML, IJCAI, GECCO, etc.), won a Best Paper Award at GECCO 2019, and was covered by technology media such as Science, Wired, VentureBeat, and Quanta Magazine.

Joel Lehman

Joel Lehman

Joel Lehman was previously an assistant professor at the IT University of Copenhagen, and researches neural networks, evolutionary algorithms, and reinforcement learning.

Aditya Rawal

Aditya Rawal

Aditya Rawal is a research scientist at Uber AI Labs. His interests lies at the convergence of two research fields - neuroevolution and deep learning. His belief is that evolutionary search can replace human ingenuity in creating next generation of deep networks. Previously, Aditya received his MS/PhD in Computer Science from University of Texas at Austin, advised by Prof. Risto Miikkulainen. During his PhD, he developed neuroevolution algorithms to evolve recurrent architectures for sequence-prediction problems and construct multi-agent systems that cooperate, compete and communicate.

Jiale Zhi

Jiale Zhi

Jiale Zhi is a senior software engineer with Uber AI. His area of interest is distributed computing, big data, scientific computation, evolutionary computing, and reinforcement learning. He is also interested in real-world applications of machine learning in traditional software engineering. He is the creator of the Fiber project, a scalable, distributed framework for large scale parallel computation applications. Before Uber AI, he was a Tech Lead in Uber's edge team, which manages Uber's global mobile network traffic and routing.

Yulun Li

Yulun Li

Yulun Li previously worked as a software engineer with Uber AI.

Jeff Clune

Jeff Clune

Jeff Clune is the former Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming, a Senior Research Manager and founding member of Uber AI Labs, and currently a Research Team Leader at OpenAI. Jeff focuses on robotics and training neural networks via deep learning and deep reinforcement learning. He has also researched open questions in evolutionary biology using computational models of evolution, including studying the evolutionary origins of modularity, hierarchy, and evolvability. Prior to becoming a professor, he was a Research Scientist at Cornell University, received a PhD in computer science and an MA in philosophy from Michigan State University, and received a BA in philosophy from the University of Michigan. More about Jeff’s research can be found at JeffClune.com

Kenneth O. Stanley

Kenneth O. Stanley

Before joining Uber AI Labs full time, Ken was an associate professor of computer science at the University of Central Florida (he is currently on leave). He is a leader in neuroevolution (combining neural networks with evolutionary techniques), where he helped invent prominent algorithms such as NEAT, CPPNs, HyperNEAT, and novelty search. His ideas have also reached a broader audience through the recent popular science book, Why Greatness Cannot Be Planned: The Myth of the Objective.

Posted by Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley

Category: