My Profile Photo

Sebastian Blaes


Postdoctoral Researcher in the Autonomous Learning Group at the Max Planck Institute for Intelligent Systems, Tuebingen.


Cycling, running, trekking, mountaineering enthusiast.


Control What You Can: Intrinsically Motivated Task-Planning Agent


This project studies how a hierarchical, intrinsically motivated RL agent can gain control over object-manipulation environments without external supervision.

What makes the exploration challenging in this type of environments are the sparse object-object and object-agent interactions. For instance, in the video shown above, the agent is only able to manipulate the heavy anvil if the agent is already in possession of the forklift. However, the likelihood of picking up the forklift, bringing it to the anvil, and eventually transporting the anvil to its target location is very low if the agent explores the environment just randomly.

In nature, we can observe that humans and many other animals solve complex, multi-stage tasks by breaking down the individual tasks in smaller, easier-to-accomplish subtasks which can be solved one after another. The reason for that is that the subtasks can be solved by executing much easier subroutines than the routine that would be necessary to solve the original task in one go.

For instance, if a monkey wants to eat a nut, the monkey has to backtrack the steps necessary to solve the task. That is, it needs a tool that can crack the nut open, it needs to bring the tool to the nut, but before the money has to find the tool somewhere in the environment.

It turns out, that many real world tasks have this compositional nature and clear task boundaries that can be exploited by an agent to solve a task much more efficient.

Figure 1: Architecture of the CWYC agent

Figure 1: Architecture of the CWYC agent

Figure 1 shows the architecture of the Control What You Can (CWYC) agent. We impose certain inductive biases on the architecture of the agent that facilitate the learning of useful subroutines in the environment.

An intrinsic motivation (IM) module guides the agent’s exploration during a free-play phase in which the agent can freely explore the environment without external reward. In the literature, different types of intrinsic motivation were proposed. Competence-based IM methods are all about control. For instance, whether an agent is able to reach self-imposed goals or the repertoire of skills the agent acquired over time. Knowledge-based IM is about the agent’s predictions about and actual experience in the environment. The CWYC agent is primarily driven by it’s desire to maximize its overall learning progress. Where this signal is sparse, especially in the beginning of the training, the agent uses the prediction error of it’s internal model of the dynamics of the world as an additional signal to bootstrap from.

One important inductive is on the structure of the internal state-space of the agent. The agent has an object-centric state representation that factorizes into the different entities in the environment (e.g. the agent, forklift, anvil, drone and cone). However, although the state-space factorizes into the different objects in the environment, the agent has no knowledge about the semantic meaning of the different factors (e.g. which state components belong to which particular object in the environment).

Bibtex:

@inproceedings{NEURIPS2019_b6f97e6f,
author = {Blaes, Sebastian and Vlastelica Pogan\v{c}i\'{c}, Marin and Zhu, Jiajie and Martius, Georg},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {},
publisher = {Curran Associates, Inc.},
title = {Control What You Can: Intrinsically Motivated Task-Planning Agent},
url = {https://proceedings.neurips.cc/paper/2019/file/b6f97e6f0fd175613910d613d574d0cb-Paper.pdf},
volume = {32},
year = {2019}
}
  • Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation