My Profile Photo

Sebastian Blaes

Ph.D. candidate in the Autonomous Learning Group at the Max Planck Institute for Intelligent Systems, Tuebingen.

Cycling, running, trekking, mountaineering enthusiast.


  • We introduce a simple but effective method for managing risk in zero-order trajectory optimization that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks. Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.

    ...   ...   ...  
  • Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner’s behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping.

    ...   ...   ...  
  • Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.

    ...   ...   ...  
  • We present a novel intrinsically motivated agent that learns how to control the environment in the fastest possible manner by optimizing learning progress. It learns what can be controlled, how to allocate time and attention, and the relations between objects using surprise based motivation. The effectiveness of our method is demonstrated in a synthetic as well as a robotic manipulation environment yielding considerably improved performance and smaller sample complexity. In a nutshell, our work combines several task-level planning agent structures (backtracking search on task graph, probabilistic road-maps, allocation of search efforts) with intrinsic motivation to achieve learning from scratch.

    ...   ...   ...  
  • Training a deep convolution neural network (CNN) to succeed in visual object classification usually requires a great number of examples. Here, starting from such a pre-learned CNN, we study the task of extending the network to classify additional categories on the basis of only few examples (“few-shot learning”). We find that a simple and fast prototype-based learning procedure in the global feature layers (“Global Prototype Learning”, GPL) leads to some remarkably good classification results for a large portion of the new classes. It requires only up to ten examples for the new classes to reach a plateau in performance. To understand this few-shot learning performance resulting from GPL as well as the performance of the original network, we use the t-SNE method (Maaten and Hinton, 2008) to visualize clusters of object category examples. This reveals the strong connection between classification performance and data distribution and explains why some new categories only need few examples for learning while others resist good classification results even when trained with many more examples.

  • Deep convolution networks are extended with an oscillatory phase dynamics and recurrent couplings that are based on convolution and deconvolution. Moreover, top-down modulation is included that enforces the dynamical selection and grouping of features of the recognized object into assemblies based on temporal coherence. With respect to image processing, it is demonstrated how the combination of these mechanisms allow for the segmentation of the parts of the objects that are relevant for its classification

  • An implementation of attentional bias is presented for a network model that couples excitatory and inhibitory oscillatory units in a manner that is inspired by the mechanisms that generate cortical gamma oscillations. Attentional biases are implemented as oscillatory coherences between excitatory units that encode the spatial location or features of the target and the pool of inhibitory units. This form of attentional bias is motivated by neurophysiological findings that relate selective attention to spike field coherence. Including also pattern recognition mechanisms, we demonstrate how this implementation of attentional bias leads to selection of an attentional target while suppressing distracters for cases of spatial and feature-based attention. With respect to neurophysiological observations, we argue that the recently found positive correlation between high firing rates and strong gamma locking with attention (Vinck, Womelsdorf, Buffalo, Desimone, & Fries, 2013) may point to an essential mechanism of the brain’s attentional selection and suppression processes.

© 2021 Sebastian Blaes. All rights reserved.