Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.
We present a novel intrinsically motivated agent that learns how to control the environment in the fastest possible manner by optimizing learning progress. It learns what can be controlled, how to allocate time and attention, and the relations between objects using surprise based motivation. The effectiveness of our method is demonstrated in a synthetic as well as a robotic manipulation environment yielding considerably improved performance and smaller sample complexity. In a nutshell, our work combines several task-level planning agent structures (backtracking search on task graph, probabilistic road-maps, allocation of search efforts) with intrinsic motivation to achieve learning from scratch.
Training a deep convolution neural network (CNN) to succeed in visual object classification usually requires a great number of examples. Here, starting from such a pre-learned CNN, we study the task of extending the network to classify additional categories on the basis of only few examples (“few-shot learning”). We find that a simple and fast prototype-based learning procedure in the global feature layers (“Global Prototype Learning”, GPL) leads to some remarkably good classification results for a large portion of the new classes. It requires only up to ten examples for the new classes to reach a plateau in performance. To understand this few-shot learning performance resulting from GPL as well as the performance of the original network, we use the t-SNE method (Maaten and Hinton, 2008) to visualize clusters of object category examples. This reveals the strong connection between classification performance and data distribution and explains why some new categories only need few examples for learning while others resist good classification results even when trained with many more examples.
Deep convolution networks are extended with an oscillatory phase dynamics and recurrent couplings that are based on convolution and deconvolution. Moreover, top-down modulation is included that enforces the dynamical selection and grouping of features of the recognized object into assemblies based on temporal coherence. With respect to image processing, it is demonstrated how the combination of these mechanisms allow for the segmentation of the parts of the objects that are relevant for its classification
An implementation of attentional bias is presented for a network model that couples excitatory and inhibitory oscillatory units in a manner that is inspired by the mechanisms that generate cortical gamma oscillations. Attentional biases are implemented as oscillatory coherences between excitatory units that encode the spatial location or features of the target and the pool of inhibitory units. This form of attentional bias is motivated by neurophysiological findings that relate selective attention to spike field coherence. Including also pattern recognition mechanisms, we demonstrate how this implementation of attentional bias leads to selection of an attentional target while suppressing distracters for cases of spatial and feature-based attention. With respect to neurophysiological observations, we argue that the recently found positive correlation between high firing rates and strong gamma locking with attention (Vinck, Womelsdorf, Buffalo, Desimone, & Fries, 2013) may point to an essential mechanism of the brain’s attentional selection and suppression processes.