Markus Wulfmeier


Not data, not compute, but people are the limiting resource for progress in machine learning and robotics. I am particularly interested in methods for increasing the efficiency of the process of providing supervision to our systems; which currently focuses on imitation, adaptation and transfer learning. (Please find more information under: Google Scholar or in my CV).


'Effortless Supervision - How to Increase the Efficiency of Optimising Behaviours.'

June 2017
July 2017

Stanford, Uber, Facebook AI Research
Google Brain, Zoox, Volkswagen ERL

'Efficient Supervision for Robot Learning via Imitation, Adaptation, and Simulation.'

February 2018

DeepMind, AIMS Center for Doctoral Training (Oxford University)

'Efficient Supervision for Robot Learning - Why Robots are Even Harder to Supervise than PhD Students' (General Audience)

February 2018

London Machine Learning Meetup


TACO: Learning Task Decomposition via Temporal Alignment for Control
Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.

Incremental Adversarial Domain Adaptation for Continually Changing Environments
Markus Wulfmeier, Alex Bewley, Ingmar Posner
IEEE International Conference on Robotics and Automation (ICRA) 2018

Continuous appearance shifts such as changes in weather and lighting conditions can impact the performance of deployed machine learning models. While unsupervised domain adaptation aims to address this challenge, current approaches do not utilise the continuity of the occurring shifts. In particular, many robotics applications exhibit these conditions and thus facilitate the potential to incrementally adapt a learnt model over minor shifts which integrate to massive differences over time. Our work presents an adversarial approach for lifelong, incremental domain adaptation which benefits from unsupervised alignment to a series of intermediate domains which successively diverge from the labelled source domain. We empirically demonstrate that our incremental approach improves handling of large appearance changes, e.g. day to night, on a traversable-path segmentation task compared with a direct, single alignment step approach. Furthermore, by approximating the feature distribution for the source domain with a generative adversarial network, the deployment module can be rendered fully independent of retaining potentially large amounts of the related source training data for only a minor reduction in performance.

Mutual Alignment Transfer Learning
Markus Wulfmeier, Ingmar Posner, Pieter Abbeel
Conference on Robot Learning (CoRL) 2017

Training robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning in games and simulations, research in real robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies in simulation, these can perform sub-optimally on the real platform given imperfect calibration of model dynamics. We present an approach - supplemental to fine tuning on the real robot - to further benefit from parallel access to a simulator during training. The developed approach harnesses auxiliary rewards to guide the exploration for the real world agent based on the proficiency of the agent in simulation and vice versa. In this context, we demonstrate empirically that the reciprocal alignment for both agents provides further benefit as the agent in simulation can adjust to optimize its behaviour for states commonly visited by the real-world agent.

Reverse Curriculum Generation for Reinforcement Learning
Carlos Florensa, David Held, Markus Wulfmeier, Pieter Abbeel
Conference on Robot Learning (CoRL) 2017

Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These tasks present considerable difficulties for reinforcement learning approaches, since the natural reward function for such goal-oriented tasks is sparse and prohibitive amounts of exploration are required to reach the goal and receive a learning signal. Past approaches tackle these problems by manually designing a task-specific reward shaping function to help guide the learning. Instead, we propose a method to learn these tasks without requiring any prior task knowledge other than obtaining a single state in which the task is achieved. The robot is trained in "reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent's performance, leading to efficient training on such tasks. We demonstrate our approach on difficult simulated fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.

Large-Scale Cost Function Learning for Path Planning using Deep Inverse Reinforcement Learning
Markus Wulfmeier, Dushyant Rao, Dominic Zeng Wang, Peter Ondruska, Ingmar Posner
The International Journal of Robotics Research (IJRR) 2017

We present an approach for learning spatial traversability maps for driving in complex, urban environments based on an extensive dataset demonstrating the driving behaviour of human experts. The direct end-to-end mapping from raw input data to cost bypasses the effort of manually designing parts of the pipeline, exploits a large number of data samples, and can be framed additionally to refine handcrafted cost maps produced based on manual hand-engineered features. To achieve this, we introduce a maximum-entropy-based, non-linear inverse reinforcement learning (IRL) framework which exploits the capacity of fully convolutional neural networks (FCNs) to represent the cost model underlying driving behaviours. The application of a high-capacity, deep, parametric approach successfully scales to more complex environments and driving behaviours, while at deployment being run-time independent of training dataset size. After benchmarking against state-of-the-art IRL approaches, we focus on demonstrating scalability and performance on an ambitious dataset collected over the course of 1 year including more than 25,000 demonstration trajectories extracted from over 120 km of urban driving. We evaluate the resulting cost representations by showing the advantages over a carefully, manually designed cost map and furthermore demonstrate its robustness towards systematic errors by learning accurate representations even in the presence of calibration perturbations. Importantly, we demonstrate that a manually designed cost map can be refined to more accurately handle corner cases that are scarcely seen in the environment, such as stairs, slopes and underpasses, by further incorporating human priors into the training framework.

Addressing Appearance Change in Outdoor Robotics with Adversarial Domain Adaptation
Markus Wulfmeier, Alex Bewley, Ingmar Posner
International Conference on Intelligent Robots and Systems (IROS) 2017

Appearance changes due to weather and seasonal conditions represent a strong impediment to the robust implementation of machine learning systems in outdoor robotics. While the model is optimised for the training domain it will deliver degraded performance in application domains that underlie distributional shifts caused by these changes. Traditionally, this problem has been addressed via the collection of labelled data in multiple domains or by imposing priors on the type of shift between both domains. We frame the problem in the context of unsupervised domain adaptation and apply an adversarial framework to train a deep neural network with the additional objective to align features across domains. This approach benefits from adding unlabelled data and is generally applicable to many state-of-the-art architectures. Moreover, as adversarial training is notoriously hard to stabilise, we first perform an extensive ablation study on a surrogate classification task underlying the same appearance change and then apply the distilled insights to the problem of free-space segmentation for motion planning.

Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments
Markus Wulfmeier, Dominic Zeng Wang, Ingmar Posner
International Conference on Intelligent Robots and Systems (IROS) 2016, Best Student Paper Award

In this work, we present an approach to learn cost maps for driving in complex urban environments from a very large number of demonstrations of driving behaviour by human experts. The learned cost maps are constructed directly from raw sensor measurements, bypassing the effort of manually designing cost maps as well as features. When deploying the learned cost maps, the trajectories generated not only replicate human-like driving behaviour but are also demonstrably robust against systematic errors in putative robot configuration. To achieve this we deploy a Maximum Entropy based, non-linear IRL framework which uses Fully Convolutional Neural Networks (FCNs) to represent the cost model underlying expert driving behaviour. Using a deep, parametric approach enables us to scale efficiently to large datasets and complex behaviours by being run-time independent of dataset extent during deployment. We demonstrate the scalability and the performance of the proposed approach on an ambitious dataset collected over the course of one year including more than 25k demonstration trajectories extracted from over 120km of driving around pedestrianised areas in the city of Milton Keynes, UK. We evaluate the resulting cost representations by showing the advantages over a carefully manually designed cost map and, in addition, demonstrate its robustness to systematic errors by learning precise cost-maps even in the presence of system calibration perturbations.

Incorporating Human Domain Knowledge into Large Scale Cost Function Learning
Markus Wulfmeier, Dushyant Rao, Ingmar Posner
Neural Information Processing Systems 2016, Deep Reinforcement Learning Workshop

Recent advances have shown the capability of Fully Convolutional Neural Networks (FCN) to model cost functions for motion planning in the context of learning driving preferences purely based on demonstration data from human drivers. While pure learning from demonstrations in the framework of Inverse Reinforcement Learning (IRL) is a promising approach, we can benefit from well informed human priors and incorporate them into the learning process. Our work achieves this by pretraining a model to regress to a manual cost function and refining it based on Maximum Entropy Deep Inverse Reinforcement Learning. When injecting prior knowledge as pretraining for the network, we achieve higher robustness, more visually distinct obstacle boundaries, and the ability to capture instances of obstacles that elude models that purely learn from demonstration data. Furthermore, by exploiting these human priors, the resulting model can more accurately handle corner cases that are scarcely seen in the demonstration data, such as stairs, slopes, and underpasses.

Maximum Entropy Deep Inverse Reinforcement Learning
Markus Wulfmeier, Peter Ondruska, Ingmar Posner
Neural Information Processing Systems 2015, Deep Reinforcement Learning Workshop

This paper presents a general framework for exploiting the representational capacity of neural networks to approximate complex, nonlinear reward functions in the context of solving the inverse reinforcement learning (IRL) problem. We show in this context that the Maximum Entropy paradigm for IRL lends itself naturally to the efficient training of deep architectures. At test time, the approach leads to a computational complexity independent of the number of demonstrations, which makes it especially well-suited for applications in life-long learning scenarios. Our approach achieves performance commensurate to the state-of-the-art on existing benchmarks while exceeding on an alternative benchmark based on highly varying reward structures. Finally, we extend the basic architecture - which is equivalent to a simplified subclass of Fully Convolutional Neural Networks (FCNNs) with width one - to include larger convolutions in order to eliminate dependency on precomputed spatial features and work on raw input representations.

Voronoi-Based Heuristic for Nonholonomic Search-Based Path Planning
Qi Wang, Markus Wulfmeier, Bernardo Wagner
Intelligent Autonomous Systems 13

This paper proposes the use of a Voronoi-based heuristic to significantly speed up search-based nonholonomic path planning. Using generalized Voronoi diagrams (GVD) and in this manner exploiting geometric information about the obstacles, the presented approach is able to considerably reduce computation time while satisfying differential constraints using motion primitives for exploration. A key advantage compared to the common use of Euclidean heuristics is the inherent ability to avoid local minima of the cost function, which can be caused by, e.g., concave obstacles. Therefore, the application of the Voronoi-based heuristic is particularly beneficial in densely cluttered environments.

Design and implementation of a particle image velocimetry method for analysis of running gear–soil interaction
Carmine Senatore, Markus Wulfmeier, Ivan Vlahinić, Jose Andrade, Karl Iagnemma
Journal of Terramechanics 2013, Vol. 50, p. 311-326

Experimental analysis of running gear–soil interaction traditionally focuses on the measurement of forces and torques developed by the running gear. This type of measurement provides useful information about running gear performance but it does not allow for explicit investigation of soil failure behavior. This paper describes a methodology based on particle image velocimetry for analyzing soil motion from a sequence of images. A procedure for systematically identifying experimental and processing settings is presented. Soil motion is analyzed for a rigid wheel traveling on a Mars regolith simulant while operating against a glass wall, thereby imposing plain strain boundary conditions. An off-the-shelf high speed camera is used to collect images of the soil flow. Experimental results show that it is possible to accurately compute soil deformation characteristics without the need of markers. Measured soil velocity fields are used to calculate strain fields.

Investigation of stress and failure in granular soils for lightweight robotic vehicle applications
Carmine Senatore, Markus Wulfmeier, Jamie MacLennan, Paramsothy Jayakumar, Karl Iagnemma
Proceedings of the Ground Vehicle Systems Engineering and Technology Symposium, 2012, Best Paper Award

This paper describes novel experimental methods aimed at understanding the fundamental phenomena governing the motion of lightweight vehicles on dry, granular soils. A single-wheel test rig is used to empirically investigate wheel motion under controlled wheel slip and loading conditions on sandy, dry soil. Test conditions can be designed to replicate typical field scenarios for lightweight robots, while key operational parameters such as drawbar force, torque, and sinkage are measured. This test rig enables imposition of velocities, or application of loads, to interchangeable running gears within a confined soil bin of dimensions 1.5 m long, 0.7 m wide, and 0.4 m deep. This allows testing of small-scale wheels, tracks, and cone or plate penetrators. Aside from standard wheel experiments (i.e., measurements of drawbar force, applied torque, and sinkage during controlled slip runs) two additional experimental methodologies have been developed. The first relies on high-speed imaging of the wheel-soil interface and the use of particle image velocimetry (PIV) to measure micro-scale terrain kinematics. The second experimental methodology consists of a custom force sensor array located at the wheel-terrain interface. The sensors allow explicit measurement of normal and shear forces (and, therefore, estimation of normal and shear stresses) at numerous discrete points along the wheel-soil interface. Experimental measurements gathered by these test methodologies are to be compared against well-established semi-empirical models, to validate and understand limitations of the models and propose improvements.