Instructions:From the left column, select the step in the indicated move of the MICE Model that best corresponds to each group of sentences in this introduction. The feedback is shown at the bottom.

Creative Commons License Adapted from Jens Lundell, "Dynamic movement primitives and reinforcement learning for adapting a learned skill," Master's thesis, Dept. of Elect. Eng., Aalto Univ., Espoo, Finland, 2016.

1 Introduction (steps within moves)

Step Move 1: Current situation
Step 1
Step 2
Step 3
1Robots are used in human environments today more than ever before.
Step Move 1: Current situation
Step 1
Step 2
Step 3
2Examples of such robots include simple vacuum cleaning or lawn mowing robots. 3Although robots have become increasingly popular, they are still mainly used in industrial settings as welders and assemblers. 4The reason for robots excelling in industrial settings is that the environment is for the most part static and that the task at hand is known a priori. 5For this reason, the robot can be preprogrammed to precisely follow specific trajectories.
Step Move 2: Problem
Step 1
Step 2
6 However, in a constantly changing environment, such as in our homes, or with complex tasks, such as playing ping pong [61], the robot can not be preprogrammed as before.
Step Move 1: Current situation
Step 1
Step 2
Step 3
7Thus, for a robot to learn in these situations, one possible and frequently used solution is to learn from a human demonstrator. 8The concept of having a human teach the robot by interacting with it is known as Learning from Demonstration (LfD), as well as Programming by Demonstration (PbD) or imitation learning. 9LfD has enabled robots to carry out complex tasks, such as playing the ball-in-a-cup game [50], playing ping pong [61], as well as many other tasks [10,63]. 10For robots to acquire a movement, they first and foremost must record the demonstrated movement by using either their internal sensors or an external monitoring system. 11Then, the responsibility is shifted over to the robot, which in turn is expected to learn a reasonable representation of the movement from the recorded information. 12This learning is achieved through the guidance of a learning algorithm, which must be able to produce a representation of the movement as well as cope with perturbations without risking failure.
Step Move 1: Current situation
Step 1
Step 2
Step 3
13LfD has attract ed significant attention in recent years, with the number of papers written on the subject serving as a significant indicator [5,60,64]. 14Several reasons have been identifi ed for this increased interest [3,53]. 15The first reason is the reduced learning time compared to traditional approaches that require a human to program the robot off-line. 16Another reason is that by demonstrating the movement to the robot, the user can also predict the behaviour in advance before the robot executes the learned skill, as this should follow the demonstrated movement quite accurately. 17This, in turn, increases safety and is exceptionally important when integrating robots into human environments. 18Finally, because the acquired movement is in line with the human workspace, critical parts of the movement can be modelled very accurately.
Step Move 2: Problem
Step 1
Step 2
19 Despite all the advantages of LfD, problems still remain concerning how to improve and generalize the learned skill. 20For example, learning from a demonstration has been shown to not be enough to master complex tasks, such as flipping a pancake [52] or playing the ball-in-a-cup game [50]. 21Hence, to really master a learned skill requires subsequent practice.
Step Move 1: Current situation
Step 1
Step 2
Step 3
22A method for improving a skill through practice has already been implemented in robotics and is known as Reinforcement Learning (RL), in which the robot tries to improve the learned movement by interacting with the environment and receiving responses in the form of rewards [48]. 23Based on these rewards, the learned movement is adapted to increase future rewards. 24This resembles the way that humans acquire a new skill: first, we imitate the skill performed by another person, and then we subsequently practise the skill in order to fine-tune the movement until we can master it.
Step Move 2: Problem
Step 1
Step 2
25 Although reinforcement learning has become the standard approach for improving a taught skill, one problem still remains and that is how to safely explore new trajectories.
Step Move 1: Current situation
Step 1
Step 2
Step 3
26To explore new trajectories, noise is typically added to the learned representation of the skill, resulting in a slightly new trajectory.
Step Move 2: Problem
Step 1
Step 2
27 However, generating the noise is not a trivial task, and problems, such as the magnitude of the generated noise and whether the noise should be correlated, still remain an open question.
Step Move 1: Current situation
Step 1
Step 2
Step 3
28Consequently, various solutions have been propos ed for generating noise. 29For example, in [49] and [50], the generated noise is dependent on the learned model.
Step Move 2: Problem
Step 1
Step 2
30Intuitively, this makes sense ; however, it does not take into consideration the constraints of the executing system, which in both cases is a robotic arm.
Step Move 2: Problem
Step 1
Step 2
31 Although recent studies [44] have generated noise focusing on the constraints of the actual system, no studies have yet attempted to apply this to improving a learned model represented as a Dynamic Movement Primitives (DMP).
Step Move 4: Your solution
Move 1
Move 2
Move 3
Move 4
Move 5
32Hence, this thesis will develop a method to safely explore new trajectories for a skill modelled as a DMP.
Step Move 4: Your solution
Move 1
Move 2
Move 3
Move 4
Move 5
33 To accomplish this objective, the thesis must carry out the following tasks:
  • model the skill as a DMP,
  • select an RL algorithm for improving the modelled skill,
  • evaluate the safety of the exploration method.
Step Move 4: Your solution
Move 1
Move 2
Move 3
Move 4
Move 5
34 To test the effectiveness of the new approach for generating safe trajectories works, a challenging enough real-world task is needed for experimentation. 35For this purpose, the thesis will use the ball-in-a-cup game, as this game can be seen as a benchmark problem in robotics [50]. 36The game consists of a ball attached to a string, which is then attached to the bottom of a cup. 37The objective of the game is to get the ball into the cup. 38This is achieved by induc ing a movement on the ball that quickly moves the cup back and forth and then pulls it up and moves the cup under the ball. 39Although the game is simple by nature, it is not that easy to learn, as it requires fast, precise movement, and even small changes in the movement can have a drastic impact on the trajectory of the ball. 40The ball-in-a-cup game is of interest, as a robot cannot get the ball into the cup by merely executing the movement produced by the initially learned model, thus requiring subsequent RL to successfully master the skill.
Step Move 4: Your solution
Move 1
Move 2
Move 3
Move 4
Move 5
41 The rest of this thesis is organized as follows. 42 Chapter 2 introduces and compares several state-of-the-art LfD approaches. 43Based on the results, DMP is chosen to model movements. 44 Chapter 3, in turn, is devoted to explaining DMP in more detail. 45As reinforcement learning is used to improve the learned movement, Chapter 4 is dedicated to explaining RL. 46 The chapter first explains the theory behind RL and then introduces state-of-the-art algorithms. 47 Chapter 5 introduces the testbed, the hardware, as well as already implemented and newly developed software for supporting the experimental part of this thesis. 48The experiments and results are presented in Chapter 6. 49These are also critically discussed in the same chapter. 50Finally, conclusions and suggestions for future work are presented in Chapter 7.
Correct: Wrong: