Instructions:From the left column, select the
step in the indicated move of the MICE Model that best corresponds to each group of sentences in this introduction. The feedback is shown at the bottom.
Adapted from Jens Lundell, "Dynamic movement primitives and reinforcement learning for adapting a learned skill," Master's thesis, Dept. of Elect. Eng., Aalto Univ., Espoo, Finland, 2016.
1 Introduction (steps within moves)
Testing updated layout
Step
Move 1: Current situation
Step 1
Step 2
Step 3
1Robots are used in human environments today
more than ever before.
Step
Move 1: Current situation
Step 1
Step 2
Step 3
2Examples of such robots include simple vacuum cleaning or lawn mowing robots.
3Although robots have become increasingly popular, they are still mainly used in industrial settings as welders and assemblers.
4The reason for robots excelling in industrial settings is that the environment is for the most part static and that the task at hand is known a priori.
5For this reason, the robot can be preprogrammed to precisely follow specific trajectories.
Step 1
Step 2
6
However, in a constantly changing environment, such as in our homes, or with complex tasks, such as playing ping pong [61], the robot can
not be preprogrammed as before.
Step
Move 1: Current situation
Step 1
Step 2
Step 3
7Thus, for a robot to learn in these situations, one possible and
frequently used solution is to learn from a human demonstrator.
8The concept of having a human teach the robot by interacting with it
is known as Learning from Demonstration (LfD), as well as Programming by Demonstration (PbD) or imitation learning.
9LfD has enabled robots to carry out complex tasks, such as playing the ball-in-a-cup game [50], playing ping pong [61], as well as many other tasks [10,63].
10For robots to acquire a movement, they first and foremost must record the demonstrated movement by using either their internal sensors or an external monitoring system.
11Then, the responsibility is shifted over to the robot, which in turn is expected to learn a reasonable representation of the movement from the recorded information.
12This learning is achieved through the guidance of a learning algorithm, which must be able to produce a representation of the movement as well as cope with perturbations without risking failure.
Step
Move 1: Current situation
Step 1
Step 2
Step 3
13LfD
has attract
ed
significant attention in recent years, with the number of papers written on the subject serving as a significant indicator [5,60,64].
14Several reasons
have been identifi
ed for this increased interest
[3,53].
15The first reason is the reduced learning time compared to traditional approaches that require a human to program the robot off-line.
16Another reason is that by demonstrating the movement to the robot, the user can also predict the behaviour in advance before the robot executes the learned skill, as this should follow the demonstrated movement quite accurately.
17This, in turn, increases safety and is exceptionally important when integrating robots into human environments.
18Finally, because the acquired movement is in line with the human workspace, critical parts of the movement can be modelled very accurately.
Step 1
Step 2
19
Despite all the advantages of LfD,
problems still remain concerning how to improve and generalize the learned skill.
20For example, learning from a demonstration has been shown to
not be enough to master complex tasks, such as flipping a pancake [52] or playing the ball-in-a-cup game [50].
21Hence, to really master a learned skill
requires subsequent practice.
Step
Move 1: Current situation
Step 1
Step 2
Step 3
22A method for improving a skill through practice has already been implemented in robotics and
is known as Reinforcement Learning (RL), in which the robot tries to improve the learned movement by interacting with the environment and receiving responses in the form of rewards [48].
23Based on these rewards, the learned movement is adapted to increase future rewards.
24This resembles the way that humans acquire a new skill: first, we imitate the skill performed by another person, and then we subsequently practise the skill in order to fine-tune the movement until we can master it.
Step 1
Step 2
25
Although reinforcement learning has become the standard approach for improving a taught skill, one
problem still remains and that is how to safely explore new trajectories.
Step
Move 1: Current situation
Step 1
Step 2
Step 3
26To explore new trajectories, noise is
typically added to the learned representation of the skill, resulting in a slightly new trajectory.
Step 1
Step 2
27
However, generating the noise is
not a trivial task, and
problems, such as the magnitude of the generated noise and whether the noise should be correlated, still remain an
open question.
Step
Move 1: Current situation
Step 1
Step 2
Step 3
28Consequently, various solutions
have been propos
ed for generating noise.
29For example, in
[49] and [50], the generated noise is dependent on the learned model.
Step 1
Step 2
30Intuitively, this makes sense
; however, it does
not take into consideration the constraints of the executing system, which in both cases is a robotic arm.
Step 1
Step 2
31
Although recent studies [44] have generated noise focusing on the constraints of the actual system,
no studies have yet attempted to apply this to improving a learned model represented as a Dynamic Movement Primitives (DMP).
Step
Move 4: Your solution
Step 1
Step 2
Step 3
Step 4
Step 5
32Hence,
this thesis will
develop a
method to safely explore new trajectories for a skill modelled as a DMP.
Step
Move 4: Your solution
Step 1
Step 2
Step 3
Step 4
Step 5
33
To accomplish this objective, the thesis must carry out the following tasks:
- model the skill as a DMP,
- select an RL algorithm for improving the modelled skill,
- evaluate the safety of the exploration method.
Step
Move 4: Your solution
Step 1
Step 2
Step 3
Step 4
Step 5
34
To test the effectiveness of the new approach for generating safe trajectories works, a challenging enough real-world task is needed for experimentation.
35For this purpose, the thesis
will use the ball-in-a-cup game, as this game can be seen as a benchmark problem in robotics [50].
36The game consists of a ball attached to a string, which is then attached to the bottom of a cup.
37The objective of the game is to get the ball into the cup.
38This is achieved
by induc
ing a movement on the ball that quickly moves the cup back and forth and then pulls it up and moves the cup under the ball.
39Although the game is simple by nature, it is not that easy to learn, as it requires fast, precise movement, and even small changes in the movement can have a drastic impact on the trajectory of the ball.
40The ball-in-a-cup game is of interest, as a robot cannot get the ball into the cup by merely executing the movement produced by the initially learned model, thus requiring subsequent RL to successfully master the skill.
Step
Move 4: Your solution
Step 1
Step 2
Step 3
Step 4
Step 5
41
The rest of this thesis is organized as follows.
42
Chapter 2 introduces and compares several state-of-the-art LfD approaches.
43Based on the results, DMP is chosen to model movements.
44
Chapter 3, in turn,
is devoted to explaining DMP in more detail.
45As reinforcement learning is used to improve the learned movement,
Chapter 4 is dedicated to explaining RL.
46
The chapter first
explains the theory behind RL and then introduces state-of-the-art algorithms.
47
Chapter 5 introduces the testbed, the hardware, as well as already implemented and newly developed software for supporting the experimental part of this thesis.
48The experiments and results
are presented in Chapter 6.
49These are also critically discussed
in the same chapter.
50Finally, conclusions and suggestions for future work
are presented in Chapter 7.