Faculty of Informatics

2009/2010 Summer Session Research Scholarship Project

Supervisor: Dr Jo Abrantes

Title: Distributed Reinforcement Learning for the Control of Modular Robots

Project Description:

The aim of this project is to develop distributed reinforcement learning algorithms that will enable our modular snake robot to learn particular behaviours.

Recently there has been an important research effort in the design of efficient controllers for modular robots. These robots, which can be sought of as a team of agents mechanically linked together, add an extra layer of complexity to the design of robot control algorithms because decision making processes must take into account both the individual agents’ and the team’s points of view.

A promising approach to this problem is to use distributed reinforcement learning (RL) algorithms and have the modular robots learn their behaviours instead of trying to design hand-coded decision making algorithms capable of dealing with the complexities of the robots and their interaction with dynamic environments. In this project we are particularly interested in the development, implementation and testing of a model-free distributed Q-learning algorithm.

The project will involve investigating the use of distributed RL for the design of controllers for modular-robots. To this end a literature review will be carried out to identify and select the most promising distributed RL control approach and algorithms to be implemented in the snake’s controller. The initial algorithms to be developed will enable the snake robot to learn to move efficiently in a predetermined direction. These algorithms will subsequently be refined to allow the robot to learn to negotiate obstacles found in the path to its goal destination.

Additional Information:

Practical results of the use of a distributed RL strategy can be seen in the video of a modular robot in action available at: http://modular.mmmi.sdu.dk/wiki/USD_Modular_Robotics_Research_Lab.

“Reinforcement learning is learning what to do --how to map situations to actions-- so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics -- trial-and-error search and delayed reward-- are the two most important distinguishing features of reinforcement learning.” (Extracted from Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto, MIT Press, Cambridge, MA)

Software:

The programming languages used in the development of the algorithms and its implementation in the robot controller are C++ and the Lua scripting language.

Hardware:

The snake robot consists of eight individual modules linked together in a snake-like configuration, as shown in Figure 1. Each of the modules’ motion is performed by an independently controlled servomotor. The direction, amplitude and sequence on the motions of each individual module will determine the way the robot moves as a whole.

Expected Outcomes:

The effectiveness of the algorithms developed will be demonstrated at the end of the ten weeks by having the snake robot learn to move in predetermined directions and to negotiate obstacles put in its path.

The expected educational outcomes of this project are:

    1. The student will be aware of the most commonly used distributed reinforcement learning techniques for robot control and will learn how to develop and implement these methods to control a modular robot.

    2. The student will be aware of several aspects of academic research, including literature review, designing and carrying out of experiments, and oral and written research reporting.

    3. The student will be acquainted with the planning and managing of a small research project.

Additionally, an expected ‘side-effect’ of this summer project is the development of a practical setting for the teaching of reinforcement learning in undergraduate and postgraduate subjects such as CSCI444 and MCS9444.

Selection Criteria:

    1. Academic performance at distinction level or higher.

    2. Knowledge and experience in: C++ programming language

        (Lua scripting language and basic electronics will be an advantage)

    3. Interest in pursuing research in robotics

Image
Image
Image

Last reviewed: 23 July, 2009

ACADEMIC ADVICE

Find out who to contact for advice about your studies. See Academic Advice contacts of Undergraduate and Postgraduate for the current session

Student Enquiry Centre
Need help? 
Contact the Faculty of Informatics, Student Enquiry Centre!
Monday - Friday 
8.30am - 5.00pm 
T:  (02) 4221 3606
E: Informatics Enquiries

FEEDBACK

Feedback of Informatics

PRIVACY: click here to read about our commitment to privacy.
CONFIDENTIALITY: All feedback is treated in the strictest of confidence.

NOTICE BOARD

SEMINAR AND WORKSHOPS     

SCSSE Seminar 
Title
: The future of privacy
Speaker: Prof. Mark Ryan
Day:  Wednesday 15 February 2012
Location:  3.224
Time:        12:30 - 13:30

Title: Active client based identity management
Speaker: Prof. Chris Mitchell
Day:  Thursday 23 February, 2012
Location:  3.224
Time:        4:00 pm

SISAT Seminar 
Title
: Information infrastructures, and IT in mergers and acquisitions
Speaker: Dr Stefan Henningsson
Day:  Wednesday 16 February 2012
Location:  39.215
Time:        12:30 - 13:30

NEWS AND EVENTS                      

staff intranet img