Reinforcement Learning: Maze Solving

Keywords: Xamarin.Forms, Device.StartTimer, OrderByDescending, Reinforcement Learning, Maze Solving, Q-Learning, Sarsa, ε-greedy strategy, Boltzmann exploration


I developed a Xamarin.Forms app that solves simple mazes with reinforcement learning algorithms.  The source code is here.

The screen shot just after launching the app.


MainPage.xaml
Designed the screen layout with 1 column and 5 rows Grid.  From the upper row, App title, some Buttons to control the reinforcement learning, main area displaying the maze, and two data displaying areas for debug.

MazeData.cs
Defined the "Route" class for modeling mazes.  The "SetStateGrid" method creates and displays a maze by setting BoxViews in Grid layout.  And also, this method sets start cell of the maze, sets Grid cell color, and places some Labels.

MainPage.xaml.cs
Defined the "State" and "Action" class for states and actions of reinforcement learning.  The "Learning" and "IntervalLearningWithActionBlink" method basically iterate actions and Q-value updates.  They seem to be a little bit complicated, because they include the process to display every five seconds the maze solving try and error of the reinforcement learning.
  The "Learning" method executes the "IntervalLearningWithActionBlink" method every five seconds, and iterates the learning "Iterations(=1000)" times.
  The "IntervalLearningWithActionBlink" method displays a maze solving from start to goal by blinking the grid cells of the route, and iterates the learning "Interval" times.
  The "IntervalTuning" method tunes the learning speed in the "IntervalLearningWithActionBlink" method according to the Q-value convergence.  In other words, the larger the Q-value changes, the less the learning iterations are in the "IntervalLearningWithActionBlink" method.

ReinforcementLearning.cs
The "QLearning" and "Sarsa" methods update the Q-values by Q-Learning and Sarsa algorithms.  The "QLearningAction" and the "SarsaAction" methods take one action and Q-value update by the each reinforcement algorithms respectively.  The "QLearningLoop" and the "SarsaLoop" method iterate reinforcement learning "N" times.  The "GetAction" and some methods are for action strategy, such as ε-greedy strategy and Boltzmann exploration.




The screen shot after learning.  The numbers at the upper left of each Grid cells are the number that the reinforcement learning algorithm visited the cells.  The intensity of red color of each cell was changed according to the number.


A movie showing this App running.  You can get the original movie file from here (in "iOS Simulator movie" folder).


コメント

このブログの人気の投稿

Get the Color Code from an Image Pixel

Prolog Interpreter

PCL Storage (1)