Reinforcement Learning Super Mario

Everyone fancies a retro video game such as those of Nintendo and Atari. Both of these happen to be major trends in the AI world today. Both these game environments prove to be simple enough to provide computational traceability and yet require highly intricate strategies.

Using artificial intelligence and machine learning algorithms to play games has been widely discussed and investigated. In our case, Reinforcement Learning (RL) is used to construct an RL Mario controller agent to beat Super Mario Bros levels.

This project relies on two main resources. The first of these is OpenAI Gym, an open source toolkit used for developing and comparing reinforcement learning algorithms in different environments. The second resource used is Philip Paquette’s Gym Super Mario which serves as a sort of interface between OpenAI Gym and the FCEUX emulator.

The goal of our agent Mario is relatively simple, begin on the left edge of a level and navigate through the map and reach the flagpole without dying, while trying to maximize the score.

In order to achieve success, Mario generally must move right, however moving just right can prove to be redundant. Therefore, the agent must learn about the environment to navigate through the map. For reaching an optimal level of gameplay, the agent learns how to avoid/kill its enemies, jump over pipes, spaces and adjust its speed to improve the score.

To formulate Super Mario Bros as an Reinforcement Learning problem, we must specify three components: the state space, the action space and the reward function.
State Representation – Mario’s state is represented as a 13×16 tile grid of the numbers 0-3. Number 0 represents empty space, 1 represents an object such as a coin, ground or pipe, 2 represents enemies and 3 represents Mario.

Action Space – the representation on the action space is dependent on the FCEUX emulator aka “Nintendo Controller”. There are 6 actions that can be either pressed (1) or not pressed (0) in the game environment. This leads to possible actions. However, there are only 9 actions that make logical sense and have an impact on the game. For example, pressing left and right is an action on the emulator but has no effect within the environment.

Reward Function – it’s the amount of reward achieved by the agent during the previous action. This is what is used to shape Mario’s behavior. The Gym Super Mario environment by Philip Paquette provides a default reward function, which changes in respect to Mario’s distance among the level. Mario gets +1 reward if it moves right, -1 if it moves left and 0 if it stays in place. However, this isn’t all, Mario also gets rewards for killing enemies, jumping over pipes and avoiding obstacles.

The whole idea was an implementation of a classic “agent- environment loop”. Each iteration, the agent chooses an action, and the environment returns an observation and a reward depending on the action.

OpenAI Gym – https://gym.openai.com/

Philip Paquette repo – https://github.com/ppaquette/gym-super-mario

Reinforcement Learning – http://cs229.stanford.edu/notes/cs229-notes12.pdf

FCEUX Emulator – http://www.fceux.com/web/home.html