Neural Network and Q Learning based solution for Pacman

Pacman, a popular arcade game involves a challenge that is easy to understand yet requires practice and strategy to overcome. I did this project with three classmates as a part of Artificial Intelligence course. Here we present the development and implementation of the automation of the Pacman agent using a unique learning system. This allows the agent to maneuver through the maze so as to enable gathering of as many reward points as possible and reach higher levels of performance. This is achieved through the development and implementation of a neural network along with the application of a reinforcement learning system to this neural network called Q learning. The benefit of this approach is that it will enable validation of the application of a new reinforcement learning methodology that has been traditionally used in robotics for gaming agents such as Pacman.

Q-learning is a reinforcement learning technique that works by learning an action-value function that gives the expected utility of performing a given action in a given state and following a fixed policy thereafter. The basic implementation uses a q-table to store the data.

Since there are four directions, one Neural Network was created per direction, all of which are structured the same by having the eight inputs, three layers and one output. The key to building learning agent is to avoid using the entire Pacman map as a stimuli. Doing so will result in small changes in the map causing drastically difference results where Pacman can be in the same fundamental state but due to small changes in the map will take different actions as it will be incapable of recognizing that it has experienced the state prior. Figure bleow shows how Pacman is stuck in a corner and surrounded by two ghosts, yet the agent would recognize the states as being different due to Pacman being in a different corner. This would make the learning process of the Pacman agent difficult because small changes produce completely different states which will produce different actions.

The eight input stimuli towards the Neural Networks include:

ghost to immediate left
ghost to immediate right
ghost to immediate above
ghost to immediate below
pellet to immediate left
pellet to immediate right
pellet to immediate above
pellet to immediate below

The idea behind using Neural Networks within Q Learning is that they act in place of impractically large look up tables. The Neural Networks act as a function approximation for maximum utility Q.

To check out project paper click here