Click here to Skip to main content
15,892,809 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm trying to implement a Reinforcement Learning Agent for the game Lights Out which is specified here:
Programming Problems and Competitions :: HackerRank[^]

The code so far: GitHub - BugsBuggy/RL-LightsOut: Run the code in agent.py[^]

The problem is: the agent is not converging towards a policy. It is not learning anything. Does anyone have an idea why it does not learn anything?

The algorithm I tried to use for estimating the value of a state with a Neural Network for non-linear function approximation can be gathered from :
Sutton & Barto page 198
http://incompleteideas.net/book/bookdraft2017nov5.pdf[^]


I'm sure you're learning something too if you look into the project.

Thanks and PS: I'm new to Reinforcement Learning

What I have tried:

And I tried to implement a neural network as a function approximator that estimates the value for every action that could possibly be taken.
Furthermore a few details:
The action is chosen randomly with probability epsilon and with probability 1-epsilon the argmax of all possible actions is chosen.
The reward structure is defined as follows:
The agent gets the reward -1 if if makes any move that is valid and does not lead to a terminal state. If the agent makes a move that leads to a terminal state it gets the reward 100. If it makes a non valid move which is not possible, because the agent only looks at possible moves the agent gets a reward of -300. This could be added later to let the agent learn by itself which moves are possible.
The agent tries to learn by playing against itself, by simulating the opponent with random moves and treating the result as feedback from the environment. To evaluate the performance the agent plays 100 games against an opponent with random moves after every 500 games it plays against itself.
Posted
Updated 19-Apr-18 7:01am
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900