Lights out reinforcement learning agent

Question

0.00/5 (No votes)

See more:

, +

I'm trying to implement a Reinforcement Learning Agent for the game Lights Out which is specified here:
Programming Problems and Competitions :: HackerRank[^]

The code so far: GitHub - BugsBuggy/RL-LightsOut: Run the code in agent.py[^]

The problem is: the agent is not converging towards a policy. It is not learning anything. Does anyone have an idea why it does not learn anything?

The algorithm I tried to use for estimating the value of a state with a Neural Network for non-linear function approximation can be gathered from :
Sutton & Barto page 198
http://incompleteideas.net/book/bookdraft2017nov5.pdf[^]

I'm sure you're learning something too if you look into the project.

Thanks and PS: I'm new to Reinforcement Learning

What I have tried:

And I tried to implement a neural network as a function approximator that estimates the value for every action that could possibly be taken.
Furthermore a few details:
The action is chosen randomly with probability epsilon and with probability 1-epsilon the argmax of all possible actions is chosen.
The reward structure is defined as follows:
The agent gets the reward -1 if if makes any move that is valid and does not lead to a terminal state. If the agent makes a move that leads to a terminal state it gets the reward 100. If it makes a non valid move which is not possible, because the agent only looks at possible moves the agent gets a reward of -300. This could be added later to let the agent learn by itself which moves are possible.
The agent tries to learn by playing against itself, by simulating the opponent with random moves and treating the result as feedback from the environment. To evaluate the performance the agent plays 100 games against an opponent with random moves after every 500 games it plays against itself.

Posted 18-Apr-18 22:34pm

Member 13787458

Updated 19-Apr-18 7:01am

Patrice T

v3

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)