I'm trying to implement a Reinforcement Learning Agent for the game Lights Out which is specified here:
Programming Problems and Competitions :: HackerRank[
^]
The code so far:
GitHub - BugsBuggy/RL-LightsOut: Run the code in agent.py[
^]
The problem is: the agent is not converging towards a policy. It is not learning anything. Does anyone have an idea why it does not learn anything?
The algorithm I tried to use for estimating the value of a state with a Neural Network for non-linear function approximation can be gathered from :
Sutton & Barto page 198
http://incompleteideas.net/book/bookdraft2017nov5.pdf[
^]
I'm sure you're learning something too if you look into the project.
Thanks and PS: I'm new to Reinforcement Learning
What I have tried:
And I tried to implement a neural network as a function approximator that estimates the value for every action that could possibly be taken.
Furthermore a few details:
The action is chosen randomly with probability epsilon and with probability 1-epsilon the argmax of all possible actions is chosen.
The reward structure is defined as follows:
The agent gets the reward -1 if if makes any move that is valid and does not lead to a terminal state. If the agent makes a move that leads to a terminal state it gets the reward 100. If it makes a non valid move which is not possible, because the agent only looks at possible moves the agent gets a reward of -300. This could be added later to let the agent learn by itself which moves are possible.
The agent tries to learn by playing against itself, by simulating the opponent with random moves and treating the result as feedback from the environment. To evaluate the performance the agent plays 100 games against an opponent with random moves after every 500 games it plays against itself.