How to go from an episodic task to a continuing one

72 views Asked by At

I have implemented a Q-Learning algorithm for an episodic undiscounted (i.e. discount factor = 1) task. The task is to escape from a predator, so the way I have implemented it now is to set a maximum number of timesteps, after which the agent is considered "escaped" and given a positive reward. If the agent is caught before the end of the episode, a negative reward is given. At each time step, the reward is zero.

I would like to change from this episodic definition to a continuing definition of the same problem, where there is no maximum number of timesteps. However, the "episode" can still end at some point, if the agent is captured by the predator.

In Sutton & Barto's book I found something, but I am a little confused... in the first few chapters it only says that you have to consider a discounted task (i.e. set a discount factor less than one) to deal with a continuing interaction, but then it doesn't go into detail. Chapter 10.3, on the other hand, talks about the "average reward" approach for non-discounted continuing tasks.

Do you have any suggestions on how to do this correctly? Do you know of any articles or books that talk in detail about how to deal with reinforcement learning for a discounted or undiscounted continuing task?

Also: I am familiar with using a linearly decreasing exploration rate (epsilon) in an episodic task. In the case of a continuing task, how do you schedule the epsilon decay?

0

There are 0 answers