What's the point of using Temporal difference learning at all?

Question

What's the point of using Temporal difference learning at all?

472 views Asked by CYC At 26 November 2017 at 07:58

As far as I know, for a specific policy \pi, temporal difference learning let us compute the expected value following that policy \pi, but what's the meaning of knowing a specific policy?

Shouldn't we try finding the optimal policy for a given environment? What's the point of doing a specific \pi using temporal difference learning at all?

Original Q&A

There are 1 answers

**Pablo EM** · Accepted Answer · 2017-11-26T18:49:32+00:00

As you said, only finding the value function for a given policy is not very useful in the general case, where the goal is finding an optimal policy. However, several classical algorithms such as SARSA or Q-learning, can ve viewed as a special case of generalized policy iteration, where the most difficult part is finding the value function of a policy. Once you know the value function, it's easy to find a better policy, then find again the value function of the recently computed policy, and so on. This process, given some conditions, converges to the optimal policy.

In summary, temporal difference learning is a key step in other algorthims that allow to find an optimal policy.

TechQA.

What's the point of using Temporal difference learning at all?

There are 1 answers

Related Questions in REINFORCEMENT-LEARNING

Related Questions in TEMPORAL-DIFFERENCE

Popular Questions

Popular Tags

Trending Questions