Training deep q neural network to drive physical robot through a maze. Calculating q values of all possible actions too computationally expensive

Question

Training deep q neural network to drive physical robot through a maze. Calculating q values of all possible actions too computationally expensive

104 views Asked by user22985427 At 29 November 2023 at 20:32

I am trying to train a neural network to navigate a physical robot through a maze. I have no training data and have to use reinforcement learning to train it. I am using a deep q network. However I am running into problems when trying to generate the training data for the experience replay. As far as I understand it at the beginning of deep q learning the q network has to predict q values for all possible actions. In the simulation I am using for training the robot can rotate 360 degrees and the move forward as much as it wants every time it takes an action. The number of possible actions it can take is too big to be able to reasonably compute all of them. Also because the q network has to predict the q values of every possible action at every step I can't just do this calculation once to populate the experience replay and be done. I have heard this a neural network can sometimes make this problem easier but that you need an output neuron for every possible action. With the number of possible actions this doesn't seem doable either. Is there any way to train a deep q network without calculating q values for every possible action. Thank you in advance and sorry for long post.

Original Q&A

There are 1 answers

**Lexpj** · Answer 1 · 2023-11-30T16:56:00+00:00

Well, you would want to calculate the q-values for every possible action otherwise you do not know which action yields the highest expected reward. So, you would want to limit your number of actions. You could do that by stating not allowing all 360 rotations to be involved, but rather the movement as output, such that 2 actions: turn left 1 degree and turn right 1 degree, will make up all 360 rotations of the robot, limiting the action space in this section from 360 to 2. Of course, you could also program this a little smarter, stating that you would want to rotate by 45 degrees or 90 degrees, depending on your task.

For sure do you want to use a DQN when the observation space is large. Depending on your task, you might also want to take a look into other models, such as PPO. Check https://stable-baselines3.readthedocs.io/en/master/ for easy application and training of reinforcement learning models.

TechQA.

Training deep q neural network to drive physical robot through a maze. Calculating q values of all possible actions too computationally expensive

There are 1 answers

Related Questions in REINFORCEMENT-LEARNING

Related Questions in Q-LEARNING

Popular Questions

Trending Questions