in dqn/agent.py line 59
if terminal:
screen, reward, action, terminal = self.env.new_random_game()
when starting a new game due to a terminal state.
why we don't need to reset the self.history?
because it would affect the next iteration.
# 1. predict
action = self.predict(self.history.get())
# 2. act
screen, reward, terminal = self.env.act(action, is_training=True)
# 3. observe
self.observe(screen, reward, action, terminal)
the predicted action for self.history.get() is not depending on the current game screens, it will predict action for the previous game screen, which is ended, instead.
Do I miss anything?
Thank you very much.
in dqn/agent.py line 59
when starting a new game due to a terminal state.
why we don't need to reset the self.history?
because it would affect the next iteration.
the predicted action for self.history.get() is not depending on the current game screens, it will predict action for the previous game screen, which is ended, instead.
Do I miss anything?
Thank you very much.