OpenAI gym: when is reset required?

news2025/7/13 2:51:35

题意：“OpenAI Gym: 什么时候需要重置？”

问题背景：

Although I can manage to get the examples and my own code to run, I am more curious about the real semantics / expectations behind OpenAI gym API, in particular Env.reset()

“虽然我能够让示例代码和我自己的代码运行起来，但我更好奇 OpenAI Gym API 背后的真实语义和预期，特别是对 `Env.reset()` 方法。”

When is reset expected/required? At the end of each episode? Or only after creating an environment?

“什么时候应该/需要调用重置？是在每个回合结束时，还是只在创建环境后调用？”

I rather think it makes sense before each episode but I have not been able to read that explicitly!

“我认为在每个回合开始前调用重置是有道理的，但我没有明确读到这一点！”

问题解决：

You typically use reset after an entire episode. So that could be after you reached a terminal state in the mdp, or after you reached you maximum amount of time steps (set by you). I also typically reset it at the very start of training as well.

“通常，你会在整个回合结束后使用 `reset`。这可能是在你达到马尔可夫决策过程（MDP）中的终止状态之后，或者在你达到设定的最大时间步数之后。我通常也会在训练刚开始时调用 `reset`。”

So if you are at your starting state 'A' and you want to reach state 'Z', you would run your time steps going from 'A' -> 'B' -> 'C' ..., then when you reach the terminal state 'Z', you start a new episode using reset, which would take you back to 'A'.

“所以，如果你处于起始状态 ‘A’ 并且想要到达状态 ‘Z’，你会执行时间步，从 ‘A’ -> ‘B’ -> ‘C’ ……，然后当你到达终止状态 ‘Z’ 时，使用 `reset` 开始新的一回合，这会让你回到 ‘A’。”

for episode in range(iterations):
        state = env.reset() // first state
        for time_step in range(1000):  //max amount of iterations
            action = take_action(state)
            state, reward, done, _ = env.step(action)
            if done:
                break // takes you to the next episode where the environment is reset