题意:OpenAI-gym 如何在 step()
中为某个动作实现一个计时器
问题背景:
One of the actions I want the agent to do needs to have a delay between every action. For context, in pygame I have the following code for shooting a bullet:
我希望代理执行的某个动作在每次执行之间需要有一个延迟。为了提供一些背景知识,在 Pygame 中,我有以下用于发射子弹的代码
if keys[pygame.K_SPACE]:
current_time = pygame.time.get_ticks()
# ready to fire when 600 ms have passed.
if current_time - previous_time > 600:
previous_time = current_time
bullets.append([x + 25, y + 24])
I've set a timer to prevent bullet spamming, how would I construct this to work with the step() method? My other actions are moving up, down, left, right.
我设置了一个计时器来防止子弹连发,我该如何构建这个功能以使其与 step()
方法一起工作?我的其他动作是向上、向下、向左、向右移动
This is my first time creating a project with OpenAI-gym so I'm not sure what the capabilities of the toolkit are, any help would be greatly appreciated.
这是我第一次使用 OpenAI-gym 创建项目,所以我不确定该工具包的功能,任何帮助都将不胜感激
问题解决:
You can use whatever method of tracking time you like (other than pygame.time.get_ticks()
I suppose), and use a similar approach as in that pygame
code. You'd want to store previous_time
as a member of the environment instead of just a local variable, because you want it to persist across function calls.
你可以使用任何你喜欢的时间跟踪方法(我猜除了 pygame.time.get_ticks()
),并使用类似于该 pygame 代码的方法。你需要将 previous_time
存储为环境的一个成员,而不仅仅是一个局部变量,因为你希望它在函数调用之间保持持久性
It's not easy to actually prevent your Reinforcement Learning agent (assuming you're using gym for RL) from selecting the fire action altogether, but you can simply implement the step()
function in such a way that the agent does not do anything at all if they select the fire action too quickly.
实际上要阻止你的强化学习代理(假设你使用 gym 进行强化学习)选择开火动作并不容易,但你可以简单地实现 step()
函数,使得如果代理选择开火动作过快,什么都不会执行
As for measuring time, you could measure wall clock time, but then the power of your CPU is going to influence how often your agent is allowed to shoot (it might be able to shoot a new bullet every step on very old hardware, but only be allowed to shoot one bullet every 100 steps on powerful hardware), that's probably a bad idea. Instead, I'd recommend measuring "time" simply by counting the step()
calls. For example, using only the code from your question above, the step()
function could look like:
至于时间测量,你可以测量墙上时钟的时间,但这样一来,你的 CPU 性能将会影响代理允许射击的频率(在非常旧的硬件上,它可能每一步都能射击一颗新子弹,而在强大的硬件上可能只能每 100 步射击一颗子弹),这可能不是一个好主意。相反,我建议通过计数 step()
调用次数来简单地测量“时间”。例如,仅使用你上面问题中的代码,step()
函数可能看起来像这样
def step(action):
self.step_counter += 1
# other step() code here
if action == FIRE:
if self.step_counter - self.previous_time > 10: # or any other number
self.previous_time = self.step_counter
bullets.append([x + 25, y + 24])
# other step() code here
Don't forget that you'll also want to reset your newly added member variables in reset()
:
别忘了你还需要在 reset()
中重置你新添加的成员变量
def reset():
self.step_counter = 0
self.previous_time = -100 # some negative number such that your agent can fire at start
# other reset() code here