题意:Python 数据分箱 OpenAI Gym
问题背景:
I am attempting to create a custom environment for reinforcement learning with openAI gym. I need to represent all possible values that the environment will see in a variable called observation_space
. There are 3 possible actions for the agent to use called action_space
我正在尝试为强化学习创建一个自定义环境,使用 OpenAI Gym。我需要在一个名为 `observation_space` 的变量中表示环境可能看到的所有可能值。代理可以使用 3 种可能的动作,称为 `action_space`。
To be more specific the observation_space
is a temperature sensor which will see possible ranges from 50 to 150 degrees and I think I can represent all of this by:
更具体地说,`observation_space` 是一个温度传感器,能够检测从 50 到 150 度的可能范围,我认为我可以通过以下方式表示所有这些值:
EDIT, I had the action_space numpy array wrong
编辑:我的 action_space
numpy 数组设置错了
import numpy as np
action_space = np.array([ 0, 1, 2])
observation_space = np.arange(50,150,1)
Is there a better method that I could use for the observation_space
where I could bin the data? IE, make 20 bins 50-55, 55-60, 60-65, etc...
是否有更好的方法可以用于 `observation_space`,以便对数据进行分箱?例如,将数据分为 20 个区间,如 50-55、55-60、60-65 等等。
I think what I have will work but seems sort of cumbersome... And I am sure there is a better practice as there is not a lot of wisdom on my end this subject. This will print out a Q table:
我认为我现在的方法可以工作,但似乎有点繁琐……而且我相信还有更好的实践,因为我在这个领域的经验不是很多。这将打印出一个 Q 表:
action_size = action_space.shape[0]
state_size = observation_space.shape[0]
qtable = np.zeros((state_size, action_size))
print(qtable)
问题解决:
This is not really related to programming, so maybe on stats.stackexchange you may get better answers. Anyway, it just depends on how much accuracy you want. I guess you want to change the temperature (increase, decrease, don't change) according to the sensor readings. Is there much different (in terms of optimal action) between 50 and 51? If not, then you can discretize the state space every 2 degrees. And so on.
这实际上与编程关系不大,所以也许在 stats.stackexchange 上你可以得到更好的答案。无论如何,这只是取决于你希望有多高的准确性。我猜测你想根据传感器读数来改变温度(增加、减少、不变)。在 50 和 51 度之间(在最优动作方面)是否有很大的不同?如果没有,那么你可以每 2 度对状态空间进行离散化。依此类推。
More generally, doing so you are using what in RL are called "features". A discretization over an interval of the state space is called tile coding and usually works well.
更一般地说,这样做是使用了在强化学习中称为“特征”的东西。对状态空间区间的离散化被称为“平铺编码”(tile coding),通常效果很好。
If you are new to RL, I really advise to read this book, or at least Chapters 1,3,4 which are related to what you are doing.
如果你对强化学习还不熟悉,我真的建议你阅读这本书,或者至少阅读第 1、3、4 章,这些章节与你正在做的工作相关。