Ml-agents合作推块不返还奖励

Ml-agents cooperative push block not returning rewards

我正在使用 Cooperative 推块环境(https://github.com/Unity-Technologi...nvironment-Examples.md#cooperative-push-block) (exported in order to use the Python API) using the latest stable version. The issue is that I'm not getting the reward (positives or negatives). It is always 0. If I export the Single push block environment, I receive the rewards correctly. Below you have the code I'm using from the collab example https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Python-API.md

decision_steps, terminal_steps = env.get_steps(behavior_name)
if tracked_agent in decision_steps:
    episode_rewards += decision_steps[tracked_agent].reward

print('REWARD', decision_steps.reward) # Always 0
# Each decision_steps[tracked_agent].reward also returns 0

根据文档,我应该收到负面惩罚 (-0.0001) 或正面信号 +1、+2、+3。即使他们随机推一个区块,我也收到 0 作为奖励。

他们在文档中说奖励是作为“团体奖励”给出的。我不知道这是否意味着上面代码的更改。

我从 Unity ml-agents GitHub 问题部分收到了这个答案:

DecisionStep 还有一个 group_reward 字段,它与奖励字段是分开的。给予 Cooperative Pushblock 代理的组奖励应该在这里。 很抱歉合作没有明确指出这一点,我会对其进行更新。

https://github.com/Unity-Technologies/ml-agents/issues/5567