RLLib 调整 PPOTrainer 但不调整 A2CTrainer
RLLib tunes PPOTrainer but not A2CTrainer
我正在针对 CartPole 环境对这两种算法进行比较。进口为:
import ray
from ray import tune
from ray.rllib import agents
ray.init() # Skip or set to ignore if already called
运行 这非常有效:
experiment = tune.run(
agents.ppo.PPOTrainer,
config={
"env": "CartPole-v1",
"num_gpus": 1,
"num_workers": 0,
"num_envs_per_worker": 50,
"rollout_fragment_length": 100,
"train_batch_size": 5000,
"sgd_minibatch_size": 500,
"num_sgd_iter": 10,
"entropy_coeff": 0.01,
"lr_schedule": [
[0, 0.0005],
[10000000, 0.000000000001],
],
"lambda": 0.95,
"kl_coeff": 0.5,
"clip_param": 0.1,
"vf_share_layers": False,
},
metric="episode_reward_mean",
mode="max",
stop={"training_iteration": 100},
checkpoint_at_end=True,
)
但是当我对 A2C 代理执行相同操作时:
experiment = tune.run(
agents.a3c.A2CTrainer,
config={
"env": "CartPole-v1",
"num_gpus": 1,
"num_workers": 0,
"num_envs_per_worker": 50,
"rollout_fragment_length": 100,
"train_batch_size": 5000,
"sgd_minibatch_size": 500,
"num_sgd_iter": 10,
"entropy_coeff": 0.01,
"lr_schedule": [
[0, 0.0005],
[10000000, 0.000000000001],
],
"lambda": 0.95,
"kl_coeff": 0.5,
"clip_param": 0.1,
"vf_share_layers": False,
},
metric="episode_reward_mean",
mode="max",
stop={"training_iteration": 100},
checkpoint_at_end=True,
)
它returns这个异常:
---------------------------------------------------------------------------
TuneError Traceback (most recent call last)
<ipython-input-9-6680e67f9343> in <module>()
23 mode="max",
24 stop={"training_iteration": 100},
---> 25 checkpoint_at_end=True,
26 )
/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, loggers, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, queue_trials, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, ray_auto_init, run_errored_only, global_checkpoint_period, with_server, upload_dir, sync_to_cloud, sync_to_driver, sync_on_checkpoint)
432 if incomplete_trials:
433 if raise_on_failed_trial:
--> 434 raise TuneError("Trials did not complete", incomplete_trials)
435 else:
436 logger.error("Trials did not complete: %s", incomplete_trials)
TuneError: ('Trials did not complete', [A2C_CartPole-v1_6acda_00000])
谁能告诉我这是怎么回事?我不知道它是否与我正在使用的库版本有关,或者我编码有误。这是一个常见问题吗?
由于您从 PPO 试用版复制的配置,A2C 代码失败:“sgd_minibatch_size”、“kl_coeff”和许多其他是 PPO 特定的配置,当 运行 使用 A2C。
日志目录中的“error.txt”解释了错误。
我正在针对 CartPole 环境对这两种算法进行比较。进口为:
import ray
from ray import tune
from ray.rllib import agents
ray.init() # Skip or set to ignore if already called
运行 这非常有效:
experiment = tune.run(
agents.ppo.PPOTrainer,
config={
"env": "CartPole-v1",
"num_gpus": 1,
"num_workers": 0,
"num_envs_per_worker": 50,
"rollout_fragment_length": 100,
"train_batch_size": 5000,
"sgd_minibatch_size": 500,
"num_sgd_iter": 10,
"entropy_coeff": 0.01,
"lr_schedule": [
[0, 0.0005],
[10000000, 0.000000000001],
],
"lambda": 0.95,
"kl_coeff": 0.5,
"clip_param": 0.1,
"vf_share_layers": False,
},
metric="episode_reward_mean",
mode="max",
stop={"training_iteration": 100},
checkpoint_at_end=True,
)
但是当我对 A2C 代理执行相同操作时:
experiment = tune.run(
agents.a3c.A2CTrainer,
config={
"env": "CartPole-v1",
"num_gpus": 1,
"num_workers": 0,
"num_envs_per_worker": 50,
"rollout_fragment_length": 100,
"train_batch_size": 5000,
"sgd_minibatch_size": 500,
"num_sgd_iter": 10,
"entropy_coeff": 0.01,
"lr_schedule": [
[0, 0.0005],
[10000000, 0.000000000001],
],
"lambda": 0.95,
"kl_coeff": 0.5,
"clip_param": 0.1,
"vf_share_layers": False,
},
metric="episode_reward_mean",
mode="max",
stop={"training_iteration": 100},
checkpoint_at_end=True,
)
它returns这个异常:
---------------------------------------------------------------------------
TuneError Traceback (most recent call last)
<ipython-input-9-6680e67f9343> in <module>()
23 mode="max",
24 stop={"training_iteration": 100},
---> 25 checkpoint_at_end=True,
26 )
/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, loggers, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, queue_trials, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, ray_auto_init, run_errored_only, global_checkpoint_period, with_server, upload_dir, sync_to_cloud, sync_to_driver, sync_on_checkpoint)
432 if incomplete_trials:
433 if raise_on_failed_trial:
--> 434 raise TuneError("Trials did not complete", incomplete_trials)
435 else:
436 logger.error("Trials did not complete: %s", incomplete_trials)
TuneError: ('Trials did not complete', [A2C_CartPole-v1_6acda_00000])
谁能告诉我这是怎么回事?我不知道它是否与我正在使用的库版本有关,或者我编码有误。这是一个常见问题吗?
由于您从 PPO 试用版复制的配置,A2C 代码失败:“sgd_minibatch_size”、“kl_coeff”和许多其他是 PPO 特定的配置,当 运行 使用 A2C。
日志目录中的“error.txt”解释了错误。