How to export/import data (rlberry data, tensorboard data, …)?

How to extract data from the WriterData?

rlberry provides tools displaying information about the training of an agent. Some of these tools are visible on the visualization page (from User Guide).

But maybe you have your own favorite tool, and would like to use it. rlberry allows you to export the training data as dataframe, to be used with other tools.

To show how it works with an example, here is a code training PPO from stablebaselines3 on CartPole environment via rlberry :

from rlberry.envs import gym_make
from stable_baselines3 import PPO
from rlberry.agents.stable_baselines import StableBaselinesAgent
from rlberry.manager import ExperimentManager, plot_writer_data, read_writer_data
import matplotlib.pyplot as plt

env_ctor, env_kwargs = gym_make, dict(id="CartPole-v1")

manager = ExperimentManager(
    StableBaselinesAgent,
    (env_ctor, env_kwargs),
    agent_name="PPO",
    fit_budget=5e4,
    init_kwargs={"algo_cls": PPO, "policy": "MlpPolicy", "verbose": 0},
    n_fit=3,
)

manager.fit()
[INFO] 15:58: Running ExperimentManager fit() for PPO with n_fit = 3 and max_workers = None.
[INFO] 15:58:                                 agent_name  worker  time/iterations  max_global_step
                                                 PPO        1           1               2048
[INFO] 15:58:                                 agent_name  worker  time/iterations  max_global_step
                                                 PPO        2           1               2048
[INFO] 15:58:                                 agent_name  worker  time/iterations  max_global_step
                                                 PPO        0           1               2048
[INFO] 15:58: [PPO[worker: 1]] | max_global_step = 4096 | time/iterations = 1 | rollout/ep_rew_mean = 23.569767441860463 | rollout/ep_len_mean = 23.569767441860463 | time/fps = 591 | time/time_elapsed = 3 | time/total_timesteps = 2048 | train/learning_rate = 0.0003 |
[INFO] 15:58: [PPO[worker: 2]] | max_global_step = 4096 | time/iterations = 1 | rollout/ep_rew_mean = 21.903225806451612 | rollout/ep_len_mean = 21.903225806451612 | time/fps = 567 | time/time_elapsed = 3 | time/total_timesteps = 2048 | train/learning_rate = 0.0003 |
[INFO] 15:58: [PPO[worker: 0]] | max_global_step = 4096 | time/iterations = 1 | rollout/ep_rew_mean = 23.49425287356322 | rollout/ep_len_mean = 23.49425287356322 | time/fps = 557 | time/time_elapsed = 3 | time/total_timesteps = 2048 | train/learning_rate = 0.0003 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 6144 | time/iterations = 2 | rollout/ep_rew_mean = 26.94 | rollout/ep_len_mean = 26.94 | time/fps = 437 | time/time_elapsed = 9 | time/total_timesteps = 4096 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6862980721518397 | train/policy_gradient_loss = -0.016145382329705173 | train/value_loss = 57.95402302145958 | train/approx_kl = 0.009136519394814968 | train/clip_fraction = 0.1068359375 | train/loss = 6.268213748931885 | train/explained_variance = 0.00011879205703735352 | train/n_updates = 10 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 6144 | time/iterations = 2 | rollout/ep_rew_mean = 26.84 | rollout/ep_len_mean = 26.84 | time/fps = 429 | time/time_elapsed = 9 | time/total_timesteps = 4096 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6861314654350281 | train/policy_gradient_loss = -0.016842093877494337 | train/value_loss = 50.17323541939258 | train/approx_kl = 0.007978597655892372 | train/clip_fraction = 0.1025390625 | train/loss = 5.8147406578063965 | train/explained_variance = 0.0003063678741455078 | train/n_updates = 10 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 6144 | time/iterations = 2 | rollout/ep_rew_mean = 28.75 | rollout/ep_len_mean = 28.75 | time/fps = 426 | time/time_elapsed = 9 | time/total_timesteps = 4096 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6855484075844288 | train/policy_gradient_loss = -0.015410382760455832 | train/value_loss = 61.32087602615356 | train/approx_kl = 0.008056383579969406 | train/clip_fraction = 0.105224609375 | train/loss = 10.251166343688965 | train/explained_variance = 0.012730419635772705 | train/n_updates = 10 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 8192 | time/iterations = 3 | rollout/ep_rew_mean = 36.85 | rollout/ep_len_mean = 36.85 | time/fps = 409 | time/time_elapsed = 14 | time/total_timesteps = 6144 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6685062969103456 | train/policy_gradient_loss = -0.014946110408345703 | train/value_loss = 39.33342697024345 | train/approx_kl = 0.008881180547177792 | train/clip_fraction = 0.060693359375 | train/loss = 11.630510330200195 | train/explained_variance = 0.1108359694480896 | train/n_updates = 20 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 8192 | time/iterations = 3 | rollout/ep_rew_mean = 37.18 | rollout/ep_len_mean = 37.18 | time/fps = 403 | time/time_elapsed = 15 | time/total_timesteps = 6144 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6661150485277176 | train/policy_gradient_loss = -0.013149463082663715 | train/value_loss = 38.683698976039885 | train/approx_kl = 0.007977155968546867 | train/clip_fraction = 0.043798828125 | train/loss = 13.9081449508667 | train/explained_variance = 0.05941134691238403 | train/n_updates = 20 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 8192 | time/iterations = 3 | rollout/ep_rew_mean = 37.65 | rollout/ep_len_mean = 37.65 | time/fps = 402 | time/time_elapsed = 15 | time/total_timesteps = 6144 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6675648905336857 | train/policy_gradient_loss = -0.01585175626023556 | train/value_loss = 39.83039126396179 | train/approx_kl = 0.008422331884503365 | train/clip_fraction = 0.05068359375 | train/loss = 18.283363342285156 | train/explained_variance = 0.06431382894515991 | train/n_updates = 20 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 10240 | time/iterations = 4 | rollout/ep_rew_mean = 45.9 | rollout/ep_len_mean = 45.9 | time/fps = 397 | time/time_elapsed = 20 | time/total_timesteps = 8192 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6372709095478057 | train/policy_gradient_loss = -0.021793167035502846 | train/value_loss = 56.082052528858185 | train/approx_kl = 0.008312474004924297 | train/clip_fraction = 0.09052734375 | train/loss = 21.487403869628906 | train/explained_variance = 0.29079967737197876 | train/n_updates = 30 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 10240 | time/iterations = 4 | rollout/ep_rew_mean = 48.35 | rollout/ep_len_mean = 48.35 | time/fps = 392 | time/time_elapsed = 20 | time/total_timesteps = 8192 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6271074561402201 | train/policy_gradient_loss = -0.021605250079301187 | train/value_loss = 53.17835917472839 | train/approx_kl = 0.01045585609972477 | train/clip_fraction = 0.107275390625 | train/loss = 20.300893783569336 | train/explained_variance = 0.24486440420150757 | train/n_updates = 30 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 10240 | time/iterations = 4 | rollout/ep_rew_mean = 49.44 | rollout/ep_len_mean = 49.44 | time/fps = 389 | time/time_elapsed = 21 | time/total_timesteps = 8192 | train/learning_rate = 0.0003 | train/entropy_loss = -0.641490114107728 | train/policy_gradient_loss = -0.01604906824504724 | train/value_loss = 56.91851507425308 | train/approx_kl = 0.007528345100581646 | train/clip_fraction = 0.0734375 | train/loss = 23.153453826904297 | train/explained_variance = 0.22841238975524902 | train/n_updates = 30 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 12288 | time/iterations = 5 | rollout/ep_rew_mean = 61.62 | rollout/ep_len_mean = 61.62 | time/fps = 374 | time/time_elapsed = 27 | time/total_timesteps = 10240 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6044564859941601 | train/policy_gradient_loss = -0.016754490803577937 | train/value_loss = 69.31612868309021 | train/approx_kl = 0.009068363346159458 | train/clip_fraction = 0.078857421875 | train/loss = 30.16673469543457 | train/explained_variance = 0.30177778005599976 | train/n_updates = 40 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 12288 | time/iterations = 5 | rollout/ep_rew_mean = 63.61 | rollout/ep_len_mean = 63.61 | time/fps = 371 | time/time_elapsed = 27 | time/total_timesteps = 10240 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6121436970308423 | train/policy_gradient_loss = -0.014887585233373102 | train/value_loss = 62.94282633662224 | train/approx_kl = 0.005902732722461224 | train/clip_fraction = 0.049267578125 | train/loss = 24.8435115814209 | train/explained_variance = 0.21425354480743408 | train/n_updates = 40 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 12288 | time/iterations = 5 | rollout/ep_rew_mean = 62.2 | rollout/ep_len_mean = 62.2 | time/fps = 368 | time/time_elapsed = 27 | time/total_timesteps = 10240 | train/learning_rate = 0.0003 | train/entropy_loss = -0.621853212080896 | train/policy_gradient_loss = -0.01637536641501356 | train/value_loss = 62.13811606168747 | train/approx_kl = 0.008492568507790565 | train/clip_fraction = 0.06396484375 | train/loss = 25.353282928466797 | train/explained_variance = 0.31684231758117676 | train/n_updates = 40 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 14336 | time/iterations = 6 | rollout/ep_rew_mean = 76.07 | rollout/ep_len_mean = 76.07 | time/fps = 367 | time/time_elapsed = 33 | time/total_timesteps = 12288 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5713022822514177 | train/policy_gradient_loss = -0.01559052456432255 | train/value_loss = 63.737575674057005 | train/approx_kl = 0.00888746790587902 | train/clip_fraction = 0.071826171875 | train/loss = 23.2188663482666 | train/explained_variance = 0.43151962757110596 | train/n_updates = 50 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 14336 | time/iterations = 6 | rollout/ep_rew_mean = 78.83 | rollout/ep_len_mean = 78.83 | time/fps = 365 | time/time_elapsed = 33 | time/total_timesteps = 12288 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5959413398057223 | train/policy_gradient_loss = -0.01293433145910967 | train/value_loss = 63.95801417827606 | train/approx_kl = 0.007563581224530935 | train/clip_fraction = 0.06982421875 | train/loss = 20.49068832397461 | train/explained_variance = 0.40706634521484375 | train/n_updates = 50 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 14336 | time/iterations = 6 | rollout/ep_rew_mean = 79.43 | rollout/ep_len_mean = 79.43 | time/fps = 362 | time/time_elapsed = 33 | time/total_timesteps = 12288 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6087406625971198 | train/policy_gradient_loss = -0.011938219325384126 | train/value_loss = 66.20582329630852 | train/approx_kl = 0.005129554774612188 | train/clip_fraction = 0.04287109375 | train/loss = 26.536352157592773 | train/explained_variance = 0.3696613907814026 | train/n_updates = 50 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 16384 | time/iterations = 7 | rollout/ep_rew_mean = 94.75 | rollout/ep_len_mean = 94.75 | time/fps = 363 | time/time_elapsed = 39 | time/total_timesteps = 14336 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5794724302366376 | train/policy_gradient_loss = -0.004287737552658655 | train/value_loss = 40.43672263324261 | train/approx_kl = 0.0037438003346323967 | train/clip_fraction = 0.014404296875 | train/loss = 5.200799465179443 | train/explained_variance = 0.6620278060436249 | train/n_updates = 60 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 16384 | time/iterations = 7 | rollout/ep_rew_mean = 93.41 | rollout/ep_len_mean = 93.41 | time/fps = 360 | time/time_elapsed = 39 | time/total_timesteps = 14336 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5922138599678874 | train/policy_gradient_loss = -0.012010189255670411 | train/value_loss = 57.09716731309891 | train/approx_kl = 0.007144401781260967 | train/clip_fraction = 0.075146484375 | train/loss = 14.848328590393066 | train/explained_variance = 0.5530484616756439 | train/n_updates = 60 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 16384 | time/iterations = 7 | rollout/ep_rew_mean = 93.76 | rollout/ep_len_mean = 93.76 | time/fps = 357 | time/time_elapsed = 40 | time/total_timesteps = 14336 | train/learning_rate = 0.0003 | train/entropy_loss = -0.6040949983522296 | train/policy_gradient_loss = -0.009169524490425828 | train/value_loss = 40.84913797974586 | train/approx_kl = 0.007860680110752583 | train/clip_fraction = 0.072705078125 | train/loss = 9.374231338500977 | train/explained_variance = 0.7407508194446564 | train/n_updates = 60 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 18432 | time/iterations = 8 | rollout/ep_rew_mean = 111.61 | rollout/ep_len_mean = 111.61 | time/fps = 360 | time/time_elapsed = 45 | time/total_timesteps = 16384 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5607249280437827 | train/policy_gradient_loss = -0.002986471042095218 | train/value_loss = 33.21346059292555 | train/approx_kl = 0.003013045061379671 | train/clip_fraction = 0.011279296875 | train/loss = 8.299112319946289 | train/explained_variance = 0.8296276032924652 | train/n_updates = 70 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 18432 | time/iterations = 8 | rollout/ep_rew_mean = 112.21 | rollout/ep_len_mean = 112.21 | time/fps = 358 | time/time_elapsed = 45 | time/total_timesteps = 16384 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5921528477221727 | train/policy_gradient_loss = -0.01051775121013634 | train/value_loss = 39.23670785278082 | train/approx_kl = 0.005722516216337681 | train/clip_fraction = 0.06689453125 | train/loss = 4.937105655670166 | train/explained_variance = 0.7620555758476257 | train/n_updates = 70 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 18432 | time/iterations = 8 | rollout/ep_rew_mean = 109.93 | rollout/ep_len_mean = 109.93 | time/fps = 355 | time/time_elapsed = 46 | time/total_timesteps = 16384 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5929792949929833 | train/policy_gradient_loss = -0.005616791581269353 | train/value_loss = 63.9369278550148 | train/approx_kl = 0.0032515935599803925 | train/clip_fraction = 0.02109375 | train/loss = 17.186660766601562 | train/explained_variance = 0.6016848087310791 | train/n_updates = 70 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 20480 | time/iterations = 9 | rollout/ep_rew_mean = 125.22 | rollout/ep_len_mean = 125.22 | time/fps = 355 | time/time_elapsed = 51 | time/total_timesteps = 18432 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5653722988441586 | train/policy_gradient_loss = -0.008493624679249478 | train/value_loss = 38.953543305397034 | train/approx_kl = 0.005177437327802181 | train/clip_fraction = 0.07109375 | train/loss = 14.79820442199707 | train/explained_variance = 0.7749437093734741 | train/n_updates = 80 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 20480 | time/iterations = 9 | rollout/ep_rew_mean = 130.25 | rollout/ep_len_mean = 130.25 | time/fps = 353 | time/time_elapsed = 52 | time/total_timesteps = 18432 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5738503985106945 | train/policy_gradient_loss = -0.005740263756888453 | train/value_loss = 72.06800128221512 | train/approx_kl = 0.006686339154839516 | train/clip_fraction = 0.03720703125 | train/loss = 7.82557487487793 | train/explained_variance = 0.33640867471694946 | train/n_updates = 80 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 20480 | time/iterations = 9 | rollout/ep_rew_mean = 128.55 | rollout/ep_len_mean = 128.55 | time/fps = 351 | time/time_elapsed = 52 | time/total_timesteps = 18432 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5821122424677014 | train/policy_gradient_loss = -0.0035777818571659735 | train/value_loss = 53.145361164212225 | train/approx_kl = 0.004387532360851765 | train/clip_fraction = 0.018701171875 | train/loss = 13.294953346252441 | train/explained_variance = 0.6190232038497925 | train/n_updates = 80 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 22528 | time/iterations = 10 | rollout/ep_rew_mean = 141.68 | rollout/ep_len_mean = 141.68 | time/fps = 354 | time/time_elapsed = 57 | time/total_timesteps = 20480 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5748784447088837 | train/policy_gradient_loss = -0.008402446379477624 | train/value_loss = 19.196025171130895 | train/approx_kl = 0.005493971519172192 | train/clip_fraction = 0.05244140625 | train/loss = 3.0958304405212402 | train/explained_variance = 0.9052915200591087 | train/n_updates = 90 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 22528 | time/iterations = 10 | rollout/ep_rew_mean = 146.91 | rollout/ep_len_mean = 146.91 | time/fps = 352 | time/time_elapsed = 58 | time/total_timesteps = 20480 | train/learning_rate = 0.0003 | train/entropy_loss = -0.55838915547356 | train/policy_gradient_loss = -0.008732947133103153 | train/value_loss = 48.26576453149319 | train/approx_kl = 0.005845913663506508 | train/clip_fraction = 0.065673828125 | train/loss = 11.673324584960938 | train/explained_variance = 0.7672396898269653 | train/n_updates = 90 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 22528 | time/iterations = 10 | rollout/ep_rew_mean = 143.68 | rollout/ep_len_mean = 143.68 | time/fps = 351 | time/time_elapsed = 58 | time/total_timesteps = 20480 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5832941999658943 | train/policy_gradient_loss = -0.010998867846501526 | train/value_loss = 21.56470604687929 | train/approx_kl = 0.006126352585852146 | train/clip_fraction = 0.073388671875 | train/loss = 3.2158937454223633 | train/explained_variance = 0.8782470673322678 | train/n_updates = 90 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 1]] | max_global_step = 24576 | time/iterations = 11 | rollout/ep_rew_mean = 157.38 | rollout/ep_len_mean = 157.38 | time/fps = 354 | time/time_elapsed = 63 | time/total_timesteps = 22528 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5647322304546833 | train/policy_gradient_loss = -0.007764048119133804 | train/value_loss = 52.263426271080974 | train/approx_kl = 0.007562276907265186 | train/clip_fraction = 0.090771484375 | train/loss = 12.511579513549805 | train/explained_variance = 0.777026578783989 | train/n_updates = 100 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 2]] | max_global_step = 24576 | time/iterations = 11 | rollout/ep_rew_mean = 164.8 | rollout/ep_len_mean = 164.8 | time/fps = 352 | time/time_elapsed = 63 | time/total_timesteps = 22528 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5593959849327803 | train/policy_gradient_loss = -0.0112069135720958 | train/value_loss = 45.52521513402462 | train/approx_kl = 0.012146038934588432 | train/clip_fraction = 0.162939453125 | train/loss = 35.99325180053711 | train/explained_variance = 0.779657244682312 | train/n_updates = 100 | train/clip_range = 0.2 |
[INFO] 15:59: [PPO[worker: 0]] | max_global_step = 24576 | time/iterations = 11 | rollout/ep_rew_mean = 163.31 | rollout/ep_len_mean = 163.31 | time/fps = 351 | time/time_elapsed = 64 | time/total_timesteps = 22528 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5535886317491532 | train/policy_gradient_loss = -0.003764605871401727 | train/value_loss = 76.87041089832783 | train/approx_kl = 0.007615496404469013 | train/clip_fraction = 0.03154296875 | train/loss = 45.99373245239258 | train/explained_variance = 0.35959136486053467 | train/n_updates = 100 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 26624 | time/iterations = 12 | rollout/ep_rew_mean = 175.06 | rollout/ep_len_mean = 175.06 | time/fps = 353 | time/time_elapsed = 69 | time/total_timesteps = 24576 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5552531754598021 | train/policy_gradient_loss = -0.005408551605069078 | train/value_loss = 53.160164260864256 | train/approx_kl = 0.005178069695830345 | train/clip_fraction = 0.025634765625 | train/loss = 37.7170295715332 | train/explained_variance = 0.7826626151800156 | train/n_updates = 110 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 26624 | time/iterations = 12 | rollout/ep_rew_mean = 182.79 | rollout/ep_len_mean = 182.79 | time/fps = 351 | time/time_elapsed = 69 | time/total_timesteps = 24576 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5358171337284148 | train/policy_gradient_loss = -0.00488179410531302 | train/value_loss = 8.18989806524478 | train/approx_kl = 0.0034808891359716654 | train/clip_fraction = 0.06181640625 | train/loss = 0.12967732548713684 | train/explained_variance = 0.16228169202804565 | train/n_updates = 110 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 26624 | time/iterations = 12 | rollout/ep_rew_mean = 178.05 | rollout/ep_len_mean = 178.05 | time/fps = 350 | time/time_elapsed = 70 | time/total_timesteps = 24576 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5562285710126161 | train/policy_gradient_loss = -0.004001504971529357 | train/value_loss = 32.59976389706135 | train/approx_kl = 0.002194597851485014 | train/clip_fraction = 0.021923828125 | train/loss = 4.274383068084717 | train/explained_variance = 0.8627262711524963 | train/n_updates = 110 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 28672 | time/iterations = 13 | rollout/ep_rew_mean = 192.64 | rollout/ep_len_mean = 192.64 | time/fps = 353 | time/time_elapsed = 75 | time/total_timesteps = 26624 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5598000731319189 | train/policy_gradient_loss = -0.004672619019402191 | train/value_loss = 27.96599825024605 | train/approx_kl = 0.0037293194327503443 | train/clip_fraction = 0.034814453125 | train/loss = 11.138860702514648 | train/explained_variance = 0.9212513640522957 | train/n_updates = 120 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 28672 | time/iterations = 13 | rollout/ep_rew_mean = 201.19 | rollout/ep_len_mean = 201.19 | time/fps = 351 | time/time_elapsed = 75 | time/total_timesteps = 26624 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5327114884741604 | train/policy_gradient_loss = -0.002263979368581204 | train/value_loss = 62.354254606366155 | train/approx_kl = 0.0018954614643007517 | train/clip_fraction = 0.00458984375 | train/loss = 49.1828498840332 | train/explained_variance = 0.01690804958343506 | train/n_updates = 120 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 28672 | time/iterations = 13 | rollout/ep_rew_mean = 194.91 | rollout/ep_len_mean = 194.91 | time/fps = 350 | time/time_elapsed = 76 | time/total_timesteps = 26624 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5350399187766015 | train/policy_gradient_loss = -0.010988622946024406 | train/value_loss = 32.65103582441807 | train/approx_kl = 0.012016495689749718 | train/clip_fraction = 0.100732421875 | train/loss = 5.403335094451904 | train/explained_variance = 0.8912321701645851 | train/n_updates = 120 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 30720 | time/iterations = 14 | rollout/ep_rew_mean = 208.91 | rollout/ep_len_mean = 208.91 | time/fps = 350 | time/time_elapsed = 81 | time/total_timesteps = 28672 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5623490344733 | train/policy_gradient_loss = -0.007931908047612523 | train/value_loss = 25.936047033965586 | train/approx_kl = 0.004620042629539967 | train/clip_fraction = 0.0498046875 | train/loss = 2.1860785484313965 | train/explained_variance = 0.8344163149595261 | train/n_updates = 130 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 30720 | time/iterations = 14 | rollout/ep_rew_mean = 218.48 | rollout/ep_len_mean = 218.48 | time/fps = 349 | time/time_elapsed = 82 | time/total_timesteps = 28672 | train/learning_rate = 0.0003 | train/entropy_loss = -0.514452669210732 | train/policy_gradient_loss = -0.0014171435825119261 | train/value_loss = 4.242323934612796 | train/approx_kl = 0.006443873047828674 | train/clip_fraction = 0.032568359375 | train/loss = 0.14281976222991943 | train/explained_variance = -0.007233858108520508 | train/n_updates = 130 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 30720 | time/iterations = 14 | rollout/ep_rew_mean = 211.74 | rollout/ep_len_mean = 211.74 | time/fps = 348 | time/time_elapsed = 82 | time/total_timesteps = 28672 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5367143749259412 | train/policy_gradient_loss = -0.01454816997575108 | train/value_loss = 10.89514188542962 | train/approx_kl = 0.009338829666376114 | train/clip_fraction = 0.122314453125 | train/loss = 5.663129806518555 | train/explained_variance = 0.9458933025598526 | train/n_updates = 130 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 32768 | time/iterations = 15 | rollout/ep_rew_mean = 225.59 | rollout/ep_len_mean = 225.59 | time/fps = 349 | time/time_elapsed = 87 | time/total_timesteps = 30720 | train/learning_rate = 0.0003 | train/entropy_loss = -0.539276737626642 | train/policy_gradient_loss = -0.0037407161165901926 | train/value_loss = 26.433760127052665 | train/approx_kl = 0.013978826813399792 | train/clip_fraction = 0.064990234375 | train/loss = 0.37698429822921753 | train/explained_variance = 0.035490989685058594 | train/n_updates = 140 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 32768 | time/iterations = 15 | rollout/ep_rew_mean = 235.66 | rollout/ep_len_mean = 235.66 | time/fps = 348 | time/time_elapsed = 88 | time/total_timesteps = 30720 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4973093102686107 | train/policy_gradient_loss = -0.012424326899053994 | train/value_loss = 2.345036637177691 | train/approx_kl = 0.008750807493925095 | train/clip_fraction = 0.106884765625 | train/loss = 0.18800251185894012 | train/explained_variance = 0.7767911404371262 | train/n_updates = 140 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 32768 | time/iterations = 15 | rollout/ep_rew_mean = 230.19 | rollout/ep_len_mean = 230.19 | time/fps = 346 | time/time_elapsed = 88 | time/total_timesteps = 30720 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5259521684609354 | train/policy_gradient_loss = -0.02139304491574876 | train/value_loss = 4.581413919106126 | train/approx_kl = 0.012810716405510902 | train/clip_fraction = 0.201708984375 | train/loss = 0.7710778713226318 | train/explained_variance = 0.8799830973148346 | train/n_updates = 140 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 34816 | time/iterations = 16 | rollout/ep_rew_mean = 243.47 | rollout/ep_len_mean = 243.47 | time/fps = 348 | time/time_elapsed = 93 | time/total_timesteps = 32768 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5016440353356302 | train/policy_gradient_loss = -0.005388573392338003 | train/value_loss = 1.6560175356687978 | train/approx_kl = 0.0062754955142736435 | train/clip_fraction = 0.066552734375 | train/loss = 0.09270089864730835 | train/explained_variance = 0.12096387147903442 | train/n_updates = 150 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 34816 | time/iterations = 16 | rollout/ep_rew_mean = 252.56 | rollout/ep_len_mean = 252.56 | time/fps = 347 | time/time_elapsed = 94 | time/total_timesteps = 32768 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4738074015825987 | train/policy_gradient_loss = -0.0019494367443257943 | train/value_loss = 1.4576879689877387 | train/approx_kl = 0.005790143273770809 | train/clip_fraction = 0.037353515625 | train/loss = 0.18722578883171082 | train/explained_variance = 0.35352087020874023 | train/n_updates = 150 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 34816 | time/iterations = 16 | rollout/ep_rew_mean = 246.37 | rollout/ep_len_mean = 246.37 | time/fps = 345 | time/time_elapsed = 94 | time/total_timesteps = 32768 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5140718438662588 | train/policy_gradient_loss = -0.0004109115216124337 | train/value_loss = 1.4875038336322177 | train/approx_kl = 0.0043577756732702255 | train/clip_fraction = 0.02763671875 | train/loss = 0.17618514597415924 | train/explained_variance = -0.08825933933258057 | train/n_updates = 150 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 36864 | time/iterations = 17 | rollout/ep_rew_mean = 257.95 | rollout/ep_len_mean = 257.95 | time/fps = 345 | time/time_elapsed = 100 | time/total_timesteps = 34816 | train/learning_rate = 0.0003 | train/entropy_loss = -0.505179504211992 | train/policy_gradient_loss = -0.0033724807828548363 | train/value_loss = 1.052925960079301 | train/approx_kl = 0.01005391776561737 | train/clip_fraction = 0.107958984375 | train/loss = 0.09074155241250992 | train/explained_variance = -0.022495508193969727 | train/n_updates = 160 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 36864 | time/iterations = 17 | rollout/ep_rew_mean = 269.16 | rollout/ep_len_mean = 269.16 | time/fps = 343 | time/time_elapsed = 101 | time/total_timesteps = 34816 | train/learning_rate = 0.0003 | train/entropy_loss = -0.48579485388472676 | train/policy_gradient_loss = 4.9661558296065775e-05 | train/value_loss = 0.9125513993494678 | train/approx_kl = 0.005249223671853542 | train/clip_fraction = 0.029833984375 | train/loss = 0.011615638621151447 | train/explained_variance = 0.20920252799987793 | train/n_updates = 160 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 36864 | time/iterations = 17 | rollout/ep_rew_mean = 262.82 | rollout/ep_len_mean = 262.82 | time/fps = 342 | time/time_elapsed = 101 | time/total_timesteps = 34816 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5235147284343838 | train/policy_gradient_loss = -0.003425118201994337 | train/value_loss = 1.1361884556215955 | train/approx_kl = 0.005556339398026466 | train/clip_fraction = 0.04345703125 | train/loss = 0.04526910558342934 | train/explained_variance = 0.0790131688117981 | train/n_updates = 160 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 38912 | time/iterations = 18 | rollout/ep_rew_mean = 279.36 | rollout/ep_len_mean = 279.36 | time/fps = 342 | time/time_elapsed = 107 | time/total_timesteps = 36864 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5124423679895699 | train/policy_gradient_loss = -9.320563549408689e-05 | train/value_loss = 0.6837369541579392 | train/approx_kl = 0.0015420113923028111 | train/clip_fraction = 0.011376953125 | train/loss = 0.048248302191495895 | train/explained_variance = 0.026345491409301758 | train/n_updates = 170 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 38912 | time/iterations = 18 | rollout/ep_rew_mean = 291.19 | rollout/ep_len_mean = 291.19 | time/fps = 341 | time/time_elapsed = 107 | time/total_timesteps = 36864 | train/learning_rate = 0.0003 | train/entropy_loss = -0.49892428508028386 | train/policy_gradient_loss = -0.0013376812363276257 | train/value_loss = 0.5619548875140026 | train/approx_kl = 0.005291177425533533 | train/clip_fraction = 0.031787109375 | train/loss = 0.08444305509328842 | train/explained_variance = -0.06384599208831787 | train/n_updates = 170 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 38912 | time/iterations = 18 | rollout/ep_rew_mean = 280.12 | rollout/ep_len_mean = 280.12 | time/fps = 339 | time/time_elapsed = 108 | time/total_timesteps = 36864 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5188016330823302 | train/policy_gradient_loss = -0.0005724920614738948 | train/value_loss = 0.6982153896708041 | train/approx_kl = 0.0033194604329764843 | train/clip_fraction = 0.013623046875 | train/loss = 0.05807049572467804 | train/explained_variance = 0.02944713830947876 | train/n_updates = 170 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 40960 | time/iterations = 19 | rollout/ep_rew_mean = 294.87 | rollout/ep_len_mean = 294.87 | time/fps = 340 | time/time_elapsed = 114 | time/total_timesteps = 38912 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4957636919803917 | train/policy_gradient_loss = -0.004073993970087031 | train/value_loss = 0.4760114259843249 | train/approx_kl = 0.008029351010918617 | train/clip_fraction = 0.062060546875 | train/loss = 0.03772534430027008 | train/explained_variance = 0.0035400986671447754 | train/n_updates = 180 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 40960 | time/iterations = 19 | rollout/ep_rew_mean = 307.4 | rollout/ep_len_mean = 307.4 | time/fps = 338 | time/time_elapsed = 114 | time/total_timesteps = 38912 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5157848816365004 | train/policy_gradient_loss = -0.0030665539947221988 | train/value_loss = 0.336146240857488 | train/approx_kl = 0.006352574564516544 | train/clip_fraction = 0.05068359375 | train/loss = 0.04088807851076126 | train/explained_variance = 0.8884187638759613 | train/n_updates = 180 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 40960 | time/iterations = 19 | rollout/ep_rew_mean = 294.72 | rollout/ep_len_mean = 294.72 | time/fps = 337 | time/time_elapsed = 115 | time/total_timesteps = 38912 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5111317873932422 | train/policy_gradient_loss = -0.0015660247969208284 | train/value_loss = 0.431194728880655 | train/approx_kl = 0.0047972844913601875 | train/clip_fraction = 0.030078125 | train/loss = 0.02516128309071064 | train/explained_variance = -0.002133488655090332 | train/n_updates = 180 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 1]] | max_global_step = 43008 | time/iterations = 20 | rollout/ep_rew_mean = 309.13 | rollout/ep_len_mean = 309.13 | time/fps = 338 | time/time_elapsed = 121 | time/total_timesteps = 40960 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4922599596902728 | train/policy_gradient_loss = -0.00019939174962928518 | train/value_loss = 0.27829485264082904 | train/approx_kl = 0.0020252331160008907 | train/clip_fraction = 0.00849609375 | train/loss = 0.003118633758276701 | train/explained_variance = 0.016220271587371826 | train/n_updates = 190 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 2]] | max_global_step = 43008 | time/iterations = 20 | rollout/ep_rew_mean = 325.01 | rollout/ep_len_mean = 325.01 | time/fps = 336 | time/time_elapsed = 121 | time/total_timesteps = 40960 | train/learning_rate = 0.0003 | train/entropy_loss = -0.48774116234853865 | train/policy_gradient_loss = -0.0037827152031240986 | train/value_loss = 0.19911157262977214 | train/approx_kl = 0.0032185050658881664 | train/clip_fraction = 0.030908203125 | train/loss = -0.013709803111851215 | train/explained_variance = 0.26044702529907227 | train/n_updates = 190 | train/clip_range = 0.2 |
[INFO] 16:00: [PPO[worker: 0]] | max_global_step = 43008 | time/iterations = 20 | rollout/ep_rew_mean = 311.67 | rollout/ep_len_mean = 311.67 | time/fps = 335 | time/time_elapsed = 122 | time/total_timesteps = 40960 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4999147373251617 | train/policy_gradient_loss = -0.0014124810899375007 | train/value_loss = 0.2843351167524816 | train/approx_kl = 0.005678324960172176 | train/clip_fraction = 0.02919921875 | train/loss = 0.020313650369644165 | train/explained_variance = 0.055005550384521484 | train/n_updates = 190 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 1]] | max_global_step = 45056 | time/iterations = 21 | rollout/ep_rew_mean = 321.96 | rollout/ep_len_mean = 321.96 | time/fps = 336 | time/time_elapsed = 127 | time/total_timesteps = 43008 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5042375044897198 | train/policy_gradient_loss = -0.0011692596512148158 | train/value_loss = 0.16990109027537983 | train/approx_kl = 0.0032151443883776665 | train/clip_fraction = 0.0251953125 | train/loss = 0.04340684413909912 | train/explained_variance = -0.01492154598236084 | train/n_updates = 200 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 2]] | max_global_step = 45056 | time/iterations = 21 | rollout/ep_rew_mean = 338.59 | rollout/ep_len_mean = 338.59 | time/fps = 334 | time/time_elapsed = 128 | time/total_timesteps = 43008 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5080808162689209 | train/policy_gradient_loss = -0.003549698476854246 | train/value_loss = 0.1296787588755251 | train/approx_kl = 0.00621542613953352 | train/clip_fraction = 0.05302734375 | train/loss = 0.04627562314271927 | train/explained_variance = 0.0931699275970459 | train/n_updates = 200 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 0]] | max_global_step = 45056 | time/iterations = 21 | rollout/ep_rew_mean = 327.08 | rollout/ep_len_mean = 327.08 | time/fps = 333 | time/time_elapsed = 129 | time/total_timesteps = 43008 | train/learning_rate = 0.0003 | train/entropy_loss = -0.49875728664919733 | train/policy_gradient_loss = -0.0030554209108231587 | train/value_loss = 0.17959667765753692 | train/approx_kl = 0.006197799928486347 | train/clip_fraction = 0.040576171875 | train/loss = 0.017518581822514534 | train/explained_variance = 0.0012366771697998047 | train/n_updates = 200 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 1]] | max_global_step = 47104 | time/iterations = 22 | rollout/ep_rew_mean = 336.75 | rollout/ep_len_mean = 336.75 | time/fps = 332 | time/time_elapsed = 135 | time/total_timesteps = 45056 | train/learning_rate = 0.0003 | train/entropy_loss = -0.47473918814212085 | train/policy_gradient_loss = -0.000753292843728559 | train/value_loss = 0.10499831844717847 | train/approx_kl = 0.0022293308284133673 | train/clip_fraction = 0.022412109375 | train/loss = 0.002592694014310837 | train/explained_variance = -0.007587909698486328 | train/n_updates = 210 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 2]] | max_global_step = 47104 | time/iterations = 22 | rollout/ep_rew_mean = 354.33 | rollout/ep_len_mean = 354.33 | time/fps = 331 | time/time_elapsed = 135 | time/total_timesteps = 45056 | train/learning_rate = 0.0003 | train/entropy_loss = -0.495706963352859 | train/policy_gradient_loss = -0.000426401813456323 | train/value_loss = 0.08446975928891334 | train/approx_kl = 0.0015652569709345698 | train/clip_fraction = 0.008056640625 | train/loss = 0.010549742728471756 | train/explained_variance = 0.0012684464454650879 | train/n_updates = 210 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 0]] | max_global_step = 47104 | time/iterations = 22 | rollout/ep_rew_mean = 340.34 | rollout/ep_len_mean = 340.34 | time/fps = 330 | time/time_elapsed = 136 | time/total_timesteps = 45056 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4880263367667794 | train/policy_gradient_loss = -0.0010256466252030806 | train/value_loss = 0.11710972856089938 | train/approx_kl = 0.0032306264620274305 | train/clip_fraction = 0.0166015625 | train/loss = -6.178580224514008e-05 | train/explained_variance = -0.04952669143676758 | train/n_updates = 210 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 1]] | max_global_step = 49152 | time/iterations = 23 | rollout/ep_rew_mean = 351.07 | rollout/ep_len_mean = 351.07 | time/fps = 328 | time/time_elapsed = 143 | time/total_timesteps = 47104 | train/learning_rate = 0.0003 | train/entropy_loss = -0.46606638189405203 | train/policy_gradient_loss = -0.0012806903061573394 | train/value_loss = 0.06755283990169118 | train/approx_kl = 0.00338670052587986 | train/clip_fraction = 0.015380859375 | train/loss = 0.011496410705149174 | train/explained_variance = 0.011872351169586182 | train/n_updates = 220 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 2]] | max_global_step = 49152 | time/iterations = 23 | rollout/ep_rew_mean = 369.72 | rollout/ep_len_mean = 369.72 | time/fps = 327 | time/time_elapsed = 143 | time/total_timesteps = 47104 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4813957496546209 | train/policy_gradient_loss = -0.0005197725258767605 | train/value_loss = 0.05070137128532224 | train/approx_kl = 0.0011682924814522266 | train/clip_fraction = 0.011328125 | train/loss = 0.014267145656049252 | train/explained_variance = 0.02635061740875244 | train/n_updates = 220 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 0]] | max_global_step = 49152 | time/iterations = 23 | rollout/ep_rew_mean = 357.6 | rollout/ep_len_mean = 357.6 | time/fps = 325 | time/time_elapsed = 144 | time/total_timesteps = 47104 | train/learning_rate = 0.0003 | train/entropy_loss = -0.5085351384244859 | train/policy_gradient_loss = -0.0008456365059828386 | train/value_loss = 0.07298006257788074 | train/approx_kl = 0.0026025455445051193 | train/clip_fraction = 0.022119140625 | train/loss = 0.021276511251926422 | train/explained_variance = -0.025399088859558105 | train/n_updates = 220 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 1]] | max_global_step = 51200 | time/iterations = 24 | rollout/ep_rew_mean = 361.78 | rollout/ep_len_mean = 361.78 | time/fps = 325 | time/time_elapsed = 151 | time/total_timesteps = 49152 | train/learning_rate = 0.0003 | train/entropy_loss = -0.4632605144754052 | train/policy_gradient_loss = -0.0030253833654569464 | train/value_loss = 0.04603174216354091 | train/approx_kl = 0.005220792256295681 | train/clip_fraction = 0.06181640625 | train/loss = 0.01565437763929367 | train/explained_variance = -0.0232236385345459 | train/n_updates = 230 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 2]] | max_global_step = 51200 | time/iterations = 24 | rollout/ep_rew_mean = 383.75 | rollout/ep_len_mean = 383.75 | time/fps = 323 | time/time_elapsed = 151 | time/total_timesteps = 49152 | train/learning_rate = 0.0003 | train/entropy_loss = -0.49069994343444706 | train/policy_gradient_loss = -0.0018766895205772015 | train/value_loss = 0.03171468693126371 | train/approx_kl = 0.005815165117383003 | train/clip_fraction = 0.061181640625 | train/loss = 0.0051743886433541775 | train/explained_variance = 0.12434303760528564 | train/n_updates = 230 | train/clip_range = 0.2 |
[INFO] 16:01: [PPO[worker: 0]] | max_global_step = 51200 | time/iterations = 24 | rollout/ep_rew_mean = 370.78 | rollout/ep_len_mean = 370.78 | time/fps = 322 | time/time_elapsed = 152 | time/total_timesteps = 49152 | train/learning_rate = 0.0003 | train/entropy_loss = -0.49728800179436805 | train/policy_gradient_loss = -0.002779247868602397 | train/value_loss = 0.04587990254440229 | train/approx_kl = 0.005585251376032829 | train/clip_fraction = 0.0578125 | train/loss = 0.01021644752472639 | train/explained_variance = 0.028099477291107178 | train/n_updates = 230 | train/clip_range = 0.2 |
[INFO] 16:01: ... trained!
[INFO] 16:01: Saved ExperimentManager(PPO) using pickle.
[INFO] 16:01: The ExperimentManager was saved in : 'rlberry_data/temp/manager_data/PPO_2024-06-28_15-58-48_4fc693bc/manager_obj.pickle'
Backend tkagg is interactive backend. Turning interactive mode on.

The easy way to display information is to use rlberry.manager.plot_writer_data as in the visualization page, but instead you can use rlberry.manager.read_writer_data function to extract the information into dataframe :

df = read_writer_data([manager])

Then, you can use your own tools to display whatever you like.

To illustrate this, we plot the rollout reward mean of PPO coming from 3 fit using matplotlib.

figure, ax = plt.subplots(1, 1)

for n_simu in df["n_simu"].unique():
    to_plot_df = df.loc[
        (df["tag"] == "rollout/ep_rew_mean") & (df["n_simu"] == n_simu),
        ["global_step", "value"],
    ]
    ax.plot(to_plot_df["global_step"], to_plot_df["value"])

ax.set_xlabel("steps")
ax.set_ylabel("rewards")
plt.show()

image

In this previous example, rlberry.manager.read_writer_data had an ExperimentManager as data_source, but rlberry.manager.read_writer_data can also take as input a list of ExperimentManager (if you need data on more than one experimentManager), or a path(String) to a directory containing pickle files of an ExperimentManager.

Default writer

Of course, the information contained in the writer (the result of the rlberry.manager.read_writer_data function) depends on how it has been configured and what the agent has recorded in it.

In the default writer you have the following information :

  • name : Name of the agent

  • tag : The type/name of the information (depending of the agent logging policy, in our previous example it was rollout/ep_rew_mean from the PPO stableBaselines3 agent )

  • value : The value of the information

  • dw_time_elapsed : Time elapsed since writer initialization

  • global_step : Step at which the value was added.

  • n_simu : Added by rlberry.manager.read_writer_data, n_simu is an integer identifying the agent (if you use fit>1, you will have information on more than 1 agent in your writer.)

How to import data from tensorboard?

Maybe you want to use other tools to train your agents, but you want to use rlberry tools for visualisation and/or statistical tests. If your training is compatible with tensorboard, you can load the data in a pandas dataframes to use them in rlberry. To do that, you can use the tool tensorboard_to_dataframe. There are two input formats for the tensorboard data :

Option 1: via a directory

Be careful about this 2 things:

  • The folder containing tensorboard results must respect the following tree structure : <tensorboard_log_folder/algo_name/n_simu/events.out.tfevents.xxxxx>

  • You must have only one file (event.out.tfenvent.xxx) by leaf folder(n_simu), only the first one would be imported !

For instance, suppose you do the following training with stablebaseline, and log with tensorboard :

from stable_baselines3 import PPO
from stable_baselines3 import A2C

log_path = "./log"
path_ppo = str(log_path + "/PPO_cartpole/")
path_a2c = str(log_path + "/A2C_cartpole/")

model = PPO("MlpPolicy", "CartPole-v1", tensorboard_log=path_ppo)
model2 = A2C("MlpPolicy", "CartPole-v1", tensorboard_log=path_a2c)
model2_seed2 = A2C("MlpPolicy", "CartPole-v1", tensorboard_log=path_a2c)
model.learn(total_timesteps=5_000, tb_log_name="ppo")
model2.learn(total_timesteps=5_000, tb_log_name="A2C")
model2_seed2.learn(total_timesteps=5_000, tb_log_name="A2C")

Then, to convert these logs in a pandas dataframe, you can use the tool tensorboard_to_dataframe. It will give you a Dict with all the scalar data from the tensorboad folder.

  • The keys will be the “tag” (the name of the measure)

  • the values will be the dataframe with 4 columns : [“name”, “n_simu”, “x”, “y”] (respectively “name of the algorithm”, “seed number”, “step number”, and “measure value” )

from rlberry.manager import tensorboard_to_dataframe

data_in_dataframe = tensorboard_to_dataframe(log_path)

print(data_in_dataframe.keys())
print("-----------")
print(data_in_dataframe)
dict_keys(['rollout/ep_len_mean', 'rollout/ep_rew_mean', 'time/fps', 'train/approx_kl', 'train/clip_fraction', 'train/clip_range', 'train/entropy_loss', 'train/explained_variance', 'train/learning_rate', 'train/loss', 'train/policy_gradient_loss', 'train/value_loss', 'train/policy_loss'])
-----------
{'rollout/ep_len_mean':             name n_simu     x          y
0   PPO_cartpole  ppo_1  2048  22.898876
1   PPO_cartpole  ppo_1  4096  26.700001
2   PPO_cartpole  ppo_1  6144  36.810001
3   A2C_cartpole  A2C_1   500  40.090908
4   A2C_cartpole  A2C_1  1000  45.900002
5   A2C_cartpole  A2C_1  1500  50.724136
6   A2C_cartpole  A2C_1  2000  53.567566
7   A2C_cartpole  A2C_1  2500  55.266666
8   A2C_cartpole  A2C_1  3000  58.666668
9   A2C_cartpole  A2C_1  3500  61.018520
10  A2C_cartpole  A2C_1  4000  68.589287
11  A2C_cartpole  A2C_1  4500  73.114754
12  A2C_cartpole  A2C_1  5000  74.424240
13  A2C_cartpole  A2C_2   500  23.619047
14  A2C_cartpole  A2C_2  1000  23.951220
15  A2C_cartpole  A2C_2  1500  27.865385
16  A2C_cartpole  A2C_2  2000  33.000000
17  A2C_cartpole  A2C_2  2500  38.140625
18  A2C_cartpole  A2C_2  3000  43.405796
19  A2C_cartpole  A2C_2  3500  45.890411
20  A2C_cartpole  A2C_2  4000  49.720001
21  A2C_cartpole  A2C_2  4500  56.139240
22  A2C_cartpole  A2C_2  5000  60.402439,

 'rollout/ep_rew_mean':             name n_simu     x          y
0   PPO_cartpole  ppo_1  2048  22.898876
1   PPO_cartpole  ppo_1  4096  26.700001
2   PPO_cartpole  ppo_1  6144  36.810001
3   A2C_cartpole  A2C_1   500  40.090908
4   A2C_cartpole  A2C_1  1000  45.900002
5   A2C_cartpole  A2C_1  1500  50.724136
6   A2C_cartpole  A2C_1  2000  53.567566
7   A2C_cartpole  A2C_1  2500  55.266666
8   A2C_cartpole  A2C_1  3000  58.666668
9   A2C_cartpole  A2C_1  3500  61.018520
10  A2C_cartpole  A2C_1  4000  68.589287
11  A2C_cartpole  A2C_1  4500  73.114754
12  A2C_cartpole  A2C_1  5000  74.424240
13  A2C_cartpole  A2C_2   500  23.619047
14  A2C_cartpole  A2C_2  1000  23.951220
15  A2C_cartpole  A2C_2  1500  27.865385
16  A2C_cartpole  A2C_2  2000  33.000000
17  A2C_cartpole  A2C_2  2500  38.140625
18  A2C_cartpole  A2C_2  3000  43.405796
19  A2C_cartpole  A2C_2  3500  45.890411
20  A2C_cartpole  A2C_2  4000  49.720001
21  A2C_cartpole  A2C_2  4500  56.139240
22  A2C_cartpole  A2C_2  5000  60.402439,

 'time/fps':             name n_simu     x       y
0   PPO_cartpole  ppo_1  2048  3431.0
1   PPO_cartpole  ppo_1  4096  2396.0
2   PPO_cartpole  ppo_1  6144  2156.0
3   A2C_cartpole  A2C_1   500  1595.0
4   A2C_cartpole  A2C_1  1000  1614.0
5   A2C_cartpole  A2C_1  1500  1568.0
6   A2C_cartpole  A2C_1  2000  1553.0
7   A2C_cartpole  A2C_1  2500  1547.0
8   A2C_cartpole  A2C_1  3000  1530.0
9   A2C_cartpole  A2C_1  3500  1548.0
10  A2C_cartpole  A2C_1  4000  1558.0
11  A2C_cartpole  A2C_1  4500  1551.0
12  A2C_cartpole  A2C_1  5000  1556.0
13  A2C_cartpole  A2C_2   500  1628.0
14  A2C_cartpole  A2C_2  1000  1644.0
15  A2C_cartpole  A2C_2  1500  1561.0
16  A2C_cartpole  A2C_2  2000  1539.0
17  A2C_cartpole  A2C_2  2500  1547.0
18  A2C_cartpole  A2C_2  3000  1562.0
19  A2C_cartpole  A2C_2  3500  1572.0
20  A2C_cartpole  A2C_2  4000  1576.0
21  A2C_cartpole  A2C_2  4500  1586.0
22  A2C_cartpole  A2C_2  5000  1594.0,

 'train/approx_kl':            name n_simu     x         y
0  PPO_cartpole  ppo_1  4096  0.009280
1  PPO_cartpole  ppo_1  6144  0.009204,

 'train/clip_fraction':            name n_simu     x         y
0  PPO_cartpole  ppo_1  4096  0.128174
1  PPO_cartpole  ppo_1  6144  0.057813,

 'train/clip_range':            name n_simu     x    y
0  PPO_cartpole  ppo_1  4096  0.2
1  PPO_cartpole  ppo_1  6144  0.2,

 'train/entropy_loss':             name n_simu     x         y
0   PPO_cartpole  ppo_1  4096 -0.685331
1   PPO_cartpole  ppo_1  6144 -0.659614
2   A2C_cartpole  A2C_1   500 -0.615525
3   A2C_cartpole  A2C_1  1000 -0.484166
4   A2C_cartpole  A2C_1  1500 -0.565144
5   A2C_cartpole  A2C_1  2000 -0.511171
6   A2C_cartpole  A2C_1  2500 -0.551776
7   A2C_cartpole  A2C_1  3000 -0.503026
8   A2C_cartpole  A2C_1  3500 -0.617282
9   A2C_cartpole  A2C_1  4000 -0.261234
10  A2C_cartpole  A2C_1  4500 -0.417461
11  A2C_cartpole  A2C_1  5000 -0.633000
12  A2C_cartpole  A2C_2   500 -0.692809
13  A2C_cartpole  A2C_2  1000 -0.684999
14  A2C_cartpole  A2C_2  1500 -0.649449
15  A2C_cartpole  A2C_2  2000 -0.642278
16  A2C_cartpole  A2C_2  2500 -0.592125
17  A2C_cartpole  A2C_2  3000 -0.301056
18  A2C_cartpole  A2C_2  3500 -0.640023
19  A2C_cartpole  A2C_2  4000 -0.512887
20  A2C_cartpole  A2C_2  4500 -0.432308
21  A2C_cartpole  A2C_2  5000 -0.492796,

 'train/explained_variance':             name n_simu     x         y
0   PPO_cartpole  ppo_1  4096 -0.005981
1   PPO_cartpole  ppo_1  6144  0.095037
2   A2C_cartpole  A2C_1   500 -0.060004
3   A2C_cartpole  A2C_1  1000 -0.009993
4   A2C_cartpole  A2C_1  1500 -0.021823
5   A2C_cartpole  A2C_1  2000  0.001556
6   A2C_cartpole  A2C_1  2500 -0.003476
7   A2C_cartpole  A2C_1  3000  0.006280
8   A2C_cartpole  A2C_1  3500  0.001778
9   A2C_cartpole  A2C_1  4000  0.005313
10  A2C_cartpole  A2C_1  4500  0.002912
11  A2C_cartpole  A2C_1  5000  0.001874
12  A2C_cartpole  A2C_2   500  0.111738
13  A2C_cartpole  A2C_2  1000  0.078319
14  A2C_cartpole  A2C_2  1500  0.000760
15  A2C_cartpole  A2C_2  2000  0.009839
16  A2C_cartpole  A2C_2  2500  0.008209
17  A2C_cartpole  A2C_2  3000 -0.000845
18  A2C_cartpole  A2C_2  3500 -0.000841
19  A2C_cartpole  A2C_2  4000  0.000686
20  A2C_cartpole  A2C_2  4500  0.001162
21  A2C_cartpole  A2C_2  5000  0.000076,

 'train/learning_rate':             name n_simu     x       y
0   PPO_cartpole  ppo_1  4096  0.0003
1   PPO_cartpole  ppo_1  6144  0.0003
2   A2C_cartpole  A2C_1   500  0.0007
3   A2C_cartpole  A2C_1  1000  0.0007
4   A2C_cartpole  A2C_1  1500  0.0007
5   A2C_cartpole  A2C_1  2000  0.0007
6   A2C_cartpole  A2C_1  2500  0.0007
7   A2C_cartpole  A2C_1  3000  0.0007
8   A2C_cartpole  A2C_1  3500  0.0007
9   A2C_cartpole  A2C_1  4000  0.0007
10  A2C_cartpole  A2C_1  4500  0.0007
11  A2C_cartpole  A2C_1  5000  0.0007
12  A2C_cartpole  A2C_2   500  0.0007
13  A2C_cartpole  A2C_2  1000  0.0007
14  A2C_cartpole  A2C_2  1500  0.0007
15  A2C_cartpole  A2C_2  2000  0.0007
16  A2C_cartpole  A2C_2  2500  0.0007
17  A2C_cartpole  A2C_2  3000  0.0007
18  A2C_cartpole  A2C_2  3500  0.0007
19  A2C_cartpole  A2C_2  4000  0.0007
20  A2C_cartpole  A2C_2  4500  0.0007
21  A2C_cartpole  A2C_2  5000  0.0007,

 'train/loss':            name n_simu     x          y
0  PPO_cartpole  ppo_1  4096   6.982748
1  PPO_cartpole  ppo_1  6144  13.480467,

 'train/policy_gradient_loss':            name n_simu     x         y
0  PPO_cartpole  ppo_1  4096 -0.022298
1  PPO_cartpole  ppo_1  6144 -0.016617,

 'train/value_loss':             name n_simu     x            y
0   PPO_cartpole  ppo_1  4096    54.930149
1   PPO_cartpole  ppo_1  6144    32.751965
2   A2C_cartpole  A2C_1   500     9.222057
3   A2C_cartpole  A2C_1  1000     7.639998
4   A2C_cartpole  A2C_1  1500     6.368935
5   A2C_cartpole  A2C_1  2000     5.560571
6   A2C_cartpole  A2C_1  2500     5.007382
7   A2C_cartpole  A2C_1  3000   469.051453
8   A2C_cartpole  A2C_1  3500     3.818318
9   A2C_cartpole  A2C_1  4000     3.285388
10  A2C_cartpole  A2C_1  4500     2.823058
11  A2C_cartpole  A2C_1  5000     2.386893
12  A2C_cartpole  A2C_2   500     8.672586
13  A2C_cartpole  A2C_2  1000     6.938823
14  A2C_cartpole  A2C_2  1500     6.459139
15  A2C_cartpole  A2C_2  2000     5.905715
16  A2C_cartpole  A2C_2  2500     5.079061
17  A2C_cartpole  A2C_2  3000  1009.296082
18  A2C_cartpole  A2C_2  3500     3.968157
19  A2C_cartpole  A2C_2  4000     3.429344
20  A2C_cartpole  A2C_2  4500     2.945411
21  A2C_cartpole  A2C_2  5000     2.487410,

 'train/policy_loss':             name n_simu     x          y
0   A2C_cartpole  A2C_1   500   1.682467
1   A2C_cartpole  A2C_1  1000   1.788085
2   A2C_cartpole  A2C_1  1500   0.925050
3   A2C_cartpole  A2C_1  2000   0.615906
4   A2C_cartpole  A2C_1  2500   0.801314
5   A2C_cartpole  A2C_1  3000  -2.096942
6   A2C_cartpole  A2C_1  3500   1.006535
7   A2C_cartpole  A2C_1  4000   1.268059
8   A2C_cartpole  A2C_1  4500   0.521781
9   A2C_cartpole  A2C_1  5000   0.593369
10  A2C_cartpole  A2C_2   500   1.878575
11  A2C_cartpole  A2C_2  1000   1.407964
12  A2C_cartpole  A2C_2  1500   1.321871
13  A2C_cartpole  A2C_2  2000   1.198855
14  A2C_cartpole  A2C_2  2500   0.724112
15  A2C_cartpole  A2C_2  3000 -24.444633
16  A2C_cartpole  A2C_2  3500   0.851452
17  A2C_cartpole  A2C_2  4000   1.169502
18  A2C_cartpole  A2C_2  4500   1.198329
19  A2C_cartpole  A2C_2  5000   0.700427}

Option 2: via a Dict

In tensorboard_to_dataframe, you can also use a Dict as input. The Dict must have the algo_name in keys, and a list of path in values (path to the events.out.tfevents file). In the list, the position of the path will be consider as the n_simu

# creating the dic
import os

folder_ppo_1 = str(path_ppo + "ppo_1/")
folder_A2C_1 = str(path_a2c + "A2C_1/")
folder_A2C_2 = str(path_a2c + "A2C_2/")

path_event_ppo_1 = str(folder_ppo_1 + os.listdir(folder_ppo_1)[0])
path_event_A2C_1 = str(folder_A2C_1 + os.listdir(folder_A2C_1)[0])
path_event_A2C_2 = str(folder_A2C_2 + os.listdir(folder_A2C_2)[0])

input_dict = {
    "ppo_cartpole_tensorboard": [path_event_ppo_1],
    "a2c_cartpole_tensorboard": [path_event_A2C_1, path_event_A2C_2],
}


# same function
data_in_dataframe2 = tensorboard_to_dataframe(input_dict)

# same results
print(data_in_dataframe2.keys())
print("-----------")
print(data_in_dataframe2)

dict_keys(['rollout/ep_len_mean', 'rollout/ep_rew_mean', 'time/fps', 'train/approx_kl', 'train/clip_fraction', 'train/clip_range', 'train/entropy_loss', 'train/explained_variance', 'train/learning_rate', 'train/loss', 'train/policy_gradient_loss', 'train/value_loss', 'train/policy_loss'])
-----------
{'rollout/ep_len_mean':                         name  n_simu     x          y
0   ppo_cartpole_tensorboard       0  2048  22.898876
1   ppo_cartpole_tensorboard       0  4096  26.700001
2   ppo_cartpole_tensorboard       0  6144  36.810001
3   a2c_cartpole_tensorboard       0   500  40.090908
4   a2c_cartpole_tensorboard       0  1000  45.900002
5   a2c_cartpole_tensorboard       0  1500  50.724136
6   a2c_cartpole_tensorboard       0  2000  53.567566
7   a2c_cartpole_tensorboard       0  2500  55.266666
8   a2c_cartpole_tensorboard       0  3000  58.666668
9   a2c_cartpole_tensorboard       0  3500  61.018520
10  a2c_cartpole_tensorboard       0  4000  68.589287
11  a2c_cartpole_tensorboard       0  4500  73.114754
12  a2c_cartpole_tensorboard       0  5000  74.424240
13  a2c_cartpole_tensorboard       1   500  23.619047
14  a2c_cartpole_tensorboard       1  1000  23.951220
15  a2c_cartpole_tensorboard       1  1500  27.865385
16  a2c_cartpole_tensorboard       1  2000  33.000000
17  a2c_cartpole_tensorboard       1  2500  38.140625
18  a2c_cartpole_tensorboard       1  3000  43.405796
19  a2c_cartpole_tensorboard       1  3500  45.890411
20  a2c_cartpole_tensorboard       1  4000  49.720001
21  a2c_cartpole_tensorboard       1  4500  56.139240
22  a2c_cartpole_tensorboard       1  5000  60.402439,

 'rollout/ep_rew_mean':                         name  n_simu     x          y
0   ppo_cartpole_tensorboard       0  2048  22.898876
1   ppo_cartpole_tensorboard       0  4096  26.700001
2   ppo_cartpole_tensorboard       0  6144  36.810001
3   a2c_cartpole_tensorboard       0   500  40.090908
4   a2c_cartpole_tensorboard       0  1000  45.900002
5   a2c_cartpole_tensorboard       0  1500  50.724136
6   a2c_cartpole_tensorboard       0  2000  53.567566
7   a2c_cartpole_tensorboard       0  2500  55.266666
8   a2c_cartpole_tensorboard       0  3000  58.666668
9   a2c_cartpole_tensorboard       0  3500  61.018520
10  a2c_cartpole_tensorboard       0  4000  68.589287
11  a2c_cartpole_tensorboard       0  4500  73.114754
12  a2c_cartpole_tensorboard       0  5000  74.424240
13  a2c_cartpole_tensorboard       1   500  23.619047
14  a2c_cartpole_tensorboard       1  1000  23.951220
15  a2c_cartpole_tensorboard       1  1500  27.865385
16  a2c_cartpole_tensorboard       1  2000  33.000000
17  a2c_cartpole_tensorboard       1  2500  38.140625
18  a2c_cartpole_tensorboard       1  3000  43.405796
19  a2c_cartpole_tensorboard       1  3500  45.890411
20  a2c_cartpole_tensorboard       1  4000  49.720001
21  a2c_cartpole_tensorboard       1  4500  56.139240
22  a2c_cartpole_tensorboard       1  5000  60.402439,

 'time/fps':                         name  n_simu     x       y
0   ppo_cartpole_tensorboard       0  2048  3431.0
1   ppo_cartpole_tensorboard       0  4096  2396.0
2   ppo_cartpole_tensorboard       0  6144  2156.0
3   a2c_cartpole_tensorboard       0   500  1595.0
4   a2c_cartpole_tensorboard       0  1000  1614.0
5   a2c_cartpole_tensorboard       0  1500  1568.0
6   a2c_cartpole_tensorboard       0  2000  1553.0
7   a2c_cartpole_tensorboard       0  2500  1547.0
8   a2c_cartpole_tensorboard       0  3000  1530.0
9   a2c_cartpole_tensorboard       0  3500  1548.0
10  a2c_cartpole_tensorboard       0  4000  1558.0
11  a2c_cartpole_tensorboard       0  4500  1551.0
12  a2c_cartpole_tensorboard       0  5000  1556.0
13  a2c_cartpole_tensorboard       1   500  1628.0
14  a2c_cartpole_tensorboard       1  1000  1644.0
15  a2c_cartpole_tensorboard       1  1500  1561.0
16  a2c_cartpole_tensorboard       1  2000  1539.0
17  a2c_cartpole_tensorboard       1  2500  1547.0
18  a2c_cartpole_tensorboard       1  3000  1562.0
19  a2c_cartpole_tensorboard       1  3500  1572.0
20  a2c_cartpole_tensorboard       1  4000  1576.0
21  a2c_cartpole_tensorboard       1  4500  1586.0
22  a2c_cartpole_tensorboard       1  5000  1594.0,

 'train/approx_kl':                        name  n_simu     x         y
0  ppo_cartpole_tensorboard       0  4096  0.009280
1  ppo_cartpole_tensorboard       0  6144  0.009204,

 'train/clip_fraction':                        name  n_simu     x         y
0  ppo_cartpole_tensorboard       0  4096  0.128174
1  ppo_cartpole_tensorboard       0  6144  0.057813,

 'train/clip_range':                        name  n_simu     x    y
0  ppo_cartpole_tensorboard       0  4096  0.2
1  ppo_cartpole_tensorboard       0  6144  0.2,

 'train/entropy_loss':                         name  n_simu     x         y
0   ppo_cartpole_tensorboard       0  4096 -0.685331
1   ppo_cartpole_tensorboard       0  6144 -0.659614
2   a2c_cartpole_tensorboard       0   500 -0.615525
3   a2c_cartpole_tensorboard       0  1000 -0.484166
4   a2c_cartpole_tensorboard       0  1500 -0.565144
5   a2c_cartpole_tensorboard       0  2000 -0.511171
6   a2c_cartpole_tensorboard       0  2500 -0.551776
7   a2c_cartpole_tensorboard       0  3000 -0.503026
8   a2c_cartpole_tensorboard       0  3500 -0.617282
9   a2c_cartpole_tensorboard       0  4000 -0.261234
10  a2c_cartpole_tensorboard       0  4500 -0.417461
11  a2c_cartpole_tensorboard       0  5000 -0.633000
12  a2c_cartpole_tensorboard       1   500 -0.692809
13  a2c_cartpole_tensorboard       1  1000 -0.684999
14  a2c_cartpole_tensorboard       1  1500 -0.649449
15  a2c_cartpole_tensorboard       1  2000 -0.642278
16  a2c_cartpole_tensorboard       1  2500 -0.592125
17  a2c_cartpole_tensorboard       1  3000 -0.301056
18  a2c_cartpole_tensorboard       1  3500 -0.640023
19  a2c_cartpole_tensorboard       1  4000 -0.512887
20  a2c_cartpole_tensorboard       1  4500 -0.432308
21  a2c_cartpole_tensorboard       1  5000 -0.492796,

 'train/explained_variance':                         name  n_simu     x         y
0   ppo_cartpole_tensorboard       0  4096 -0.005981
1   ppo_cartpole_tensorboard       0  6144  0.095037
2   a2c_cartpole_tensorboard       0   500 -0.060004
3   a2c_cartpole_tensorboard       0  1000 -0.009993
4   a2c_cartpole_tensorboard       0  1500 -0.021823
5   a2c_cartpole_tensorboard       0  2000  0.001556
6   a2c_cartpole_tensorboard       0  2500 -0.003476
7   a2c_cartpole_tensorboard       0  3000  0.006280
8   a2c_cartpole_tensorboard       0  3500  0.001778
9   a2c_cartpole_tensorboard       0  4000  0.005313
10  a2c_cartpole_tensorboard       0  4500  0.002912
11  a2c_cartpole_tensorboard       0  5000  0.001874
12  a2c_cartpole_tensorboard       1   500  0.111738
13  a2c_cartpole_tensorboard       1  1000  0.078319
14  a2c_cartpole_tensorboard       1  1500  0.000760
15  a2c_cartpole_tensorboard       1  2000  0.009839
16  a2c_cartpole_tensorboard       1  2500  0.008209
17  a2c_cartpole_tensorboard       1  3000 -0.000845
18  a2c_cartpole_tensorboard       1  3500 -0.000841
19  a2c_cartpole_tensorboard       1  4000  0.000686
20  a2c_cartpole_tensorboard       1  4500  0.001162
21  a2c_cartpole_tensorboard       1  5000  0.000076,

 'train/learning_rate':                         name  n_simu     x       y
0   ppo_cartpole_tensorboard       0  4096  0.0003
1   ppo_cartpole_tensorboard       0  6144  0.0003
2   a2c_cartpole_tensorboard       0   500  0.0007
3   a2c_cartpole_tensorboard       0  1000  0.0007
4   a2c_cartpole_tensorboard       0  1500  0.0007
5   a2c_cartpole_tensorboard       0  2000  0.0007
6   a2c_cartpole_tensorboard       0  2500  0.0007
7   a2c_cartpole_tensorboard       0  3000  0.0007
8   a2c_cartpole_tensorboard       0  3500  0.0007
9   a2c_cartpole_tensorboard       0  4000  0.0007
10  a2c_cartpole_tensorboard       0  4500  0.0007
11  a2c_cartpole_tensorboard       0  5000  0.0007
12  a2c_cartpole_tensorboard       1   500  0.0007
13  a2c_cartpole_tensorboard       1  1000  0.0007
14  a2c_cartpole_tensorboard       1  1500  0.0007
15  a2c_cartpole_tensorboard       1  2000  0.0007
16  a2c_cartpole_tensorboard       1  2500  0.0007
17  a2c_cartpole_tensorboard       1  3000  0.0007
18  a2c_cartpole_tensorboard       1  3500  0.0007
19  a2c_cartpole_tensorboard       1  4000  0.0007
20  a2c_cartpole_tensorboard       1  4500  0.0007
21  a2c_cartpole_tensorboard       1  5000  0.0007,

 'train/loss':                        name  n_simu     x          y
0  ppo_cartpole_tensorboard       0  4096   6.982748
1  ppo_cartpole_tensorboard       0  6144  13.480467,

 'train/policy_gradient_loss':         name  n_simu     x         y
0  ppo_cartpole_tensorboard       0  4096 -0.022298
1  ppo_cartpole_tensorboard       0  6144 -0.016617,

 'train/value_loss':                   name  n_simu     x            y
0   ppo_cartpole_tensorboard       0  4096    54.930149
1   ppo_cartpole_tensorboard       0  6144    32.751965
2   a2c_cartpole_tensorboard       0   500     9.222057
3   a2c_cartpole_tensorboard       0  1000     7.639998
4   a2c_cartpole_tensorboard       0  1500     6.368935
5   a2c_cartpole_tensorboard       0  2000     5.560571
6   a2c_cartpole_tensorboard       0  2500     5.007382
7   a2c_cartpole_tensorboard       0  3000   469.051453
8   a2c_cartpole_tensorboard       0  3500     3.818318
9   a2c_cartpole_tensorboard       0  4000     3.285388
10  a2c_cartpole_tensorboard       0  4500     2.823058
11  a2c_cartpole_tensorboard       0  5000     2.386893
12  a2c_cartpole_tensorboard       1   500     8.672586
13  a2c_cartpole_tensorboard       1  1000     6.938823
14  a2c_cartpole_tensorboard       1  1500     6.459139
15  a2c_cartpole_tensorboard       1  2000     5.905715
16  a2c_cartpole_tensorboard       1  2500     5.079061
17  a2c_cartpole_tensorboard       1  3000  1009.296082
18  a2c_cartpole_tensorboard       1  3500     3.968157
19  a2c_cartpole_tensorboard       1  4000     3.429344
20  a2c_cartpole_tensorboard       1  4500     2.945411
21  a2c_cartpole_tensorboard       1  5000     2.487410,

 'train/policy_loss':                         name  n_simu     x          y
0   a2c_cartpole_tensorboard       0   500   1.682467
1   a2c_cartpole_tensorboard       0  1000   1.788085
2   a2c_cartpole_tensorboard       0  1500   0.925050
3   a2c_cartpole_tensorboard       0  2000   0.615906
4   a2c_cartpole_tensorboard       0  2500   0.801314
5   a2c_cartpole_tensorboard       0  3000  -2.096942
6   a2c_cartpole_tensorboard       0  3500   1.006535
7   a2c_cartpole_tensorboard       0  4000   1.268059
8   a2c_cartpole_tensorboard       0  4500   0.521781
9   a2c_cartpole_tensorboard       0  5000   0.593369
10  a2c_cartpole_tensorboard       1   500   1.878575
11  a2c_cartpole_tensorboard       1  1000   1.407964
12  a2c_cartpole_tensorboard       1  1500   1.321871
13  a2c_cartpole_tensorboard       1  2000   1.198855
14  a2c_cartpole_tensorboard       1  2500   0.724112
15  a2c_cartpole_tensorboard       1  3000 -24.444633
16  a2c_cartpole_tensorboard       1  3500   0.851452
17  a2c_cartpole_tensorboard       1  4000   1.169502
18  a2c_cartpole_tensorboard       1  4500   1.198329
19  a2c_cartpole_tensorboard       1  5000   0.700427}