A demo of PPO algorithm in PBall2D environment

Illustration of how to set up an PPO algorithm in rlberry. The environment chosen here is PBALL2D environment.

from rlberry_research.agents.torch import PPOAgent
from rlberry_research.envs.benchmarks.ball_exploration import PBall2D

env = PBall2D()
n_steps = 3e3

agent = PPOAgent(env)

observation, info = env.reset()
for tt in range(200):
    action = agent.policy(observation)
    observation, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

video = env.save_video("_video/video_plot_ppo.mp4")

Total running time of the script: (0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery