Gallery of examples¶
Compare PPO and A2C on Acrobot with AdaStop
Record reward during training and then plot it
Using multiple virtual environments with rlberry
Illustration of plotting tools on Bandits
Illustration of rlberry environments¶
A demo of MountainCar environment
A demo of AppleGold environment
A demo of Acrobot environment with RSUCBVIAgent
A demo of Gridworld environment with ValueIterationAgent
A demo of twinrooms environment
A demo of OldGymCompatibilityWrapper with old_Acrobot environment
A demo of SpringCartPole environment with DQNAgent
A demo of ATARI Freeway environment with DQNAgent
A demo of ATARI Atlantis environment with vectorized PPOAgent
A demo of ATARI Breakout environment with vectorized PPOAgent
Illustration of rlberry agents¶
A demo of PPO algorithm in PBall2D environment
A demo of ValueIteration algorithm in Chain environment
A demo of RSUCBVI algorithm in MountainCar environment
A demo of A2C algorithm in PBall2D environment
A demo of MBQVI algorithm in Gridworld environment
A demo of RSKernelUCBVIAgent algorithm in Acrobot environment
A demo of DQN algorithm in CartPole environment
A demo of M-DQN algorithm in CartPole environment
Illustration of bandits in rlberry¶
Comparison of Thompson sampling and UCB on Bernoulli and Gaussian bandits
Comparison subplots of various index based bandits algorithms
A demo of Bandit BAI on a real dataset to select mirrors