Gallery of examples¶
Compare PPO and A2C on Acrobot with AdaStop
Record reward during training and then plot it
Illustration of plotting tools on Bandits
Illustration of rlberry environments¶
A demo of MountainCar environment
A demo of AppleGold environment
A demo of Acrobot environment with RSUCBVIAgent
A demo of Gridworld environment with ValueIterationAgent
A demo of twinrooms environment
A demo of OldGymCompatibilityWrapper with old_Acrobot environment
A demo of SpringCartPole environment with DQNAgent
A demo of ATARI Freeway environment with DQNAgent
A demo of ATARI Atlantis environment with vectorized PPOAgent
A demo of ATARI Breakout environment with vectorized PPOAgent
Illustration of rlberry agents¶
A demo of PPO algorithm in PBall2D environment
A demo of ValueIteration algorithm in Chain environment
A demo of RSUCBVI algorithm in MountainCar environment
A demo of A2C algorithm in PBall2D environment
A demo of MBQVI algorithm in Gridworld environment
A demo of RSKernelUCBVIAgent algorithm in Acrobot environment
A demo of DQN algorithm in CartPole environment
A demo of M-DQN algorithm in CartPole environment
Illustration of bandits in rlberry¶
Comparison of Thompson sampling and UCB on Bernoulli and Gaussian bandits
A demo of Bandit BAI on a real dataset to select mirrors
Comparison subplots of various index based bandits algorithms