Toggle Menu

Gallery of examples¶

Plot kernel functions

Plot kernel functions

Compare PPO and A2C on Acrobot with AdaStop

Compare PPO and A2C on Acrobot with AdaStop

Record reward during training and then plot it

Record reward during training and then plot it

Compare Bandit Algorithms

Compare Bandit Algorithms

Using multiple virtual environments with rlberry

Using multiple virtual environments with rlberry

Illustration of plotting tools on Bandits

Illustration of plotting tools on Bandits

A demo of Experiment Manager

A demo of Experiment Manager

Checkpointing

Illustration of rlberry environments¶

A demo of Chain environment

A demo of Chain environment

A demo of MountainCar environment

A demo of MountainCar environment

A demo of AppleGold environment

A demo of AppleGold environment

A demo of Acrobot environment with RSUCBVIAgent

A demo of Acrobot environment with RSUCBVIAgent

A demo of Gridworld environment with ValueIterationAgent

A demo of Gridworld environment with ValueIterationAgent

A demo of twinrooms environment

A demo of twinrooms environment

A demo of OldGymCompatibilityWrapper with old_Acrobot environment

A demo of OldGymCompatibilityWrapper with old_Acrobot environment

A demo of rooms environment

A demo of rooms environment

A demo of PBALL2D environment

A demo of PBALL2D environment

A demo of SpringCartPole environment with DQNAgent

A demo of SpringCartPole environment with DQNAgent

A demo of ATARI Freeway environment with DQNAgent

A demo of ATARI Freeway environment with DQNAgent

A demo of ATARI Atlantis environment with vectorized PPOAgent

A demo of ATARI Atlantis environment with vectorized PPOAgent

A demo of ATARI Breakout environment with vectorized PPOAgent

A demo of ATARI Breakout environment with vectorized PPOAgent

Illustration of rlberry agents¶

A demo of PPO algorithm in PBall2D environment

A demo of PPO algorithm in PBall2D environment

A demo of ValueIteration algorithm in Chain environment

A demo of ValueIteration algorithm in Chain environment

A demo of RSUCBVI algorithm in MountainCar environment

A demo of RSUCBVI algorithm in MountainCar environment

A demo of A2C algorithm in PBall2D environment

A demo of A2C algorithm in PBall2D environment

SAC Soft Actor-Critic

SAC Soft Actor-Critic

A demo of MBQVI algorithm in Gridworld environment

A demo of MBQVI algorithm in Gridworld environment

A demo of RSKernelUCBVIAgent algorithm in Acrobot environment

A demo of RSKernelUCBVIAgent algorithm in Acrobot environment

A demo of DQN algorithm in CartPole environment

A demo of DQN algorithm in CartPole environment

A demo of M-DQN algorithm in CartPole environment

A demo of M-DQN algorithm in CartPole environment

Illustration of bandits in rlberry¶

UCB Bandit cumulative regret

UCB Bandit cumulative regret

EXP3 Bandit cumulative regret

EXP3 Bandit cumulative regret

Comparison of Thompson sampling and UCB on Bernoulli and Gaussian bandits

Comparison of Thompson sampling and UCB on Bernoulli and Gaussian bandits

Comparison subplots of various index based bandits algorithms

Comparison subplots of various index based bandits algorithms

A demo of Bandit BAI on a real dataset to select mirrors

A demo of Bandit BAI on a real dataset to select mirrors

Gallery generated by Sphinx-Gallery