Changelog

Dev version

nothing

Version 0.7.3

PR #454

  • remove unused librairies

PR #451

  • Moving UCBVI to rlberry_scool

PR #438

  • move long tests to rlberry research

PR #436 #444 #445 #447 #448 #455 #456

  • Update user guide

  • add tests on the userguide examples

  • removing rlberry_research references as much as possible (doc and code)

Version 0.7.1

PR #411

  • Moving “rendering” to rlberry

PR #405 #406 #408

  • fix plots

PR #404

  • add AdaStop

Version 0.7.0

PR #397

  • Automatic save after fit() in ExperienceManager

PR #396

  • Improve coverage and fix version workflow

PR #385 to #390

  • Switch from RTD to github page

PR #382

  • switch to poetry

PR #379

  • rlberry: everything for rl that is not an agent or an environment, e.g. experiment management, parallelization, statistical tools, plotting…

  • rlberry-scool: repository for teaching materials, e.g. simplified algorithms for teaching, notebooks for tutorials for learning RL…

  • rlberry-research: repository of agents and environments used inside Inria Scool team

PR #376

  • New plot_writer_data function that does not depend on seaborn and that can plot smoothed function and confidence band if scikit-fda is installed.

Version 0.6.0

PR #276

  • Non adaptive multiple tests for agent comparison.

PR #365

  • Fix Sphinx version to <7.

PR #350

  • Rename AgentManager to ExperimentManager.

PR #326

  • Moved SAC from experimental to torch agents. Tested and benchmarked.

PR #335

  • Upgrade from Python3.9 -> python3.10

Version 0.5.0

PR #281, #323

  • Merge gymnasium branch into main, make gymnasium the default library for environments in rlberry.

Version 0.4.1

PR #318

  • Update to allow the training on a computer with GPU, save the agents, then load it on a computer without GPU.

PR #308

  • Update make_atari_env and PPO to be compatible together and use vectorized env (PPO manage the vector)

PR #298

  • Move old scripts (jax agents, attention networks, old examples…) that we won’t maintain from the main branch to an archive branch.

PR #277

  • Add and update code to use “Atari games” env

PR #281

  • New branch for code compatible with Gymnasium

Version 0.4.0

PR #273

  • Change the default behavior of plot_writer_data so that if seaborn has version >= 0.12.0 then a 90% percentile interval is used instead of sd.

PR #269

PR #262

  • PPO can now handle continuous actions.

PR #261, #264

  • Implementation of Munchausen DQN in rlberry.agents.torch.MDQNAgent.

  • Comparison of MDQN with DQN agent in the long tests.

PR #244, #250, #253

  • Compress the pickles used to save the trained agents.

PR #235

  • Implementation of rlberry.envs.SpringCartPole environment, an RL environment featuring two cartpoles linked by a spring.

PR #226, #227

  • Improve logging, the logging level can now be changed with rlberry.utils.logging.set_level().

  • Introduce smoothing in curves done with plot_writer_data when only one seed is used.

PR #223

  • Moved PPO from experimental to torch agents. Tested and benchmarked.

Version 0.3.0

PR #206

  • Creation of a Deep RL tutorial, in the user guide.

PR #132

  • New tracker class rlberry.agents.bandit.tools.BanditTracker to track statistics to be used in Bandit algorithms.

PR #191

  • Possibility to generate a profile with rlberry.agents.manager.ExperimentManager.

PR #148, #161, #180

PR #119

  • Improving documentation for agents.torch.utils

  • New replay buffer rlberry.agents.utils.replay.ReplayBuffer, aiming to replace code in utils/memories.py

  • New DQN implementation, aiming to fix reproducibility and compatibility issues.

  • Implements Q(lambda) in DQN Agent.

Feb 22, 2022 (PR #126)

  • Setup rlberry.__version__ (currently 0.3.0dev0)

  • Record rlberry version in a ExperimentManager attribute equality of ExperimentManagers

  • Override __eq__ method of the ExperimentManager class.

Feb 14-15, 2022 (PR #97, #118)

  • (feat) Add Bandits basic environments and agents. See IndexAgent and Bandit.

  • Thompson Sampling bandit algorithm with gaussian or beta prior.

  • Base class for bandits algorithms with custom save & load functions (called BanditWithSimplePolicy)

Feb 11, 2022 (#83, #95)

  • (fix) Fixed bug in FiniteMDP.sample(): terminal state was being checked with self.state instead of given state

  • (feat) Option to use ‘fork’ or ‘spawn’ in ExperimentManager

  • (feat) ExperimentManager output_dir now has a timestamp and a short ID by default.

  • (feat) Gridworld can be constructed from string layout

  • (feat) max_workers argument for ExperimentManager to control the maximum number of processes/threads created by the fit() method.

Feb 04, 2022

Version 0.2.1

  • Agent and ExperimentManager both have a unique_id attribute (useful for creating unique output files/directories).

  • DefaultWriter is now initialized in base class Agent and (optionally) wraps a tensorboard SummaryWriter.

  • ExperimentManager has an option enable_tensorboard that activates tensorboard logging in each of its Agents (with their writer attribute). The log_dirs of tensorboard are automatically assigned by ExperimentManager.

  • RemoteExperimentManager receives tensorboard data created in the server, when the method get_writer_data() is called. This is done by a zip file transfer with network.

  • BaseWrapper and gym_make now have an option wrap_spaces. If set to True, this option converts gym.spaces to rlberry.spaces, which provides classes with better seeding (using numpy’s default_rng instead of RandomState)

  • ExperimentManager: new method get_agent_instances() that returns trained instances

  • plot_writer_data: possibility to set xtag (tag used for x-axis)

  • Fixed agent initialization bug in AgentHandler (eval_env missing in kwargs for agent_class).

Version 0.2

  • AgentStats renamed to ExperimentManager.

  • ExperimentManager can handle agents that cannot be pickled.

  • Agent interface requires eval() method instead of policy() to handle more general agents (e.g. reward-free, POMDPs etc).

  • Multi-processing and multi-threading are now done with ProcessPoolExecutor and ThreadPoolExecutor (allowing nested processes for example). Processes are created with spawn (jax does not work with fork, see #51).

  • JAX implementation of DQN and replay buffer using reverb (experimental).

  • network: server and client interfaces to exchange messages via sockets (experimental).

  • RemoteExperimentManager to train agents in a remote server and gather the results locally (experimental).

  • Fix rendering bug with OpenGL