(external_lib_page)= # How to use the external libraries [rlberry](https://github.com/rlberry-py/rlberry) is fully compatible with the following external RL libraries, and we provide you a quick introduction on how to incorporate them. ## Using rlberry and Gymnasium If you want to use [Gymnasium](https://gymnasium.farama.org/) environments with [rlberry](https://github.com/rlberry-py/rlberry), simply do the following: ```python from rlberry.envs import gym_make # wraps gym.make # for example, let's take CartPole env = gym_make("CartPole-v1") ``` This way, `env` **behaves exactly the same as the gym environment**, we simply replace the seeding function by `env.reseed()`, which ensures unified seeding and reproducibility when using rlberry. ## Using rlberry and Stable Baselines [Stable Baselines](https://github.com/DLR-RM/stable-baselines3) provides implementations of several Deep RL agents. [rlberry](https://github.com/rlberry-py/rlberry) provides a wrapper class for [Stable Baselines](https://github.com/DLR-RM/stable-baselines3) algorithms, which makes it easy to train several agents in parallel, optimize hyperparameters, visualize the results, etc... The example below shows a how to quickly train a StableBaselines3 A2C agent in just a few lines: ```python from rlberry.envs import gym_make from stable_baselines3 import A2C from rlberry.agents.stable_baselines import StableBaselinesAgent env_ctor, env_kwargs = gym_make, dict(id="CartPole-v1") env = env_ctor(**env_kwargs) agent = StableBaselinesAgent(env, A2C, "MlpPolicy", verbose=1) agent.fit(budget=1000) ``` There are two important implementation details to note: 1. Logging is configured with the same options as [Stable Baselines](https://github.com/DLR-RM/stable-baselines3). Under the hood, the rlberry_ Agent's writer is added as an output of the [Stable Baselines](https://github.com/DLR-RM/stable-baselines3) Logger. This means that all the metrics collected during training are automatically passed to rlberry_. 2. Saving and loading saves two files: the agent and the [Stable Baselines](https://github.com/DLR-RM/stable-baselines3) model. The [Stable Baselines](https://github.com/DLR-RM/stable-baselines3) algorithm class is a **required** parameter of the agent. In order to use it with ExperimentManagers, it must be included in the `init_kwargs` parameter. For example, below we use rlberry_ to train several instances of the A2C implementation of [Stable Baselines](https://github.com/DLR-RM/stable-baselines3) and evaluate two hyperparameter configurations. ```python class A2CAgent(StableBaselinesAgent): """A2C with hyperparameter optimization.""" name = "A2C" def __init__(self, env, **kwargs): super(A2CAgent, self).__init__(env, algo_cls=A2C, **kwargs) @classmethod def sample_parameters(cls, trial): learning_rate = trial.suggest_float("learning_rate", 1e-5, 1, log=True) ent_coef = trial.suggest_float("ent_coef", 0.00000001, 0.1, log=True) vf_coef = trial.suggest_float("vf_coef", 0, 1) normalize_advantage = trial.suggest_categorical( "normalize_advantage", [False, True] ) return dict( learning_rate=learning_rate, ent_coef=ent_coef, vf_coef=vf_coef, normalize_advantage=normalize_advantage, ) # Training several agents and comparing different hyperparams from rlberry.manager import ExperimentManager, MultipleManagers, evaluate_agents # Pass the wrapper directly with init_kwargs stats = ExperimentManager( StableBaselinesAgent, (env_ctor, env_kwargs), agent_name="A2C baseline", init_kwargs=dict(algo_cls=A2C, policy="MlpPolicy", verbose=1), fit_kwargs=dict(log_interval=1000), fit_budget=2500, eval_kwargs=dict(eval_horizon=400), n_fit=4, parallelization="process", output_dir="dev/stable_baselines", seed=123, ) # Pass a subclass for hyperparameter optimization stats_alternative = ExperimentManager( A2CAgent, (env_ctor, env_kwargs), agent_name="A2C optimized", init_kwargs=dict(policy="MlpPolicy", verbose=1), fit_kwargs=dict(log_interval=1000), fit_budget=2500, eval_kwargs=dict(eval_horizon=400), n_fit=4, parallelization="process", output_dir="dev/stable_baselines", seed=456, ) # Optimize hyperparams (600 seconds) stats_alternative.optimize_hyperparams( timeout=600, n_optuna_workers=2, n_fit=2, optuna_parallelization="process", fit_fraction=1.0, ) # Fit everything in parallel multimanagers = MultipleManagers() multimanagers.append(stats) multimanagers.append(stats_alternative) multimanagers.run() ```