(save_load_page)= # How to save/load an experiment For this example, we'll use the same code as [ExperimentManager](ExperimentManager_page) (from User Guide), and use the save and load functions. ## how to save an experiment? To save your experiment, you have to train it first (with `fit()`), then you just have to use the `save()` function. Train the Agent : ```python from rlberry.envs import gym_make from rlberry_scool.agents.tabular_rl import QLAgent from rlberry.manager import ExperimentManager from rlberry.seeding import Seeder seeder = Seeder(123) # seeder initialization env_id = "FrozenLake-v1" # Id of the environment env_ctor = gym_make # constructor for the env env_kwargs = dict( id=env_id, is_slippery=False ) # give the id of the env inside the kwargs experiment_to_save = ExperimentManager( QLAgent, # Agent Class (env_ctor, env_kwargs), # Environment as Tuple(constructor,kwargs) init_kwargs=dict( gamma=0.95, alpha=0.8, exploration_type="epsilon", exploration_rate=0.25 ), # agent args fit_budget=int(300000), # Budget used to call our agent "fit()" n_fit=1, # Number of agent instances to fit. seed=seeder, # to be reproducible agent_name="QL" + env_id, # Name of the agent output_dir="./results/", # where to store the outputs ) experiment_to_save.fit() print(experiment_to_save.get_agent_instances()[0].Q) # print the content of the Q-table ``` ```none [INFO] 11:11: Running ExperimentManager fit() for QLFrozenLake-v1 with n_fit = 1 and max_workers = None. [INFO] 11:11: agent_name worker episode_rewards max_global_step QLFrozenLake-v1 0 0.0 178711 [INFO] 11:11: ... trained! writers.py:108: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. df = pd.concat([df, pd.DataFrame(self._data[tag])], ignore_index=True) [[0.73509189 0.77378094 0.77378094 0.73509189] [0.73509189 0. 0.81450625 0.77378094] [0.77378094 0.857375 0.77378094 0.81450625] [0.81450625 0. 0.77377103 0.77378092] [0.77378094 0.81450625 0. 0.73509189] [0. 0. 0. 0. ] [0. 0.9025 0. 0.81450625] [0. 0. 0. 0. ] [0.81450625 0. 0.857375 0.77378094] [0.81450625 0.9025 0.9025 0. ] [0.857375 0.95 0. 0.857375 ] [0. 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0.9025 0.95 0.857375 ] [0.9025 0.95 1. 0.9025 ] [0. 0. 0. 0. ]] [INFO] 11:11: Saved ExperimentManager(QLFrozenLake-v1) using pickle. ``` After this run, you can see the 'print' of the q-table. At the end of the fit, the data of this experiment are saved automatically. It will be saved according to the `output_dir` parameter (here `./results/`). If you don't specify the `output_dir` parameter, it will saved by default inside the `rlberry_data/temp/` folder. (Or you can use temporary folder by importing [tempfile](https://docs.python.org/3/library/tempfile.html) library and using `with tempfile.TemporaryDirectory() as tmpdir:`) In this folder, you should find : - `manager_obj.pickle` and folder `agent_handler`, the save of your experiment and your agent. - `data.csv`, the episodes result during the training process ## How to load a previous experiment? In this example you will load the experiment saved in the part 1. To load an experiment previously saved, you need to : - Locate the file you want to load (you can use the tool function `get_single_path_of_most_recently_trained_experiment_manager_obj_from_path` to get the most recently saved `manager_obj.pickle` from a folder) - use the function `load()` from the class [ExperimentManager](rlberry.manager.ExperimentManager.load). ```python from rlberry.envs import gym_make from rlberry.manager.experiment_manager import ExperimentManager from rlberry.utils import loading_tools path_to_load = loading_tools.get_single_path_of_most_recently_trained_experiment_manager_obj_from_path( "results" ) # find the path to the "manager_obj.pickle" loaded_experiment_manager = ExperimentManager.load(path_to_load) # load the experiment print( loaded_experiment_manager.get_agent_instances()[0].Q ) # print the content of the Q-table ``` If you want to test the agent from the loaded Experiment, you can add : ```python env_id = "FrozenLake-v1" # Id of the environment env_ctor = gym_make # constructor for the env env_kwargs = dict( id=env_id, is_slippery=False ) # give the id of the env inside the kwargs test_env = env_ctor(**env_kwargs) # create the Environment # test the agent of the experiment on the test environment observation, info = test_env.reset() for tt in range(50): action = loaded_experiment_manager.get_agent_instances()[0].policy(observation) next_observation, reward, terminated, truncated, info = test_env.step(action) done = terminated or truncated if done: if reward == 1: print("Success!") break else: print("Fail! Retry!") next_observation, info = test_env.reset() observation = next_observation ``` ```none [[0.73509189 0.77378094 0.77378094 0.73509189] [0.73509189 0. 0.81450625 0.77378094] [0.77378094 0.857375 0.77378094 0.81450625] [0.81450625 0. 0.77377103 0.77378092] [0.77378094 0.81450625 0. 0.73509189] [0. 0. 0. 0. ] [0. 0.9025 0. 0.81450625] [0. 0. 0. 0. ] [0.81450625 0. 0.857375 0.77378094] [0.81450625 0.9025 0.9025 0. ] [0.857375 0.95 0. 0.857375 ] [0. 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0.9025 0.95 0.857375 ] [0.9025 0.95 1. 0.9025 ] [0. 0. 0. 0. ]] Success! ``` As you can see, we haven't re-fit the experiment, and the q-table is the same as the one previously saved (and the Agent can finish the environment). ## Other information The `save` and `load` can be useful for : - you want to train your agent on a computer, and test/use it on others. - you have a long training, and you want to do some 'checkpoints'. - you want to do the training in more than once (only if your agent has "fit(x) then fit(y), is the same as fit(x+y)") ## How to save/load an agent only? (advanced users) We highly recommend to use the save/load with the `ExperimentManager` to have all the information (as above). But if you need to save only the agent for a specific case, you can do so : ### Save the agent To save an agent, you just have to call the method `save("output_dir_path")` of your train agent. Be careful, only the agent is save (not the training env)! ```python from rlberry.envs import gym_make from rlberry_scool.agents.tabular_rl import QLAgent from rlberry.seeding import Seeder seeder = Seeder(500) # seeder initialization env_seed_max_value = 500 env_id = "FrozenLake-v1" # Id of the environment env = gym_make(env_id) env.seed(int(seeder.rng.integers(env_seed_max_value))) agent_to_train_and_save = QLAgent( env, gamma=0.95, alpha=0.8, exploration_type="epsilon", exploration_rate=0.25, seeder=seeder, ) agent_to_train_and_save.fit(300000) # Agent's training print(agent_to_train_and_save.Q) # print the content of the Q-table agent_to_train_and_save.save("./results/") # save the agent ``` ```none [INFO] 11:28: agent_name worker episode_rewards max_global_step QL -1 0.0 195540 [[0.1830874 0.15802259 0.12087594 0.16358512] [0. 0. 0. 0.16674384] [0.10049071 0.09517673 0.11326436 0.07236883] [0.10552007 0.06660356 0.07020302 0.1104349 ] [0.23065463 0. 0.19028937 0.20689438] [0. 0. 0. 0. ] [0.08408004 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0.17382279 0. 0.2417443 ] [0. 0.29498867 0. 0. ] [0.46487572 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0.52043878 0.56986596 0.19259904] [0.57831479 0.6858159 0.22998936 0.39350426] [0. 0. 0. 0. ]] ``` ### Load the agent To load an agent, you should use its `load()` function. But be careful, you have to add some not-saved parameters (in this case, the environment). This parameters should be given through a `dict`. ```python # create a seeded env env_for_loader = gym_make(env_id) env_for_loader.seed(int(seeder.rng.integers(env_seed_max_value))) # create the 'not-saved parameters' dict params_for_loader = dict(env=env_for_loader) # load the agent loaded_agent = QLAgent.load("./results/", **params_for_loader) print(loaded_agent.Q) # print the content of the Q-table # create a seeded test env test_env = gym_make(env_id) test_env.seed(int(seeder.rng.integers(env_seed_max_value))) observation, info = test_env.reset() for tt in range(50): action = loaded_agent.policy(observation) next_observation, reward, terminated, truncated, info = test_env.step(action) done = terminated or truncated if done: if reward == 1: print("Success!") break else: print("Fail! Retry!") next_observation, info = test_env.reset() observation = next_observation ``` ```none [[0.1830874 0.15802259 0.12087594 0.16358512] [0. 0. 0. 0.16674384] [0.10049071 0.09517673 0.11326436 0.07236883] [0.10552007 0.06660356 0.07020302 0.1104349 ] [0.23065463 0. 0.19028937 0.20689438] [0. 0. 0. 0. ] [0.08408004 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0.17382279 0. 0.2417443 ] [0. 0.29498867 0. 0. ] [0.46487572 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0. 0. 0. ] [0. 0.52043878 0.56986596 0.19259904] [0.57831479 0.6858159 0.22998936 0.39350426] [0. 0. 0. 0. ]] Success! ``` This code show that the loaded agent contains all the necessary component needed to reuse the agent. (As you can see, we haven't re-fit the agent, and the q-table is the same as the one previously saved)