How to save/load an experiment

For this example, we’ll use the same code as ExperimentManager (from User Guide), and use the save and load functions.

how to save an experiment?

To save your experiment, you have to train it first (with fit()), then you just have to use the save() function.

Train the Agent :

from rlberry.envs import gym_make
from rlberry_scool.agents.tabular_rl import QLAgent
from rlberry.manager import ExperimentManager

from rlberry.seeding import Seeder

seeder = Seeder(123)  # seeder initialization

env_id = "FrozenLake-v1"  # Id of the environment
env_ctor = gym_make  # constructor for the env
env_kwargs = dict(
    id=env_id, is_slippery=False
)  # give the id of the env inside the kwargs


experiment_to_save = ExperimentManager(
    QLAgent,  # Agent Class
    (env_ctor, env_kwargs),  # Environment as Tuple(constructor,kwargs)
    init_kwargs=dict(
        gamma=0.95, alpha=0.8, exploration_type="epsilon", exploration_rate=0.25
    ),  # agent args
    fit_budget=int(300000),  # Budget used to call our agent "fit()"
    n_fit=1,  # Number of agent instances to fit.
    seed=seeder,  # to be reproducible
    agent_name="QL" + env_id,  # Name of the agent
    output_dir="./results/",  # where to store the outputs
)

experiment_to_save.fit()
print(experiment_to_save.get_agent_instances()[0].Q)  # print the content of the Q-table
[INFO] 11:11: Running ExperimentManager fit() for QLFrozenLake-v1 with n_fit = 1 and max_workers = None.
[INFO] 11:11:                                    agent_name     worker  episode_rewards  max_global_step
                                              QLFrozenLake-v1    0          0.0             178711
[INFO] 11:11: ... trained!
writers.py:108: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame(self._data[tag])], ignore_index=True)
[[0.73509189 0.77378094 0.77378094 0.73509189]
 [0.73509189 0.         0.81450625 0.77378094]
 [0.77378094 0.857375   0.77378094 0.81450625]
 [0.81450625 0.         0.77377103 0.77378092]
 [0.77378094 0.81450625 0.         0.73509189]
 [0.         0.         0.         0.        ]
 [0.         0.9025     0.         0.81450625]
 [0.         0.         0.         0.        ]
 [0.81450625 0.         0.857375   0.77378094]
 [0.81450625 0.9025     0.9025     0.        ]
 [0.857375   0.95       0.         0.857375  ]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.9025     0.95       0.857375  ]
 [0.9025     0.95       1.         0.9025    ]
 [0.         0.         0.         0.        ]]
[INFO] 11:11: Saved ExperimentManager(QLFrozenLake-v1) using pickle.

After this run, you can see the ‘print’ of the q-table. At the end of the fit, the data of this experiment are saved automatically. It will be saved according to the output_dir parameter (here ./results/). If you don’t specify the output_dir parameter, it will saved by default inside the rlberry_data/temp/ folder. (Or you can use temporary folder by importing tempfile library and using with tempfile.TemporaryDirectory() as tmpdir:)

In this folder, you should find :

  • manager_obj.pickle and folder agent_handler, the save of your experiment and your agent.

  • data.csv, the episodes result during the training process

How to load a previous experiment?

In this example you will load the experiment saved in the part 1.

To load an experiment previously saved, you need to :

  • Locate the file you want to load (you can use the tool function get_single_path_of_most_recently_trained_experiment_manager_obj_from_path to get the most recently saved manager_obj.pickle from a folder)

  • use the function load() from the class ExperimentManager.

from rlberry.envs import gym_make
from rlberry.manager.experiment_manager import ExperimentManager
from rlberry.utils import loading_tools


path_to_load = loading_tools.get_single_path_of_most_recently_trained_experiment_manager_obj_from_path(
    "results"
)  # find the path to the "manager_obj.pickle"

loaded_experiment_manager = ExperimentManager.load(path_to_load)  # load the experiment

print(
    loaded_experiment_manager.get_agent_instances()[0].Q
)  # print the content of the Q-table

If you want to test the agent from the loaded Experiment, you can add :

env_id = "FrozenLake-v1"  # Id of the environment
env_ctor = gym_make  # constructor for the env
env_kwargs = dict(
    id=env_id, is_slippery=False
)  # give the id of the env inside the kwargs
test_env = env_ctor(**env_kwargs)  # create the Environment

# test the agent of the experiment on the test environment
observation, info = test_env.reset()
for tt in range(50):
    action = loaded_experiment_manager.get_agent_instances()[0].policy(observation)
    next_observation, reward, terminated, truncated, info = test_env.step(action)
    done = terminated or truncated
    if done:
        if reward == 1:
            print("Success!")
            break
        else:
            print("Fail! Retry!")
            next_observation, info = test_env.reset()
    observation = next_observation
[[0.73509189 0.77378094 0.77378094 0.73509189]
 [0.73509189 0.         0.81450625 0.77378094]
 [0.77378094 0.857375   0.77378094 0.81450625]
 [0.81450625 0.         0.77377103 0.77378092]
 [0.77378094 0.81450625 0.         0.73509189]
 [0.         0.         0.         0.        ]
 [0.         0.9025     0.         0.81450625]
 [0.         0.         0.         0.        ]
 [0.81450625 0.         0.857375   0.77378094]
 [0.81450625 0.9025     0.9025     0.        ]
 [0.857375   0.95       0.         0.857375  ]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.9025     0.95       0.857375  ]
 [0.9025     0.95       1.         0.9025    ]
 [0.         0.         0.         0.        ]]
Success!

As you can see, we haven’t re-fit the experiment, and the q-table is the same as the one previously saved (and the Agent can finish the environment).

Other information

The save and load can be useful for :

  • you want to train your agent on a computer, and test/use it on others.

  • you have a long training, and you want to do some ‘checkpoints’.

  • you want to do the training in more than once (only if your agent has “fit(x) then fit(y), is the same as fit(x+y)”)

How to save/load an agent only? (advanced users)

We highly recommend to use the save/load with the ExperimentManager to have all the information (as above). But if you need to save only the agent for a specific case, you can do so :

Save the agent

To save an agent, you just have to call the method save("output_dir_path") of your train agent. Be careful, only the agent is save (not the training env)!

from rlberry.envs import gym_make
from rlberry_scool.agents.tabular_rl import QLAgent
from rlberry.seeding import Seeder

seeder = Seeder(500)  # seeder initialization
env_seed_max_value = 500

env_id = "FrozenLake-v1"  # Id of the environment

env = gym_make(env_id)
env.seed(int(seeder.rng.integers(env_seed_max_value)))
agent_to_train_and_save = QLAgent(
    env,
    gamma=0.95,
    alpha=0.8,
    exploration_type="epsilon",
    exploration_rate=0.25,
    seeder=seeder,
)
agent_to_train_and_save.fit(300000)  # Agent's training
print(agent_to_train_and_save.Q)  # print the content of the Q-table

agent_to_train_and_save.save("./results/")  # save the agent
[INFO] 11:28:                                 agent_name  worker  episode_rewards  max_global_step
                                                  QL        -1         0.0             195540
[[0.1830874  0.15802259 0.12087594 0.16358512]
 [0.         0.         0.         0.16674384]
 [0.10049071 0.09517673 0.11326436 0.07236883]
 [0.10552007 0.06660356 0.07020302 0.1104349 ]
 [0.23065463 0.         0.19028937 0.20689438]
 [0.         0.         0.         0.        ]
 [0.08408004 0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.17382279 0.         0.2417443 ]
 [0.         0.29498867 0.         0.        ]
 [0.46487572 0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.52043878 0.56986596 0.19259904]
 [0.57831479 0.6858159  0.22998936 0.39350426]
 [0.         0.         0.         0.        ]]

Load the agent

To load an agent, you should use its load() function. But be careful, you have to add some not-saved parameters (in this case, the environment). This parameters should be given through a dict.

# create a seeded env
env_for_loader = gym_make(env_id)
env_for_loader.seed(int(seeder.rng.integers(env_seed_max_value)))

# create the 'not-saved parameters' dict
params_for_loader = dict(env=env_for_loader)

# load the agent
loaded_agent = QLAgent.load("./results/", **params_for_loader)
print(loaded_agent.Q)  # print the content of the Q-table

# create a seeded test env
test_env = gym_make(env_id)
test_env.seed(int(seeder.rng.integers(env_seed_max_value)))

observation, info = test_env.reset()
for tt in range(50):
    action = loaded_agent.policy(observation)
    next_observation, reward, terminated, truncated, info = test_env.step(action)
    done = terminated or truncated
    if done:
        if reward == 1:
            print("Success!")
            break
        else:
            print("Fail! Retry!")
            next_observation, info = test_env.reset()
    observation = next_observation
[[0.1830874  0.15802259 0.12087594 0.16358512]
 [0.         0.         0.         0.16674384]
 [0.10049071 0.09517673 0.11326436 0.07236883]
 [0.10552007 0.06660356 0.07020302 0.1104349 ]
 [0.23065463 0.         0.19028937 0.20689438]
 [0.         0.         0.         0.        ]
 [0.08408004 0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.17382279 0.         0.2417443 ]
 [0.         0.29498867 0.         0.        ]
 [0.46487572 0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.52043878 0.56986596 0.19259904]
 [0.57831479 0.6858159  0.22998936 0.39350426]
 [0.         0.         0.         0.        ]]

Success!

This code show that the loaded agent contains all the necessary component needed to reuse the agent. (As you can see, we haven’t re-fit the agent, and the q-table is the same as the one previously saved)