How to seed your experiment¶
Why use seeding?¶
First, maybe you don’t know what seeding are, and why you should use it.
The seeding is the process of initializing a random number generator with a specific value (the seed) to ensure that the sequence of random numbers it produces is repeatable.
Here are 3 reasons why seeding is important
Reproducibility
Seeding ensures that the sequence of random numbers (or any other stochastic process) generated by your algorithm is repeatable. This is essential for debugging, testing, and validating your algorithm. When testing an algorithm that uses random sampling, if you encounter an issue, you need to reproduce the exact sequence of operations that led to the issue. By using a fixed seed, you can guarantee that the same random numbers are generated each time, allowing you to trace and fix the problem consistently.
Another use of this reproducibility is to have a third party (e.g. reviewer at a conference) independently reproduce your work to assert that it indeed works as advertised.
Predictability
Predictability is important in scenarios where the behavior of your algorithm needs to be understood or communicated to stakeholders. If you’re developing a machine learning model that relies on randomized initialization, setting a seed ensures that every time you initialize the model, it starts with the same weights. This predictability helps in consistently evaluating model performance across different runs.
Fairness and Comparability
In research and competitive environments, it’s essential to ensure that all comparisons are fair. By using the same seed, different algorithms or models can be compared under identical conditions.
Example: When comparing the performance of different machine learning models, using a fixed seed ensures that each model is trained and tested on the same data splits, leading to fair and comparable performance metrics.
One good practice is to always document the seed value used in your experiments, tests, or production code. This helps in future debugging and verification.
Basics¶
Rlberry has a class Seeder that conveniently wraps a NumPy SeedSequence, and allows you to create independent random number generators for different objects and threads, using a single Seeder instance. It works as follows:
Suppose you want generate 5 random Integer between 0 and 9.
If you run this code many time, you should have different outputs.
from rlberry.seeding import Seeder
seeder = Seeder()
result_list = []
for _ in range(5):
result_list.append(seeder.rng.integers(10))
print(result_list)
run 1 :
[9, 3, 4, 8, 4]
run 2 :
[2, 0, 6, 3, 9]
run 3 :
[7, 3, 8, 1, 1]
But if you fix the seed as follow, and run it many time… You should have the same ‘random’ numbers every time :
from rlberry.seeding import Seeder
seeder = Seeder(123)
result_list = []
for _ in range(5):
result_list.append(seeder.rng.integers(10))
print(result_list)
run 1 :
[9, 1, 0, 7, 4]
run 2 :
[9, 1, 0, 7, 4]
run 3 :
[9, 1, 0, 7, 4]
In rlberry¶
classic usage¶
Each Seeder instance has a random number generator (rng), see here to check the methods available in rng.
Agent and Environment reseed(seeder)
functions should use seeder.spawn()
that allow to create new independent child generators from the same seeder. So it’s a good practice to use single seeder to reseed the Agent or Environment, and they will have their own seeder and rng.
When writing your own agents and inheriting from the Agent class, you should use agent.rng whenever you need to generate random numbers; the same applies to your environments. This is necessary to ensure reproducibility.
from rlberry.seeding import Seeder
seeder = Seeder(123) # seeder initialization
from rlberry.envs import gym_make
from rlberry_scool.agents import UCBVIAgent
env = gym_make("MountainCar-v0")
env.reseed(seeder) # seeder first use
agent = UCBVIAgent(env)
agent.reseed(seeder) # same seeder
# check that the generated numbers are different
print("env seeder: ", env.seeder)
print("random sample 1 from env rng: ", env.rng.normal())
print("random sample 2 from env rng: ", env.rng.normal())
print("agent seeder: ", agent.seeder)
print("random sample 1 from agent rng: ", agent.rng.normal())
print("random sample 2 from agent rng: ", agent.rng.normal())
env seeder: Seeder object with: SeedSequence(
entropy=123,
spawn_key=(0, 0),
)
random sample 1 from env rng: -1.567498838741829
random sample 2 from env rng: 0.6356604305460527
agent seeder: Seeder object with: SeedSequence(
entropy=123,
spawn_key=(0, 1),
n_children_spawned=2,
)
random sample 1 from agent rng: 1.2466559261185188
random sample 2 from agent rng: 0.8402527193117317
With ExperimentManager¶
For this part we will use the same code from the ExperimentManager part. 3 runs without the seeder :
from rlberry.envs import gym_make
from rlberry.agents.stable_baselines import StableBaselinesAgent
from stable_baselines3 import PPO
from rlberry.manager import ExperimentManager, evaluate_agents
env_id = "CartPole-v1" # Id of the environment
env_ctor = gym_make # constructor for the env
env_kwargs = dict(id=env_id) # give the id of the env inside the kwargs
first_experiment = ExperimentManager(
StableBaselinesAgent, # Agent Class
(env_ctor, env_kwargs), # Environment as Tuple(constructor,kwargs)
fit_budget=int(100), # Budget used to call our agent "fit()"
init_kwargs=dict(algo_cls=PPO), # Init value for StableBaselinesAgent
eval_kwargs=dict(
eval_horizon=1000
), # Arguments required to call rlberry.agents.agent.Agent.eval().
n_fit=1, # Number of agent instances to fit.
agent_name="PPO_first_experiment" + env_id, # Name of the agent
)
first_experiment.fit()
output = evaluate_agents(
[first_experiment], n_simulations=5, plot=False
) # evaluate the experiment on 5 simulations
print(output)
Run 1:
[INFO] 14:47: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:47: ... trained!
[INFO] 14:47: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:..... Evaluation finished
PPO_first_experimentCartPole-v1
0 20.8
1 20.8
2 21.4
3 24.3
4 28.8
Run 2 :
[INFO] 14:47: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:47: ... trained!
[INFO] 14:47: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:..... Evaluation finished
PPO_first_experimentCartPole-v1
0 25.0
1 19.3
2 28.5
3 26.1
4 19.0
Run 3 :
[INFO] 14:47: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:47: ... trained!
[INFO] 14:47: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:..... Evaluation finished
PPO_first_experimentCartPole-v1
0 23.6
1 19.2
2 20.5
3 19.8
4 16.5
Without the seeder, the outputs are different (non-reproducible).
3 runs with the seeder :
from rlberry.envs import gym_make
from rlberry.agents.stable_baselines import StableBaselinesAgent
from stable_baselines3 import PPO
from rlberry.manager import ExperimentManager, evaluate_agents
from rlberry.seeding import Seeder
seeder = Seeder(42)
env_id = "CartPole-v1" # Id of the environment
env_ctor = gym_make # constructor for the env
env_kwargs = dict(id=env_id) # give the id of the env inside the kwargs
first_experiment = ExperimentManager(
StableBaselinesAgent, # Agent Class
(env_ctor, env_kwargs), # Environment as Tuple(constructor,kwargs)
fit_budget=int(100), # Budget used to call our agent "fit()"
init_kwargs=dict(algo_cls=PPO), # Init value for StableBaselinesAgent
eval_kwargs=dict(
eval_horizon=1000
), # Arguments required to call rlberry.agents.agent.Agent.eval().
n_fit=1, # Number of agent instances to fit.
agent_name="PPO_first_experiment" + env_id, # Name of the agent
seed=seeder,
)
first_experiment.fit()
output = evaluate_agents(
[first_experiment], n_simulations=5, plot=False
) # evaluate the experiment on 5 simulations
print(output)
Run 1:
[INFO] 14:46: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:46: ... trained!
[INFO] 14:46: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:..... Evaluation finished
PPO_first_experimentCartPole-v1
0 23.3
1 19.7
2 23.0
3 18.8
4 19.7
Run 2 :
[INFO] 14:46: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:46: ... trained!
[INFO] 14:46: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:..... Evaluation finished
PPO_first_experimentCartPole-v1
0 23.3
1 19.7
2 23.0
3 18.8
4 19.7
Run 3 :
[INFO] 14:46: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:46: ... trained!
[INFO] 14:46: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:..... Evaluation finished
PPO_first_experimentCartPole-v1
0 23.3
1 19.7
2 23.0
3 18.8
4 19.7
With the seeder, the outputs are the same (reproducible).
multi-threading¶
If you want use multi-threading, a seeder can spawn other seeders that are independent from it. This is useful to seed two different threads, using seeder1 in the first thread, and seeder2 in the second thread.
from rlberry.seeding import Seeder
seeder = Seeder(123)
seeder1, seeder2 = seeder.spawn(2)
print("random sample 1 from seeder1 rng: ", seeder1.rng.normal())
print("random sample 2 from seeder1 rng: ", seeder1.rng.normal())
print("-----")
print("random sample 1 from seeder2 rng: ", seeder2.rng.normal())
print("random sample 2 from seeder2 rng: ", seeder2.rng.normal())
random sample 1 from seeder1 rng: -0.4732958445958833
random sample 2 from seeder1 rng: 0.5863995575997462
-----
random sample 1 from seeder2 rng: -0.1722486099076424
random sample 2 from seeder2 rng: -0.1930990650226178
External libraries¶
You can also use a tool to seed some external libraries using the method set_external_seed
.
(currently only torch)
It will be useful if you want reproducibility with external libraries. In this example, we will use torch
to generate random numbers.
If you run this code many time, you should have different outputs.
import torch
result_list = []
for i in range(5):
result_list.append(torch.randint(2**32, (1,))[0].item())
print(result_list)
run 1 :
[3817148928, 671396126, 2950680447, 791815335, 3335786391]
run 2 :
[82990446, 2463687945, 1829003305, 647811387, 3543380778]
run 3 :
[3887070615, 363268341, 3607514851, 3881090947, 1018754931]
If you add to this code a Seeder, use the set_external_seed
method, and re-run it, you should have the same ‘random’ numbers every time.
import torch
from rlberry.seeding import set_external_seed
from rlberry.seeding import Seeder
seeder = Seeder(123)
set_external_seed(seeder)
result_list = []
for i in range(5):
result_list.append(torch.randint(2**32, (1,))[0].item())
print(result_list)
run 1 :
[693246422, 3606543353, 433394544, 2194426398, 3928404622]
run 2 :
[693246422, 3606543353, 433394544, 2194426398, 3928404622]
run 3 :
[693246422, 3606543353, 433394544, 2194426398, 3928404622]
⚠ warning : If you fit an ExperimentManager with a torch agent, you don’t need to use the set_external_seed
, rlberry does it for you.⚠