How to seed your experiment¶

Why use seeding?¶

First, maybe you don’t know what seeding are, and why you should use it.

The seeding is the process of initializing a random number generator with a specific value (the seed) to ensure that the sequence of random numbers it produces is repeatable.

Here are 3 reasons why seeding is important

Reproducibility

Seeding ensures that the sequence of random numbers (or any other stochastic process) generated by your algorithm is repeatable. This is essential for debugging, testing, and validating your algorithm. When testing an algorithm that uses random sampling, if you encounter an issue, you need to reproduce the exact sequence of operations that led to the issue. By using a fixed seed, you can guarantee that the same random numbers are generated each time, allowing you to trace and fix the problem consistently.

Another use of this reproducibility is to have a third party (e.g. reviewer at a conference) independently reproduce your work to assert that it indeed works as advertised.

Predictability

Predictability is important in scenarios where the behavior of your algorithm needs to be understood or communicated to stakeholders. If you’re developing a machine learning model that relies on randomized initialization, setting a seed ensures that every time you initialize the model, it starts with the same weights. This predictability helps in consistently evaluating model performance across different runs.

Fairness and Comparability

In research and competitive environments, it’s essential to ensure that all comparisons are fair. By using the same seed, different algorithms or models can be compared under identical conditions.

Example: When comparing the performance of different machine learning models, using a fixed seed ensures that each model is trained and tested on the same data splits, leading to fair and comparable performance metrics.

One good practice is to always document the seed value used in your experiments, tests, or production code. This helps in future debugging and verification.

Basics¶

Rlberry has a class Seeder that conveniently wraps a NumPy SeedSequence, and allows you to create independent random number generators for different objects and threads, using a single Seeder instance. It works as follows:

Suppose you want generate 5 random Integer between 0 and 9.

If you run this code many time, you should have different outputs.

from rlberry.seeding import Seeder

seeder = Seeder()

result_list = []
for _ in range(5):
    result_list.append(seeder.rng.integers(10))
print(result_list)

run 1 :

[9, 3, 4, 8, 4]

run 2 :

[2, 0, 6, 3, 9]

run 3 :

[7, 3, 8, 1, 1]

But if you fix the seed as follow, and run it many time… You should have the same ‘random’ numbers every time :

from rlberry.seeding import Seeder

seeder = Seeder(123)

result_list = []
for _ in range(5):
    result_list.append(seeder.rng.integers(10))
print(result_list)

run 1 :

[9, 1, 0, 7, 4]

run 2 :

[9, 1, 0, 7, 4]

run 3 :

[9, 1, 0, 7, 4]

In rlberry¶

classic usage¶

Each Seeder instance has a random number generator (rng), see here to check the methods available in rng.

Agent and Environment reseed(seeder) functions should use seeder.spawn() that allow to create new independent child generators from the same seeder. So it’s a good practice to use single seeder to reseed the Agent or Environment, and they will have their own seeder and rng.

When writing your own agents and inheriting from the Agent class, you should use agent.rng whenever you need to generate random numbers; the same applies to your environments. This is necessary to ensure reproducibility.

from rlberry.seeding import Seeder

seeder = Seeder(123)  # seeder initialization

from rlberry.envs import gym_make
from rlberry_scool.agents import UCBVIAgent


env = gym_make("MountainCar-v0")
env.reseed(seeder)  # seeder first use

agent = UCBVIAgent(env)
agent.reseed(seeder)  # same seeder

# check that the generated numbers are different
print("env seeder: ", env.seeder)
print("random sample 1 from env rng: ", env.rng.normal())
print("random sample 2 from env rng: ", env.rng.normal())
print("agent seeder: ", agent.seeder)
print("random sample 1 from agent rng: ", agent.rng.normal())
print("random sample 2 from agent rng: ", agent.rng.normal())

env seeder:  Seeder object with: SeedSequence(
    entropy=123,
    spawn_key=(0, 0),
)
random sample 1 from env rng:  -1.567498838741829
random sample 2 from env rng:  0.6356604305460527
agent seeder:  Seeder object with: SeedSequence(
    entropy=123,
    spawn_key=(0, 1),
    n_children_spawned=2,
)
random sample 1 from agent rng:  1.2466559261185188
random sample 2 from agent rng:  0.8402527193117317

With ExperimentManager¶

For this part we will use the same code from the ExperimentManager part. 3 runs without the seeder :

from rlberry.envs import gym_make
from rlberry.agents.stable_baselines import StableBaselinesAgent
from stable_baselines3 import PPO
from rlberry.manager import ExperimentManager, evaluate_agents


env_id = "CartPole-v1"  # Id of the environment

env_ctor = gym_make  # constructor for the env
env_kwargs = dict(id=env_id)  # give the id of the env inside the kwargs


first_experiment = ExperimentManager(
    StableBaselinesAgent,  # Agent Class
    (env_ctor, env_kwargs),  # Environment as Tuple(constructor,kwargs)
    fit_budget=int(100),  # Budget used to call our agent "fit()"
    init_kwargs=dict(algo_cls=PPO),  # Init value for StableBaselinesAgent
    eval_kwargs=dict(
        eval_horizon=1000
    ),  # Arguments required to call rlberry.agents.agent.Agent.eval().
    n_fit=1,  # Number of agent instances to fit.
    agent_name="PPO_first_experiment" + env_id,  # Name of the agent
)

first_experiment.fit()

output = evaluate_agents(
    [first_experiment], n_simulations=5, plot=False
)  # evaluate the experiment on 5 simulations
print(output)

Run 1:

[INFO] 14:47: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:47: ... trained!
[INFO] 14:47: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:.....  Evaluation finished
   PPO_first_experimentCartPole-v1
0                             20.8
1                             20.8
2                             21.4
3                             24.3
4                             28.8

Run 2 :

[INFO] 14:47: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:47: ... trained!
[INFO] 14:47: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:.....  Evaluation finished
   PPO_first_experimentCartPole-v1
0                             25.0
1                             19.3
2                             28.5
3                             26.1
4                             19.0

Run 3 :

[INFO] 14:47: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:47: ... trained!
[INFO] 14:47: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:.....  Evaluation finished
   PPO_first_experimentCartPole-v1
0                             23.6
1                             19.2
2                             20.5
3                             19.8
4                             16.5

Without the seeder, the outputs are different (non-reproducible).

3 runs with the seeder :

from rlberry.envs import gym_make
from rlberry.agents.stable_baselines import StableBaselinesAgent
from stable_baselines3 import PPO
from rlberry.manager import ExperimentManager, evaluate_agents

from rlberry.seeding import Seeder

seeder = Seeder(42)

env_id = "CartPole-v1"  # Id of the environment

env_ctor = gym_make  # constructor for the env
env_kwargs = dict(id=env_id)  # give the id of the env inside the kwargs


first_experiment = ExperimentManager(
    StableBaselinesAgent,  # Agent Class
    (env_ctor, env_kwargs),  # Environment as Tuple(constructor,kwargs)
    fit_budget=int(100),  # Budget used to call our agent "fit()"
    init_kwargs=dict(algo_cls=PPO),  # Init value for StableBaselinesAgent
    eval_kwargs=dict(
        eval_horizon=1000
    ),  # Arguments required to call rlberry.agents.agent.Agent.eval().
    n_fit=1,  # Number of agent instances to fit.
    agent_name="PPO_first_experiment" + env_id,  # Name of the agent
    seed=seeder,
)

first_experiment.fit()

output = evaluate_agents(
    [first_experiment], n_simulations=5, plot=False
)  # evaluate the experiment on 5 simulations
print(output)

Run 1:

[INFO] 14:46: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:46: ... trained!
[INFO] 14:46: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:.....  Evaluation finished
   PPO_first_experimentCartPole-v1
0                             23.3
1                             19.7
2                             23.0
3                             18.8
4                             19.7

Run 2 :

[INFO] 14:46: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:46: ... trained!
[INFO] 14:46: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:.....  Evaluation finished
   PPO_first_experimentCartPole-v1
0                             23.3
1                             19.7
2                             23.0
3                             18.8
4                             19.7

Run 3 :

[INFO] 14:46: Running ExperimentManager fit() for PPO_first_experimentCartPole-v1 with n_fit = 1 and max_workers = None.
[INFO] 14:46: ... trained!
[INFO] 14:46: Evaluating PPO_first_experimentCartPole-v1...
[INFO] Evaluation:.....  Evaluation finished
   PPO_first_experimentCartPole-v1
0                             23.3
1                             19.7
2                             23.0
3                             18.8
4                             19.7

With the seeder, the outputs are the same (reproducible).

multi-threading¶

If you want use multi-threading, a seeder can spawn other seeders that are independent from it. This is useful to seed two different threads, using seeder1 in the first thread, and seeder2 in the second thread.

from rlberry.seeding import Seeder

seeder = Seeder(123)
seeder1, seeder2 = seeder.spawn(2)

print("random sample 1 from seeder1 rng: ", seeder1.rng.normal())
print("random sample 2 from seeder1 rng: ", seeder1.rng.normal())
print("-----")
print("random sample 1 from seeder2 rng: ", seeder2.rng.normal())
print("random sample 2 from seeder2 rng: ", seeder2.rng.normal())

random sample 1 from seeder1 rng:  -0.4732958445958833
random sample 2 from seeder1 rng:  0.5863995575997462
-----
random sample 1 from seeder2 rng:  -0.1722486099076424
random sample 2 from seeder2 rng:  -0.1930990650226178

External libraries¶

You can also use a tool to seed some external libraries using the method set_external_seed. (currently only torch)

It will be useful if you want reproducibility with external libraries. In this example, we will use torch to generate random numbers.

If you run this code many time, you should have different outputs.

import torch

result_list = []
for i in range(5):
    result_list.append(torch.randint(2**32, (1,))[0].item())

print(result_list)

run 1 :

[3817148928, 671396126, 2950680447, 791815335, 3335786391]

run 2 :

[82990446, 2463687945, 1829003305, 647811387, 3543380778]

run 3 :

[3887070615, 363268341, 3607514851, 3881090947, 1018754931]

If you add to this code a Seeder, use the set_external_seed method, and re-run it, you should have the same ‘random’ numbers every time.

import torch
from rlberry.seeding import set_external_seed
from rlberry.seeding import Seeder

seeder = Seeder(123)

set_external_seed(seeder)
result_list = []
for i in range(5):
    result_list.append(torch.randint(2**32, (1,))[0].item())

print(result_list)

run 1 :

[693246422, 3606543353, 433394544, 2194426398, 3928404622]

run 2 :

[693246422, 3606543353, 433394544, 2194426398, 3928404622]

run 3 :

[693246422, 3606543353, 433394544, 2194426398, 3928404622]

⚠ warning : If you fit an ExperimentManager with a torch agent, you don’t need to use the set_external_seed, rlberry does it for you.⚠