rlberry.manager.AdastopComparator

class rlberry.manager.AdastopComparator(n=5, K=5, B=10000, comparisons=None, alpha=0.01, beta=0, seed=None)[source]

Bases: MultipleAgentsComparator

Compare sequentially agents, with possible early stopping. At maximum, there can be n times K fits done.

See adastop library for more details (https://github.com/TimotheeMathieu/adastop)

Parameters:
n: int, or array of ints of size self.n_agents, default=5

If int, number of fits before each early stopping check. If array of int, a different number of fits is used for each agent.

K: int, default=5

number of check.

B: int, default=None

Number of random permutations used to approximate permutation distribution.

comparisons: list of tuple of indices or None

if None, all the pairwise comparison are done. If = [(0,1), (0,2)] for instance, the compare only 0 vs 1 and 0 vs 2

alpha: float, default=0.01

level of the test

beta: float, default=0

power spent in early accept.

seed: int or None, default = None
Attributes:
agent_names: list of str

list of the agents’ names.

managers_paths: dictionary

managers_paths[agent_name] is a list of the paths to the trained experiment managers. Can be loaded with ExperimentManager.load.

decision: dict

decision of the tests for each comparison, keys are the comparisons and values are in {“equal”, “larger”, “smaller”}.

n_iters: dict

number of iterations (i.e. number of fits) used for each agent. Keys are the agents’ names and values are ints.

Methods

compare(manager_list[, n_evaluations, verbose])

Run Adastop on the managers from manager_list

compute_mean_diffs(k, Z)

Compute the absolute value of the sum differences.

get_results()

Returns a dataframe with the results of the tests.

partial_compare(eval_values[, verbose])

Do the test of the k^th interim.

plot_results([agent_names, axes])

visual representation of results.

plot_results_sota([agent_names, axes])

visual representation of results when the first agent is compared to all the others.

print_results()

Print the results of the test.

compare(manager_list, n_evaluations=50, verbose=True)[source]

Run Adastop on the managers from manager_list

Parameters:
manager_list: list of ExperimentManager kwargs

List of manager containing agents we want to compare.

n_evaluations: int, default = 50

number of evaluations used to estimate the score used for AdaStop.

verbose: bool

Print Steps.

Returns
——-
decisions: dictionary with comparisons as index and with values str in {“equal”, “larger”, “smaller”, “continue”}

Decision of the test at this step.

compute_mean_diffs(k, Z)

Compute the absolute value of the sum differences.

get_results()

Returns a dataframe with the results of the tests.

partial_compare(eval_values, verbose=True)

Do the test of the k^th interim.

Parameters:
eval_values: dict of agents and evaluations

keys are agent names and values are concatenation of evaluations till interim k, e.g. {“PP0”: [1,1,1,1,1], “SAC”: [42,42,42,42,42]}

verbose: bool

print Steps

Returns
——-
decisions: dictionary with comparisons as index and with values str in {“equal”, “larger”, “smaller”, “continue”}

Decision of the test at this step.

id_finished: bool

Whether the test is finished or not.

T: float

Test statistic.

bk: float

Thresholds of the tests.

plot_results(agent_names=None, axes=None)

visual representation of results.

Parameters:
agent_nameslist of str or None
axestuple of two matplotlib axes of None

if None, use the following: fig, (ax1, ax2) = plt.subplots(2, 1, gridspec_kw={“height_ratios”: [1, 2]}, figsize=(6,5))

plot_results_sota(agent_names=None, axes=None)

visual representation of results when the first agent is compared to all the others.

Parameters:
agent_nameslist of str or None
axestuple of two matplotlib axes of None

if None, use the following: fig, (ax1, ax2) = plt.subplots(2, 1, gridspec_kw={“height_ratios”: [1, 2]}, figsize=(6,5))

print_results()[source]

Print the results of the test.

Examples using rlberry.manager.AdastopComparator

Compare PPO and A2C on Acrobot with AdaStop

Compare PPO and A2C on Acrobot with AdaStop