rlberry.agents.utils.replay
.ReplayBuffer¶
- class rlberry.agents.utils.replay.ReplayBuffer(max_replay_size, rng, max_episode_steps=None, enable_prioritized=False, alpha=0.5, beta=0.5)[source]¶
Bases:
object
Replay buffer that allows sampling data with shape (batch_size, time_size, …).
- Parameters:
- max_replay_size: int
Maximum number of transitions that can be stored
- rng: numpy.random.Generator
Numpy random number generator. See https://numpy.org/doc/stable/reference/random/generator.html
- max_episode_steps: int, optional
Maximum length of an episode
- enable_prioritized: bool, default = False
If True, enable sampling with prioritized experience replay, by setting sampling_mode=”prioritized” in the
sample()
method.- alpha: float, default = 0.5
How much prioritization is used, if prioritized=True, (0 - no prioritization, 1 - full prioritization).
- beta: float, default = 0.5
To what degree to use importance weights, if prioritized=True, (0 - no corrections, 1 - full correction).
- Attributes:
data
Dict containing all stored data.
dtypes
Dict containing the data types for each tag.
max_episode_steps
Maximum length of an episode.
tags
Tags identifying the entries in the replay buffer.
Notes
For prioritized experience replay, code was adapted from https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py
Examples
>>> import numpy as np >>> from rlberry.agents.utils import replay >>> from rlberry.envs import gym_make >>> >>> rng = np.random.default_rng() >>> buffer = replay.ReplayBuffer(100_000, rng) >>> buffer.setup_entry("observations", np.float32) >>> buffer.setup_entry("actions", np.uint32) >>> buffer.setup_entry("rewards", np.float32) >>> >>> # Store data in the replay >>> env = gym_make("CartPole-v1") >>> for _ in range(500): >>> done = False >>> obs,info = env.reset() >>> while not done: >>> action = env.action_space.sample() >>> next_observation, reward, terminated, truncated, info = env.step(action) >>> done = terminated or truncated >>> buffer.append( >>> { >>> "observations": obs, >>> "actions": action, >>> "rewards": reward >>> } >>> ) >>> obs = next_observation >>> if done: >>> buffer.end_episode() >>> # Sample a batch of 32 sub-trajectories of length 100. >>> # Note: a sub-trajectory may include transitions from more than one episode! >>> batch = buffer.sample(batch_size=32, chunk_size=100) >>> for tag in buffer.tags: >>> print(tag, batch.data[tag].shape)
Methods
append
(data)Store data.
clear
()Clear data in replay.
Call this method to indicate the end of an episode.
sample
(batch_size, chunk_size[, sampling_mode])Sample a batch.
setup_entry
(tag, dtype)Configure replay buffer to store data.
update_priorities
(indices, new_priorities)Update priorities in the replay buffer.
- append(data)[source]¶
Store data.
- Parameters:
- datadict
Dictionary containing scalar values, whose keys must be in self.tags.
- property data¶
Dict containing all stored data.
- property dtypes¶
Dict containing the data types for each tag.
- property max_episode_steps¶
Maximum length of an episode.
- sample(batch_size, chunk_size, sampling_mode='uniform')[source]¶
Sample a batch.
Data have shape (B, T, …), where B = batch_size T = chunk_size and represents a batch of sub-trajectories.
- Parameters:
- batch_size: int
Number of sub-trajectories to sample.
- chunk_size: int
Length of each sub-trajectory. A sub-trajectory may include transitions from more than one episode.
- sampling_mode: {“uniform”, “prioritized”}, default = “uniform”
“uniform”: sample batch uniformly at random; “prioritized”: use prioritized experience replay (requires enable_prioritized=True in the constructor).
- Returns:
- If the number of stored transitions is smaller than chunk_size, returns None.
- Otherwise, returns a NamedTuple
batch
where:
batch.data
is a dict such that batch.data[tag] is a numpy array
- containing data stored for a given tag.
batch.data
is a dict wherebatch.data["indices"]
contains the indices of the sampled transitions in the buffer, andbatch.data["weights"]
contains the importance sampling weights associeated to the prioritized experience replay.
- property tags¶
Tags identifying the entries in the replay buffer.
- update_priorities(indices, new_priorities)[source]¶
Update priorities in the replay buffer.
- Parameters:
- indicesarray of shape (batch, time)
Numpy array containing indices of transitions to be updated. From a sampled batch, you can set it to batch.info[“indices”].
- new_prioritiesarray of shape (batch, time)
Numpy array containing the new priorities. Must have the same shape as the indices array.