A demo of MBQVI algorithm in Gridworld environment

Illustration of how to set up an MBQVI algorithm in rlberry. The environment chosen here is GridWorld environment.

from rlberry_scool.agents.mbqvi import MBQVIAgent
from rlberry_research.envs.finite import GridWorld

params = {}
params["n_samples"] = 100  # samples per state-action pair
params["gamma"] = 0.99
params["horizon"] = None

env = GridWorld(7, 10, walls=((2, 2), (3, 3)), success_probability=0.6)
agent = MBQVIAgent(env, **params)
info = agent.fit()
print(info)

# evaluate policy in a deterministic version of the environment
env_eval = GridWorld(7, 10, walls=((2, 2), (3, 3)), success_probability=1.0)
env_eval.enable_rendering()
state, info = env_eval.reset()
for tt in range(50):
    action = agent.policy(state)
    next_s, _, _, _, _ = env_eval.step(action)
    state = next_s
video = env_eval.save_video("_video/video_plot_mbqvi.mp4")

Total running time of the script: (0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery