Note
Go to the end to download the full example code
A demo of MBQVI algorithm in Gridworld environment¶
Illustration of how to set up an MBQVI algorithm in rlberry. The environment chosen here is GridWorld environment.
from rlberry_scool.agents.mbqvi import MBQVIAgent
from rlberry_research.envs.finite import GridWorld
params = {}
params["n_samples"] = 100 # samples per state-action pair
params["gamma"] = 0.99
params["horizon"] = None
env = GridWorld(7, 10, walls=((2, 2), (3, 3)), success_probability=0.6)
agent = MBQVIAgent(env, **params)
info = agent.fit()
print(info)
# evaluate policy in a deterministic version of the environment
env_eval = GridWorld(7, 10, walls=((2, 2), (3, 3)), success_probability=1.0)
env_eval.enable_rendering()
state, info = env_eval.reset()
for tt in range(50):
action = agent.policy(state)
next_s, _, _, _, _ = env_eval.step(action)
state = next_s
video = env_eval.save_video("_video/video_plot_mbqvi.mp4")
Total running time of the script: (0 minutes 0.000 seconds)