.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_agent_manager.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_plot_agent_manager.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_agent_manager.py:


=======================
A demo of Experiment Manager
=======================
In this example, we use the ExperimentManager.

First, we initialize a grid world environment with finite state space and actions.
A grid world is a simple environment with finite states and actions, on which
we can test simple algorithms. The reward function can be accessed by: ``env.R[state, action]``, while the transitions by: ``env.P[state, action, next_state]``.

Then, we implement a value iteration algorithm for the action values:

.. math::

    Q(s, a) \leftarrow \sum_{s^{\prime}} p(s'|a, s)\left( R(s, a)+\gamma \max _{a^{\prime}} Q(s^{\prime}, a^{\prime}) \right).

Finally, we compare with a baseline provided by a random policy using the ExperimentManager class which trains, evaluates and gathers statistics about the two agents.

.. GENERATED FROM PYTHON SOURCE LINES 19-129


.. image-sg:: /auto_examples/images/sphx_glr_plot_agent_manager_001.png
   :alt: plot agent manager
   :srcset: /auto_examples/images/sphx_glr_plot_agent_manager_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [INFO] 13:52: ... trained! 
    [INFO] 13:52: Saved ExperimentManager(ValueIterationAgent) using pickle. 
    [INFO] 13:52: The ExperimentManager was saved in : 'rlberry_data/temp/manager_data/ValueIterationAgent_2025-03-07_13-52-16_d9059838/manager_obj.pickle' 
    [INFO] 13:52: Running ExperimentManager fit() for RandomAgent with n_fit = 1 and max_workers = None. 
    [INFO] 13:52: ... trained! 
    [INFO] 13:52: Saved ExperimentManager(RandomAgent) using pickle. 
    [INFO] 13:52: The ExperimentManager was saved in : 'rlberry_data/temp/manager_data/RandomAgent_2025-03-07_13-52-17_4771091c/manager_obj.pickle' 
    [INFO] 13:52: Evaluating ValueIterationAgent... 
    [INFO] 13:52: Computing 10 evaluations. 
    [INFO] Evaluation:..........  Evaluation finished 
    [INFO] 13:52: Evaluating RandomAgent... 
    [INFO] 13:52: Computing 10 evaluations. 
    [INFO] Evaluation:..........  Evaluation finished 


|

.. code-block:: python3


    from rlberry_research.envs import GridWorld

    # Create a grid world environment and an agent with a value iteration policy
    env_ctor = GridWorld
    env_kwargs = dict(
        nrows=3,
        ncols=10,
        reward_at={(1, 1): 0.1, (2, 9): 1.0},
        walls=((1, 4), (2, 4), (1, 5)),
        success_probability=0.9,
    )
    env = env_ctor(**env_kwargs)

    import numpy as np
    from rlberry.agents import AgentWithSimplePolicy


    class ValueIterationAgent(AgentWithSimplePolicy):
        name = "ValueIterationAgent"

        def __init__(
            self, env, gamma=0.99, epsilon=1e-5, **kwargs
        ):  # it's important to put **kwargs to ensure compatibility with the base class
            """
            gamma: discount factor
            episilon: precision of value iteration
            """
            AgentWithSimplePolicy.__init__(
                self, env, **kwargs
            )  # self.env is initialized in the base class

            self.gamma = gamma
            self.epsilon = epsilon
            self.Q = None  # Q function to be computed in fit()

        def fit(self, budget=None, **kwargs):
            """
            Run value iteration.
            """
            S, A = env.observation_space.n, env.action_space.n
            Q = np.zeros((S, A))
            V = np.zeros(S)

            while True:
                TQ = np.zeros((S, A))
                for ss in range(S):
                    for aa in range(A):
                        TQ[ss, aa] = env.R[ss, aa] + self.gamma * env.P[ss, aa, :].dot(V)
                V = TQ.max(axis=1)

                if np.abs(TQ - Q).max() < self.epsilon:
                    break
                Q = TQ
            self.Q = Q

        def policy(self, observation):
            return self.Q[observation, :].argmax()

        @classmethod
        def sample_parameters(cls, trial):
            """
            Sample hyperparameters for hyperparam optimization using Optuna (https://optuna.org/)
            """
            gamma = trial.suggest_categorical("gamma", [0.1, 0.25, 0.5, 0.75, 0.99])
            return {"gamma": gamma}


    # Create random agent as a baseline
    class RandomAgent(AgentWithSimplePolicy):
        name = "RandomAgent"

        def __init__(self, env, **kwargs):
            AgentWithSimplePolicy.__init__(self, env, **kwargs)

        def fit(self, budget=None, **kwargs):
            pass

        def policy(self, observation):
            return self.env.action_space.sample()


    from rlberry.manager import ExperimentManager, evaluate_agents

    # Define parameters
    vi_params = {"gamma": 0.1, "epsilon": 1e-3}

    # Create ExperimentManager to fit 4 agents using 1 job
    vi_stats = ExperimentManager(
        ValueIterationAgent,
        (env_ctor, env_kwargs),
        fit_budget=0,
        eval_kwargs=dict(eval_horizon=20),
        init_kwargs=vi_params,
        n_fit=4,
    )
    vi_stats.fit()

    # Create ExperimentManager for baseline
    baseline_stats = ExperimentManager(
        RandomAgent,
        (env_ctor, env_kwargs),
        fit_budget=0,
        eval_kwargs=dict(eval_horizon=20),
        n_fit=1,
    )
    baseline_stats.fit()

    # Compare policies using 10 Monte Carlo simulations
    output = evaluate_agents([vi_stats, baseline_stats], n_simulations=10)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.229 seconds)


.. _sphx_glr_download_auto_examples_plot_agent_manager.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_agent_manager.py <plot_agent_manager.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_agent_manager.ipynb <plot_agent_manager.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_