.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/demo_bandits/plot_ucb_bandit.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_demo_bandits_plot_ucb_bandit.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_demo_bandits_plot_ucb_bandit.py:


=============================
UCB Bandit cumulative regret
=============================

This script shows how to define a bandit environment and an UCB Index-based algorithm.

.. GENERATED FROM PYTHON SOURCE LINES 8-70


.. image-sg:: /auto_examples/demo_bandits/images/sphx_glr_plot_ucb_bandit_001.png
   :alt: Cumulative Pseudo-Regret
   :srcset: /auto_examples/demo_bandits/images/sphx_glr_plot_ucb_bandit_001.png
   :class: sphx-glr-single-img


.. code-block:: python3


    import numpy as np
    from rlberry_research.envs.bandits import NormalBandit
    from rlberry_research.agents.bandits import IndexAgent, makeSubgaussianUCBIndex
    from rlberry.manager import ExperimentManager, plot_writer_data
    import matplotlib.pyplot as plt


    # Agents definition


    class UCBAgent(IndexAgent):
        """UCB agent for sigma-subgaussian bandits"""

        name = "UCB Agent"

        def __init__(self, env, sigma=1, **kwargs):
            index, _ = makeSubgaussianUCBIndex(sigma)
            IndexAgent.__init__(self, env, index, writer_extra="action", **kwargs)


    # Parameters of the problem
    means = np.array([0, 0.9, 1])  # means of the arms
    T = 3000  # Horizon
    M = 20  # number of MC simu

    # Construction of the experiment

    env_ctor = NormalBandit
    env_kwargs = {"means": means, "stds": 2 * np.ones(len(means))}

    xp_manager = ExperimentManager(
        UCBAgent,
        (env_ctor, env_kwargs),
        fit_budget=T,
        init_kwargs={"sigma": 2},
        n_fit=M,
        parallelization="process",
        mp_context="fork",
    )
    # these parameters should give parallel computing even in notebooks


    # Agent training

    xp_manager.fit()


    # Compute and plot (pseudo-)regret
    def compute_pseudo_regret(actions):
        return np.cumsum(np.max(means) - means[actions.astype(int)])


    fig = plt.figure(1, figsize=(5, 3))
    ax = plt.gca()
    output = plot_writer_data(
        [xp_manager],
        tag="action",
        preprocess_func=compute_pseudo_regret,
        title="Cumulative Pseudo-Regret",
        ax=ax,
    )


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 4.882 seconds)


.. _sphx_glr_download_auto_examples_demo_bandits_plot_ucb_bandit.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_ucb_bandit.py <plot_ucb_bandit.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_ucb_bandit.ipynb <plot_ucb_bandit.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_