Skip to content

[RLlib; docs] Docs do-over (new API stack): ConnectorV2 documentation (part II). #54313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c5dcc11
wip
sven1977 Aug 13, 2024
bf7a10e
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Aug 22, 2024
32b33ec
wip
sven1977 Aug 22, 2024
e7a6f24
wip
sven1977 Aug 23, 2024
30272c1
wip
sven1977 Aug 26, 2024
f097d1a
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Aug 26, 2024
f2a2359
wip
sven1977 Aug 26, 2024
bba9fbb
wip
sven1977 Aug 26, 2024
328e0c1
wip
sven1977 Aug 26, 2024
345ee78
wip
sven1977 Aug 26, 2024
ebdd51b
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 19, 2025
5a822cb
wip
sven1977 Jan 19, 2025
e7468dd
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jun 11, 2025
2550c03
wip
sven1977 Jun 11, 2025
bc9fb66
LINT
sven1977 Jun 11, 2025
b870d4f
LINT
sven1977 Jun 12, 2025
9df3bdf
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jun 13, 2025
d513f61
wip
sven1977 Jun 13, 2025
f273b72
wip
sven1977 Jun 17, 2025
7bfb131
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jun 17, 2025
b96eaa0
wip
sven1977 Jun 17, 2025
8b9a981
wip
sven1977 Jun 18, 2025
bf6a086
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jun 18, 2025
68f48a6
wip
sven1977 Jun 23, 2025
30bde82
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jun 25, 2025
f7df27d
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jun 30, 2025
99da912
wip
sven1977 Jun 30, 2025
b10dbcb
wip
sven1977 Jun 30, 2025
f527e5d
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jul 2, 2025
b70a520
wip
sven1977 Jul 2, 2025
e65940a
wip
sven1977 Jul 2, 2025
63b577d
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jul 3, 2025
3e2d792
wip
sven1977 Jul 3, 2025
232b69c
wip
sven1977 Jul 3, 2025
63270c5
catch up with 01 PR.
sven1977 Jul 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions doc/source/rllib/connector-v2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. _connector-v2-docs:

ConnectorV2 and ConnectorV2 pipelines
=====================================

.. toctree::
:hidden:

env-to-module-connector
module-to-env-connector
learner-connector

.. include:: /_includes/rllib/new_api_stack.rst

.. grid:: 1 2 3 4
:gutter: 1
:class-container: container pb-3

.. grid-item-card::
:img-top: /rllib/images/connector_v2/connector_generic.svg
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img

.. button-ref:: connector-v2-docs

ConnectorV2 overview (this page)

.. grid-item-card::
:img-top: /rllib/images/connector_v2/env_to_module_connector.svg
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img

.. button-ref:: env-to-module-pipeline-docs

Env-to-module pipelines

.. grid-item-card::
:img-top: /rllib/images/connector_v2/module_to_env_connector.svg
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img

.. button-ref:: module-to-env-connector-docs

Module-to-env pipelines

.. grid-item-card::
:img-top: /rllib/images/connector_v2/learner_connector.svg
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img

.. button-ref:: learner-connector-docs

Learner connector pipelines


RLlib stores and transports all trajectory data in the form of :py:class:`~ray.rllib.env.single_agent_episode.SingleAgentEpisode`
or :py:class:`~ray.rllib.env.multi_agent_episode.MultiAgentEpisode` objects.
**Connector pipelines** are the components that translate this episode data into tensor batches
readable by neural network models right before the model forward pass.

.. figure:: images/connector_v2/generic_connector_pipeline.svg
:width: 1000
:align: left

**Generic ConnectorV2 Pipeline**: All pipelines consist of one or more :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pieces.
When calling the pipeline, you pass in a list of Episodes, the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` instance,
and a batch, which initially might be an empty dict.
Each :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` piece in the pipeline takes its predecessor's output,
starting on the left side with the batch, performs some transformations on the episodes, the batch, or both, and passes everything
on to the next piece. Thereby, all :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pieces can read from and write to the
provided episodes, add any data from these episodes to the batch, or change the data that's already in the batch.
The pipeline then returns the output batch of the last piece.

Note that the batch output of the pipeline lives only as long as the succeeding
:py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` forward pass or `Env.step()` call. RLlib discards the data afterwards.
The list of episodes, however, may persist longer. For example, if a env-to-module pipeline reads an observation from an episode,
mutates that observation, and then writes it back into the episode, the subsequent module-to-env pipeline is able to see the changed observation.
Also, the Learner pipeline operates on the same episodes that have already passed through both env-to-module and module-to-env pipelines
and thus might have undergone changes.


Three ConnectorV2 pipeline types
--------------------------------

There are three different types of connector pipelines in RLlib:

1) :ref:`Env-to-module pipeline <env-to-module-pipeline-docs>`, which creates tensor batches for action computing forward passes.
2) :ref:`Module-to-env pipeline <module-to-env-pipeline-docs>`, which translates a model's output into RL environment actions.
3) :ref:`Learner connector pipeline <learner-pipeline-docs>`, which creates the train batch for a model update.

The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` API is an extremely powerful tool for
customizing your RLlib experiments and algorithms. It allows you to take full control over accessing, changing, and re-assembling
the episode data collected from your RL environments or your offline RL input files as well as controlling the exact
nature and shape of the tensor batches that RLlib feeds into your models for computing actions or losses.

.. figure:: images/connector_v2/location_of_connector_pipelines_in_rllib.svg
:width: 900
:align: left

**ConnectorV2 Pipelines**: Connector pipelines convert episodes into batched data, which a neural network can process
(env-to-module and Learner) or convert your model's output into actions, which your RL environment needs for stepping (module-to-env).
The env-to-module pipeline, located on an :py:class:`~ray.rllib.env.env_runner.EnvRunner`, takes a list of
episodes as input and outputs a batch for an :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` forward pass
that computes the next action. The module-to-env pipeline on the same :py:class:`~ray.rllib.env.env_runner.EnvRunner`
takes the output of that :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` and converts it into actions
for the next call to your RL environment's `step()` method.
Lastly, a Learner connector pipeline, located on a :py:class:`~ray.rllib.core.learner.learner.Learner`
worker, converts a list of episodes into a train batch for the next :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` update.

The succeeding pages discuss the three pipeline types in more detail, however, all three have in common:

* All connector pipelines are sequences of one or more :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pieces. You can nest these as well, meaning some of the pieces may be connector pipelines themselves.
* All connector pieces and -pipelines are Python callables, overriding the :py:meth:`~ray.rllib.connectors.connector_v2.ConnectorV2.__call__` method.
* The call signatures are uniform across the different pipeline types. The main arguments are the list of episodes, the batch to-be-built, and the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` instance. See the :py:meth:`~ray.rllib.connectors.connector_v2.ConnectorV2.__call__` method for more details.
* All connector pipelines can read from and write to the provided list of episodes as well as the batch and thereby perform data transforms as required.


Batch construction phases and formats
-------------------------------------

When you push a list of input episodes through a connector pipeline, the batch the pipeline constructs, which is always a python dictionary,
undergoes different formats and phases while passing the different pieces of the pipeline. The following applies to all
:ref:`env-to-module <env-to-module-pipeline-docs>` and :ref:`learner connector <learner-pipeline-docs>` pipelines.

.. figure:: images/connector_v2/pipeline_batch_phases_single_agent.svg
:width: 1000
:align: left

**Batch construction phases and formats**: In the standard single-agent case, where only one ModuleID (``DEFAULT_MODULE_ID``) exists,
the batch starts as an empty dictionary (left) and then undergoes a "collect data" phase, in which connector pieces add individual items
to the batch by storing them under a) the column name, for example ``obs`` or ``rewards``, and b) under the episode ID, from which they extracted
the item.
In most cases, your custom connector pieces operate during this phase. Once all custom pieces have performed their data insertions and transforms,
the :py:class:`~ray.rllib.connectors.common.agent_to_module_mapping.AgentToModuleMapping` default piece performs a
"reorganize by ModuleID" operation, during which the batch's dictionary hierarchy changes to having the ModuleID (``DEFAULT_MODULE_ID``) at
the top level and the column names thereunder. On the lowest level in the batch, data items still reside in python lists (middle).
Finally, the :py:class:`~ray.rllib.connectors.common.batch_individual_items.BatchIndividualItems` default piece creates NumPy arrays
out of the python lists, thereby batching all data (right).


For multi-agent setups, where there are more than one ModuleIDs, note that the
:py:class:`~ray.rllib.connectors.common.agent_to_module_mapping.AgentToModuleMapping` default connector piece makes sure that
the constructed output batch maps module IDs to the respective module's forward batch:

.. figure:: images/connector_v2/pipeline_batch_phases_multi_agent.svg
:width: 1000
:align: left

This way, RLlib's :py:class:`~ray.rllib.core.rl_module.multi_rl_module.MultiRLModule` can split up the forward passes into
individual submodules' forward passes using the individual batches under the respective ``ModuleIDs``.
See :ref:`here for how to write your own multi-module or multi-agent forward logic <implementing-custom-multi-rl-modules>`
and override this default behavior of :py:class:`~ray.rllib.core.rl_module.multi_rl_module.MultiRLModule`.


In case you have a stateful :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, for example an LSTM, RLlib adds two additional
default connector pieces to the pipeline, :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`
and :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch`:


.. figure:: images/connector_v2/pipeline_batch_phases_single_agent_w_states.svg
:width: 1000
:align: left

**Batch construction phases and formats for stateful models**: For stateful :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` instances,
RLlib automatically adds additional two default connector pieces to the pipeline. The
:py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad` piece converts all lists of individual data
items on the lowest batch level into sequences of a fixed length (``max_seq_len``, see note below for how to set this) and automatically zero-pads
these if it encounters an episode end.
The :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch` piece adds the previously generated
``state_out`` values of your :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` under the ``state_in`` column name to the batch. Note that
RLlib only adds the ``state_in`` values for the first timestep in each sequence and therefore also doesn't add a time dimension to the data in the
``state_in`` column.


.. note::

To change the zero-padded sequence length for the :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`
connector, set in your config for custom models:

.. code-block:: python

config.rl_module(model_config={"max_seq_len": ...})

And for RLlib's default models:

.. code-block:: python

from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
config.rl_module(model_config=DefaultModelConfig(max_seq_len=...))


.. Debugging ConnectorV2 Pipelines
.. ===============================

.. TODO (sven): Move the following to the "how to contribute to RLlib" page and rename that page "how to develop, debug and contribute to RLlib?"

.. You can debug your custom ConnectorV2 pipelines (and any RLlib component in general) through the following simple steps:

.. Run without any remote :py:class:`~ray.rllib.env.env_runner.EnvRunner` workers. After defining your :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object, do: `config.env_runners(num_env_runners=0)`.
.. Run without any remote :py:class:`~ray.rllib.core.learner.learner.Learner` workers. After defining your :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object, do: `config.learners(num_learners=0)`.
.. Switch off Ray Tune, if applicable. After defining your :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object, do: `algo = config.build()`, then `while True: algo.train()`.
.. Set a breakpoint in the ConnectorV2 piece (or any other RLlib component) you would like to debug and start the experiment script in your favorite IDE in debugging mode.

.. .. figure:: images/debugging_rllib_in_ide.png
Loading