diff --git a/doc/source/rllib/connector-v2.rst b/doc/source/rllib/connector-v2.rst index d41459a8dda55..d22ba2d551b19 100644 --- a/doc/source/rllib/connector-v2.rst +++ b/doc/source/rllib/connector-v2.rst @@ -97,7 +97,86 @@ The succeeding pages discuss the three pipeline types in more detail, however, a * All connector pipelines can read from and write to the provided list of episodes as well as the batch and thereby perform data transforms as required. +Batch construction phases and formats +------------------------------------- +When you push a list of input episodes through a connector pipeline, the pipeline constructs a batch from the given data. +This batch always starts as an empty python dictionary and undergoes different formats and phases while passing through the different +pieces of the pipeline. + +The following applies to all :ref:`env-to-module ` and learner connector pipelines (documentation in progress). + +.. figure:: images/connector_v2/pipeline_batch_phases_single_agent.svg + :width: 1000 + :align: left + + **Batch construction phases and formats**: In the standard single-agent case, where only one ModuleID (``DEFAULT_MODULE_ID``) exists, + the batch starts as an empty dictionary (left) and then undergoes a "collect data" phase, in which connector pieces add individual items + to the batch by storing them under a) the column name, for example ``obs`` or ``rewards``, and b) under the episode ID, from which they extracted + the item. + In most cases, your custom connector pieces operate during this phase. Once all custom pieces have performed their data insertions and transforms, + the :py:class:`~ray.rllib.connectors.common.agent_to_module_mapping.AgentToModuleMapping` default piece performs a + "reorganize by ModuleID" operation (center), during which the batch's dictionary hierarchy changes to having the ModuleID (``DEFAULT_MODULE_ID``) at + the top level and the column names thereunder. On the lowest level in the batch, data items still reside in python lists. + Finally, the :py:class:`~ray.rllib.connectors.common.batch_individual_items.BatchIndividualItems` default piece creates NumPy arrays + out of the python lists, thereby batching all data (right). + + +For multi-agent setups, where there are more than one ModuleIDs the +:py:class:`~ray.rllib.connectors.common.agent_to_module_mapping.AgentToModuleMapping` default connector piece makes sure that +the constructed output batch maps module IDs to the respective module's forward batch: + +.. figure:: images/connector_v2/pipeline_batch_phases_multi_agent.svg + :width: 1100 + :align: left + + **Batch construction for multi-agent**: In a multi-agent setup, the default :py:class:`~ray.rllib.connectors.common.agent_to_module_mapping.AgentToModuleMapping` + connector piece reorganizes the batch by ``ModuleID``, then column names, such that a + :py:class:`~ray.rllib.core.rl_module.multi_rl_module.MultiRLModule` can loop through its sub-modules and provide each with a batch + for the forward pass. + +RLlib's :py:class:`~ray.rllib.core.rl_module.multi_rl_module.MultiRLModule` can split up the forward passes into +individual submodules' forward passes using the individual batches under the respective ``ModuleIDs``. +See :ref:`here for how to write your own multi-module or multi-agent forward logic ` +and override this default behavior of :py:class:`~ray.rllib.core.rl_module.multi_rl_module.MultiRLModule`. + + +Finally, if you have a stateful :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, for example an LSTM, RLlib adds two additional +default connector pieces to the pipeline, :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad` +and :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch`: + + +.. figure:: images/connector_v2/pipeline_batch_phases_single_agent_w_states.svg + :width: 900 + :align: left + + **Batch construction for stateful models**: For stateful :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` instances, + RLlib automatically adds additional two default connector pieces to the pipeline. The + :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad` piece converts all lists of individual data + items on the lowest batch level into sequences of a fixed length (``max_seq_len``, see note below for how to set this) and automatically zero-pads + these if it encounters an episode end. + The :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch` piece adds the previously generated + ``state_out`` values of your :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` under the ``state_in`` column name to the batch. Note that + RLlib only adds the ``state_in`` values for the first timestep in each sequence and therefore also doesn't add a time dimension to the data in the + ``state_in`` column. + + +.. note:: + + To change the zero-padded sequence length for the :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad` + connector, set in your config for custom models: + + .. code-block:: python + + config.rl_module(model_config={"max_seq_len": ...}) + + And for RLlib's default models: + + .. code-block:: python + + from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig + + config.rl_module(model_config=DefaultModelConfig(max_seq_len=...)) .. Debugging ConnectorV2 Pipelines diff --git a/doc/source/rllib/env-to-module-connector.rst b/doc/source/rllib/env-to-module-connector.rst index f804716c4ec08..19d8f0d3ca341 100644 --- a/doc/source/rllib/env-to-module-connector.rst +++ b/doc/source/rllib/env-to-module-connector.rst @@ -1,13 +1,33 @@ .. include:: /_includes/rllib/we_are_hiring.rst -.. include:: /_includes/rllib/new_api_stack.rst - - .. _env-to-module-pipeline-docs: Env-to-module pipelines ======================= +.. include:: /_includes/rllib/new_api_stack.rst + +.. grid:: 1 2 3 4 + :gutter: 1 + :class-container: container pb-3 + + .. grid-item-card:: + :img-top: /rllib/images/connector_v2/connector_generic.svg + :class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img + + .. button-ref:: connector-v2-docs + + ConnectorV2 overview + + .. grid-item-card:: + :img-top: /rllib/images/connector_v2/env_to_module_connector.svg + :class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img + + .. button-ref:: env-to-module-pipeline-docs + + Env-to-module pipelines (this page) + + One env-to-module pipeline resides on each :py:class:`~ray.rllib.env.env_runner.EnvRunner` and is responsible for handling the data flow from the `gymnasium.Env `__ to the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`. @@ -84,10 +104,21 @@ use the following code snippet as a starting point: Alternatively, in case there is no ``env`` object available, you should pass in the ``spaces`` argument instead. -RLlib requires these pieces of information to compute the correct output observation space, so that the -:py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` can receive the correct space for its own setup procedure. +RLlib requires either of these pieces of information to compute the correct output observation space of the pipeline, so that the +:py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` can receive the correct input space for its own setup procedure. +The structure of the `spaces` argument should ideally be: -:ref:`See here for the expected format of the spaces arg `. +.. code-block:: python + + spaces = { + "__env__": ([env observation space], [env action space]), # <- may be vectorized + "__env_single__": ([env observation space], [env action space]), # <- never vectorized! + "[module ID, e.g. 'default_policy']": ([module observation space], [module action space]), + ... # <- more modules in multi-agent case + } + +However, for single-agent cases, it may be enough to provide the non-vectorized, single observation- +and action spaces only: .. testcode:: @@ -125,7 +156,7 @@ for stateless- and stateful :py:class:`~ray.rllib.core.rl_module.rl_module.RLMod action = 0 obs, _, _, _, _ = env.step(action) episode1.add_env_step(observation=obs, action=action, reward=1.0) - # - episode 2 (just do one timestep) + # - episode 2 (just one timestep) obs, _ = env.reset() episode2.add_env_reset(observation=obs) @@ -179,7 +210,7 @@ for stateless- and stateful :py:class:`~ray.rllib.core.rl_module.rl_module.RLMod You can see that the pipeline extracted the current observations from the two running episodes and placed them under the ``obs`` column into the forward batch. -The batch has a size of 2, because we had 2 episodes, and should look similar to this: +The batch has a size of two, because we had two episodes, and should look similar to this: .. code-block:: text @@ -226,19 +257,6 @@ RLlib prepends the provided :py:class:`~ray.rllib.connectors.connector_v2.Connec unless you set `add_default_connectors_to_env_to_module_pipeline=False` in your config, in which case RLlib exclusively uses the provided :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pieces without any automatically added default behavior. -.. _env-to-module-connectors-structure-of-spaces-arg: - -Note that RLlib expects the structure of the `spaces` argument to be: - -.. code-block:: python - - spaces = { - "__env__": ([env observation space], [env action space]), # <- may be vectorized - "__env_single__": ([env observation space], [env action space]), # <- never vectorized! - "[module ID, e.g. 'default_policy']": ([module observation space], [module action space]), - ... # <- more modules in multi-agent case - } - For example, to prepend a custom ConnectorV2 piece to the env-to-module pipeline, you can do this in your config: .. testcode:: diff --git a/doc/source/rllib/images/connector_v2/pipeline_batch_phases_multi_agent.svg b/doc/source/rllib/images/connector_v2/pipeline_batch_phases_multi_agent.svg new file mode 100644 index 0000000000000..75c2fb7bd692c --- /dev/null +++ b/doc/source/rllib/images/connector_v2/pipeline_batch_phases_multi_agent.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/doc/source/rllib/images/connector_v2/pipeline_batch_phases_single_agent.svg b/doc/source/rllib/images/connector_v2/pipeline_batch_phases_single_agent.svg new file mode 100644 index 0000000000000..935a962fa1c3b --- /dev/null +++ b/doc/source/rllib/images/connector_v2/pipeline_batch_phases_single_agent.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/doc/source/rllib/images/connector_v2/pipeline_batch_phases_single_agent_w_states.svg b/doc/source/rllib/images/connector_v2/pipeline_batch_phases_single_agent_w_states.svg new file mode 100644 index 0000000000000..ad86f4bd5d625 --- /dev/null +++ b/doc/source/rllib/images/connector_v2/pipeline_batch_phases_single_agent_w_states.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/doc/source/rllib/rl-modules.rst b/doc/source/rllib/rl-modules.rst index fa1863da514e4..c8e27ee7c21f6 100644 --- a/doc/source/rllib/rl-modules.rst +++ b/doc/source/rllib/rl-modules.rst @@ -791,6 +791,8 @@ You implement the main action sampling logic in the ``_forward_...()`` methods: +.. _implementing-custom-multi-rl-modules: + Implementing custom MultiRLModules ---------------------------------- diff --git a/doc/source/rllib/user-guides.rst b/doc/source/rllib/user-guides.rst index baa34d5b9496b..a3fef19df8c2a 100644 --- a/doc/source/rllib/user-guides.rst +++ b/doc/source/rllib/user-guides.rst @@ -80,7 +80,7 @@ RLlib Feature Guides .. button-ref:: connector-v2 - How To Use Connectors and Connector pipelines (new API stack)? + How To Use Connectors and Connector pipelines? .. grid-item-card:: :img-top: /rllib/images/rllib-logo.svg