tensorflow
diff --git a/‎official/vision/docs/customize_input_pipeline.md
Lines changed: 370 additions & 0 deletions b/‎official/vision/docs/customize_input_pipeline.md
Lines changed: 370 additions & 0 deletions
@@ -0,0 +1,370 @@
+# Customize Input Pipeline
+
+
+
+
+
+
+## Overview
+
+
+A task is a class that encapsulates the logic of loading data, building models,
+performing one-step training and validation, etc. It connects all components
+together and is called by the base
+[Trainer](https://github.com/tensorflow/models/blob/master/official/core/base_trainer.py).
+You can create your own task by inheriting from base
+[Task](https://github.com/tensorflow/models/blob/master/official/core/base_task.py),
+or from one of the
+[tasks](https://github.com/tensorflow/models/tree/master/official/vision/tasks)
+we already defined, if most of the operations can be reused. An `ExampleTask`
+inheriting from
+[ImageClassificationTask](https://github.com/tensorflow/models/blob/master/official/vision/tasks/image_classification.py#L31)
+can be found
+[here](https://github.com/tensorflow/models/blob/master/official/vision/examples/starter/example_task.py).
+
+
+In a task class, the `build_inputs` method is responsible for building the input
+pipeline for training and evaluation. Specifically, it will instantiate a
+Decoder object and a Parser object, which are used to create an `InputReader`
+that will generate a `tf.data.Dataset` object.
+
+
+Here's an example code snippet that demonstrates how to create a custom
+`build_inputs` method:
+
+
+```python
+def build_inputs(
+    self,
+    params: exp_cfg.DataConfig,
+    input_context: Optional[tf.distribute.InputContext] = None
+) -> tf.data.Dataset:
+  ....
+
+
+  decoder = sample_input.Decoder()
+  parser = sample_input.Parser(
+      output_size=..., num_classes=...)
+  reader = input_reader_factory.input_reader_generator(
+      params,
+      dataset_fn=dataset_fn.pick_dataset_fn(params.file_type),
+      decoder_fn=decoder.decode,
+      parser_fn=parser.parse_fn(params.is_training))
+      ....
+
+
+  dataset = reader.read(input_context=input_context)
+  return dataset
+```
+
+
+The class being responsible for building the input pipeline is
+[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214)
+with interface
+
+```python
+class InputReader:
+  """Input reader that returns a tf.data.Dataset instance."""
+
+  def __init__(
+      self,
+      params: cfg.DataConfig,
+      dataset_fn=tf.data.TFRecordDataset,
+      decoder_fn: Optional[Callable[..., Any]] = None,
+      combine_fn: Optional[Callable[..., Any]] = None,
+      sample_fn: Optional[Callable[..., Any]] = None,
+      parser_fn: Optional[Callable[..., Any]] = None,
+      filter_fn: Optional[Callable[..., tf.Tensor]] = None,
+      transform_and_batch_fn: Optional[
+          Callable[
+              [tf.data.Dataset, Optional[tf.distribute.InputContext]],
+              tf.data.Dataset,
+          ]
+      ] = None,
+      postprocess_fn: Optional[Callable[..., Any]] = None,
+  ):
+  ....
+
+  def read(self,
+            input_context: Optional[tf.distribute.InputContext] = None,
+            dataset: Optional[tf.data.Dataset] = None) -> tf.data.Dataset:
+      """Generates a tf.data.Dataset object."""
+      if dataset is None:
+        dataset = self._read_data_source(self._matched_files, self._dataset_fn,
+                                        input_context)
+      dataset = self._decode_and_parse_dataset(dataset, self._global_batch_size,
+                                              input_context)
+      dataset = _maybe_map_fn(dataset, self._postprocess_fn)
+      if not (self._enable_shared_tf_data_service_between_parallel_trainers and
+              self._apply_tf_data_service_before_batching):
+        dataset = self._maybe_apply_data_service(dataset, input_context)
+
+      if self._deterministic is not None:
+        options = tf.data.Options()
+        options.deterministic = self._deterministic
+        dataset = dataset.with_options(options)
+      if self._autotune_algorithm:
+        options = tf.data.Options()
+        options.autotune.autotune_algorithm = (
+            tf.data.experimental.AutotuneAlgorithm[self._autotune_algorithm])
+        dataset = dataset.with_options(options)
+      return dataset.prefetch(self._prefetch_buffer_size)
+```
+
+Therefore, customizing the input pipeline is equivalent to having customized
+versions of `dataset_fn`, `decoder_fn`, etc. The execution order is generally
+as:
+
+```
+dataset_fn -> decoder_fn -> combine_fn -> parser_fn -> filter_fn ->
+transform_and_batch_fn -> postprocess_fn
+```
+
+The `transform_and_batch_fn` is an optional function that merges multiple
+examples into a batch and its default behavior to `dataset.batch` if not
+specified. In this workflow, the functions before `transform_and_batch_fn`, e.g.
+`dataset_fn`, `decoder_fn`, consume tensors without the batch dimension, while
+`postprocess_fn` will consume tensors with the batch dimension.
+
+We have essentially covered
+[decoder_fn](https://github.com/tensorflow/models/blob/master/official/vision/docs/read_custom_datasets.md#decoder),
+and `parser_fn` is another very important one that takes the decoded raw tensors
+dict and parses them into a dictionary of tensors that can be consumed by the
+model. It will be executed after decoder_fn.
+
+It is also worth noting that optimizing of the input pipeline through
+batching, shuffling and prefetching is also implemented in this class.
+
+## Parser
+
+A custom data loader can also be useful if you want to take advantage of
+features such as data augmentation.
+
+Customizing preprocessing is useful because it allows the user to tailor the
+preprocessing steps to suit the specific requirements of the task. While there
+are standard preprocessing techniques that are commonly used, different
+applications may require different preprocessing steps. Additionally, custom
+preprocessing can also improve the efficiency and accuracy of the model by
+removing unnecessary steps, reducing computational resources or adding steps
+that are important to the specific task being addressed.
+
+For example, tasks such as object detection or segmentation may require
+additional preprocessing steps such as resizing, cropping, or data augmentation
+to improve the robustness of the model. Below are some essential steps to
+customize a parser.
+
+### Instructions
+
+*   **Create a Subclass**
+    <br>
+
+<dd><dl>
+
+ Like Decoder, create `class Parser(parser.Parser)` in the same file.The
+`Parser` class should be a childclass of the
+[generic parser interface](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/parser.py)
+and must implement all the abstract methods. It should have the implementation
+of abstract methods `_parse_train_data` and `_parse_eval_data`, to generate
+images and labels for model training and evaluation respectively. The below example
+takes only two arguments but one can freely add as many arguments as needed.
+
+```python
+class Parser(parser.Parser):
+
+ def __init__(self, output_size: List[int], num_classes: float):
+
+   self._output_size = output_size
+   self._num_classes = num_classes
+   self._dtype = tf.float32
+
+    ....
+```
+
+<br>
+
+Refer to the data parser and processing [class](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/maskrcnn_input.py) for Mask R-CNN for more complex cases. The class has multiple parameters related to data augmentation, masking, anchor boxes, data type of output image and more.
+
+</dd></dl>
+
+<br>
+
+*   **Complete Abstract Methods**<br>
+
+<dd><dl> 
+
+To define your own Parser, the user should override abstract functions
+[_parse_train_data](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/parser.py#L26)
+and
+[_parse_eval_data](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/parser.py#L39)
+of the
+[parser](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/parser.py)
+interface in the subclass, where decoded tensors are parsed with pre-processing
+steps for training and evaluation respectively. The output from the two
+functions can be any structure like a tuple, list or dictionary.
+
+```python
+  @abc.abstractmethod
+  def _parse_train_data(self, decoded_tensors):
+    """Generates images and labels that are usable for model training.
+
+    Args:
+      decoded_tensors: a dict of Tensors produced by the decoder.
+
+    Returns:
+      images: the image tensor.
+      labels: a dict of Tensors that contains labels.
+    """
+    pass
+
+  @abc.abstractmethod
+  def _parse_eval_data(self, decoded_tensors):
+    """Generates images and labels that are usable for model evaluation.
+
+    Args:
+      decoded_tensors: a dict of Tensors produced by the decoder.
+
+    Returns:
+      images: the image tensor.
+      labels: a dict of Tensors that contains labels.
+    """
+    pass
+
+```
+
+The input of `_parse_train_data` and `_parse_eval_data` is a dict of Tensors
+produced by the decoder; the output of these two functions is typically a tuple
+of (processe_image, processed_label). The user may perform any processing steps
+in these two functions as long as the interface is aligned. Note that the
+processing steps in `_parse_train_data` and `_parse_eval_data` are typically
+different since data augmentation is usually only applied to training. For
+Example, refer to the
+[Data parser](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/classification_input.py#L166)
+and processing steps for classification. We can observe that
+
+<dd><dl>
+
+-For `_parse_train_data`, the following steps are performed</dd></dl>
+
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Image decoding<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Random cropping<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Random flipping<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Color jittering<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Image resizing<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Auto-augmentation with autoaug, randaug etc.<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Image normalization<br>
+
+<dd><dl><dd><dl>
+
+-For `_parse_eval_data`, the following steps are performed</dd></dl></dd></dl>
+
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;
+Image decoding<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Center cropping<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Image resizing<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Image normalization<br>
+
+</dd></dl>
+
+**Additional Methods**
+
+The subclass (say sample_input.py) must include implementations for all of the
+abstract methods defined in the Interface
+[Decoder](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/decoder.py)
+and
+[Parser](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/parser.py)
+, as well as any additional methods that are necessary for the subclass's
+functionality.
+
+For Example, In
+[object detection](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/tf_example_decoder.py#L72),
+the decoder will take the serialized example and output a dictionary of tensors
+with multiple fields that process and analyze to detect objects and determine
+their location and orientation in the image. Separate methods for each of the
+above fields can make the code easier to read and maintain, especially when the
+class contains a large number of methods.
+
+Refer
+[Data parser](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/retinanet_input.py)
+for Object Detection here.
+
+### Example
+
+Creating a Parser is an optional step and it varies with the use case. Below are
+some use cases where we have included the Decoder and Parser based on the
+requirements.
+
+Use case |    Decoder/Parser |
+-------------------------------------------------------------------------------------------------------------------------------------------------------- | ----
+[Classification](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/classification_input.py) |  Both Decoder and Parser
+[Segmentation](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders/retinanet_input.py) |  Only Parser
+
+## Input Pipeline
+
+Decoder and Parser discussed previously define how to decode and parse per data
+point e.g. an image. However a complete input pipeline would need to handle
+reading data from files in a distributed system, applying random perturbations,
+batching etc. You may find more details about these concepts
+[here](https://www.tensorflow.org/guide/data_performance#optimize_performance).
+
+We have established a well tuned input pipeline as defined in the [InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214) class, such that the user won’t need to modify it in most cases. The input pipeline roughly follows<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Shuffling the files<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Decoding<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Parsing<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Caching<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;If training: repeat and shuffle<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Batching<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;&nbsp;&nbsp;Prefetching<br>
+
+For the rest of this section, we will discuss one particular use case that
+requires the modification of the typical pipeline by maybe creating a subclass
+of the
+[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214).
+
+### Combines multiple datasets
+
+Create a custom InputReader by subclassing
+[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214)
+interface. Custom InputReader class allows the user to combine multiple
+datasets, helps in mixing a labeled and pseudo-labeled dataset etc. The business
+logic is implemented in the `read()` method which finally generates a
+`tf.data.Dataset` object.
+
+The exact implementation of an InputReader can vary depending on the specific
+requirements of your task and the type of input data you're working with, data
+format, and preprocessing requirements.
+
+Here is an example of how to create a custom InputReader by subclassing
+[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/core/input_reader.py#L214)
+interface:
+
+```python
+class CustomInputReader(input_reader.InputReader):
+
+ def __init__(self,
+              params: cfg.DataConfig,
+              dataset_fn=tf.data.TFRecordDataset,
+              pseudo_label_dataset_fn=tf.data.TFRecordDataset,
+                ....):
+
+ def read(
+     self,
+     input_context: Optional[tf.distribute.InputContext] = None
+ ) -> tf.data.Dataset:
+
+
+   labeled_dataset =   ....
+   pseudo_labeled_dataset =   ....
+   dataset_concat = tf.data.Dataset.zip(
+       (labeled_dataset, pseudo_labeled_dataset))
+  ....
+
+   return dataset_concat.prefetch(tf.data.experimental.AUTOTUNE)
+
+```
+
+### Example
+
+Refer to the
+[InputReader](https://github.com/tensorflow/models/blob/b1a7752c5137822a32bd0dd70a0cb96e807ea411/official/vision/dataloaders/input_reader.py#L124)
+for vision in TFM. The `CombinationDatasetInputReader` class mixes a labeled and
+pseudo-labeled dataset and returns a `tf.data.Dataset` instance.