You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/inference.md
+95-17Lines changed: 95 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,22 +32,20 @@ doc = nlp(text)
32
32
33
33
To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing_backend] description below.
34
34
35
-
## Inference on multiple documents {: #edsnlp.core.stream.Stream }
35
+
## Streams
36
36
37
37
When processing multiple documents, we can optimize the inference by parallelizing the computation on a single core, multiple cores and GPUs or even multiple machines.
38
38
39
-
### Streams
40
-
41
39
These optimizations are enabled by performing *lazy inference* : the operations (e.g., reading a document, converting it to a Doc, running the different pipes of a model or writing the result somewhere) are not executed immediately but are instead scheduled in a [Stream][edsnlp.core.stream.Stream] object. It can then be executed by calling the `execute` method, iterating over it or calling a writing method (e.g., `to_pandas`). In fact, data connectors like `edsnlp.data.read_json` return a stream, as well as the `nlp.pipe` method.
42
40
43
41
A stream contains :
44
42
45
43
- a `reader`: the source of the data (e.g., a file, a database, a list of strings, etc.)
46
-
- the list of operations to perform under a `pipeline` attribute containing the name if any, function / pipe, keyword arguments and context for each operation
44
+
- the list of operations to perform (`stream.ops`) that contain the function / pipe, keyword arguments and context for each operation
47
45
- an optional `writer`: the destination of the data (e.g., a file, a database, a list of strings, etc.)
48
46
- the execution `config`, containing the backend to use and its configuration such as the number of workers, the batch size, etc.
49
47
50
-
All methods (`.map`, `.map_batches`, `.map_gpu`, `.map_pipeline`, `.set_processing`) of the stream are chainable, meaning that they return a new stream object (no in-place modification).
48
+
All methods (`map()`, `map_batches()`, `map_gpu()`, `map_pipeline()`, `set_processing()`) of the stream are chainable, meaning that they return a new stream object (no in-place modification).
51
49
52
50
For instance, the following code will load a model, read a folder of JSON files, apply the model to each document and write the result in a Parquet folder, using 4 CPUs and 2 GPUs.
Streams support a variety of operations, such as applying a function to each element of the stream, batching the elements, applying a model to the elements, etc. In each case, the operations will not be executed immediately but will be scheduled to be executed when iterating of the collection, or calling the `execute()`, `to_*()` or `write_*()` methods.
98
+
99
+
### `map()` {: #edsnlp.core.stream.Stream.map }
100
+
101
+
::: edsnlp.core.stream.Stream.map
102
+
options:
103
+
sections: ['text', 'parameters']
104
+
header: false
105
+
show_source: false
99
106
100
-
To apply an operation to a stream, you can use the `.map` method. It takes a callable as input and an optional dictionary of keyword arguments. The function will be applied to each element of the collection.
To apply an operation to a stream in batches, you can use the `map_batches()` method. It takes a callable as input, an optional dictionary of keyword arguments and batching arguments.
To apply an operation to a stream in batches, you can use the `.map_batches` method. It takes a callable as input and an optional dictionary of keyword arguments. The function will be applied to each batch of the collection (as a list of elements), and should return a list of results, that will be concatenated at the end.
136
+
### `loop()` {: #edsnlp.core.stream.Stream.loop }
103
137
104
-
To apply a model, you can use the `.map_pipeline` method. It takes a model as input and will add every pipe of the model to the scheduled operations.
138
+
::: edsnlp.core.stream.Stream.loop
139
+
options:
140
+
heading_level: 3
141
+
sections: ['text', 'parameters']
142
+
header: false
143
+
show_source: false
105
144
106
-
To run a specific function on a GPU (for advanced users, otherwise `map_pipeline` should accommodate most use cases), you can use the `.map_gpu` method. It takes two or three callables as input: the first on (`prepare_batches`) takes a batch of inputs and should return some tensors that will be sent to the GPU and passed to the second callable (`forward`), which will apply the deep learning ops and return the results. The third callable (`postprocess`) and gets the batch of inputs as well as the `forward` results and should return the final results (for instance, the input documents annotated with the predictions).
In each cases, the operations will not be executed immediately but will be scheduled to be executed when iterating of the collection, or calling the `.execute`, `.to_*` or `.write_*` methods.
147
+
::: edsnlp.core.stream.Stream.shuffle
148
+
options:
149
+
heading_level: 3
150
+
sections: ['text', 'parameters']
151
+
header: false
152
+
show_source: false
109
153
110
-
### Execution of a stream {: #edsnlp.core.stream.Stream.set_processing }
154
+
### Configure the execution with `set_processing()` {: #edsnlp.core.stream.Stream.set_processing }
111
155
112
156
You can configure how the operations performed in the stream is executed by calling its `set_processing(...)` method. The following options are available :
Many operations rely on batching, either to be more efficient or because they require a fixed-size input. The `batch_size` and `batch_by` argument of the `map_batches()` method allows you to specify the size of the batches and what function to use to compute the size of the batches.
Note that these batch functions are only available under specific conditions:
216
+
217
+
- either `backend="simple"` or `deterministic=True` (default) if `backend="multiprocessing"`, otherwise elements might be processed out of order
218
+
- if every op before was elementwise (e.g. `map()`, `map_gpu()`, `map_pipeline()` and no generator function), or `sentinel_mode` was explicitly set to `"split"` in `map_batches()`, otherwise the sentinel are dropped by default when the user requires batching.
0 commit comments