speechmatics · TudorCRL · Jul 11, 2025 · Jul 11, 2025 · Jul 11, 2025 · Jul 11, 2025
diff --git a/docs/speech-to-text/real-time/latency.mdx b/docs/speech-to-text/real-time/latency.mdx
@@ -1,89 +1,124 @@
 ---
-description: 'Learn about latency in the Speechmatics Real-Time server'
+description: 'Control response time in Realtime transcription'
 keywords: [speechmatics, real-time, rt, latency, low latency, fast, transcription, speech recognition, asr]
-sidebar_label: Real-time Latency
+sidebar_label: Latency
 ---
 
-# Real-Time Latency
+# Latency settings
 
-When transcribing in real-time, you can control the maximum time to wait for the final transcript. This could be as fast as 0.7 seconds, though allowing a longer time will give a slight accuracy improvement. 
+Balance speed and accuracy in your Realtime transcription by adjusting latency settings.
 
-For even faster output, use [Partial transcripts](#partial-transcripts) to receive transcription output before higher-accuracy final transcripts are returned.
+## Configuration options
 
-## Configuration Example
-The following example shows a typical configuration for low latency applications. Include this in the [StartRecognition](/api-ref/realtime-transcription-websocket#startrecognition) message.
+Configure real-time latency with the following parameters:
+
+- `max_delay`: Maximum time in seconds (0.7-4.0, default: 4.0) between what's said and final transcript delivery
+- `max_delay_mode`: Mode setting (`fixed` or `flexible`, default: `flexible`) for handling [numeral formatting](#numeral-formatting)
+- `enable_partials`: Boolean (default: false) to enable [partial transcripts](#partial-transcripts) for faster feedback
+
+Add these parameters to your [StartRecognition](/api-ref/realtime-transcription-websocket#startrecognition) message:
 
 ```json
 {
-  "type": "transcription",
   "transcription_config": {
     // highlight-start
     "max_delay": 0.7,
     "max_delay_mode": "flexible",
-    // highlight-end
     "enable_partials": true,
+    // highlight-end
     "language": "en",
-    "operating_point": "enhanced",
+    "operating_point": "enhanced"
   }
 }
 ```
 
-- `max_delay` (Number): Optional. Allowed between 0.7 and 4 seconds. Default is 4 seconds. This is the delay in seconds between the end of a spoken word and returning the Final transcript results. Note that there is a very small amount of additional latency while the server is sending the transcript to the client.
-- `max_delay_mode` (String): Optional. Allowed values are `fixed` and `flexible`. Default is `flexible`. This allows some additional time for [Numeral Formatting](#numeral-formatting).
-- `enable_partials` (Boolean): Default is false. Whether or not to receive [Partial transcripts](#partial-transcripts) before the Final transcripts are received.
 
-## Accuracy/Latency trade-offs
+## Speed vs. accuracy trade-offs
+
+Choose the right `max_delay` setting for your use case:
 
-We recommend experimenting with different settings for the `max_delay` to find the right trade-off between accuracy and latency for your application. Based on our own testing and experience, we can offer a few guidelines to get you started.
+| Setting | Accuracy Impact | Recommended Use Cases |
+|---------|----------------|----------------------|
+| 0.7-1.5s | < 5% degradation | Conversational AI, voice assistants |
+| 2.0s | ~1% degradation | Live captioning, broadcast media |
+| 4.0s | No degradation | Highest accuracy needs with partial transcripts |
 
-Setting `max_delay` to between 0.7 and 1.5 gives an accuracy degradation of less than 5% relative when compared to the Batch transcription service. This tradeoff is worthwhile for use cases that need ultra-fast responses such as real-time conversational AI.
+:::warning
+Lower latency settings trade some accuracy for speed. Test thoroughly with your specific audio.
+:::
 
-At 2 seconds `max_delay`, there is around 1% relative accuracy degradation when compared to the Batch transcription service. This is the recommended setting for most use cases, such as broadcast captioning.
+## Partial transcripts
 
-For the best accuracy, we recommend using a `max_delay` of 4 seconds which is equivalent to our Batch transcription service. This can be combined with Partial transcripts, to give users early feedback of the recognised text.
+Get preliminary results faster while waiting for final, more accurate transcripts.
 
-## Partial Transcripts
+### How partial transcripts work
 
-Partial transcripts allow you to receive preliminary transcription and update as more context is available until the higher-accuracy [Finals](/api-ref/realtime-transcription-websocket#addtranscript) are returned. Typically Partials are returned in less than 500 milliseconds. [Partial transcripts](/api-ref/realtime-transcription-websocket.mdx#addpartialtranscript) are enabled using the `enable_partials` config option. 
+- Delivered in under 500ms (vs. final transcripts at your configured `max_delay`)
+- Updated continuously as more speech context becomes available
+- Enabled with `enable_partials: true` in your configuration
 
-On each Final transcript you will immediately receive a Partial transcript with any remaining words which have not been finalised.
+### Limitations
 
-Note that Partial transcripts have some limitations:
-- Accuracy is usually 10-25% lower than the Final transcript. This includes punctuation and capitalisation of words.
-- The `confidence` field for Partial transcripts has no meaning and should not be relied on.
+  - Accuracy is typically 10-25% lower than final transcripts
+  - Punctuation and capitalization may be incorrect
+  - Confidence scores are not meaningful and should be ignored
 
-## Numeral Formatting
+## Numeral formatting
 
-[Numeral Formatting](/speech-to-text/output-enhancements/numeral-formatting) ensures readability of your transcripts by formatting numbers, dates, currencies and other important _entities_ into their written form.
+Improve transcript readability with properly formatted numbers, dates, and currencies.
 
-When the `max_delay_mode` is set to `flexible`, and an entity is being spoken, the Final transcript would be delayed until the entity is fully spoken to enable proper formatting. This option should be used in most use-cases for improved accuracy and readability for numbers, currencies, and dates. 
+### Flexible mode
 
-If you have strict latency requirements, and prefer not to wait for entity formatting to complete, set `max_delay_mode` to `fixed`. Note that in this mode, there will be some reduction in accuracy and readability for numbers, currencies, and dates.
+When using `max_delay_mode: "flexible"` (default):
+- System waits until an entity (number, date, currency) is fully spoken
+- Ensures proper formatting of complex numerical expressions
+- Slightly increases latency only when entities are detected 
 
-## Example Outputs (Partials and Finals)
-With only `Finals` and default `max_delay_mode`, messages received could look like the following:
+### Fixed mode
 
-- **(Final)**: I am 35.
+For applications with strict latency requirements:
+- Set `max_delay_mode: "fixed"` to enforce consistent timing
+- System won't wait for entities to complete before returning results
 
-**Final output**: I am 35.
+:::warning
+Fixed mode reduces accuracy and readability of numbers, currencies, and dates.
+:::
 
-With `Partials` enabled and default `max_delay_mode`, messages received could look like the following:
+## Example output comparison
 
-- **(Partial)**: I
-- **(Partial)**: I am
-- **(partial)**: I am third
-- **(Partial)**: I am 30
-- **(Final)**: I am 35.
+### Finals only (default)
+
+With only final transcripts (default configuration):
+
+```
+(Final): I am 35.
+```
 
-**Final output**: I am 35.
+### Partials with flexible mode
 
-With `Partials` enabled and `max_delay_mode` as `fixed`, messages received could look like the following:
+With `enable_partials: true` and `max_delay_mode: "flexible"`:
 
-- **(Partial)**: I
-- **(Final)**: I am
-- **(partial)**: third
-- **(Final)**: 30
-- **(Partial)**: five
-- **(Final)**: five.
+```
+(Partial): I
+(Partial): I am
+(Partial): I am third
+(Partial): I am 30
+(Final): I am 35.
+```
+
+Note how the system corrects "30" to "35" in the final transcript.
+
+### Partials with fixed mode
+
+With `enable_partials: true` and `max_delay_mode: "fixed"`:
+
+```
+(Partial): I
+(Final): I am
+(Partial): third
+(Final): 30
+(Partial): five
+(Final): five.
+```
 
-**Final output**: I am 30 five.
+Final output: "I am 30 five." Note how the number isn't properly formatted.