ADR-50: update fast ingest details #387

MauriceVanVeen · 2025-10-09T19:27:21Z

Updates details based on nats-io/nats-server#7412

Signed-off-by: Maurice van Veen <[email protected]>

ripienaar · 2025-10-11T05:00:01Z

adr/ADR-50.md

 * Each stream can only have 50 batches in flight at any time
 * Each server can only have 1000 batches in flight at any time
- * A batch that has not had traffic for 10 seconds will be abandoned
+ * A batch that has not had traffic for 10 seconds since the first message will be abandoned


Hmm, can you elaborate on this? Shouldn't it be the last message?

This is about atomic batch, not fast batch. So currently this is using the first message and then sets a timer for 10s to abandon the batch. This was not specified in the ADR yet, so added in here now as well.

Not sure if we'd like/need to change that for 2.12.2+? Given atomic batch has a limited size, I'm not sure if this would be needed.

I think also for atomic the intention is if I have not seen a client for 10 seconds, so it makes more sense as expressed as 10 seconds since the last message

ripienaar · 2025-10-11T05:00:44Z

adr/ADR-50.md

-* Should we only support `eob` style commits?
+* Should we only support `eob` style commits? (server supports both currently)
 * Clients might stall if they lost all the acks involved in their max pending, we might then have to just timeout or perhaps add a way to probe the server to send a `BatchFlowAck` as a liveness check. We will though experiment first before doing this.
+* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.


As they wont always abandon the batch we have to be selective, if its like last seq check that would kill the batch but others might not?

I think for being easier to implement we should only puback on last so perhaps we should extend the fc ack with some additional fields to communicate mid-flow errors that isnt terminal

All expected header checks are terminal, except for a duplicate with Nats-Msg-Id and in that case it would be omitted from the batch but it would continue.

Let's say you're using a Nats-Expected-Last-Subject-Sequence and that fails. The question here is if we want just a PubAck with error. Or if the PubAck would also contain the last persisted message sequence before the error.

Not sure all header handling should be terminal, if gaps are ok then a single message with a last seq mismatch is just a gap that we can continue over right?

Reporting that gap would be annoying with current design as you say - since this can happen a lot - but seems to me we first need to agree on what happens with headers

ripienaar · 2025-10-11T05:02:05Z

adr/ADR-50.md

+* Should we only support `eob` style commits? (server supports both currently)
 * Clients might stall if they lost all the acks involved in their max pending, we might then have to just timeout or perhaps add a way to probe the server to send a `BatchFlowAck` as a liveness check. We will though experiment first before doing this.
+* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.
+* Since every message contains a reply, we could easily spam errors to the client. These errors would also be sent earlier than the final ack. Should we only send an error once, and rely on explicit "probes" to retry getting these errors if they were lost?


This we spoke about the old style inboxes and clients dropping interest, I worry about single errors going missing

Been thinking about the reply thing - it would be great not to have a reply on every message but thats more a bandwidth thing and wont resolve this since you would still send errors to the control channel (reply from first message)

ripienaar · 2025-10-11T05:02:34Z

adr/ADR-50.md

 * Clients might stall if they lost all the acks involved in their max pending, we might then have to just timeout or perhaps add a way to probe the server to send a `BatchFlowAck` as a liveness check. We will though experiment first before doing this.
+* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.
+* Since every message contains a reply, we could easily spam errors to the client. These errors would also be sent earlier than the final ack. Should we only send an error once, and rely on explicit "probes" to retry getting these errors if they were lost?
+* How to handle flow control messages on duplicate messages? Duplicates are omitted, so how do we do flow control in that case since these messages will be immediately dropped.


we have to treat them as eligible for fc cos its about data thet crossed the line

Yeah, mostly a server-side complexity (but also influences the client throughput speed). Because duplicate messages are not stored, the acks will come in faster if there's a bunch of duplicates.

ripienaar · 2025-10-11T05:03:33Z

adr/ADR-50.md

+* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.
+* Since every message contains a reply, we could easily spam errors to the client. These errors would also be sent earlier than the final ack. Should we only send an error once, and rely on explicit "probes" to retry getting these errors if they were lost?
+* How to handle flow control messages on duplicate messages? Duplicates are omitted, so how do we do flow control in that case since these messages will be immediately dropped.
+* Should flow control only support acks per N messages? There will always be an average message size, so having both seems redundant. More importantly though, the server might count bytes differently than the client would, this could result in the client silently and strangely stalling and slowing down. Relying purely on counting messages makes this simpler to implement on both sides, and prevent desync.


it would be one or the other not both at the same time, think its important to have a byte dimension for bigger packets over slower lines? We should though park that and test to confirm so for now lets focus on per message

Just a thought: if you are sending large messages then batching will not get you much of a speedup (if any) so per byte FC probably not useful in 'batching for speed' use cases. (however if this is to be used as a replacement for doing async JS publish then probably useful. maybe even more so than per message). Agreed we can start first with just per message.

jnmoyne

The last message will get the be updated to have the Nats-Batch-Commit:1 header set by the server before the batch is saved.

Should be "The last message will be updated" (delete the extra "the")

ADR-50: update fast ingest details

d9d739c

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen mentioned this pull request Oct 9, 2025

(2.14) Fast batch: initial fast ingest support nats-io/nats-server#7412

Draft

ripienaar reviewed Oct 11, 2025

View reviewed changes

jnmoyne reviewed Oct 12, 2025

View reviewed changes

ADR-50: update fast ingest details #387

Are you sure you want to change the base?

ADR-50: update fast ingest details #387

Uh oh!

Conversation

MauriceVanVeen commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnmoyne Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnmoyne left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MauriceVanVeen commented Oct 9, 2025 •

edited

Loading

jnmoyne Oct 12, 2025 •

edited

Loading