Skip to content

Conversation

MauriceVanVeen
Copy link
Member

@MauriceVanVeen MauriceVanVeen commented Oct 9, 2025

Updates details based on nats-io/nats-server#7412

Signed-off-by: Maurice van Veen <[email protected]>
* Each stream can only have 50 batches in flight at any time
* Each server can only have 1000 batches in flight at any time
* A batch that has not had traffic for 10 seconds will be abandoned
* A batch that has not had traffic for 10 seconds since the first message will be abandoned
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can you elaborate on this? Shouldn't it be the last message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about atomic batch, not fast batch. So currently this is using the first message and then sets a timer for 10s to abandon the batch. This was not specified in the ADR yet, so added in here now as well.

Not sure if we'd like/need to change that for 2.12.2+? Given atomic batch has a limited size, I'm not sure if this would be needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also for atomic the intention is if I have not seen a client for 10 seconds, so it makes more sense as expressed as 10 seconds since the last message

* Should we only support `eob` style commits?
* Should we only support `eob` style commits? (server supports both currently)
* Clients might stall if they lost all the acks involved in their max pending, we might then have to just timeout or perhaps add a way to probe the server to send a `BatchFlowAck` as a liveness check. We will though experiment first before doing this.
* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As they wont always abandon the batch we have to be selective, if its like last seq check that would kill the batch but others might not?

I think for being easier to implement we should only puback on last so perhaps we should extend the fc ack with some additional fields to communicate mid-flow errors that isnt terminal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All expected header checks are terminal, except for a duplicate with Nats-Msg-Id and in that case it would be omitted from the batch but it would continue.

Let's say you're using a Nats-Expected-Last-Subject-Sequence and that fails. The question here is if we want just a PubAck with error. Or if the PubAck would also contain the last persisted message sequence before the error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure all header handling should be terminal, if gaps are ok then a single message with a last seq mismatch is just a gap that we can continue over right?

Reporting that gap would be annoying with current design as you say - since this can happen a lot - but seems to me we first need to agree on what happens with headers

* Should we only support `eob` style commits? (server supports both currently)
* Clients might stall if they lost all the acks involved in their max pending, we might then have to just timeout or perhaps add a way to probe the server to send a `BatchFlowAck` as a liveness check. We will though experiment first before doing this.
* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.
* Since every message contains a reply, we could easily spam errors to the client. These errors would also be sent earlier than the final ack. Should we only send an error once, and rely on explicit "probes" to retry getting these errors if they were lost?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This we spoke about the old style inboxes and clients dropping interest, I worry about single errors going missing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been thinking about the reply thing - it would be great not to have a reply on every message but thats more a bandwidth thing and wont resolve this since you would still send errors to the control channel (reply from first message)

* Clients might stall if they lost all the acks involved in their max pending, we might then have to just timeout or perhaps add a way to probe the server to send a `BatchFlowAck` as a liveness check. We will though experiment first before doing this.
* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.
* Since every message contains a reply, we could easily spam errors to the client. These errors would also be sent earlier than the final ack. Should we only send an error once, and rely on explicit "probes" to retry getting these errors if they were lost?
* How to handle flow control messages on duplicate messages? Duplicates are omitted, so how do we do flow control in that case since these messages will be immediately dropped.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to treat them as eligible for fc cos its about data thet crossed the line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, mostly a server-side complexity (but also influences the client throughput speed). Because duplicate messages are not stored, the acks will come in faster if there's a bunch of duplicates.

* How to handle errors based on per-message header checks? We could return a PubAck with up to what point of the batch was persisted, and the error of the message that came after that. But would mean a PubAck+error response.
* Since every message contains a reply, we could easily spam errors to the client. These errors would also be sent earlier than the final ack. Should we only send an error once, and rely on explicit "probes" to retry getting these errors if they were lost?
* How to handle flow control messages on duplicate messages? Duplicates are omitted, so how do we do flow control in that case since these messages will be immediately dropped.
* Should flow control only support acks per N messages? There will always be an average message size, so having both seems redundant. More importantly though, the server might count bytes differently than the client would, this could result in the client silently and strangely stalling and slowing down. Relying purely on counting messages makes this simpler to implement on both sides, and prevent desync.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be one or the other not both at the same time, think its important to have a byte dimension for bigger packets over slower lines? We should though park that and test to confirm so for now lets focus on per message

Copy link
Contributor

@jnmoyne jnmoyne Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: if you are sending large messages then batching will not get you much of a speedup (if any) so per byte FC probably not useful in 'batching for speed' use cases. (however if this is to be used as a replacement for doing async JS publish then probably useful. maybe even more so than per message). Agreed we can start first with just per message.

Copy link
Contributor

@jnmoyne jnmoyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last message will get the be updated to have the Nats-Batch-Commit:1 header set by the server before the batch is saved.

Should be "The last message will be updated" (delete the extra "the")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants