Skip to content

Conversation

tadjik1
Copy link
Contributor

@tadjik1 tadjik1 commented Aug 27, 2025

Description

Change Streams include a feature that allows the stream to resume after certain failures (for example, when a host is unreachable or a cursor is not found). The full list of resumable errors is defined in the spec).

This feature works by having ChangeStream cache the resume token returned by the server on successful operations, and then pass its value in either the startAfter or resumeAfter field when attempting to resume. The server then knows from which point in the oplog to return changes to the client.

The caching mechanism is handled internally in both the event-listener form and the iterator form, and it updates the stream’s resumeToken property. For example:

const changeStream = collection.watch([]);
while (true) {
  await changeStream.next();
  console.log(changeStream.resumeToken);  // resumeToken updates on each change
}
// this can be rewritten into for..of form which uses .next() internally
// for await (const _ of changeStream) {

However, not all exposed methods cache the token. In particular, .tryNext() ignores it, so the following code logs the same resumeToken for each change:

const changeStream = collection.watch([]);
while (true) {
  const change = await changeStream.tryNext();
  if (change) {
    console.log(changeStream.resumeToken);  // resumeToken does not update
  }
  await scheduler.wait(1000);  // add delay since tryNext() does not wait for changes
}

If the stream encounters a resumable error in the snippets above, it will attempt to resume. Without the latest resume token (.tryNext() case), the server might return changes that were already consumed, resulting in duplicates on the application side. This issue does not occur with .next() or with the 'change' event listener, both of which update the resumeToken correctly.

The fix

Handle changes returned by .tryNext() the same way as in .next() by using the private ._processChange(), except when the change is null. Because .tryNext() does not wait for server changes, a null indicates “no changes,” whereas .next() would treat null as a signal to close the stream.

Testing

Resume functionality is covered by a combination of unit and integration tests that verify token caching and the use of startAfter/resumeAfter during resume.

Additionally, a test case was added for this specific user scenario:

  • Create a change stream
  • Get the first change
  • Attempt to get the second change; fail on getMore
  • Resume with the token
  • Get the second change (no duplicates)

Release Highlight

ChangeStream .tryNext() now updates resumeToken to prevent duplicates after resume

When .tryNext() returns a change document, the driver now caches its resumeToken, aligning its behavior with .next() and the 'change' event. If .tryNext() returns null (no new changes), nothing is cached, which is unchanged from previous behavior.

Previously, .tryNext() did not update the resumeToken, so a resumable error could cause a resume from an older token and re-deliver already processed changes. With this release, resumes continue from the latest token observed via .tryNext(), preventing duplicates.

const changeStream = collection.watch([]);
while (true) {
  const change = await changeStream.tryNext(); // prior versions could return duplicates
  await scheduler.wait(1000);  // delay since tryNext() does not wait for changes
}

Applications that poll change streams with .tryNext() in non-blocking loops benefit directly. There are no API changes; if you previously tracked and passed resumeAfter or startAfter manually, you can now rely on the driver’s built-in token caching.

Huge thanks to @rkistner for bringing this bug to our attention and for sharing code to reproduce it. Huge thanks as well to @Omnicpie for investigating and implementing a fix.

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@tadjik1 tadjik1 changed the title test(NODE-4763): add tests for resumeToken caching mechanism fix(NODE-4763): cache resumeToken in #tryNext() of the changeStream Aug 27, 2025
@tadjik1 tadjik1 force-pushed the NODE-4763 branch 5 times, most recently from 249547b to 9ed6270 Compare August 29, 2025 08:51
@tadjik1 tadjik1 marked this pull request as ready for review August 29, 2025 11:02
@tadjik1 tadjik1 requested a review from a team as a code owner August 29, 2025 11:02
@tadjik1 tadjik1 changed the title fix(NODE-4763): cache resumeToken in #tryNext() of the changeStream fix(NODE-4763): cache resumeToken in ChangeStream.tryNext() Aug 29, 2025
@baileympearson baileympearson self-assigned this Aug 29, 2025
@baileympearson baileympearson added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Aug 29, 2025
@tadjik1 tadjik1 force-pushed the NODE-4763 branch 2 times, most recently from 93138f0 to a4f9f9d Compare September 3, 2025 14:40
Copy link
Contributor

@baileympearson baileympearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment on test title phrasing, otherwise looks good!

@dariakp dariakp added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Sep 5, 2025
@baileympearson baileympearson merged commit 8331a93 into main Sep 8, 2025
29 of 31 checks passed
@baileympearson baileympearson deleted the NODE-4763 branch September 8, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants