Skip to content

Conversation

eshitachandwani
Copy link
Member

@eshitachandwani eshitachandwani commented Aug 1, 2025

Fixes: #8474

The race is in ReportLoad function of clientImpl. The implementation was recently changed as the part of xds client migration.

The comment says that lrsclient.LRSClient should be initialized only at creation time but that was not the case. It was being initialized at the time of calling ReportLoad function.

RELEASE NOTES:

  • lrsclient:
    • Fix a race condition where the LRSClient was not initialized at creation time but it was being initialized at the time of calling the ReportLoad function.
    • Creating an LRSClient no longer requires a node ID.

@eshitachandwani eshitachandwani added this to the 1.75 Release milestone Aug 1, 2025
@eshitachandwani eshitachandwani added Type: Bug Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Aug 1, 2025
Copy link

codecov bot commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.04%. Comparing base (fa0d658) to head (c8c3ee4).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
internal/xds/xdsclient/clientimpl.go 80.00% 2 Missing and 1 partial ⚠️
internal/xds/clients/lrsclient/lrsclient.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8483      +/-   ##
==========================================
+ Coverage   81.86%   82.04%   +0.17%     
==========================================
  Files         412      412              
  Lines       40518    40465      -53     
==========================================
+ Hits        33172    33200      +28     
+ Misses       5953     5887      -66     
+ Partials     1393     1378      -15     
Files with missing lines Coverage Δ
internal/xds/xdsclient/clientimpl_loadreport.go 76.92% <ø> (+4.19%) ⬆️
internal/xds/clients/lrsclient/lrsclient.go 73.11% <0.00%> (+0.48%) ⬆️
internal/xds/xdsclient/clientimpl.go 82.05% <80.00%> (-0.48%) ⬇️

... and 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal modified the milestones: 1.75 Release, 1.74 Release Aug 1, 2025
@eshitachandwani eshitachandwani changed the title create lrs at creation time xds/xdsclient: create LRSClient at time of initialisation Aug 1, 2025
TransportBuilder: gConfig.TransportBuilder,
})
if err != nil {
return nil, err
Copy link
Contributor

@purnesh42H purnesh42H Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have moved it out of report load, we might have think a bit more here. Should error in lrs client creation be fatal? because not everyone is going to use internal xdsclient for load reporting.

Copy link
Contributor

@arjan-bal arjan-bal Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, we should keep the same behaviour as before the xDS client migration changes. If there are certain users who need to ignore LRS client creation failures, we can create a new issue to discuss if the bahviour changes makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to retain the behavior of making LRS client creation failures be fatal. But we might want to change one minor thing in lrsclient.New. Currently it fails if node ID is empty in the configuration. We recently removed that check for the xDS client creation. I'm guessing other languages might not treat this as fatal for LRS creation.

@eshitachandwani : Could you please check what the other languages do and if required remove the check for empty node ID in lrsclient.New. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly , Java is checking for not null here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That check is for the whole node proto or struct. Not just the node ID field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a private constructor. It's being called from the builder below which sets the id to an empty string by default. In Java empty and null strings are different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh! Okay! Got it!

TransportBuilder: gConfig.TransportBuilder,
})
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to retain the behavior of making LRS client creation failures be fatal. But we might want to change one minor thing in lrsclient.New. Currently it fails if node ID is empty in the configuration. We recently removed that check for the xDS client creation. I'm guessing other languages might not treat this as fatal for LRS creation.

@eshitachandwani : Could you please check what the other languages do and if required remove the check for empty node ID in lrsclient.New. Thanks.

for i := 0; i < numGoroutines; i++ {
go func() {
defer wg.Done()
_, cancelStore := client.ReportLoad(serverConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a loop here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand, loop for what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A loop inside the goroutine to start reporting load and subsequently canceling it. What I'm asking for is:

	for i := 0; i < numGoroutines; i++ {
		go func() {
			defer wg.Done()
			for j := 0; j < 100; j++ {
				_, cancelStore := client.ReportLoad(serverConfig)
				cancelStore(ctx)
			}
		}()
	}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean several ReportLoad() calls from one goroutine? Is this is better repro the real life case or to increase the chances of catching the race?

@easwars easwars assigned eshitachandwani and unassigned easwars and arjan-bal Aug 4, 2025
@eshitachandwani eshitachandwani requested a review from easwars August 5, 2025 12:39
@easwars easwars assigned eshitachandwani and unassigned easwars and arjan-bal Aug 6, 2025
Comment on lines 65 to 66
case config.Node.ID == "":
return nil, errors.New("lrsclient: node ID in node is empty")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe reverting this behaviour change should also be mentioned in the release notes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release notes should also mention the bug that is fixed by this PR.

Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@easwars easwars changed the title xds/xdsclient: create LRSClient at time of initialisation xdsclient: create LRSClient at time of initialisation Aug 25, 2025
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo minor nits

@easwars easwars removed their assignment Aug 25, 2025
@eshitachandwani eshitachandwani merged commit 3074bcd into grpc:master Aug 26, 2025
15 checks passed
eshitachandwani added a commit to eshitachandwani/grpc-go that referenced this pull request Aug 26, 2025
Fixes: grpc#8474

The race is in
[ReportLoad](https://github.com/grpc/grpc-go/blob/9186ebd774370e3b3232d1b202914ff8fc2c56d6/xds/internal/xdsclient/clientimpl_loadreport.go#L35C2-L44C21)
function of clientImpl. The implementation was recently changed as the
part of [xds client
migration](grpc@082a927).

The
[comment](https://github.com/grpc/grpc-go/blob/85240a5b02defe7b653ccba66866b4370c982b6a/xds/internal/xdsclient/clientimpl.go#L86C2-L87C16)
says that `lrsclient.LRSClient` should be initialized only at creation
time but that was not the case. It was being initialized at the time of
calling `ReportLoad` function.

RELEASE NOTES:

- lrsclient:
- Fix a race condition where the `LRSClient` was not initialized at
creation time but it was being initialized at the time of calling the
`ReportLoad` function.
	- 	Creating an `LRSClient` no longer requires a node ID.
eshitachandwani added a commit to eshitachandwani/grpc-go that referenced this pull request Aug 26, 2025
Fixes: grpc#8474

The race is in
[ReportLoad](https://github.com/grpc/grpc-go/blob/9186ebd774370e3b3232d1b202914ff8fc2c56d6/xds/internal/xdsclient/clientimpl_loadreport.go#L35C2-L44C21)
function of clientImpl. The implementation was recently changed as the
part of [xds client
migration](grpc@082a927).

The
[comment](https://github.com/grpc/grpc-go/blob/85240a5b02defe7b653ccba66866b4370c982b6a/xds/internal/xdsclient/clientimpl.go#L86C2-L87C16)
says that `lrsclient.LRSClient` should be initialized only at creation
time but that was not the case. It was being initialized at the time of
calling `ReportLoad` function.

RELEASE NOTES:

- lrsclient:
- Fix a race condition where the `LRSClient` was not initialized at
creation time but it was being initialized at the time of calling the
`ReportLoad` function.
	- 	Creating an `LRSClient` no longer requires a node ID.
eshitachandwani added a commit that referenced this pull request Aug 26, 2025
Original PRs : #8476 ,
#8483
Related issues : #8473 ,
#8474

RELEASE NOTES:

- xds: Revert to allowing empty node ID in xDS bootstrap configuration
- lrsclient:
- Fix a race condition where the LRSClient was not initialized at
creation time but it was being initialized at the time of calling the
ReportLoad function.
	- Creating an LRSClient no longer requires a node ID.

---------

Co-authored-by: Sotiris Nanopoulos <[email protected]>
eshitachandwani added a commit to eshitachandwani/grpc-go that referenced this pull request Aug 27, 2025
Fixes: grpc#8474

The race is in
[ReportLoad](https://github.com/grpc/grpc-go/blob/9186ebd774370e3b3232d1b202914ff8fc2c56d6/xds/internal/xdsclient/clientimpl_loadreport.go#L35C2-L44C21)
function of clientImpl. The implementation was recently changed as the
part of [xds client
migration](grpc@082a927).

The
[comment](https://github.com/grpc/grpc-go/blob/85240a5b02defe7b653ccba66866b4370c982b6a/xds/internal/xdsclient/clientimpl.go#L86C2-L87C16)
says that `lrsclient.LRSClient` should be initialized only at creation
time but that was not the case. It was being initialized at the time of
calling `ReportLoad` function.

RELEASE NOTES:

- lrsclient:
- Fix a race condition where the `LRSClient` was not initialized at
creation time but it was being initialized at the time of calling the
`ReportLoad` function.
	- 	Creating an `LRSClient` no longer requires a node ID.
eshitachandwani added a commit that referenced this pull request Aug 28, 2025
Original PR: #8483 
Related issue: #8474 

RELEASE NOTES:
- lrsclient:
	- Fix a race condition where the `LRSClient` was not initialized at
creation time but it was being initialized at the time of calling the
`ReportLoad` function.
	- 	Creating an `LRSClient` no longer requires a node ID.
eshitachandwani added a commit to eshitachandwani/grpc-go that referenced this pull request Aug 29, 2025
Fixes: grpc#8474

The race is in
[ReportLoad](https://github.com/grpc/grpc-go/blob/9186ebd774370e3b3232d1b202914ff8fc2c56d6/xds/internal/xdsclient/clientimpl_loadreport.go#L35C2-L44C21)
function of clientImpl. The implementation was recently changed as the
part of [xds client
migration](grpc@082a927).

The
[comment](https://github.com/grpc/grpc-go/blob/85240a5b02defe7b653ccba66866b4370c982b6a/xds/internal/xdsclient/clientimpl.go#L86C2-L87C16)
says that `lrsclient.LRSClient` should be initialized only at creation
time but that was not the case. It was being initialized at the time of
calling `ReportLoad` function.

RELEASE NOTES:

- lrsclient: 
- Fix a race condition where the `LRSClient` was not initialized at
creation time but it was being initialized at the time of calling the
`ReportLoad` function.
	- 	Creating an `LRSClient` no longer requires a node ID.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in xds package
4 participants