[core] EC2 Autoscaler race condition

### What happened + What you expected to happen

FYI, previously reported this here https://github.com/ray-project/ray/issues/51861
It was closed, but the issue is still there, here's an example of a stack trace from ray 2.47.0:

```
2025-07-09 19:42:52,988 ERROR autoscaler.py:373 -- StandardAutoscaler: Error during autoscaling.
2025-07-09 19:42:53.109
Traceback (most recent call last):
2025-07-09 19:42:53.109
  File "/opt/ml/lib/python3.12/site-packages/ray/autoscaler/_private/autoscaler.py", line 370, in update
2025-07-09 19:42:53.109
    self._update()
2025-07-09 19:42:53.109
  File "/opt/ml/lib/python3.12/site-packages/ray/autoscaler/_private/autoscaler.py", line 420, in _update
2025-07-09 19:42:53.109
    self.process_completed_updates()
2025-07-09 19:42:53.109
  File "/opt/ml/lib/python3.12/site-packages/ray/autoscaler/_private/autoscaler.py", line 768, in process_completed_updates
2025-07-09 19:42:53.109
    self.load_metrics.mark_active(self.provider.internal_ip(node_id))
2025-07-09 19:42:53.109
  File "/opt/ml/lib/python3.12/site-packages/ray/autoscaler/_private/load_metrics.py", line 131, in mark_active
2025-07-09 19:42:53.109
    assert ip is not None, "IP should be known at this time"
2025-07-09 19:42:53.109
           ^^^^^^^^^^^^^^ 
```

PS> Also there appears to be a larger regression in 2.47.0 which causes clusters to get stuck and not able to scale up or down. I don't have a smoking gun, but we had to roll back our upgrade. Currently investigating which version introduced the regression. If you have any other reports for this issue or fixes, please let me know. 


### Versions / Dependencies

Ray version 2.47.0

### Reproduction script

Run a ray job hundres/thousands of tasks and let cluster scale up quickly to hundreds or thousands of nodes. We observe this happening with clusters as small as ~150 nodes.

### Issue Severity

High: It blocks me from completing my task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] EC2 Autoscaler race condition #54920

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[core] EC2 Autoscaler race condition #54920

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions