Change `LLMRayActor` to continually process individual prompts #859

finbarrtimbers · 2025-08-07T19:26:32Z

New & improved version of #807.

Now, we process individual prompts through the queues, and do inflight weight updates (configurable with the inflight_weight_updates flag, set to False by default).

Runs with inflight_updates=False (which should recreate existing behaviour):

Single GPU run: Beaker
Single GPU run with tools: Beaker
Multi-GPU run: Beaker

With inflight_updates=True:

Single GPU run: Beaker
Multi-GPU run: Beaker

This is 40% more efficient (tokens per second go from 349.22 -> 500.42 in the benchmark).

Benchmark results for main at HEAD:

Average results over 4 main benchmark batches:
Average tokens/second: 349.22
Average MFU: 4.76%
Average generation time per batch: 2422.32s
Average new tokens per sample: 13217.39 tokens
Wasted compute % (variable response length): 35.31%

Benchmark results in this PR:

Average results over 4 main benchmark batches:
Average tokens/second: 500.42
Average MFU: 5.43%
Average generation time per batch: 1691.32s
Average new tokens per sample: 13224.54 tokens

…l dataset size.

…l params.

* Fixed timing. * Cleaned up code * Cleaned up code.

hamishivi

Generally I think this is pretty good to merge, mostly just minor nits left, assuming the slowdown stuff is worked out!

hamishivi · 2025-08-27T22:26:23Z

open_instruct/grpo_fast.py

    vllm_top_p: float = 1.0
    """vLLM top p for nucleus sampling"""
+    inference_batch_size: Optional[int] = None
+    """Number of inference requests to batch together for vLLM processing"""


maybe we could be more specific: (something like) "number of unique prompts sent to a single vllm engine at once"?

hamishivi · 2025-08-27T22:27:00Z

open_instruct/grpo_fast.py


                        finish_reasons += [finish_reasons[i] for i in sampled_indices]

-                        print(


can we keep this logging?

hamishivi · 2025-08-27T22:28:39Z

open_instruct/grpo_fast.py

                raise ValueError(f"Unknown tool: {tool}")

-    actor_manager = vllm_utils3.ActorManager.remote()
+    actor_manager = ray.remote(vllm_utils3.ActorManager).remote()


Is this change necessary since ActorManager is already decorated with ray.remote?

hamishivi · 2025-08-27T22:30:13Z

open_instruct/utils.py

-    if has_beaker_job:
-        logger.info(f"is_beaker_job: BEAKER_JOB_ID value: {os.environ.get('BEAKER_JOB_ID')}")
-    return has_beaker_job
+    return "BEAKER_JOB_ID" in os.environ


claude · 2025-08-29T17:05:03Z

Code Review for PR #859: Change LLMRayActor to continually process individual prompts

Overview

This PR implements a significant architectural change from batch-based to individual prompt processing, introducing an inflight_updates flag and achieving a reported 40% performance improvement (349.22 → 500.42 tokens/second).

✅ Strengths

Performance Improvements:

Excellent 40% performance gain with clear benchmarking data
Smarter queue sizing based on actual workload: queue_size = args.num_unique_prompts_rollout * num_batches
Better memory management with comprehensive logging functions

Code Quality:

Good separation of concerns between submission and generation threads
Proper error handling and cleanup in finally blocks
Clear progress reporting and logging improvements

Architecture:

Clean transition from batch-based to individual prompt processing
Backwards compatibility with inflight_updates=False flag
Well-structured thread management with proper synchronization

⚠️ Issues and Concerns

Critical Bug - Division by Zero:
In grpo_fast.py line 442: self.inference_batch_size = self.num_unique_prompts_rollout // self.vllm_num_engines
When num_unique_prompts_rollout < vllm_num_engines, this becomes 0, which will cause issues downstream.

Potential Race Conditions:
The dataset_idx calculation in benchmark_generators.py could lead to out-of-bounds access if dataset size assumptions are violated.

Memory Management:

New log_memory_usage() function is comprehensive but may impact performance in production due to frequent gc.get_objects() calls
Consider making memory logging optional or less frequent

Thread Safety:

Multiple threads accessing shared queues - ensure proper synchronization
Thread pool size increased to len(vllm_engines) + 1 - verify this doesn't cause resource contention

🔧 Recommended Fixes

Fix Division by Zero: Use max(1, self.num_unique_prompts_rollout // self.vllm_num_engines)
Add bounds checking for dataset access
Optimize memory logging for production use

🧪 Testing Concerns

Limited Test Coverage:

Only basic unit tests for device name mapping
No integration tests for the new individual prompt processing logic
Missing tests for edge cases (empty queues, timeout scenarios)
No tests for the new inflight_updates functionality

🛡️ Security Assessment

Low Risk Areas:

No obvious security vulnerabilities
Proper input validation for queue operations
Safe memory management practices

📋 Summary

This is a solid performance-focused PR with a well-architected solution. The 40% performance gain is impressive and the code is generally well-structured. However, there are a few critical bugs that need addressing before merge:

Fix the division by zero bug in inference batch size calculation
Add bounds checking for dataset access
Optimize memory logging for production use
Expand test coverage for the new functionality

The architectural change is sound and the backwards compatibility approach is thoughtful. Once the critical issues are addressed, this will be a valuable improvement to the codebase.

🤖 Generated with Claude Code

finbarrtimbers added 30 commits July 29, 2025 12:26

Cleaned up evals to have same names as training data.

b133e2a

Refactored evals to use a batch.

1265789

Now, we accumulate eval results.

aa76132

Merge branch 'main' into fix-eval

4e30841

Updated scripts so they run.

66af972

More refactoring.

cfa55c9

Now, use the minimum of the number of requested samples and the actua…

433242a

…l dataset size.

Ran linter, and fixed extra arg issue.

0836fca

Always insert into pending_queries_map.

8028a31

Update signature in eval.

9816f34

Merge branch 'main' into fix-eval

97b8de9

Another attempted fix.

e862a14

Ran linter.

9676db3

Now, eval requests use the eval params, and normal ones use the norma…

d044278

…l params.

Now, tests should pass.

6a694bf

Merge branch 'main' into fix-eval

f45b951

Remove simple config and pass generation_config through.

96df985

Now, generation config is passed through.

b931a35

Ran linter.

aa0facb

Ran linter.

9dd0711

Added a while loop.

cbf7aa7

Added a while loop with retries.

84b9a4c

Merge branch 'main' into fix-eval

93c0a97

Added logs.

87aa0fa

Fix queue issue.

b636127

Add progress bars to all ray.get calls.

d0f8870

Merge branch 'main' into fix-eval

9f9e644

Cleaned up some of the logging.

08de6ea

Changed how we handle full queues.

634e1fb

Ran linter.

ada6556

finbarrtimbers added 7 commits August 27, 2025 08:45

Fixed benchmark.

0ef100f

Added metrics.

99848ff

Merge branch 'main' into continual-processing

ba7a0d6

Removed debug logging.

cc09d4f

Cleaned up PR.

4c99dc8

Cleaned up PR.

c5ce3ef

Cleaned up benchmark.

4c13c00

finbarrtimbers enabled auto-merge August 27, 2025 17:04

finbarrtimbers added 10 commits August 27, 2025 13:46

Cleaned up PR.

d96bc28

Merge branch 'main' into continual-processing

e9d8484

Enable inflight weight updates (#955)

0b9a1fc

* Fixed timing. * Cleaned up code * Cleaned up code.

Ran linter.

1c59533

Removed warning.

f4e8e9c

Removed debugging code.

1ef2948

Cleaned up benchmark code.

6107ad0

Added loggign

7bb2548

Ran linter.

b6acc05

Changes

a960545

hamishivi reviewed Aug 27, 2025

View reviewed changes

finbarrtimbers added 2 commits August 29, 2025 10:30

modified loop to not start until we have prompts in the queue

2cfbc0f

Merge branch 'main' into continual-processing

d5d9013

Merge branch 'main' into continual-processing

d3d3a3d

This was referenced Aug 30, 2025

Refactor main engine loop #970

Closed

Modify grpo_fast.py so that we now pass individual prompts through the queue, not batches. #972

Merged

Caches should_stop so we're not hammering Ray. #973

Merged

finbarrtimbers added 2 commits August 29, 2025 23:19

Merge branch 'main' into continual-processing

e26b45c

Changed batch size back

a28585b

This was referenced Sep 2, 2025

Manually duplicate both tool and non-tool requests. #978

Merged

Combines the process_from_queue and _process_request methods. #982

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change `LLMRayActor` to continually process individual prompts #859

Change `LLMRayActor` to continually process individual prompts #859

Uh oh!

finbarrtimbers commented Aug 7, 2025 •

edited

Loading

Uh oh!

hamishivi left a comment

Uh oh!

hamishivi Aug 27, 2025

Uh oh!

hamishivi Aug 27, 2025

Uh oh!

hamishivi Aug 27, 2025

Uh oh!

hamishivi Aug 27, 2025

Uh oh!

claude bot commented Aug 29, 2025

Uh oh!

Uh oh!


		finish_reasons += [finish_reasons[i] for i in sampled_indices]

		print(

Change LLMRayActor to continually process individual prompts #859

Are you sure you want to change the base?

Change LLMRayActor to continually process individual prompts #859

Uh oh!

Conversation

finbarrtimbers commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Aug 29, 2025

Code Review for PR #859: Change LLMRayActor to continually process individual prompts

Overview

✅ Strengths

⚠️ Issues and Concerns

🔧 Recommended Fixes

🧪 Testing Concerns

🛡️ Security Assessment

📋 Summary

Uh oh!

Uh oh!

Change `LLMRayActor` to continually process individual prompts #859

Change `LLMRayActor` to continually process individual prompts #859

finbarrtimbers commented Aug 7, 2025 •

edited

Loading