feat: add rate limiter logic to dynamo's openai api compatible http service (v1) #1949

jorgeantonio21 · 2025-07-15T17:22:56Z

Overview:

This PR implements adaptive rate limiting for the HTTP service to prevent system overload during periods of degraded model performance. The rate limiter monitors Time-to-First-Token (TTFT) and Inter-Token-Latency (ITL) metrics using exponential moving averages and automatically rejects new requests when performance thresholds are exceeded.

Details:

Core Rate Limiting Logic:

Time-Weighted EMA Tracking: Uses exponential moving averages with configurable time constants to track TTFT/ITL metrics, giving more weight to recent performance while gracefully decaying older samples
Adaptive Thresholds: Configurable TTFT and ITL thresholds trigger request rejection when exceeded
Per-Model vs Global: Supports both per-model isolation and global rate limiting across all models

Integration Points:

HTTP Service: Integrated into OpenAI-compatible endpoints (/v1/chat/completions, /v1/completions) with early request rejection (429 status)
Metrics System: Records rate limiting decisions and EMA values for Prometheus monitoring
Configuration: CLI arguments and builder pattern support for threshold and time constant configuration

Why This Approach is Robust:

Responsive: Quickly adapts to changing system conditions without requiring manual intervention
Stable: Exponential decay prevents temporary spikes from causing prolonged rate limiting
Granular: Per-model limits prevent one poorly performing model from affecting others
Observable: Full Prometheus metrics integration for monitoring and alerting

Key Files Added/Modified:

lib/llm/src/http/service/rate_limiter.rs - Core rate limiting logic with TimeWeightedAverageTracker
lib/llm/src/http/service/service_v2.rs - HTTP service integration and configuration
lib/llm/src/http/service/openai.rs - Request rejection logic in endpoint handlers
lib/llm/src/http/service/metrics.rs - Prometheus metrics for rate limiting
components/http/src/main.rs - CLI argument support
lib/bindings/python/ - Python bindings for rate limiter configuration

Where should the reviewer start?

Core Logic: lib/llm/src/http/service/rate_limiter.rs - Review the TimeWeightedAverageTracker EMA implementation and rate limiting decision logic
HTTP Integration: lib/llm/src/http/service/openai.rs - Check the should_reject_request() integration in endpoint handlers
Configuration: lib/llm/src/http/service/service_v2.rs - Verify rate limiter configuration and state management
Testing: lib/llm/tests/http-service.rs - Review integration tests covering rate limiting scenarios and recovery

Related Issues:

Relates to: The enhancement proposal in here. This is a first approach to focus mostly good-put based on TTFT and ITL. A more robust rate limiting logic can be found in the shared link.

Summary by CodeRabbit

New Features
- Introduced configurable rate limiting for the HTTP service, with options to set thresholds for response times and enable per-model or global limits.
- Added detailed Prometheus metrics and logging for rate-limited requests, including time to first token (TTFT) and inter-token latency (ITL).
- Exposed rate limiter configuration and controls to Python bindings and command-line interface.
Documentation
- Added a comprehensive guide explaining rate limiter configuration, monitoring, tuning, and best practices.
Tests
- Added extensive tests simulating slow responses to verify rate limiting triggers, recovery, and per-model/global isolation.
Bug Fixes
- Improved status tracking and metrics for requests rejected due to rate limiting.

copy-pr-bot · 2025-07-15T17:22:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-07-15T17:23:04Z

👋 Hi jorgeantonio21! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-07-15T17:28:42Z

Walkthrough

This update introduces a comprehensive, configurable rate limiting system to the HTTP service, based on time-weighted exponential moving averages (EMA) of time to first token (TTFT) and inter-token latency (ITL). The rate limiter is integrated into service configuration, Python bindings, metrics, and OpenAI-compatible endpoints, with extensive documentation, benchmarks, and tests for correctness, performance, and recovery.

Changes

File(s)	Change Summary
components/http/src/main.rs	Added CLI args for rate limiting, validation, and conditional service builder logic.
docs/guides/rate_limiting.md	Added a detailed guide for the new rate limiter, covering design, config, monitoring, and tuning.
lib/bindings/python/rust/http.rs lib/bindings/python/rust/lib.rs lib/bindings/python/src/dynamo/_core.pyi	Extended Python bindings: added `RateLimiterConfig` class, updated `HttpService` constructor, registered new class.
lib/llm/Cargo.toml lib/llm/benches/rate_limiter.rs	Added rate limiter benchmark suite and dependency (`dashmap`) to the project.
lib/llm/src/http/service.rs	Made `rate_limiter` module public.
lib/llm/src/http/service/metrics.rs	Added metrics for rate-limited requests: counters, histograms, and status handling.
lib/llm/src/http/service/openai.rs	Integrated rate limiting checks into OpenAI endpoints with early rejection and error reporting.
lib/llm/src/http/service/rate_limiter.rs	Implemented a configurable, concurrent rate limiter based on time-weighted EMA of TTFT/ITL.
lib/llm/src/http/service/service_v2.rs	Integrated `RateLimiter` into service state and configuration builder.
lib/llm/tests/http-service.rs	Added engines and async tests to verify rate limiting, metrics, recovery, and per-model/global modes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP_Service
    participant RateLimiter
    participant Engine

    Client->>HTTP_Service: Send request (e.g., completions)
    HTTP_Service->>RateLimiter: should_reject(model, endpoint, type)
    RateLimiter-->>HTTP_Service: Allow/Reject decision + EMA metrics
    alt Rejected
        HTTP_Service-->>Client: HTTP 429 Rate Limit Exceeded
        HTTP_Service->>Metrics: Record rejection + EMA
    else Allowed
        HTTP_Service->>Engine: Process request
        Engine-->>HTTP_Service: Response (stream/unary)
        HTTP_Service->>RateLimiter: record_ttft/itl
        HTTP_Service->>Metrics: Record success + latency
        HTTP_Service-->>Client: Return response
    end

Poem

A hop, a skip, a measured leap,
Now tokens flow, but not too deep!
With EMAs and thresholds tight,
We keep the service running right.
If you go too fast, beware the sign:
"Too Many Requests"—please wait in line!
🐇⏳🚦

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (1)

lib/llm/tests/http-service.rs (1)
166-180: Update compare_counters to include the new Rejected status.

The function only checks Success and Error statuses but not the newly added Rejected status. This creates a gap in test coverage for rate limiting metrics.
 fn compare_counters(metrics: &Metrics, model: &str, expected: &[u64; 8]) {
     for endpoint in &[Endpoint::Completions, Endpoint::ChatCompletions] {
         for request_type in &[RequestType::Unary, RequestType::Stream] {
-            for status in &[Status::Success, Status::Error] {
+            for status in &[Status::Success, Status::Error, Status::Rejected] {
                 let index = compute_index(endpoint, request_type, status);
                 compare_counter(
                     metrics,
                     model,
                     endpoint,
                     request_type,
                     status,
                     expected[index],
                 );
             }
         }
     }
 }
Note: You'll also need to update the expected array size from 8 to 12 to accommodate the additional Rejected status combinations (2 endpoints × 2 request types × 3 statuses = 12).

🧹 Nitpick comments (2)

lib/llm/src/http/service/rate_limiter.rs (1)

103-187: Mathematically correct and efficient EMA implementation.

The recursive formula implementation is correct and provides O(1) update complexity. The decay calculation properly models system recovery during idle periods.

One minor optimization opportunity: consider caching Instant::now() in get_decayed_time_weighted_average if it's called multiple times in quick succession.
lib/llm/benches/rate_limiter.rs (1)
42-42: Consider removing the throughput setting for more accurate benchmark comparison.

Setting throughput to 1 element makes it harder to compare the efficiency across different sample sizes. Either remove this line to let Criterion use iteration time, or set it to reflect the sample size for better comparison.
-        group.throughput(Throughput::Elements(1)); // One calculation per iteration

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4da078b and 182c615.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

components/http/src/main.rs (4 hunks)
docs/guides/rate_limiting.md (1 hunks)
lib/bindings/python/rust/http.rs (2 hunks)
lib/bindings/python/rust/lib.rs (1 hunks)
lib/bindings/python/src/dynamo/_core.pyi (2 hunks)
lib/llm/Cargo.toml (2 hunks)
lib/llm/benches/rate_limiter.rs (1 hunks)
lib/llm/src/http/service.rs (1 hunks)
lib/llm/src/http/service/metrics.rs (11 hunks)
lib/llm/src/http/service/openai.rs (9 hunks)
lib/llm/src/http/service/rate_limiter.rs (1 hunks)
lib/llm/src/http/service/service_v2.rs (5 hunks)
lib/llm/tests/http-service.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (4)

lib/llm/src/http/service.rs (1)

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/llm/src/http/service/openai.rs (1)

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::client::RequestErrorKind::NoResponders, not async_nats::Error::NoResponders. Use err.downcast_ref::<async_nats::client::RequestError>() and then check request_err.kind() against RequestErrorKind::NoResponders.

lib/llm/tests/http-service.rs (1)

Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.898Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

lib/llm/src/http/service/rate_limiter.rs (2)

Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

🧬 Code Graph Analysis (1)

lib/llm/src/http/service.rs (1)

lib/llm/src/http/service/service_v2.rs (1)

rate_limiter (48-50)

🪛 LanguageTool

docs/guides/rate_limiting.md

[uncategorized] ~1-~1: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: # Rate Limiting Guide ## Overview The Dynamo LLM serv...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🔇 Additional comments (35)

lib/llm/Cargo.toml (2)

39-42: LGTM! Benchmark configuration follows standard patterns.

The benchmark section is correctly configured with harness = false for custom benchmark harness, which is appropriate for performance testing of the rate limiter implementation.

116-116: Appropriate dependency addition for concurrent data structures.

The dashmap 6.1.0 dependency is well-suited for the rate limiter implementation, providing concurrent hash maps that are essential for thread-safe rate limiting across multiple requests.

lib/llm/src/http/service.rs (1)

26-26: LGTM! Module declaration follows existing patterns.

The public module declaration is correctly placed and follows the same pattern as other modules in the service. This appropriately exposes the rate limiter functionality to the rest of the codebase.

lib/bindings/python/rust/lib.rs (1)

101-101: LGTM! Correct Python class registration.

The RateLimiterConfig class registration follows the established pattern for exposing Rust structs to Python. The placement within the class registration section is appropriate.

lib/bindings/python/src/dynamo/_core.pyi (2)

810-814: LGTM! Updated constructor signature matches Rust implementation.

The HttpService constructor is correctly updated to accept optional parameters for port and rate_limiter_config, maintaining backward compatibility while enabling the new rate limiting functionality.

823-834: LGTM! Well-designed RateLimiterConfig class.

The RateLimiterConfig class provides appropriate parameters for configuring rate limiting:

TTFT and ITL thresholds in seconds (float precision)

Time constant for EMA calculation

Optional per-model limits flag with sensible default

The parameter types and naming conventions are consistent with the implementation.

components/http/src/main.rs (4)

7-7: LGTM! Appropriate import addition.

The RateLimiterConfig import is correctly added to support the new rate limiting functionality.

33-67: Well-designed CLI arguments for rate limiting configuration.

The CLI arguments are comprehensive and well-documented:

Clear help text for each parameter

Sensible default values (1000ms TTFT, 30ms ITL, 15s time constant)

Appropriate data types (f64 for thresholds, bool for flags)

Consistent naming conventions

The default values appear reasonable for typical LLM serving scenarios.

81-95: Correct conditional rate limiter integration.

The conditional logic properly:

Converts milliseconds to seconds for the configuration

Uses the builder pattern appropriately

Only applies rate limiting when enabled

The unit conversion from milliseconds to seconds is handled correctly.

123-139: Appropriate validation for rate limiting parameters.

The validation function correctly checks that all rate limiting parameters are positive values, which is essential for proper rate limiter operation. The error messages are clear and descriptive.

lib/bindings/python/rust/http.rs (2)

40-49: LGTM! Clean integration of rate limiter configuration.

The optional RateLimiterConfig parameter maintains backward compatibility while enabling rate limiting when needed. The builder pattern is properly utilized.

196-221: Well-structured Python bindings for rate limiter configuration.

The RateLimiterConfig class properly wraps the Rust implementation with appropriate error handling and parameter validation.

docs/guides/rate_limiting.md (1)

1-165: Excellent documentation coverage for the rate limiting feature.

The guide provides clear explanations of the time-weighted EMA algorithm, comprehensive configuration examples, monitoring guidance, and practical tuning recommendations. The mathematical formulas are correctly presented.

lib/llm/src/http/service/openai.rs (3)

101-110: Correct implementation of rate limit error response.

The method properly returns HTTP 429 (Too Many Requests) status code, which is the standard response for rate limiting.

197-199: Consistent integration of rate limiter with response collectors.

All response collectors are properly updated to accept the rate limiter, enabling metric collection across endpoints.

Also applies to: 385-387, 554-556

598-634: Well-structured rate limiting enforcement logic.

The function properly:

Checks if rate limiting is enabled before processing

Records both TTFT and ITL metrics for monitoring

Increments rejection counters for observability

Provides clear error messages including the model name

lib/llm/src/http/service/service_v2.rs (3)

23-23: Clean integration of rate limiter into service state.

The rate limiter is properly integrated following the existing patterns for shared state management with Arc, and the accessor methods are consistent with the codebase conventions.

Also applies to: 27-32, 48-54

99-100: Proper builder pattern extension for rate limiter configuration.

The optional configuration field and builder method follow the established patterns and maintain the fluent interface.

Also applies to: 151-155

161-162: Correct instantiation and integration of rate limiter.

The rate limiter is properly instantiated with optional configuration and wrapped in Arc for thread-safe sharing.

lib/llm/src/http/service/rate_limiter.rs (4)

1-44: Excellent module documentation with clear mathematical explanations.

The documentation provides a thorough understanding of the time-weighted EMA algorithm and the design philosophy prioritizing "good-put" over raw throughput.

46-101: Well-designed configuration with proper validation.

The RateLimiterConfig properly validates inputs to ensure positive thresholds and a minimum time constant. The default values (1s TTFT, 100ms ITL, 30s time constant) are reasonable starting points.

249-401: Robust rate limiter implementation with good observability.

The implementation properly handles:

Thread-safe concurrent access with DashMap

Per-model vs global limiting based on configuration

Informative logging at 90% threshold for early warning

Safe unwrap usage in get_metrics due to prior existence check

409-1171: Exceptional test coverage ensuring correctness and robustness.

The test suite comprehensively covers:

Mathematical correctness with quantitative verification

Thread safety under concurrent access

Numerical stability with extreme values and long time series

Edge cases including single samples and rapid updates

Per-model vs global limiting behavior

The tests provide strong confidence in the implementation's correctness.

lib/llm/benches/rate_limiter.rs (9)

1-342: Excellent comprehensive benchmark suite!

The benchmarks provide thorough coverage of the rate limiter functionality, including:

Performance characteristics under different configurations

Concurrent access patterns with varying thread counts

Memory allocation and bounded growth behavior

Edge cases with extreme values

This will be valuable for performance regression testing and optimization.

1-12: Well-structured benchmark setup with comprehensive test parameters.

The imports are appropriate and the constants provide good coverage across different scales and scenarios for thorough performance testing.

15-35: Solid benchmark implementation for value recording performance.

The benchmark correctly measures sequential recording performance with proper throughput measurement and clean state initialization.

68-93: Well-designed benchmark for time constant impact analysis.

The benchmark effectively measures how different time constants affect performance with a clean and systematic approach.

96-136: Comprehensive rate limiter decision benchmarking.

The benchmark effectively measures all key rate limiter operations with realistic pre-populated data and proper performance isolation.

139-181: Excellent concurrent access benchmark with proper thread safety.

The benchmark effectively tests scalability across different thread counts with appropriate Arc usage and comprehensive throughput measurement.

184-226: Effective memory pattern benchmarking for performance under memory stress.

The benchmark appropriately tests performance characteristics under memory-intensive scenarios and per-model isolation patterns.

284-327: Comprehensive configuration comparison benchmark.

The benchmark effectively compares different rate limiter configurations with realistic usage patterns, providing valuable performance insights across different tuning profiles.

329-341: Proper Criterion framework setup.

The benchmark group and main function are correctly configured with all benchmark functions included.

lib/llm/tests/http-service.rs (2)

476-561: Well-designed mock engines for testing rate limiting scenarios.

The SlowTTFTEngine and SlowITLEngine implementations effectively simulate the specific performance characteristics needed to trigger rate limiting. The controlled delays allow for deterministic testing of the rate limiter behavior.

562-1095: Comprehensive test coverage for rate limiting functionality.

The test suite thoroughly covers:

Rate limiting trigger conditions with configurable thresholds

HTTP integration with 429 status codes

Per-model vs global rate limiting behavior

Recovery dynamics with EMA decay

Excellent use of async testing patterns and metrics verification.

lib/llm/src/http/service/metrics.rs (1)

13-14: Excellent integration of rate limiting metrics!

The implementation properly extends the existing metrics infrastructure with:

Clear separation between regular and rate limit specific metrics

Consistent naming conventions following the established pattern

Proper documentation updates

Clean integration with ResponseMetricCollector to record both standard and rate limiting metrics

The metrics will provide valuable observability into the rate limiter's behavior.

Also applies to: 23-25, 33-33, 41-42, 83-92, 97-98, 103-104, 131-148, 224-260, 264-328, 365-374, 404-424, 515-515, 521-524, 555-555, 576-576

coderabbitai · 2025-07-15T17:28:45Z

lib/llm/src/http/service/openai.rs

+    // Rate limit check
+    // TODO: handle streaming, currently just unary
+    should_reject_request(
+        &state,
+        &request.inner.model,
+        &Endpoint::Responses,
+        &RequestType::Unary,
+    )?;


⚠️ Potential issue

Address the TODO for streaming support in responses endpoint.

The rate limit check currently hardcodes RequestType::Unary, which means streaming responses won't be properly categorized in metrics. This should be fixed to properly detect streaming mode.

Would you like me to help implement proper streaming detection for the responses endpoint?

🤖 Prompt for AI Agents

In lib/llm/src/http/service/openai.rs around lines 487 to 494, the rate limit check uses a hardcoded RequestType::Unary, which does not correctly handle streaming responses. Modify the code to detect if the request is for streaming or unary dynamically, and pass the appropriate RequestType variant to should_reject_request. This will ensure streaming responses are properly categorized in metrics.

lib/llm/benches/rate_limiter.rs

lib/llm/tests/http-service.rs

lib/llm/benches/rate_limiter.rs

…r flagging rate limit settings, integrate rate limiter with dynamo launch logic

kthui · 2025-07-17T23:25:10Z

FYI, details on the cargo check failure:

cargo check --locked
    Updating crates.io index
    Updating git repository `https://github.com/EricLBuehler/mistral.rs.git`
    Updating git repository `https://github.com/EricLBuehler/candle.git`
    Updating git submodule `https://github.com/NVIDIA/cutlass.git`
    Updating git submodule `https://github.com/NVIDIA/cutlass.git`
    Updating git submodule `https://github.com/NVIDIA/cutlass`
    Updating git repository `https://github.com/guidance-ai/llguidance.git`
    Updating git repository `https://github.com/guoqingbao/bindgen_cuda.git`
error: the lock file /home/runner/work/dynamo/dynamo/lib/bindings/python/Cargo.lock needs to be updated but --locked was passed to prevent this
If you want to try to generate the lock file without accessing the network, remove the --locked flag and use --offline instead.
Error: Process completed with exit code 101.

I ran into similar issue in the past which I fixed by running cargo check at /dynamo/lib/bindings/python directory, and then committing the updated Cargo.lock file. Hope this help.

github-actions · 2025-08-21T09:35:45Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2025-08-27T09:34:44Z

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

jorgeantonio21 added 7 commits July 11, 2025 19:04

ideation around rate limiting logic

fa50060

rate limit logic for percentile real time metrics extraction

c7af633

add further rate limiter logic for weighted moving averages

2eed9b2

improve code structure along the repository

5885367

pybindings integration

46868ef

small refactors

5b56708

add integration tests

182c615

pull-request-size bot added the size/XXL label Jul 15, 2025

github-actions bot added external-contribution Pull request is from an external contributor feat labels Jul 15, 2025

merge main

547c39f

coderabbitai bot reviewed Jul 15, 2025

View reviewed changes

msharmavikram assigned tedzhouhk and grahamking Jul 15, 2025

merge main and resolve conflicts, address pr comments, add support fo…

4f471cd

…r flagging rate limit settings, integrate rate limiter with dynamo launch logic

merge main and resolve conflicts

b88d85f

jorgeantonio21 requested a review from a team as a code owner July 20, 2025 11:08

jorgeantonio21 added 6 commits July 20, 2025 12:20

clippy checks

64033f5

clippy checks

151e702

clippy checks

32848d6

Merge branch 'main' into feat/ja/rate-limit-v1

82387d7

update tests

bec479c

Merge branch 'main' into feat/ja/rate-limit-v1

4b7d832

github-actions bot added the Stale label Aug 21, 2025

github-actions bot closed this Aug 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add rate limiter logic to dynamo's openai api compatible http service (v1) #1949

feat: add rate limiter logic to dynamo's openai api compatible http service (v1) #1949

Uh oh!

jorgeantonio21 commented Jul 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

coderabbitai bot commented Jul 15, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kthui commented Jul 17, 2025

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!

feat: add rate limiter logic to dynamo's openai api compatible http service (v1) #1949

feat: add rate limiter logic to dynamo's openai api compatible http service (v1) #1949

Uh oh!

Conversation

jorgeantonio21 commented Jul 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

coderabbitai bot commented Jul 15, 2025

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kthui commented Jul 17, 2025

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!

jorgeantonio21 commented Jul 15, 2025 •

edited by coderabbitai bot

Loading