Skip to content

Commit de69216

Browse files
authored
feat: add batches API with OpenAI compatibility (#3088)
Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066
1 parent 46ff302 commit de69216

File tree

26 files changed

+2707
-2
lines changed

26 files changed

+2707
-2
lines changed

docs/_static/llama-stack-spec.html

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14767,7 +14767,8 @@
1476714767
"OpenAIFilePurpose": {
1476814768
"type": "string",
1476914769
"enum": [
14770-
"assistants"
14770+
"assistants",
14771+
"batch"
1477114772
],
1477214773
"title": "OpenAIFilePurpose",
1477314774
"description": "Valid purpose values for OpenAI Files API."
@@ -14844,7 +14845,8 @@
1484414845
"purpose": {
1484514846
"type": "string",
1484614847
"enum": [
14847-
"assistants"
14848+
"assistants",
14849+
"batch"
1484814850
],
1484914851
"description": "The intended purpose of the file"
1485014852
}

docs/_static/llama-stack-spec.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10951,6 +10951,7 @@ components:
1095110951
type: string
1095210952
enum:
1095310953
- assistants
10954+
- batch
1095410955
title: OpenAIFilePurpose
1095510956
description: >-
1095610957
Valid purpose values for OpenAI Files API.
@@ -11019,6 +11020,7 @@ components:
1101911020
type: string
1102011021
enum:
1102111022
- assistants
11023+
- batch
1102211024
description: The intended purpose of the file
1102311025
additionalProperties: false
1102411026
required:

docs/source/concepts/apis.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ We are working on adding a few more APIs to complete the application lifecycle.
1818
- **Batch Inference**: run inference on a dataset of inputs
1919
- **Batch Agents**: run agents on a dataset of inputs
2020
- **Synthetic Data Generation**: generate synthetic data for model development
21+
- **Batches**: OpenAI-compatible batch management for inference

docs/source/providers/agents/index.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22

33
## Overview
44

5+
Agents API for creating and interacting with agentic systems.
6+
7+
Main functionalities provided by this API:
8+
- Create agents with specific instructions and ability to use tools.
9+
- Interactions with agents are grouped into sessions ("threads"), and each interaction is called a "turn".
10+
- Agents can be provided with various tools (see the ToolGroups and ToolRuntime APIs for more details).
11+
- Agents can be provided with various shields (see the Safety API for more details).
12+
- Agents can also use Memory to retrieve information from knowledge bases. See the RAG Tool and Vector IO APIs for more details.
13+
514
This section contains documentation for all available providers for the **agents** API.
615

716
## Providers
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Batches
2+
3+
## Overview
4+
5+
Protocol for batch processing API operations.
6+
7+
The Batches API enables efficient processing of multiple requests in a single operation,
8+
particularly useful for processing large datasets, batch evaluation workflows, and
9+
cost-effective inference at scale.
10+
11+
Note: This API is currently under active development and may undergo changes.
12+
13+
This section contains documentation for all available providers for the **batches** API.
14+
15+
## Providers
16+
17+
```{toctree}
18+
:maxdepth: 1
19+
20+
inline_reference
21+
```
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# inline::reference
2+
3+
## Description
4+
5+
Reference implementation of batches API with KVStore persistence.
6+
7+
## Configuration
8+
9+
| Field | Type | Required | Default | Description |
10+
|-------|------|----------|---------|-------------|
11+
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Configuration for the key-value store backend. |
12+
| `max_concurrent_batches` | `<class 'int'>` | No | 1 | Maximum number of concurrent batches to process simultaneously. |
13+
| `max_concurrent_requests_per_batch` | `<class 'int'>` | No | 10 | Maximum number of concurrent requests to process per batch. |
14+
15+
## Sample Configuration
16+
17+
```yaml
18+
kvstore:
19+
type: sqlite
20+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/batches.db
21+
22+
```
23+

docs/source/providers/eval/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
## Overview
44

5+
Llama Stack Evaluation API for running evaluations on model and agent candidates.
6+
57
This section contains documentation for all available providers for the **eval** API.
68

79
## Providers

docs/source/providers/inference/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
## Overview
44

5+
Llama Stack Inference API for generating completions, chat completions, and embeddings.
6+
7+
This API provides the raw interface to the underlying models. Two kinds of models are supported:
8+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
9+
- Embedding models: these models generate embeddings to be used for semantic search.
10+
511
This section contains documentation for all available providers for the **inference** API.
612

713
## Providers
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the terms described in the LICENSE file in
5+
# the root directory of this source tree.
6+
7+
from .batches import Batches, BatchObject, ListBatchesResponse
8+
9+
__all__ = ["Batches", "BatchObject", "ListBatchesResponse"]
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the terms described in the LICENSE file in
5+
# the root directory of this source tree.
6+
7+
from typing import Literal, Protocol, runtime_checkable
8+
9+
from pydantic import BaseModel, Field
10+
11+
from llama_stack.schema_utils import json_schema_type, webmethod
12+
13+
try:
14+
from openai.types import Batch as BatchObject
15+
except ImportError as e:
16+
raise ImportError("OpenAI package is required for batches API. Please install it with: pip install openai") from e
17+
18+
19+
@json_schema_type
20+
class ListBatchesResponse(BaseModel):
21+
"""Response containing a list of batch objects."""
22+
23+
object: Literal["list"] = "list"
24+
data: list[BatchObject] = Field(..., description="List of batch objects")
25+
first_id: str | None = Field(default=None, description="ID of the first batch in the list")
26+
last_id: str | None = Field(default=None, description="ID of the last batch in the list")
27+
has_more: bool = Field(default=False, description="Whether there are more batches available")
28+
29+
30+
@runtime_checkable
31+
class Batches(Protocol):
32+
"""Protocol for batch processing API operations.
33+
34+
The Batches API enables efficient processing of multiple requests in a single operation,
35+
particularly useful for processing large datasets, batch evaluation workflows, and
36+
cost-effective inference at scale.
37+
38+
Note: This API is currently under active development and may undergo changes.
39+
"""
40+
41+
@webmethod(route="/openai/v1/batches", method="POST")
42+
async def create_batch(
43+
self,
44+
input_file_id: str,
45+
endpoint: str,
46+
completion_window: Literal["24h"],
47+
metadata: dict[str, str] | None = None,
48+
) -> BatchObject:
49+
"""Create a new batch for processing multiple API requests.
50+
51+
:param input_file_id: The ID of an uploaded file containing requests for the batch.
52+
:param endpoint: The endpoint to be used for all requests in the batch.
53+
:param completion_window: The time window within which the batch should be processed.
54+
:param metadata: Optional metadata for the batch.
55+
:returns: The created batch object.
56+
"""
57+
...
58+
59+
@webmethod(route="/openai/v1/batches/{batch_id}", method="GET")
60+
async def retrieve_batch(self, batch_id: str) -> BatchObject:
61+
"""Retrieve information about a specific batch.
62+
63+
:param batch_id: The ID of the batch to retrieve.
64+
:returns: The batch object.
65+
"""
66+
...
67+
68+
@webmethod(route="/openai/v1/batches/{batch_id}/cancel", method="POST")
69+
async def cancel_batch(self, batch_id: str) -> BatchObject:
70+
"""Cancel a batch that is in progress.
71+
72+
:param batch_id: The ID of the batch to cancel.
73+
:returns: The updated batch object.
74+
"""
75+
...
76+
77+
@webmethod(route="/openai/v1/batches", method="GET")
78+
async def list_batches(
79+
self,
80+
after: str | None = None,
81+
limit: int = 20,
82+
) -> ListBatchesResponse:
83+
"""List all batches for the current user.
84+
85+
:param after: A cursor for pagination; returns batches after this batch ID.
86+
:param limit: Number of batches to return (default 20, max 100).
87+
:returns: A list of batch objects.
88+
"""
89+
...

0 commit comments

Comments
 (0)