llamastack
diff --git a/‎docs/_static/llama-stack-spec.html‎
Lines changed: 4 additions & 2 deletions b/‎docs/_static/llama-stack-spec.html‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/_static/llama-stack-spec.yaml‎
Lines changed: 2 additions & 0 deletions b/‎docs/_static/llama-stack-spec.yaml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/concepts/apis.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/concepts/apis.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/providers/agents/index.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/source/providers/agents/index.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/source/providers/batches/index.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/source/providers/batches/index.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/source/providers/batches/inline_reference.md‎
Lines changed: 23 additions & 0 deletions b/‎docs/source/providers/batches/inline_reference.md‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎docs/source/providers/eval/index.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/providers/eval/index.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/providers/inference/index.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/source/providers/inference/index.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎llama_stack/apis/batches/__init__.py‎
Lines changed: 9 additions & 0 deletions b/‎llama_stack/apis/batches/__init__.py‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎llama_stack/apis/batches/batches.py‎
Lines changed: 89 additions & 0 deletions b/‎llama_stack/apis/batches/batches.py‎
Lines changed: 89 additions & 0 deletions
@@ -14767,7 +14767,8 @@
             "OpenAIFilePurpose": {
                 "type": "string",
                 "enum": [
-                    "assistants"
+                    "assistants",
+                    "batch"
                 ],
                 "title": "OpenAIFilePurpose",
                 "description": "Valid purpose values for OpenAI Files API."
@@ -14844,7 +14845,8 @@
                     "purpose": {
                         "type": "string",
                         "enum": [
-                            "assistants"
+                            "assistants",
+                            "batch"
                         ],
                         "description": "The intended purpose of the file"
                     }
 
@@ -10951,6 +10951,7 @@ components:
       type: string
       enum:
         - assistants
+        - batch
       title: OpenAIFilePurpose
       description: >-
         Valid purpose values for OpenAI Files API.
@@ -11019,6 +11020,7 @@ components:
           type: string
           enum:
             - assistants
+            - batch
           description: The intended purpose of the file
       additionalProperties: false
       required:
 
@@ -18,3 +18,4 @@ We are working on adding a few more APIs to complete the application lifecycle.
 - **Batch Inference**: run inference on a dataset of inputs
 - **Batch Agents**: run agents on a dataset of inputs
 - **Synthetic Data Generation**: generate synthetic data for model development
+- **Batches**: OpenAI-compatible batch management for inference
@@ -2,6 +2,15 @@
 
 ## Overview
 
+Agents API for creating and interacting with agentic systems.
+
+    Main functionalities provided by this API:
+    - Create agents with specific instructions and ability to use tools.
+    - Interactions with agents are grouped into sessions ("threads"), and each interaction is called a "turn".
+    - Agents can be provided with various tools (see the ToolGroups and ToolRuntime APIs for more details).
+    - Agents can be provided with various shields (see the Safety API for more details).
+    - Agents can also use Memory to retrieve information from knowledge bases. See the RAG Tool and Vector IO APIs for more details.
+
 This section contains documentation for all available providers for the **agents** API.
 
 ## Providers
 
@@ -0,0 +1,21 @@
+# Batches
+
+## Overview
+
+Protocol for batch processing API operations.
+
+    The Batches API enables efficient processing of multiple requests in a single operation,
+    particularly useful for processing large datasets, batch evaluation workflows, and
+    cost-effective inference at scale.
+
+    Note: This API is currently under active development and may undergo changes.
+
+This section contains documentation for all available providers for the **batches** API.
+
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_reference
+```
@@ -0,0 +1,23 @@
+# inline::reference
+
+## Description
+
+Reference implementation of batches API with KVStore persistence.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Configuration for the key-value store backend. |
+| `max_concurrent_batches` | `<class 'int'>` | No | 1 | Maximum number of concurrent batches to process simultaneously. |
+| `max_concurrent_requests_per_batch` | `<class 'int'>` | No | 10 | Maximum number of concurrent requests to process per batch. |
+
+## Sample Configuration
+
+```yaml
+kvstore:
+  type: sqlite
+  db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/batches.db
+
+```
+
@@ -2,6 +2,8 @@
 
 ## Overview
 
+Llama Stack Evaluation API for running evaluations on model and agent candidates.
+
 This section contains documentation for all available providers for the **eval** API.
 
 ## Providers
 
@@ -2,6 +2,12 @@
 
 ## Overview
 
+Llama Stack Inference API for generating completions, chat completions, and embeddings.
+
+    This API provides the raw interface to the underlying models. Two kinds of models are supported:
+    - LLM models: these models generate "raw" and "chat" (conversational) completions.
+    - Embedding models: these models generate embeddings to be used for semantic search.
+
 This section contains documentation for all available providers for the **inference** API.
 
 ## Providers
 
@@ -0,0 +1,9 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from .batches import Batches, BatchObject, ListBatchesResponse
+
+__all__ = ["Batches", "BatchObject", "ListBatchesResponse"]
@@ -0,0 +1,89 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from typing import Literal, Protocol, runtime_checkable
+
+from pydantic import BaseModel, Field
+
+from llama_stack.schema_utils import json_schema_type, webmethod
+
+try:
+    from openai.types import Batch as BatchObject
+except ImportError as e:
+    raise ImportError("OpenAI package is required for batches API. Please install it with: pip install openai") from e
+
+
+@json_schema_type
+class ListBatchesResponse(BaseModel):
+    """Response containing a list of batch objects."""
+
+    object: Literal["list"] = "list"
+    data: list[BatchObject] = Field(..., description="List of batch objects")
+    first_id: str | None = Field(default=None, description="ID of the first batch in the list")
+    last_id: str | None = Field(default=None, description="ID of the last batch in the list")
+    has_more: bool = Field(default=False, description="Whether there are more batches available")
+
+
+@runtime_checkable
+class Batches(Protocol):
+    """Protocol for batch processing API operations.
+
+    The Batches API enables efficient processing of multiple requests in a single operation,
+    particularly useful for processing large datasets, batch evaluation workflows, and
+    cost-effective inference at scale.
+
+    Note: This API is currently under active development and may undergo changes.
+    """
+
+    @webmethod(route="/openai/v1/batches", method="POST")
+    async def create_batch(
+        self,
+        input_file_id: str,
+        endpoint: str,
+        completion_window: Literal["24h"],
+        metadata: dict[str, str] | None = None,
+    ) -> BatchObject:
+        """Create a new batch for processing multiple API requests.
+
+        :param input_file_id: The ID of an uploaded file containing requests for the batch.
+        :param endpoint: The endpoint to be used for all requests in the batch.
+        :param completion_window: The time window within which the batch should be processed.
+        :param metadata: Optional metadata for the batch.
+        :returns: The created batch object.
+        """
+        ...
+
+    @webmethod(route="/openai/v1/batches/{batch_id}", method="GET")
+    async def retrieve_batch(self, batch_id: str) -> BatchObject:
+        """Retrieve information about a specific batch.
+
+        :param batch_id: The ID of the batch to retrieve.
+        :returns: The batch object.
+        """
+        ...
+
+    @webmethod(route="/openai/v1/batches/{batch_id}/cancel", method="POST")
+    async def cancel_batch(self, batch_id: str) -> BatchObject:
+        """Cancel a batch that is in progress.
+
+        :param batch_id: The ID of the batch to cancel.
+        :returns: The updated batch object.
+        """
+        ...
+
+    @webmethod(route="/openai/v1/batches", method="GET")
+    async def list_batches(
+        self,
+        after: str | None = None,
+        limit: int = 20,
+    ) -> ListBatchesResponse:
+        """List all batches for the current user.
+
+        :param after: A cursor for pagination; returns batches after this batch ID.
+        :param limit: Number of batches to return (default 20, max 100).
+        :returns: A list of batch objects.
+        """
+        ...