Skip to content

Commit f520e24

Browse files
authored
feat: Add S3 Files Provider (#3202)
Implements a complete S3-based file storage provider for Llama Stack with: Core Implementation: - S3FilesImpl class with full OpenAI Files API compatibility - Support for file upload, download, listing, deletion operations - Sqlite-based metadata storage for fast queries and API compliance - Configurable S3 endpoints (AWS, MinIO, LocalStack support) Key Features: - Automatic S3 bucket creation and management - Metadata persistence - Proper error handling for S3 connectivity and permissions Dependencies: - Adds boto3 for AWS S3 integration - Adds moto[s3] for testing infrastructure Testing: Unit: `./scripts/unit-tests.sh tests/unit/files tests/unit/providers/files` Integration: Start MinIO: `podman run --rm -it -p 9000:9000 minio/minio server /data` Start stack w/ S3 provider: `S3_ENDPOINT_URL=http://localhost:9000 AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin S3_BUCKET_NAME=llama-stack-files uv run llama stack build --image-type venv --providers files=remote::s3 --run` Run integration tests: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files`
1 parent c5e2e26 commit f520e24

File tree

11 files changed

+982
-2
lines changed

11 files changed

+982
-2
lines changed

docs/source/providers/files/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ This section contains documentation for all available providers for the **files*
1010
:maxdepth: 1
1111
1212
inline_localfs
13+
remote_s3
1314
```
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# remote::s3
2+
3+
## Description
4+
5+
AWS S3-based file storage provider for scalable cloud file management with metadata persistence.
6+
7+
## Configuration
8+
9+
| Field | Type | Required | Default | Description |
10+
|-------|------|----------|---------|-------------|
11+
| `bucket_name` | `<class 'str'>` | No | | S3 bucket name to store files |
12+
| `region` | `<class 'str'>` | No | us-east-1 | AWS region where the bucket is located |
13+
| `aws_access_key_id` | `str \| None` | No | | AWS access key ID (optional if using IAM roles) |
14+
| `aws_secret_access_key` | `str \| None` | No | | AWS secret access key (optional if using IAM roles) |
15+
| `endpoint_url` | `str \| None` | No | | Custom S3 endpoint URL (for MinIO, LocalStack, etc.) |
16+
| `auto_create_bucket` | `<class 'bool'>` | No | False | Automatically create the S3 bucket if it doesn't exist |
17+
| `metadata_store` | `utils.sqlstore.sqlstore.SqliteSqlStoreConfig \| utils.sqlstore.sqlstore.PostgresSqlStoreConfig` | No | sqlite | SQL store configuration for file metadata |
18+
19+
## Sample Configuration
20+
21+
```yaml
22+
bucket_name: ${env.S3_BUCKET_NAME}
23+
region: ${env.AWS_REGION:=us-east-1}
24+
aws_access_key_id: ${env.AWS_ACCESS_KEY_ID:=}
25+
aws_secret_access_key: ${env.AWS_SECRET_ACCESS_KEY:=}
26+
endpoint_url: ${env.S3_ENDPOINT_URL:=}
27+
auto_create_bucket: ${env.S3_AUTO_CREATE_BUCKET:=false}
28+
metadata_store:
29+
type: sqlite
30+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/s3_files_metadata.db
31+
32+
```
33+

llama_stack/providers/registry/files.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@
55
# the root directory of this source tree.
66

77
from llama_stack.providers.datatypes import (
8+
AdapterSpec,
89
Api,
910
InlineProviderSpec,
1011
ProviderSpec,
12+
remote_provider_spec,
1113
)
1214
from llama_stack.providers.utils.sqlstore.sqlstore import sql_store_pip_packages
1315

@@ -23,4 +25,14 @@ def available_providers() -> list[ProviderSpec]:
2325
config_class="llama_stack.providers.inline.files.localfs.config.LocalfsFilesImplConfig",
2426
description="Local filesystem-based file storage provider for managing files and documents locally.",
2527
),
28+
remote_provider_spec(
29+
api=Api.files,
30+
adapter=AdapterSpec(
31+
adapter_type="s3",
32+
pip_packages=["boto3"] + sql_store_pip_packages,
33+
module="llama_stack.providers.remote.files.s3",
34+
config_class="llama_stack.providers.remote.files.s3.config.S3FilesImplConfig",
35+
description="AWS S3-based file storage provider for scalable cloud file management with metadata persistence.",
36+
),
37+
),
2638
]
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# S3 Files Provider
2+
3+
A remote S3-based implementation of the Llama Stack Files API that provides scalable cloud file storage with metadata persistence.
4+
5+
## Features
6+
7+
- **AWS S3 Storage**: Store files in AWS S3 buckets for scalable, durable storage
8+
- **Metadata Management**: Uses SQL database for efficient file metadata queries
9+
- **OpenAI API Compatibility**: Full compatibility with OpenAI Files API endpoints
10+
- **Flexible Authentication**: Support for IAM roles and access keys
11+
- **Custom S3 Endpoints**: Support for MinIO and other S3-compatible services
12+
13+
## Configuration
14+
15+
### Basic Configuration
16+
17+
```yaml
18+
api: files
19+
provider_type: remote::s3
20+
config:
21+
bucket_name: my-llama-stack-files
22+
region: us-east-1
23+
metadata_store:
24+
type: sqlite
25+
db_path: ./s3_files_metadata.db
26+
```
27+
28+
### Advanced Configuration
29+
30+
```yaml
31+
api: files
32+
provider_type: remote::s3
33+
config:
34+
bucket_name: my-llama-stack-files
35+
region: us-east-1
36+
aws_access_key_id: YOUR_ACCESS_KEY
37+
aws_secret_access_key: YOUR_SECRET_KEY
38+
endpoint_url: https://s3.amazonaws.com # Optional for custom endpoints
39+
metadata_store:
40+
type: sqlite
41+
db_path: ./s3_files_metadata.db
42+
```
43+
44+
### Environment Variables
45+
46+
The configuration supports environment variable substitution:
47+
48+
```yaml
49+
config:
50+
bucket_name: "${env.S3_BUCKET_NAME}"
51+
region: "${env.AWS_REGION:=us-east-1}"
52+
aws_access_key_id: "${env.AWS_ACCESS_KEY_ID:=}"
53+
aws_secret_access_key: "${env.AWS_SECRET_ACCESS_KEY:=}"
54+
endpoint_url: "${env.S3_ENDPOINT_URL:=}"
55+
```
56+
57+
Note: `S3_BUCKET_NAME` has no default value since S3 bucket names must be globally unique.
58+
59+
## Authentication
60+
61+
### IAM Roles (Recommended)
62+
63+
For production deployments, use IAM roles:
64+
65+
```yaml
66+
config:
67+
bucket_name: my-bucket
68+
region: us-east-1
69+
# No credentials needed - will use IAM role
70+
```
71+
72+
### Access Keys
73+
74+
For development or specific use cases:
75+
76+
```yaml
77+
config:
78+
bucket_name: my-bucket
79+
region: us-east-1
80+
aws_access_key_id: AKIAIOSFODNN7EXAMPLE
81+
aws_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
82+
```
83+
84+
## S3 Bucket Setup
85+
86+
### Required Permissions
87+
88+
The S3 provider requires the following permissions:
89+
90+
```json
91+
{
92+
"Version": "2012-10-17",
93+
"Statement": [
94+
{
95+
"Effect": "Allow",
96+
"Action": [
97+
"s3:GetObject",
98+
"s3:PutObject",
99+
"s3:DeleteObject",
100+
"s3:ListBucket"
101+
],
102+
"Resource": [
103+
"arn:aws:s3:::your-bucket-name",
104+
"arn:aws:s3:::your-bucket-name/*"
105+
]
106+
}
107+
]
108+
}
109+
```
110+
111+
### Automatic Bucket Creation
112+
113+
By default, the S3 provider expects the bucket to already exist. If you want the provider to automatically create the bucket when it doesn't exist, set `auto_create_bucket: true` in your configuration:
114+
115+
```yaml
116+
config:
117+
bucket_name: my-bucket
118+
auto_create_bucket: true # Will create bucket if it doesn't exist
119+
region: us-east-1
120+
```
121+
122+
**Note**: When `auto_create_bucket` is enabled, the provider will need additional permissions:
123+
124+
```json
125+
{
126+
"Version": "2012-10-17",
127+
"Statement": [
128+
{
129+
"Effect": "Allow",
130+
"Action": [
131+
"s3:GetObject",
132+
"s3:PutObject",
133+
"s3:DeleteObject",
134+
"s3:ListBucket",
135+
"s3:CreateBucket"
136+
],
137+
"Resource": [
138+
"arn:aws:s3:::your-bucket-name",
139+
"arn:aws:s3:::your-bucket-name/*"
140+
]
141+
}
142+
]
143+
}
144+
```
145+
146+
### Bucket Policy (Optional)
147+
148+
For additional security, you can add a bucket policy:
149+
150+
```json
151+
{
152+
"Version": "2012-10-17",
153+
"Statement": [
154+
{
155+
"Sid": "LlamaStackAccess",
156+
"Effect": "Allow",
157+
"Principal": {
158+
"AWS": "arn:aws:iam::YOUR-ACCOUNT:role/LlamaStackRole"
159+
},
160+
"Action": [
161+
"s3:GetObject",
162+
"s3:PutObject",
163+
"s3:DeleteObject"
164+
],
165+
"Resource": "arn:aws:s3:::your-bucket-name/*"
166+
},
167+
{
168+
"Sid": "LlamaStackBucketAccess",
169+
"Effect": "Allow",
170+
"Principal": {
171+
"AWS": "arn:aws:iam::YOUR-ACCOUNT:role/LlamaStackRole"
172+
},
173+
"Action": [
174+
"s3:ListBucket"
175+
],
176+
"Resource": "arn:aws:s3:::your-bucket-name"
177+
}
178+
]
179+
}
180+
```
181+
182+
## Features
183+
184+
### Metadata Persistence
185+
186+
File metadata is stored in a SQL database for fast queries and OpenAI API compatibility. The metadata includes:
187+
188+
- File ID
189+
- Original filename
190+
- Purpose (assistants, batch, etc.)
191+
- File size in bytes
192+
- Created and expiration timestamps
193+
194+
### TTL and Cleanup
195+
196+
Files currently have a fixed long expiration time (100 years).
197+
198+
## Development and Testing
199+
200+
### Using MinIO
201+
202+
For self-hosted S3-compatible storage:
203+
204+
```yaml
205+
config:
206+
bucket_name: test-bucket
207+
region: us-east-1
208+
endpoint_url: http://localhost:9000
209+
aws_access_key_id: minioadmin
210+
aws_secret_access_key: minioadmin
211+
```
212+
213+
## Monitoring and Logging
214+
215+
The provider logs important operations and errors. For production deployments, consider:
216+
217+
- CloudWatch monitoring for S3 operations
218+
- Custom metrics for file upload/download rates
219+
- Error rate monitoring
220+
- Performance metrics tracking
221+
222+
## Error Handling
223+
224+
The provider handles various error scenarios:
225+
226+
- S3 connectivity issues
227+
- Bucket access permissions
228+
- File not found errors
229+
- Metadata consistency checks
230+
231+
## Known Limitations
232+
233+
- Fixed long TTL (100 years) instead of configurable expiration
234+
- No server-side encryption enabled by default
235+
- No support for AWS session tokens
236+
- No S3 key prefix organization support
237+
- No multipart upload support (all files uploaded as single objects)
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the terms described in the LICENSE file in
5+
# the root directory of this source tree.
6+
7+
from typing import Any
8+
9+
from llama_stack.core.datatypes import Api
10+
11+
from .config import S3FilesImplConfig
12+
13+
14+
async def get_adapter_impl(config: S3FilesImplConfig, deps: dict[Api, Any]):
15+
from .files import S3FilesImpl
16+
17+
# TODO: authorization policies and user separation
18+
impl = S3FilesImpl(config)
19+
await impl.initialize()
20+
return impl
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the terms described in the LICENSE file in
5+
# the root directory of this source tree.
6+
7+
from typing import Any
8+
9+
from pydantic import BaseModel, Field
10+
11+
from llama_stack.providers.utils.sqlstore.sqlstore import SqliteSqlStoreConfig, SqlStoreConfig
12+
13+
14+
class S3FilesImplConfig(BaseModel):
15+
"""Configuration for S3-based files provider."""
16+
17+
bucket_name: str = Field(description="S3 bucket name to store files")
18+
region: str = Field(default="us-east-1", description="AWS region where the bucket is located")
19+
aws_access_key_id: str | None = Field(default=None, description="AWS access key ID (optional if using IAM roles)")
20+
aws_secret_access_key: str | None = Field(
21+
default=None, description="AWS secret access key (optional if using IAM roles)"
22+
)
23+
endpoint_url: str | None = Field(default=None, description="Custom S3 endpoint URL (for MinIO, LocalStack, etc.)")
24+
auto_create_bucket: bool = Field(
25+
default=False, description="Automatically create the S3 bucket if it doesn't exist"
26+
)
27+
metadata_store: SqlStoreConfig = Field(description="SQL store configuration for file metadata")
28+
29+
@classmethod
30+
def sample_run_config(cls, __distro_dir__: str) -> dict[str, Any]:
31+
return {
32+
"bucket_name": "${env.S3_BUCKET_NAME}", # no default, buckets must be globally unique
33+
"region": "${env.AWS_REGION:=us-east-1}",
34+
"aws_access_key_id": "${env.AWS_ACCESS_KEY_ID:=}",
35+
"aws_secret_access_key": "${env.AWS_SECRET_ACCESS_KEY:=}",
36+
"endpoint_url": "${env.S3_ENDPOINT_URL:=}",
37+
"auto_create_bucket": "${env.S3_AUTO_CREATE_BUCKET:=false}",
38+
"metadata_store": SqliteSqlStoreConfig.sample_run_config(
39+
__distro_dir__=__distro_dir__,
40+
db_name="s3_files_metadata.db",
41+
),
42+
}

0 commit comments

Comments
 (0)