|
| 1 | +# S3 Files Provider |
| 2 | + |
| 3 | +A remote S3-based implementation of the Llama Stack Files API that provides scalable cloud file storage with metadata persistence. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **AWS S3 Storage**: Store files in AWS S3 buckets for scalable, durable storage |
| 8 | +- **Metadata Management**: Uses SQL database for efficient file metadata queries |
| 9 | +- **OpenAI API Compatibility**: Full compatibility with OpenAI Files API endpoints |
| 10 | +- **Flexible Authentication**: Support for IAM roles and access keys |
| 11 | +- **Custom S3 Endpoints**: Support for MinIO and other S3-compatible services |
| 12 | + |
| 13 | +## Configuration |
| 14 | + |
| 15 | +### Basic Configuration |
| 16 | + |
| 17 | +```yaml |
| 18 | +api: files |
| 19 | +provider_type: remote::s3 |
| 20 | +config: |
| 21 | + bucket_name: my-llama-stack-files |
| 22 | + region: us-east-1 |
| 23 | + metadata_store: |
| 24 | + type: sqlite |
| 25 | + db_path: ./s3_files_metadata.db |
| 26 | +``` |
| 27 | +
|
| 28 | +### Advanced Configuration |
| 29 | +
|
| 30 | +```yaml |
| 31 | +api: files |
| 32 | +provider_type: remote::s3 |
| 33 | +config: |
| 34 | + bucket_name: my-llama-stack-files |
| 35 | + region: us-east-1 |
| 36 | + aws_access_key_id: YOUR_ACCESS_KEY |
| 37 | + aws_secret_access_key: YOUR_SECRET_KEY |
| 38 | + endpoint_url: https://s3.amazonaws.com # Optional for custom endpoints |
| 39 | + metadata_store: |
| 40 | + type: sqlite |
| 41 | + db_path: ./s3_files_metadata.db |
| 42 | +``` |
| 43 | +
|
| 44 | +### Environment Variables |
| 45 | +
|
| 46 | +The configuration supports environment variable substitution: |
| 47 | +
|
| 48 | +```yaml |
| 49 | +config: |
| 50 | + bucket_name: "${env.S3_BUCKET_NAME}" |
| 51 | + region: "${env.AWS_REGION:=us-east-1}" |
| 52 | + aws_access_key_id: "${env.AWS_ACCESS_KEY_ID:=}" |
| 53 | + aws_secret_access_key: "${env.AWS_SECRET_ACCESS_KEY:=}" |
| 54 | + endpoint_url: "${env.S3_ENDPOINT_URL:=}" |
| 55 | +``` |
| 56 | +
|
| 57 | +Note: `S3_BUCKET_NAME` has no default value since S3 bucket names must be globally unique. |
| 58 | + |
| 59 | +## Authentication |
| 60 | + |
| 61 | +### IAM Roles (Recommended) |
| 62 | + |
| 63 | +For production deployments, use IAM roles: |
| 64 | + |
| 65 | +```yaml |
| 66 | +config: |
| 67 | + bucket_name: my-bucket |
| 68 | + region: us-east-1 |
| 69 | + # No credentials needed - will use IAM role |
| 70 | +``` |
| 71 | + |
| 72 | +### Access Keys |
| 73 | + |
| 74 | +For development or specific use cases: |
| 75 | + |
| 76 | +```yaml |
| 77 | +config: |
| 78 | + bucket_name: my-bucket |
| 79 | + region: us-east-1 |
| 80 | + aws_access_key_id: AKIAIOSFODNN7EXAMPLE |
| 81 | + aws_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY |
| 82 | +``` |
| 83 | + |
| 84 | +## S3 Bucket Setup |
| 85 | + |
| 86 | +### Required Permissions |
| 87 | + |
| 88 | +The S3 provider requires the following permissions: |
| 89 | + |
| 90 | +```json |
| 91 | +{ |
| 92 | + "Version": "2012-10-17", |
| 93 | + "Statement": [ |
| 94 | + { |
| 95 | + "Effect": "Allow", |
| 96 | + "Action": [ |
| 97 | + "s3:GetObject", |
| 98 | + "s3:PutObject", |
| 99 | + "s3:DeleteObject", |
| 100 | + "s3:ListBucket" |
| 101 | + ], |
| 102 | + "Resource": [ |
| 103 | + "arn:aws:s3:::your-bucket-name", |
| 104 | + "arn:aws:s3:::your-bucket-name/*" |
| 105 | + ] |
| 106 | + } |
| 107 | + ] |
| 108 | +} |
| 109 | +``` |
| 110 | + |
| 111 | +### Automatic Bucket Creation |
| 112 | + |
| 113 | +By default, the S3 provider expects the bucket to already exist. If you want the provider to automatically create the bucket when it doesn't exist, set `auto_create_bucket: true` in your configuration: |
| 114 | + |
| 115 | +```yaml |
| 116 | +config: |
| 117 | + bucket_name: my-bucket |
| 118 | + auto_create_bucket: true # Will create bucket if it doesn't exist |
| 119 | + region: us-east-1 |
| 120 | +``` |
| 121 | + |
| 122 | +**Note**: When `auto_create_bucket` is enabled, the provider will need additional permissions: |
| 123 | + |
| 124 | +```json |
| 125 | +{ |
| 126 | + "Version": "2012-10-17", |
| 127 | + "Statement": [ |
| 128 | + { |
| 129 | + "Effect": "Allow", |
| 130 | + "Action": [ |
| 131 | + "s3:GetObject", |
| 132 | + "s3:PutObject", |
| 133 | + "s3:DeleteObject", |
| 134 | + "s3:ListBucket", |
| 135 | + "s3:CreateBucket" |
| 136 | + ], |
| 137 | + "Resource": [ |
| 138 | + "arn:aws:s3:::your-bucket-name", |
| 139 | + "arn:aws:s3:::your-bucket-name/*" |
| 140 | + ] |
| 141 | + } |
| 142 | + ] |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +### Bucket Policy (Optional) |
| 147 | + |
| 148 | +For additional security, you can add a bucket policy: |
| 149 | + |
| 150 | +```json |
| 151 | +{ |
| 152 | + "Version": "2012-10-17", |
| 153 | + "Statement": [ |
| 154 | + { |
| 155 | + "Sid": "LlamaStackAccess", |
| 156 | + "Effect": "Allow", |
| 157 | + "Principal": { |
| 158 | + "AWS": "arn:aws:iam::YOUR-ACCOUNT:role/LlamaStackRole" |
| 159 | + }, |
| 160 | + "Action": [ |
| 161 | + "s3:GetObject", |
| 162 | + "s3:PutObject", |
| 163 | + "s3:DeleteObject" |
| 164 | + ], |
| 165 | + "Resource": "arn:aws:s3:::your-bucket-name/*" |
| 166 | + }, |
| 167 | + { |
| 168 | + "Sid": "LlamaStackBucketAccess", |
| 169 | + "Effect": "Allow", |
| 170 | + "Principal": { |
| 171 | + "AWS": "arn:aws:iam::YOUR-ACCOUNT:role/LlamaStackRole" |
| 172 | + }, |
| 173 | + "Action": [ |
| 174 | + "s3:ListBucket" |
| 175 | + ], |
| 176 | + "Resource": "arn:aws:s3:::your-bucket-name" |
| 177 | + } |
| 178 | + ] |
| 179 | +} |
| 180 | +``` |
| 181 | + |
| 182 | +## Features |
| 183 | + |
| 184 | +### Metadata Persistence |
| 185 | + |
| 186 | +File metadata is stored in a SQL database for fast queries and OpenAI API compatibility. The metadata includes: |
| 187 | + |
| 188 | +- File ID |
| 189 | +- Original filename |
| 190 | +- Purpose (assistants, batch, etc.) |
| 191 | +- File size in bytes |
| 192 | +- Created and expiration timestamps |
| 193 | + |
| 194 | +### TTL and Cleanup |
| 195 | + |
| 196 | +Files currently have a fixed long expiration time (100 years). |
| 197 | + |
| 198 | +## Development and Testing |
| 199 | + |
| 200 | +### Using MinIO |
| 201 | + |
| 202 | +For self-hosted S3-compatible storage: |
| 203 | + |
| 204 | +```yaml |
| 205 | +config: |
| 206 | + bucket_name: test-bucket |
| 207 | + region: us-east-1 |
| 208 | + endpoint_url: http://localhost:9000 |
| 209 | + aws_access_key_id: minioadmin |
| 210 | + aws_secret_access_key: minioadmin |
| 211 | +``` |
| 212 | + |
| 213 | +## Monitoring and Logging |
| 214 | + |
| 215 | +The provider logs important operations and errors. For production deployments, consider: |
| 216 | + |
| 217 | +- CloudWatch monitoring for S3 operations |
| 218 | +- Custom metrics for file upload/download rates |
| 219 | +- Error rate monitoring |
| 220 | +- Performance metrics tracking |
| 221 | + |
| 222 | +## Error Handling |
| 223 | + |
| 224 | +The provider handles various error scenarios: |
| 225 | + |
| 226 | +- S3 connectivity issues |
| 227 | +- Bucket access permissions |
| 228 | +- File not found errors |
| 229 | +- Metadata consistency checks |
| 230 | + |
| 231 | +## Known Limitations |
| 232 | + |
| 233 | +- Fixed long TTL (100 years) instead of configurable expiration |
| 234 | +- No server-side encryption enabled by default |
| 235 | +- No support for AWS session tokens |
| 236 | +- No S3 key prefix organization support |
| 237 | +- No multipart upload support (all files uploaded as single objects) |
0 commit comments