An API service that provides RESTful access to CSV or tabular data converted by Hydra. This service provides a REST API to access PostgreSQL database tables containing CSV data, offering HTTP querying capabilities, pagination, and data streaming for CSV or tabular resources.
This service is mainly used, developed and maintained by data.gouv.fr - the France Open Data platform.
The production API is deployed on data.gouv.fr infrastructure at https://tabular-api.data.gouv.fr/api. See the product documentation (in French) for usage details and the technical documentation for API reference.
- Python >= 3.11, < 3.14
- uv for dependency management
- Docker & Docker Compose
-
Start the Infrastructure
Start this project via
docker compose:docker compose up
This starts PostgREST container and PostgreSQL container with fake test data. You can access the raw PostgREST API on http://localhost:8080.
-
Launch the main API proxy
Install dependencies and start the proxy services:
uv sync uv run adev runserver -p8005 api_tabular/app.py # Api related to apified CSV files by udata-hydra uv run adev runserver -p8006 api_tabular/metrics.py # Api related to udata's metrics
The main API provides a controlled layer over PostgREST - exposing PostgREST directly would be too permissive, so this adds a security and access control layer.
-
Test the API
Query the API using a
resource_id. Several test resources are available in the fake database:aaaaaaaa-1111-bbbb-2222-cccccccccccc- Main test resource with 1000 rowsaaaaaaaa-5555-bbbb-6666-cccccccccccc- Resource with database indexesdddddddd-7777-eeee-8888-ffffffffffff- Resource allowed for aggregationaaaaaaaa-9999-bbbb-1010-cccccccccccc- Resource with indexes and aggregation allowed
To use the API with a real database served by Hydra instead of the fake test database:
-
Configure the PostgREST endpoint to point to your Hydra database:
export PGREST_ENDPOINT="http://your-hydra-postgrest:8080"
Or create a
config.tomlfile:PGREST_ENDPOINT = "http://your-hydra-postgrest:8080"
-
Start only the API services (skip the fake database):
uv sync uv run adev runserver -p8005 api_tabular/app.py uv run adev runserver -p8006 api_tabular/metrics.py
-
Use real resource IDs from your Hydra database instead of the test IDs.
Note: Make sure your Hydra PostgREST instance is accessible and the database schema matches the expected structure (tables in the csvapi schema).
GET /api/resources/{resource_id}/Returns basic information about the resource including creation date, URL, and available endpoints.
Example:
curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/Response:
{
"created_at": "2023-04-21T22:54:22.043492+00:00",
"url": "https://data.gouv.fr/datasets/example/resources/fake.csv",
"links": [
{
"href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
"type": "GET",
"rel": "profile"
},
{
"href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/",
"type": "GET",
"rel": "data"
},
{
"href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
"type": "GET",
"rel": "swagger"
}
]
}GET /api/resources/{resource_id}/profile/Returns the CSV profile information (column types, headers, etc.) generated by csv-detective.
Example:
curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/Response:
{
"profile": {
"header": [
"id",
"score",
"decompte",
"is_true",
"birth",
"liste"
]
},
"...": "..."
}GET /api/resources/{resource_id}/data/Returns the actual data with support for filtering, sorting, and pagination.
Example:
curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/Response:
{
"data": [
{
"__id": 1,
"id": " 8c7a6452-9295-4db2-b692-34104574fded",
"score": 0.708,
"decompte": 90,
"is_true": false,
"birth": "1949-07-16",
"liste": "[0]"
},
...
],
"links": {
"profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
"swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
"next": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=20",
"prev": null
},
"meta": {
"page": 1,
"page_size": 20,
"total": 1000
}
}GET /api/resources/{resource_id}/data/csv/Streams the data directly as a CSV file for download.
GET /api/resources/{resource_id}/data/json/Streams the data directly as a JSON file for download.
GET /api/resources/{resource_id}/swagger/Returns OpenAPI/Swagger documentation specific to this resource.
The data endpoint can be queried with the following operators as query string (replacing column_name with the name of an actual column), if the column type allows it (see the swagger for each column's allowed parameters):
# exact value
column_name__exact=value
# differs
column_name__differs=value
# contains
column_name__contains=value
# notcontains (value does not contain)
column_name__notcontains=value
# in (value in list)
column_name__in=value1,value2,value3
# notin (value not in list)
column_name__notin=value1,value2,value3
# less
column_name__less=value
# greater
column_name__greater=value
# strictly less
column_name__strictly_less=value
# strictly greater
column_name__strictly_greater=value
# sort by column
column_name__sort=asc
column_name__sort=desc
⚠️ WARNING: Aggregation requests are only available for resources that are listed in theALLOW_AGGREGATIONlist of the config file, which can be seen at the/api/aggregation-exceptions/endpoint.
# group by values
column_name__groupby
# count values
column_name__count
# mean / average
column_name__avg
# minimum
column_name__min
# maximum
column_name__max
# sum
column_name__sum
Note: Passing an aggregation operator (
count,avg,min,max,sum) returns a column that is named<column_name>__<operator>(for instance:?birth__groupby&score__sumwill return a list of dicts with the keysbirthandscore__sum).
⚠️ WARNING: columns that contain JSON objects (see theprofileto know which ones do) do not support filtering nor aggregation for now.
page=1 # Page number (default: 1)
page_size=20 # Items per page (default: 20, max: 50)
columns=col1,col2,col3 # Select specific columns only
curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?score__greater=0.9&decompte__exact=13Returns:
{
"data": [
{
"__id": 52,
"id": " 5174f26d-d62b-4adb-a43a-c3b6288fa2f6",
"score": 0.985,
"decompte": 13,
"is_true": false,
"birth": "1980-03-23",
"liste": "[0]"
},
{
"__id": 543,
"id": " 8705df7c-8a6a-49e2-9514-cf2fb532525e",
"score": 0.955,
"decompte": 13,
"is_true": true,
"birth": "1965-02-06",
"liste": "[0, 1, 2]"
}
],
"links": {
"profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
"swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
"next": null,
"prev": null
},
"meta": {
"page": 1,
"page_size": 20,
"total": 2
}
}With filters and aggregators (filtering is always done before aggregation, no matter the order in the parameters):
curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?decompte__groupby&birth__less=1996&score__avgi.e. decompte and average of score for all rows where birth<="1996", grouped by decompte, returns:
{
"data": [
{
"decompte": 55,
"score__avg": 0.7123333333333334
},
{
"decompte": 27,
"score__avg": 0.6068888888888889
},
{
"decompte": 23,
"score__avg": 0.4603333333333334
},
...
]
}curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=30curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?columns=id,score,birthThe metrics service provides similar functionality for system metrics:
# Get metrics data
curl http://localhost:8006/api/{model}/data/
# Get metrics as CSV
curl http://localhost:8006/api/{model}/data/csv/# Main API health
curl http://localhost:8005/health/
# Metrics API health
curl http://localhost:8006/health/Configuration is handled through TOML files and environment variables. The default configuration is in api_tabular/config_default.toml.
| Option | Default | Description |
|---|---|---|
PGREST_ENDPOINT |
http://localhost:8080 |
PostgREST server URL |
SERVER_NAME |
localhost:8005 |
Server name for URL generation |
SCHEME |
http |
URL scheme (http/https) |
SENTRY_DSN |
None |
Sentry DSN for error reporting (optional) |
PAGE_SIZE_DEFAULT |
20 |
Default page size |
PAGE_SIZE_MAX |
50 |
Maximum allowed page size |
BATCH_SIZE |
50000 |
Batch size for streaming |
DOC_PATH |
/api/doc |
Swagger documentation path |
ALLOW_AGGREGATION |
["dddddddd-7777-eeee-8888-ffffffffffff", "aaaaaaaa-9999-bbbb-1010-cccccccccccc"] |
List of resource IDs allowed for aggregation |
You can override any configuration value using environment variables:
export PGREST_ENDPOINT="http://my-postgrest:8080"
export PAGE_SIZE_DEFAULT=50
export SENTRY_DSN="https://your-sentry-dsn"Create a config.toml file in the project root or set the CSVAPI_SETTINGS environment variable:
export CSVAPI_SETTINGS="/path/to/your/config.toml"This project uses pytest for testing with async support and mocking capabilities. You must have the two tests containers running for the tests to run.
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_api.py
# Run tests with verbose output
uv run pytest -v
# Run tests and show print statements
uv run pytest -stests/test_api.py- API endpoint tests (actually pings the running API)tests/test_config.py- Configuration loading teststests/test_query.py- Query building and processing teststests/test_swagger.py- Swagger documentation tests (actually pings the running API)tests/test_utils.py- Utility function teststests/conftest.py- Test fixtures and configuration
Tests are automatically run in CI/CD. See .circleci/config.yml for the complete CI/CD configuration.
This project follows PEP 8 style guidelines using Ruff for linting and formatting. Either running these commands manually or installing the pre-commit hook is required before submitting contributions.
# Lint and sort imports, and format code
uv run ruff check --select I --fix && uv run ruff formatThis repository uses a pre-commit hook which lint and format code before each commit. Installing the pre-commit hook is required for contributions.
Install pre-commit hooks:
uv run pre-commit installThe pre-commit hook that automatically:
- Check YAML syntax
- Fix end-of-file issues
- Remove trailing whitespace
- Check for large files
- Run Ruff linting and formatting
Pull requests cannot be merged unless all CI/CD tests pass.
Tests are automatically run on every pull request and push to main branch. See .circleci/config.yml for the complete CI/CD configuration, and the 🧪 Testing section above for detailed testing commands.
The release process uses the tag_version.sh script to create git tags, GitHub releases and update CHANGELOG.md automatically. Package version numbers are automatically derived from git tags using setuptools_scm, so no manual version updates are needed in pyproject.toml.
Prerequisites: GitHub CLI must be installed and authenticated, and you must be on the main branch with a clean working directory.
# Create a new release
./tag_version.sh <version>
# Example
./tag_version.sh 2.5.0
# Dry run to see what would happen
./tag_version.sh 2.5.0 --dry-runThe script automatically:
- Extracts commits since the last tag and formats them for CHANGELOG.md
- Identifies breaking changes (commits with
!:in the subject) - Creates a git tag and pushes it to the remote repository
- Creates a GitHub release with the changelog content
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussion: Use the discussion section at the end of the production API page
- Contact Form: Support form
- Production API:
https://tabular-api.data.gouv.fr/api - Product Documentation: API tabulaire data.gouv.fr (beta) (in French)
- Technical Documentation: Swagger/OpenAPI docs
