Skip to content

Commit e93bf1e

Browse files
authored
Get collections search fields (#465)
**Related Issue(s):** - #461 **Description:** - GET `/collections` collection search fields extension ex. `/collections?fields=id,title`. [#465](#465) - Improved error messages for sorting on unsortable fields in collection search, including guidance on how to make fields sortable. [#465](#465) - Added field alias for `temporal` to enable easier sorting by temporal extent, alongside `extent.temporal.interval`. [#465](#465) **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
1 parent 7d6b741 commit e93bf1e

File tree

9 files changed

+160
-6
lines changed

9 files changed

+160
-6
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1111

1212
- Added `USE_DATETIME` environment variable to configure datetime search behavior in SFEOS. [#452](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/452)
1313
- GET `/collections` collection search sort extension ex. `/collections?sortby=+id`. [#456](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/456)
14+
- GET `/collections` collection search fields extension ex. `/collections?fields=id,title`. [#465](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/465)
15+
- Improved error messages for sorting on unsortable fields in collection search, including guidance on how to make fields sortable. [#465](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/465)
16+
- Added field alias for `temporal` to enable easier sorting by temporal extent, alongside `extent.temporal.interval`. [#465](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/465)
1417

1518
### Changed
1619

README.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,10 @@ SFEOS (stac-fastapi-elasticsearch-opensearch) is a high-performance, scalable AP
3636
- **Scale to millions of geospatial assets** with fast search performance through optimized spatial indexing and query capabilities
3737
- **Support OGC-compliant filtering** including spatial operations (intersects, contains, etc.) and temporal queries
3838
- **Perform geospatial aggregations** to analyze data distribution across space and time
39+
- **Enhanced collection search capabilities** with support for sorting and field selection
3940

4041
This implementation builds on the STAC-FastAPI framework, providing a production-ready solution specifically optimized for Elasticsearch and OpenSearch databases. It's ideal for organizations managing large geospatial data catalogs who need efficient discovery and access capabilities through standardized APIs.
4142

42-
43-
4443
## Common Deployment Patterns
4544

4645
stac-fastapi-elasticsearch-opensearch can be deployed in several ways depending on your needs:
@@ -72,6 +71,7 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
7271
- [Common Deployment Patterns](#common-deployment-patterns)
7372
- [Technologies](#technologies)
7473
- [Table of Contents](#table-of-contents)
74+
- [Collection Search Extensions](#collection-search-extensions)
7575
- [Documentation \& Resources](#documentation--resources)
7676
- [Package Structure](#package-structure)
7777
- [Examples](#examples)
@@ -113,6 +113,30 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
113113
- [Gitter Chat](https://app.gitter.im/#/room/#stac-fastapi-elasticsearch_community:gitter.im) - For real-time discussions
114114
- [GitHub Discussions](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/discussions) - For longer-form questions and answers
115115

116+
## Collection Search Extensions
117+
118+
SFEOS implements extended capabilities for the `/collections` endpoint, allowing for more powerful collection discovery:
119+
120+
- **Sorting**: Sort collections by sortable fields using the `sortby` parameter
121+
- Example: `/collections?sortby=+id` (ascending sort by ID)
122+
- Example: `/collections?sortby=-id` (descending sort by ID)
123+
- Example: `/collections?sortby=-temporal` (descending sort by temporal extent)
124+
125+
- **Field Selection**: Request only specific fields to be returned using the `fields` parameter
126+
- Example: `/collections?fields=id,title,description`
127+
- This helps reduce payload size when only certain fields are needed
128+
129+
These extensions make it easier to build user interfaces that display and navigate through collections efficiently.
130+
131+
> **Note**: Sorting is only available on fields that are indexed for sorting in Elasticsearch/OpenSearch. With the default mappings, you can sort on:
132+
> - `id` (keyword field)
133+
> - `extent.temporal.interval` (date field)
134+
> - `temporal` (alias to extent.temporal.interval)
135+
>
136+
> Text fields like `title` and `description` are not sortable by default as they use text analysis for better search capabilities. Attempting to sort on these fields will result in a user-friendly error message explaining which fields are sortable and how to make additional fields sortable by updating the mappings.
137+
>
138+
> **Important**: Adding keyword fields to make text fields sortable can significantly increase the index size, especially for large text fields. Consider the storage implications when deciding which fields to make sortable.
139+
116140
## Package Structure
117141

118142
This project is organized into several packages, each with a specific purpose:

stac_fastapi/core/stac_fastapi/core/core.py

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -225,11 +225,13 @@ async def landing_page(self, **kwargs) -> stac_types.LandingPage:
225225
return landing_page
226226

227227
async def all_collections(
228-
self, sortby: Optional[str] = None, **kwargs
228+
self, fields: Optional[List[str]] = None, sortby: Optional[str] = None, **kwargs
229229
) -> stac_types.Collections:
230230
"""Read all collections from the database.
231231
232232
Args:
233+
fields (Optional[List[str]]): Fields to include or exclude from the results.
234+
sortby (Optional[str]): Sorting options for the results.
233235
**kwargs: Keyword arguments from the request.
234236
235237
Returns:
@@ -240,6 +242,15 @@ async def all_collections(
240242
limit = int(request.query_params.get("limit", os.getenv("STAC_ITEM_LIMIT", 10)))
241243
token = request.query_params.get("token")
242244

245+
# Process fields parameter for filtering collection properties
246+
includes, excludes = set(), set()
247+
if fields and self.extension_is_enabled("FieldsExtension"):
248+
for field in fields:
249+
if field[0] == "-":
250+
excludes.add(field[1:])
251+
else:
252+
includes.add(field[1:] if field[0] in "+ " else field)
253+
243254
sort = None
244255
if sortby:
245256
parsed_sort = []
@@ -259,6 +270,15 @@ async def all_collections(
259270
token=token, limit=limit, request=request, sort=sort
260271
)
261272

273+
# Apply field filtering if fields parameter was provided
274+
if fields and self.extension_is_enabled("FieldsExtension"):
275+
filtered_collections = [
276+
filter_fields(collection, includes, excludes)
277+
for collection in collections
278+
]
279+
else:
280+
filtered_collections = collections
281+
262282
links = [
263283
{"rel": Relations.root.value, "type": MimeTypes.json, "href": base_url},
264284
{"rel": Relations.parent.value, "type": MimeTypes.json, "href": base_url},
@@ -273,7 +293,7 @@ async def all_collections(
273293
next_link = PagingLinks(next=next_token, request=request).link_next()
274294
links.append(next_link)
275295

276-
return stac_types.Collections(collections=collections, links=links)
296+
return stac_types.Collections(collections=filtered_collections, links=links)
277297

278298
async def get_collection(
279299
self, collection_id: str, **kwargs

stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@
120120
collection_search_extensions = [
121121
# QueryExtension(conformance_classes=[QueryConformanceClasses.COLLECTIONS]),
122122
SortExtension(conformance_classes=[SortConformanceClasses.COLLECTIONS]),
123-
# FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
123+
FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
124124
# CollectionSearchFilterExtension(
125125
# conformance_classes=[FilterConformanceClasses.COLLECTIONS]
126126
# ),

stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,13 +186,28 @@ async def get_all_collections(
186186
187187
Returns:
188188
A tuple of (collections, next pagination token if any).
189+
190+
Raises:
191+
HTTPException: If sorting is requested on a field that is not sortable.
189192
"""
193+
# Define sortable fields based on the ES_COLLECTIONS_MAPPINGS
194+
sortable_fields = ["id", "extent.temporal.interval", "temporal"]
195+
196+
# Format the sort parameter
190197
formatted_sort = []
191198
if sort:
192199
for item in sort:
193200
field = item.get("field")
194201
direction = item.get("direction", "asc")
195202
if field:
203+
# Validate that the field is sortable
204+
if field not in sortable_fields:
205+
raise HTTPException(
206+
status_code=400,
207+
detail=f"Field '{field}' is not sortable. Sortable fields are: {', '.join(sortable_fields)}. "
208+
+ "Text fields are not sortable by default in Elasticsearch. "
209+
+ "To make a field sortable, update the mapping to use 'keyword' type or add a '.keyword' subfield. ",
210+
)
196211
formatted_sort.append({field: {"order": direction}})
197212
# Always include id as a secondary sort to ensure consistent pagination
198213
if not any("id" in item for item in formatted_sort):

stac_fastapi/opensearch/stac_fastapi/opensearch/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@
120120
collection_search_extensions = [
121121
# QueryExtension(conformance_classes=[QueryConformanceClasses.COLLECTIONS]),
122122
SortExtension(conformance_classes=[SortConformanceClasses.COLLECTIONS]),
123-
# FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
123+
FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
124124
# CollectionSearchFilterExtension(
125125
# conformance_classes=[FilterConformanceClasses.COLLECTIONS]
126126
# ),

stac_fastapi/opensearch/stac_fastapi/opensearch/database_logic.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,13 +170,28 @@ async def get_all_collections(
170170
171171
Returns:
172172
A tuple of (collections, next pagination token if any).
173+
174+
Raises:
175+
HTTPException: If sorting is requested on a field that is not sortable.
173176
"""
177+
# Define sortable fields based on the ES_COLLECTIONS_MAPPINGS
178+
sortable_fields = ["id", "extent.temporal.interval", "temporal"]
179+
180+
# Format the sort parameter
174181
formatted_sort = []
175182
if sort:
176183
for item in sort:
177184
field = item.get("field")
178185
direction = item.get("direction", "asc")
179186
if field:
187+
# Validate that the field is sortable
188+
if field not in sortable_fields:
189+
raise HTTPException(
190+
status_code=400,
191+
detail=f"Field '{field}' is not sortable. Sortable fields are: {', '.join(sortable_fields)}. "
192+
+ "Text fields are not sortable by default in OpenSearch. "
193+
+ "To make a field sortable, update the mapping to use 'keyword' type or add a '.keyword' subfield. ",
194+
)
180195
formatted_sort.append({field: {"order": direction}})
181196
# Always include id as a secondary sort to ensure consistent pagination
182197
if not any("id" in item for item in formatted_sort):

stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/mappings.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,8 @@ class Geometry(Protocol): # noqa
165165
"providers": {"type": "object", "enabled": False},
166166
"links": {"type": "object", "enabled": False},
167167
"item_assets": {"type": "object", "enabled": get_bool_env("STAC_INDEX_ASSETS")},
168+
# Field alias to allow sorting on 'temporal' (points to extent.temporal.interval)
169+
"temporal": {"type": "alias", "path": "extent.temporal.interval"},
168170
},
169171
}
170172

stac_fastapi/tests/api/test_api_search_collections.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,3 +77,78 @@ async def test_collections_sort_id_desc(app_client, txn_client, load_test_data):
7777
assert len(test_collections) == len(collection_ids)
7878
for i, expected_id in enumerate(sorted_ids):
7979
assert test_collections[i]["id"] == expected_id
80+
81+
82+
@pytest.mark.asyncio
83+
async def test_collections_fields(app_client, txn_client, load_test_data):
84+
"""Verify GET /collections honors the fields parameter."""
85+
# Create multiple collections with different ids
86+
base_collection = load_test_data("test_collection.json")
87+
88+
# Create collections with ids in a specific order to test fields
89+
# Use unique prefixes to avoid conflicts between tests
90+
test_prefix = f"fields-{uuid.uuid4().hex[:8]}"
91+
collection_ids = [f"{test_prefix}-a", f"{test_prefix}-b", f"{test_prefix}-c"]
92+
93+
for i, coll_id in enumerate(collection_ids):
94+
test_collection = base_collection.copy()
95+
test_collection["id"] = coll_id
96+
test_collection["title"] = f"Test Collection {i}"
97+
test_collection["description"] = f"Description for collection {i}"
98+
await create_collection(txn_client, test_collection)
99+
100+
# Test include fields parameter
101+
resp = await app_client.get(
102+
"/collections",
103+
params=[("fields", "id"), ("fields", "title")],
104+
)
105+
assert resp.status_code == 200
106+
resp_json = resp.json()
107+
108+
# Check if collections exist in the response
109+
assert "collections" in resp_json, "No collections in response"
110+
111+
# Filter collections to only include the ones we created for this test
112+
test_collections = []
113+
for c in resp_json["collections"]:
114+
if "id" in c and c["id"].startswith(test_prefix):
115+
test_collections.append(c)
116+
117+
# Filter collections to only include the ones we created for this test
118+
test_collections = []
119+
for c in resp_json["collections"]:
120+
if "id" in c and c["id"].startswith(test_prefix):
121+
test_collections.append(c)
122+
123+
# Collections should only have id and title fields
124+
for collection in test_collections:
125+
assert "id" in collection
126+
assert "title" in collection
127+
assert "description" not in collection
128+
assert "links" in collection # links are always included
129+
130+
# Test exclude fields parameter
131+
resp = await app_client.get(
132+
"/collections",
133+
params=[("fields", "-description")],
134+
)
135+
assert resp.status_code == 200
136+
resp_json = resp.json()
137+
138+
# Check if collections exist in the response
139+
assert (
140+
"collections" in resp_json
141+
), "No collections in response for exclude fields test"
142+
143+
# Filter collections to only include the ones we created for this test
144+
test_collections = []
145+
for c in resp_json["collections"]:
146+
if "id" in c and c["id"].startswith(test_prefix):
147+
test_collections.append(c)
148+
149+
# Collections should have all fields except description
150+
for collection in test_collections:
151+
assert "id" in collection
152+
assert "title" in collection
153+
assert "description" not in collection
154+
assert "links" in collection

0 commit comments

Comments
 (0)