Skip to content

Commit 9e8e9dc

Browse files
Merge branch 'main' into CAT-1382
2 parents 2a8dd32 + 0988448 commit 9e8e9dc

File tree

10 files changed

+543
-47
lines changed

10 files changed

+543
-47
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
99

1010
### Added
1111

12+
- GET `/collections` collection search free text extension ex. `/collections?q=sentinel`. [#470](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/470)
1213
- Added `USE_DATETIME` environment variable to configure datetime search behavior in SFEOS. [#452](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/452)
1314
- GET `/collections` collection search sort extension ex. `/collections?sortby=+id`. [#456](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/456)
15+
- GET `/collections` collection search fields extension ex. `/collections?fields=id,title`. [#465](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/465)
16+
- Improved error messages for sorting on unsortable fields in collection search, including guidance on how to make fields sortable. [#465](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/465)
17+
- Added field alias for `temporal` to enable easier sorting by temporal extent, alongside `extent.temporal.interval`. [#465](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/465)
18+
- Added `ENABLE_COLLECTIONS_SEARCH` environment variable to make collection search extensions optional (defaults to enabled). [#465](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/465)
1419

1520
### Changed
1621

README.md

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,10 @@ SFEOS (stac-fastapi-elasticsearch-opensearch) is a high-performance, scalable AP
3636
- **Scale to millions of geospatial assets** with fast search performance through optimized spatial indexing and query capabilities
3737
- **Support OGC-compliant filtering** including spatial operations (intersects, contains, etc.) and temporal queries
3838
- **Perform geospatial aggregations** to analyze data distribution across space and time
39+
- **Enhanced collection search capabilities** with support for sorting and field selection
3940

4041
This implementation builds on the STAC-FastAPI framework, providing a production-ready solution specifically optimized for Elasticsearch and OpenSearch databases. It's ideal for organizations managing large geospatial data catalogs who need efficient discovery and access capabilities through standardized APIs.
4142

42-
43-
4443
## Common Deployment Patterns
4544

4645
stac-fastapi-elasticsearch-opensearch can be deployed in several ways depending on your needs:
@@ -72,6 +71,7 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
7271
- [Common Deployment Patterns](#common-deployment-patterns)
7372
- [Technologies](#technologies)
7473
- [Table of Contents](#table-of-contents)
74+
- [Collection Search Extensions](#collection-search-extensions)
7575
- [Documentation \& Resources](#documentation--resources)
7676
- [Package Structure](#package-structure)
7777
- [Examples](#examples)
@@ -113,6 +113,37 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
113113
- [Gitter Chat](https://app.gitter.im/#/room/#stac-fastapi-elasticsearch_community:gitter.im) - For real-time discussions
114114
- [GitHub Discussions](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/discussions) - For longer-form questions and answers
115115

116+
## Collection Search Extensions
117+
118+
SFEOS implements extended capabilities for the `/collections` endpoint, allowing for more powerful collection discovery:
119+
120+
- **Sorting**: Sort collections by sortable fields using the `sortby` parameter
121+
- Example: `/collections?sortby=+id` (ascending sort by ID)
122+
- Example: `/collections?sortby=-id` (descending sort by ID)
123+
- Example: `/collections?sortby=-temporal` (descending sort by temporal extent)
124+
125+
- **Field Selection**: Request only specific fields to be returned using the `fields` parameter
126+
- Example: `/collections?fields=id,title,description`
127+
- This helps reduce payload size when only certain fields are needed
128+
129+
- **Free Text Search**: Search across collection text fields using the `q` parameter
130+
- Example: `/collections?q=landsat`
131+
- Searches across multiple text fields including title, description, and keywords
132+
- Supports partial word matching and relevance-based sorting
133+
134+
These extensions make it easier to build user interfaces that display and navigate through collections efficiently.
135+
136+
> **Configuration**: Collection search extensions can be disabled by setting the `ENABLE_COLLECTIONS_SEARCH` environment variable to `false`. By default, these extensions are enabled.
137+
138+
> **Note**: Sorting is only available on fields that are indexed for sorting in Elasticsearch/OpenSearch. With the default mappings, you can sort on:
139+
> - `id` (keyword field)
140+
> - `extent.temporal.interval` (date field)
141+
> - `temporal` (alias to extent.temporal.interval)
142+
>
143+
> Text fields like `title` and `description` are not sortable by default as they use text analysis for better search capabilities. Attempting to sort on these fields will result in a user-friendly error message explaining which fields are sortable and how to make additional fields sortable by updating the mappings.
144+
>
145+
> **Important**: Adding keyword fields to make text fields sortable can significantly increase the index size, especially for large text fields. Consider the storage implications when deciding which fields to make sortable.
146+
116147
## Package Structure
117148

118149
This project is organized into several packages, each with a specific purpose:
@@ -243,6 +274,7 @@ You can customize additional settings in your `.env` file:
243274
| `ENABLE_DIRECT_RESPONSE` | Enable direct response for maximum performance (disables all FastAPI dependencies, including authentication, custom status codes, and validation) | `false` | Optional |
244275
| `RAISE_ON_BULK_ERROR` | Controls whether bulk insert operations raise exceptions on errors. If set to `true`, the operation will stop and raise an exception when an error occurs. If set to `false`, errors will be logged, and the operation will continue. **Note:** STAC Item and ItemCollection validation errors will always raise, regardless of this flag. | `false` | Optional |
245276
| `DATABASE_REFRESH` | Controls whether database operations refresh the index immediately after changes. If set to `true`, changes will be immediately searchable. If set to `false`, changes may not be immediately visible but can improve performance for bulk operations. If set to `wait_for`, changes will wait for the next refresh cycle to become visible. | `false` | Optional |
277+
| `ENABLE_COLLECTIONS_SEARCH` | Enable collection search extensions (sort, fields). | `true` | Optional |
246278
| `ENABLE_TRANSACTIONS_EXTENSIONS` | Enables or disables the Transactions and Bulk Transactions API extensions. If set to `false`, the POST `/collections` route and related transaction endpoints (including bulk transaction operations) will be unavailable in the API. This is useful for deployments where mutating the catalog via the API should be prevented. | `true` | Optional |
247279
| `STAC_ITEM_LIMIT` | Sets the environment variable for result limiting to SFEOS for the number of returned items and STAC collections. | `10` | Optional |
248280
| `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional |
@@ -389,6 +421,10 @@ The system uses a precise naming convention:
389421
- **Root Path Configuration**: The application root path is the base URL by default.
390422
- For AWS Lambda with Gateway API: Set `STAC_FASTAPI_ROOT_PATH` to match the Gateway API stage name (e.g., `/v1`)
391423

424+
- **Feature Configuration**: Control which features are enabled:
425+
- `ENABLE_COLLECTIONS_SEARCH`: Set to `true` (default) to enable collection search extensions (sort, fields). Set to `false` to disable.
426+
- `ENABLE_TRANSACTIONS_EXTENSIONS`: Set to `true` (default) to enable transaction extensions. Set to `false` to disable.
427+
392428

393429
## Collection Pagination
394430

stac_fastapi/core/stac_fastapi/core/core.py

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -230,11 +230,18 @@ async def landing_page(self, **kwargs) -> stac_types.LandingPage:
230230
return landing_page
231231

232232
async def all_collections(
233-
self, sortby: Optional[str] = None, **kwargs
233+
self,
234+
fields: Optional[List[str]] = None,
235+
sortby: Optional[str] = None,
236+
q: Optional[Union[str, List[str]]] = None,
237+
**kwargs,
234238
) -> stac_types.Collections:
235239
"""Read all collections from the database.
236240
237241
Args:
242+
fields (Optional[List[str]]): Fields to include or exclude from the results.
243+
sortby (Optional[str]): Sorting options for the results.
244+
q (Optional[List[str]]): Free text search terms.
238245
**kwargs: Keyword arguments from the request.
239246
240247
Returns:
@@ -245,6 +252,15 @@ async def all_collections(
245252
limit = int(request.query_params.get("limit", os.getenv("STAC_ITEM_LIMIT", 10)))
246253
token = request.query_params.get("token")
247254

255+
# Process fields parameter for filtering collection properties
256+
includes, excludes = set(), set()
257+
if fields and self.extension_is_enabled("FieldsExtension"):
258+
for field in fields:
259+
if field[0] == "-":
260+
excludes.add(field[1:])
261+
else:
262+
includes.add(field[1:] if field[0] in "+ " else field)
263+
248264
sort = None
249265
if sortby:
250266
parsed_sort = []
@@ -267,10 +283,24 @@ async def all_collections(
267283
except Exception:
268284
redis = None
269285

286+
# Convert q to a list if it's a string
287+
q_list = None
288+
if q is not None:
289+
q_list = [q] if isinstance(q, str) else q
290+
270291
collections, next_token = await self.database.get_all_collections(
271-
token=token, limit=limit, request=request, sort=sort
292+
token=token, limit=limit, request=request, sort=sort, q=q_list
272293
)
273294

295+
# Apply field filtering if fields parameter was provided
296+
if fields and self.extension_is_enabled("FieldsExtension"):
297+
filtered_collections = [
298+
filter_fields(collection, includes, excludes)
299+
for collection in collections
300+
]
301+
else:
302+
filtered_collections = collections
303+
274304
links = [
275305
{"rel": Relations.root.value, "type": MimeTypes.json, "href": base_url},
276306
{"rel": Relations.parent.value, "type": MimeTypes.json, "href": base_url},
@@ -301,7 +331,7 @@ async def all_collections(
301331
next_link = PagingLinks(next=next_token, request=request).link_next()
302332
links.append(next_link)
303333

304-
return stac_types.Collections(collections=collections, links=links)
334+
return stac_types.Collections(collections=filtered_collections, links=links)
305335

306336
async def get_collection(
307337
self, collection_id: str, **kwargs

stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/app.py

Lines changed: 26 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,7 @@
4545
)
4646
from stac_fastapi.extensions.core.fields import FieldsConformanceClasses
4747
from stac_fastapi.extensions.core.filter import FilterConformanceClasses
48-
49-
# from stac_fastapi.extensions.core.free_text import FreeTextConformanceClasses
48+
from stac_fastapi.extensions.core.free_text import FreeTextConformanceClasses
5049
from stac_fastapi.extensions.core.query import QueryConformanceClasses
5150
from stac_fastapi.extensions.core.sort import SortConformanceClasses
5251
from stac_fastapi.extensions.third_party import BulkTransactionExtension
@@ -57,7 +56,9 @@
5756
logger = logging.getLogger(__name__)
5857

5958
TRANSACTIONS_EXTENSIONS = get_bool_env("ENABLE_TRANSACTIONS_EXTENSIONS", default=True)
59+
ENABLE_COLLECTIONS_SEARCH = get_bool_env("ENABLE_COLLECTIONS_SEARCH", default=True)
6060
logger.info("TRANSACTIONS_EXTENSIONS is set to %s", TRANSACTIONS_EXTENSIONS)
61+
logger.info("ENABLE_COLLECTIONS_SEARCH is set to %s", ENABLE_COLLECTIONS_SEARCH)
6162

6263
settings = ElasticsearchSettings()
6364
session = Session.create_from_settings(settings)
@@ -115,25 +116,26 @@
115116

116117
extensions = [aggregation_extension] + search_extensions
117118

118-
# Create collection search extensions
119-
# Only sort extension is enabled for now
120-
collection_search_extensions = [
121-
# QueryExtension(conformance_classes=[QueryConformanceClasses.COLLECTIONS]),
122-
SortExtension(conformance_classes=[SortConformanceClasses.COLLECTIONS]),
123-
# FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
124-
# CollectionSearchFilterExtension(
125-
# conformance_classes=[FilterConformanceClasses.COLLECTIONS]
126-
# ),
127-
# FreeTextExtension(conformance_classes=[FreeTextConformanceClasses.COLLECTIONS]),
128-
]
129-
130-
# Initialize collection search with its extensions
131-
collection_search_ext = CollectionSearchExtension.from_extensions(
132-
collection_search_extensions
133-
)
134-
collections_get_request_model = collection_search_ext.GET
119+
# Create collection search extensions if enabled
120+
if ENABLE_COLLECTIONS_SEARCH:
121+
# Create collection search extensions
122+
collection_search_extensions = [
123+
# QueryExtension(conformance_classes=[QueryConformanceClasses.COLLECTIONS]),
124+
SortExtension(conformance_classes=[SortConformanceClasses.COLLECTIONS]),
125+
FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
126+
# CollectionSearchFilterExtension(
127+
# conformance_classes=[FilterConformanceClasses.COLLECTIONS]
128+
# ),
129+
FreeTextExtension(conformance_classes=[FreeTextConformanceClasses.COLLECTIONS]),
130+
]
131+
132+
# Initialize collection search with its extensions
133+
collection_search_ext = CollectionSearchExtension.from_extensions(
134+
collection_search_extensions
135+
)
136+
collections_get_request_model = collection_search_ext.GET
135137

136-
extensions.append(collection_search_ext)
138+
extensions.append(collection_search_ext)
137139

138140
database_logic.extensions = [type(ext).__name__ for ext in extensions]
139141

@@ -170,10 +172,13 @@
170172
"search_get_request_model": create_get_request_model(search_extensions),
171173
"search_post_request_model": post_request_model,
172174
"items_get_request_model": items_get_request_model,
173-
"collections_get_request_model": collections_get_request_model,
174175
"route_dependencies": get_route_dependencies(),
175176
}
176177

178+
# Add collections_get_request_model if collection search is enabled
179+
if ENABLE_COLLECTIONS_SEARCH:
180+
app_config["collections_get_request_model"] = collections_get_request_model
181+
177182
api = StacApi(**app_config)
178183

179184

stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ async def get_all_collections(
175175
limit: int,
176176
request: Request,
177177
sort: Optional[List[Dict[str, Any]]] = None,
178+
q: Optional[List[str]] = None,
178179
) -> Tuple[List[Dict[str, Any]], Optional[str]]:
179180
"""Retrieve a list of collections from Elasticsearch, supporting pagination.
180181
@@ -183,16 +184,32 @@ async def get_all_collections(
183184
limit (int): The number of results to return.
184185
request (Request): The FastAPI request object.
185186
sort (Optional[List[Dict[str, Any]]]): Optional sort parameter from the request.
187+
q (Optional[List[str]]): Free text search terms.
186188
187189
Returns:
188190
A tuple of (collections, next pagination token if any).
191+
192+
Raises:
193+
HTTPException: If sorting is requested on a field that is not sortable.
189194
"""
195+
# Define sortable fields based on the ES_COLLECTIONS_MAPPINGS
196+
sortable_fields = ["id", "extent.temporal.interval", "temporal"]
197+
198+
# Format the sort parameter
190199
formatted_sort = []
191200
if sort:
192201
for item in sort:
193202
field = item.get("field")
194203
direction = item.get("direction", "asc")
195204
if field:
205+
# Validate that the field is sortable
206+
if field not in sortable_fields:
207+
raise HTTPException(
208+
status_code=400,
209+
detail=f"Field '{field}' is not sortable. Sortable fields are: {', '.join(sortable_fields)}. "
210+
+ "Text fields are not sortable by default in Elasticsearch. "
211+
+ "To make a field sortable, update the mapping to use 'keyword' type or add a '.keyword' subfield. ",
212+
)
196213
formatted_sort.append({field: {"order": direction}})
197214
# Always include id as a secondary sort to ensure consistent pagination
198215
if not any("id" in item for item in formatted_sort):
@@ -208,6 +225,38 @@ async def get_all_collections(
208225
if token:
209226
body["search_after"] = [token]
210227

228+
# Apply free text query if provided
229+
if q:
230+
# For collections, we want to search across all relevant fields
231+
should_clauses = []
232+
233+
# For each search term
234+
for term in q:
235+
# Create a multi_match query for each term
236+
for field in [
237+
"id",
238+
"title",
239+
"description",
240+
"keywords",
241+
"summaries.platform",
242+
"summaries.constellation",
243+
"providers.name",
244+
"providers.url",
245+
]:
246+
should_clauses.append(
247+
{
248+
"wildcard": {
249+
field: {"value": f"*{term}*", "case_insensitive": True}
250+
}
251+
}
252+
)
253+
254+
# Add the query to the body using bool query with should clauses
255+
body["query"] = {
256+
"bool": {"should": should_clauses, "minimum_should_match": 1}
257+
}
258+
259+
# Execute the search
211260
response = await self.client.search(
212261
index=COLLECTIONS_INDEX,
213262
body=body,

0 commit comments

Comments
 (0)