-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
After migrating to Metacat 3.0 and reindexing the entire corpus with the new DataONE indexer, we observed that resource map links are missing in the Solr index for at least 22 public and 3 private datasets. The resourceMap field in the indexed metadata objects is empty even though valid resource maps exist and are properly indexed. This issue was not seen prior to the 3.0 migration and reindex.
Example
- Dataset: https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
- Resource map PID (first broken version): ess-dive-46903e0c9c3de02-20221117T224253475
- Metadata PID (first broken version): ess-dive-8e6738870d87db3-20221117T224253485
- The first version where the resourceMap stopped appearing is provided above. All subsequent versions in the obsolescence chain are also missing resource map links.
Obsolescence Chain (Representative example)
This is the obsolescence chain for a representative example. NOTE: We have found all the resource maps and noted where the resource map was indexed but not linked to the metadata (see "missing" column).
| metadata_id | seriesId | resourceMap_id | missing |
|---|---|---|---|
| ess-dive-9be051bcebbb9e6-20250428T213324618 | doi:10.15485/1898912 | ess-dive-c1729924d2c59eb-20250428T213324615 | X |
| ess-dive-9be051bcebbb9e6-20250428T213257023 | doi:10.15485/1898912 | ess-dive-c581eebb5b7b50a-20250428T213257013 | X |
| ess-dive-9be051bcebbb9e6-20250424T205756642 | doi:10.15485/1898912 | ess-dive-841f3051d9a2708-20250424T205756612 | X |
| ess-dive-9be051bcebbb9e6-20240819T204345623 | doi:10.15485/1898912 | ess-dive-ce2dc192547db4e-20240819T204345615 | X |
| ess-dive-9be051bcebbb9e6-20240314T204040227 | doi:10.15485/1898912 | ess-dive-30182ebce4568fe-20240314T204040214 | X |
| ess-dive-9be051bcebbb9e6-20240108T174838968 | doi:10.15485/1898912 | ess-dive-b6e3586c57f24e8-20240108T174838956 | X |
| ess-dive-9be051bcebbb9e6-20231108T174335343 | doi:10.15485/1898912 | ess-dive-7095e0ccee0ba15-20231108T174335337 | X |
| ess-dive-9be051bcebbb9e6-20231108T162220676 | doi:10.15485/1898912 | ess-dive-a4c4b6d97257110-20231108T162220664 | X |
| ess-dive-9be051bcebbb9e6-20231107T003557346 | doi:10.15485/1898912 | ess-dive-2cb51a443565303-20231107T003557339 | X |
| ess-dive-9be051bcebbb9e6-20231107T000616818 | doi:10.15485/1898912 | ess-dive-e4959e48bc0d436-20231107T000616813 | X |
| ess-dive-9be051bcebbb9e6-20231106T235858719 | doi:10.15485/1898912 | ess-dive-c017ba5643cb046-20231106T235858715 | X |
| ess-dive-9be051bcebbb9e6-20231106T235820611 | doi:10.15485/1898912 | ess-dive-89c84faea686d5f-20231106T235820604 | X |
| ess-dive-9be051bcebbb9e6-20231106T235015592 | doi:10.15485/1898912 | ess-dive-677c91786303c4b-20231106T235015584 | X |
| ess-dive-9be051bcebbb9e6-20231106T231212959 | doi:10.15485/1898912 | ess-dive-7fda0c47b8e1f2f-20231106T231212952 | X |
| ess-dive-9be051bcebbb9e6-20231106T231106515 | doi:10.15485/1898912 | ess-dive-0ee949f8376453a-20231106T231106507 | X |
| ess-dive-9be051bcebbb9e6-20231102T191507382 | doi:10.15485/1898912 | ess-dive-e1a0ec7bfee78f1-20231102T191507376 | X |
| ess-dive-9be051bcebbb9e6-20231024T165344106 | doi:10.15485/1898912 | ess-dive-22443e0250facf3-20231024T165344094 | X |
| ess-dive-9be051bcebbb9e6-20231020T171513129 | doi:10.15485/1898912 | ess-dive-da805827db18fb5-20231020T171513122 | X |
| ess-dive-9be051bcebbb9e6-20231020T151718937 | doi:10.15485/1898912 | ess-dive-e5fa24875975767-20231020T151718932 | X |
| ess-dive-9be051bcebbb9e6-20231020T151546870 | doi:10.15485/1898912 | ess-dive-8e968f0b58ae787-20231020T151546863 | X |
| ess-dive-9be051bcebbb9e6-20231020T151029878 | doi:10.15485/1898912 | ess-dive-288593f8d05d0a8-20231020T151029850 | X |
| ess-dive-9be051bcebbb9e6-20230509T160352692 | doi:10.15485/1898912 | ess-dive-4d8dbc35b605239-20230509T160352683 | X |
| ess-dive-e77eecf1104cc78-20230504T212124108475 | doi:10.15485/1898912 | ess-dive-23d73b4b474f11e-20230504T212129266471 | X |
| ess-dive-a5ddb655e7c69bc-20230407T152426839335 | doi:10.15485/1898912 | ess-dive-140808a28a9d9fc-20230407T152434860390 | X |
| ess-dive-792a06a368570f4-20230406T135257441502 | doi:10.15485/1898912 | ess-dive-c28f0962d002526-20230406T135302307466 | X |
| ess-dive-7ccb579e570dbd8-20230406T122652349829 | doi:10.15485/1898912 | ess-dive-8b19a68c5d29e91-20230406T122700866201 | X |
| ess-dive-9be051bcebbb9e6-20221205T175525485 | doi:10.15485/1898912 | ess-dive-b6c52d4c0d5a817-20221205T175525476 | X |
| ess-dive-8e6738870d87db3-20221117T224253485 | doi:10.15485/1898912 | ess-dive-46903e0c9c3de02-20221117T224253475 | X |
| ess-dive-20f26dd7286247a-20221111T172542084 | ess-dive-d89fb6fad7353a8-20221111T172542074 | ||
| ess-dive-ae18366438d5362-20221108T220702423 | ess-dive-4aa268acb569f9d-20221108T220702416 | ||
| ess-dive-a2314047b1cfdb2-20221108T214218585 | ess-dive-0b825c5aa441e47-20221108T214218568 | ||
| ess-dive-c0639102a0bd655-20221108T164958962 | ess-dive-f7c9cc166222f0f-20221108T164958956 | ||
| ess-dive-4c48f26f7f8afa0-20221108T000408556 | ess-dive-5d6e27e22134ecc-20221108T000408551 | ||
| ess-dive-8faddcacbf35bb7-20220920T175050062 | ess-dive-251ae38a2ebcfd2-20220920T175050053 | ||
| ess-dive-0824643ddb85a53-20220920T174739806142 | ess-dive-dc06abe31a531ef-20220920T174741711311 |
Log Output (excerpt from dataone-indexer during reindexing)
dataone-indexer 20250610-15:44:47: [INFO]: IndexWorker.consumer.indexObject by multiple thread? true, with the thread id 44 - Received the index task from the index queue with the identifier: ess-dive-46903e0c9c3de02-20221117T224253475 , the index type: create, the priority: 3 [org.dataone.cn.indexer.IndexWorker:indexObject:454]
[... additional log lines omitted for brevity ...]
dataone-indexer 20250610-15:44:50: [INFO]: SeriesIdResolver.getPid - get the system metadata from the mn base url http://metacat-hl:8080/catalog/d1/mn for the object doi:10.15485/1898912 [org.dataone.cn.indexer.parser.utility.SeriesIdResolver:getPid:53]
dataone-indexer 20250610-15:44:50: [WARN]: SeriesIdResolver.getPid - can't get the system metadata from the mn http://metacat-hl:8080/catalog/d1/mn for the object doi:10.15485/1898912 since class org.dataone.client.exception.ClientSideException: /metacat-hl: Name or service not known. We will try to get it from cn. [org.dataone.cn.indexer.parser.utility.SeriesIdResolver:getPid:61]
dataone-indexer 20250610-15:44:50: [INFO]: SeriesIdResolver.getPid - get the system metadata for the object doi:10.15485/1898912 from the cn since the current node is cn or the systemmetadata is not available on a mn with baseurl http://metacat-hl:8080/catalog/d1/mn [org.dataone.cn.indexer.parser.utility.SeriesIdResolver:getPid:67]
dataone-indexer 20250610-15:44:50: [INFO]: response httpCode: 200 [org.dataone.service.util.ExceptionHandler:filterErrors:93]
dataone-indexer 20250610-15:44:50: [INFO]: ||||||||||||||||||| the head version is urn:uuid:da404cb4-1b7e-47ec-9e29-a552f5bb04cf for sid doi:10.15485/1898912 [org.dataone.cn.indexer.resourcemap.ForesiteResourceMap:isHeadVersion:214]
2025-06-10T15:44:50.236318555Z dataone-indexer 20250610-15:44:50: [INFO]: ||||||||||||||||||| the pid ess-dive-8e6738870d87db3-20221117T224253485 is NOT the head version for sid doi:10.15485/1898912 [org.dataone.cn.indexer.resourcemap.ForesiteResourceMap:isHeadVersion:221]
dataone-indexer 20250610-15:44:50: [INFO]: The id org.dataone.service.types.v1.Identifier@c6ea9113 is not the head of the serial id doi:10.15485/1898912 So, skip merge this one!!!!!!!!!!!!!!!!!!!!!!ess-dive-8e6738870d87db3-20221117T224253485 [
[... additional log lines omitted for brevity ...]
Additional Details
- In total, at least 22 public and 3 private datasets are confirmed affected. Not all datasets have been exhaustively checked, but the issue is significant and representative.
- The problem appears to have started immediately after the Metacat 3.0 migration and reindex.
- Reindexing resource maps completes without error but does not restore the missing links.
- Impact: Affected datasets have broken file-to-metadata links in Solr and the user-facing catalog.
- Important: The log output above shows the indexer is attempting to connect to the Member Node at
metacat-hl:8080, but this service does not exist in our deployment. The correct service name ismetacat:8080. This service name mismatch may be causing or contributing to the issue.
Configuration/Deployment Note
How can we override or configure the indexer (or DataONE client libraries) to use the correct Member Node service name (metacat:8080) instead of the non-existent metacat-hl:8080? Please provide guidance on where this is set (environment variable, config file, or code) and the recommended approach to ensure proper resolution.
Hypothesis
This may be related to changes in how head versions and resource maps are processed in the indexer logic after the Metacat 3.0 upgrade. The indexer determines that the metadata PID is not the head version for its seriesId and therefore skips merging the resource map reference, leaving the resourceMap field empty in Solr.
Request
Please investigate and advise on a fix or workaround. Let us know if additional details, logs, or dataset examples are needed.