Skip to content

Resource map links missing in Solr index after Metacat 3.0 migration and reindex #242

@vchendrix

Description

@vchendrix

Summary

After migrating to Metacat 3.0 and reindexing the entire corpus with the new DataONE indexer, we observed that resource map links are missing in the Solr index for at least 22 public and 3 private datasets. The resourceMap field in the indexed metadata objects is empty even though valid resource maps exist and are properly indexed. This issue was not seen prior to the 3.0 migration and reindex.

Example

  • Dataset: https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
  • Resource map PID (first broken version): ess-dive-46903e0c9c3de02-20221117T224253475
  • Metadata PID (first broken version): ess-dive-8e6738870d87db3-20221117T224253485
  • The first version where the resourceMap stopped appearing is provided above. All subsequent versions in the obsolescence chain are also missing resource map links.

Obsolescence Chain (Representative example)

This is the obsolescence chain for a representative example. NOTE: We have found all the resource maps and noted where the resource map was indexed but not linked to the metadata (see "missing" column).

metadata_id seriesId resourceMap_id missing
ess-dive-9be051bcebbb9e6-20250428T213324618 doi:10.15485/1898912 ess-dive-c1729924d2c59eb-20250428T213324615 X
ess-dive-9be051bcebbb9e6-20250428T213257023 doi:10.15485/1898912 ess-dive-c581eebb5b7b50a-20250428T213257013 X
ess-dive-9be051bcebbb9e6-20250424T205756642 doi:10.15485/1898912 ess-dive-841f3051d9a2708-20250424T205756612 X
ess-dive-9be051bcebbb9e6-20240819T204345623 doi:10.15485/1898912 ess-dive-ce2dc192547db4e-20240819T204345615 X
ess-dive-9be051bcebbb9e6-20240314T204040227 doi:10.15485/1898912 ess-dive-30182ebce4568fe-20240314T204040214 X
ess-dive-9be051bcebbb9e6-20240108T174838968 doi:10.15485/1898912 ess-dive-b6e3586c57f24e8-20240108T174838956 X
ess-dive-9be051bcebbb9e6-20231108T174335343 doi:10.15485/1898912 ess-dive-7095e0ccee0ba15-20231108T174335337 X
ess-dive-9be051bcebbb9e6-20231108T162220676 doi:10.15485/1898912 ess-dive-a4c4b6d97257110-20231108T162220664 X
ess-dive-9be051bcebbb9e6-20231107T003557346 doi:10.15485/1898912 ess-dive-2cb51a443565303-20231107T003557339 X
ess-dive-9be051bcebbb9e6-20231107T000616818 doi:10.15485/1898912 ess-dive-e4959e48bc0d436-20231107T000616813 X
ess-dive-9be051bcebbb9e6-20231106T235858719 doi:10.15485/1898912 ess-dive-c017ba5643cb046-20231106T235858715 X
ess-dive-9be051bcebbb9e6-20231106T235820611 doi:10.15485/1898912 ess-dive-89c84faea686d5f-20231106T235820604 X
ess-dive-9be051bcebbb9e6-20231106T235015592 doi:10.15485/1898912 ess-dive-677c91786303c4b-20231106T235015584 X
ess-dive-9be051bcebbb9e6-20231106T231212959 doi:10.15485/1898912 ess-dive-7fda0c47b8e1f2f-20231106T231212952 X
ess-dive-9be051bcebbb9e6-20231106T231106515 doi:10.15485/1898912 ess-dive-0ee949f8376453a-20231106T231106507 X
ess-dive-9be051bcebbb9e6-20231102T191507382 doi:10.15485/1898912 ess-dive-e1a0ec7bfee78f1-20231102T191507376 X
ess-dive-9be051bcebbb9e6-20231024T165344106 doi:10.15485/1898912 ess-dive-22443e0250facf3-20231024T165344094 X
ess-dive-9be051bcebbb9e6-20231020T171513129 doi:10.15485/1898912 ess-dive-da805827db18fb5-20231020T171513122 X
ess-dive-9be051bcebbb9e6-20231020T151718937 doi:10.15485/1898912 ess-dive-e5fa24875975767-20231020T151718932 X
ess-dive-9be051bcebbb9e6-20231020T151546870 doi:10.15485/1898912 ess-dive-8e968f0b58ae787-20231020T151546863 X
ess-dive-9be051bcebbb9e6-20231020T151029878 doi:10.15485/1898912 ess-dive-288593f8d05d0a8-20231020T151029850 X
ess-dive-9be051bcebbb9e6-20230509T160352692 doi:10.15485/1898912 ess-dive-4d8dbc35b605239-20230509T160352683 X
ess-dive-e77eecf1104cc78-20230504T212124108475 doi:10.15485/1898912 ess-dive-23d73b4b474f11e-20230504T212129266471 X
ess-dive-a5ddb655e7c69bc-20230407T152426839335 doi:10.15485/1898912 ess-dive-140808a28a9d9fc-20230407T152434860390 X
ess-dive-792a06a368570f4-20230406T135257441502 doi:10.15485/1898912 ess-dive-c28f0962d002526-20230406T135302307466 X
ess-dive-7ccb579e570dbd8-20230406T122652349829 doi:10.15485/1898912 ess-dive-8b19a68c5d29e91-20230406T122700866201 X
ess-dive-9be051bcebbb9e6-20221205T175525485 doi:10.15485/1898912 ess-dive-b6c52d4c0d5a817-20221205T175525476 X
ess-dive-8e6738870d87db3-20221117T224253485 doi:10.15485/1898912 ess-dive-46903e0c9c3de02-20221117T224253475 X
ess-dive-20f26dd7286247a-20221111T172542084 ess-dive-d89fb6fad7353a8-20221111T172542074
ess-dive-ae18366438d5362-20221108T220702423 ess-dive-4aa268acb569f9d-20221108T220702416
ess-dive-a2314047b1cfdb2-20221108T214218585 ess-dive-0b825c5aa441e47-20221108T214218568
ess-dive-c0639102a0bd655-20221108T164958962 ess-dive-f7c9cc166222f0f-20221108T164958956
ess-dive-4c48f26f7f8afa0-20221108T000408556 ess-dive-5d6e27e22134ecc-20221108T000408551
ess-dive-8faddcacbf35bb7-20220920T175050062 ess-dive-251ae38a2ebcfd2-20220920T175050053
ess-dive-0824643ddb85a53-20220920T174739806142 ess-dive-dc06abe31a531ef-20220920T174741711311

Log Output (excerpt from dataone-indexer during reindexing)

dataone-indexer 20250610-15:44:47: [INFO]: IndexWorker.consumer.indexObject by multiple thread? true, with the thread id 44 - Received the index task from the index queue with the identifier: ess-dive-46903e0c9c3de02-20221117T224253475 , the index type: create, the priority: 3 [org.dataone.cn.indexer.IndexWorker:indexObject:454]
[... additional log lines omitted for brevity ...]
dataone-indexer 20250610-15:44:50: [INFO]: SeriesIdResolver.getPid - get the system metadata from the mn base url http://metacat-hl:8080/catalog/d1/mn for the object doi:10.15485/1898912 [org.dataone.cn.indexer.parser.utility.SeriesIdResolver:getPid:53]
dataone-indexer 20250610-15:44:50: [WARN]: SeriesIdResolver.getPid - can't get the system metadata from the mn http://metacat-hl:8080/catalog/d1/mn for the object doi:10.15485/1898912 since class org.dataone.client.exception.ClientSideException: /metacat-hl: Name or service not known. We will try to get it from cn. [org.dataone.cn.indexer.parser.utility.SeriesIdResolver:getPid:61]
dataone-indexer 20250610-15:44:50: [INFO]: SeriesIdResolver.getPid - get the system metadata for the object doi:10.15485/1898912 from the cn since the current node is cn or the systemmetadata is not available on a mn with baseurl http://metacat-hl:8080/catalog/d1/mn [org.dataone.cn.indexer.parser.utility.SeriesIdResolver:getPid:67]
dataone-indexer 20250610-15:44:50: [INFO]: response httpCode: 200 [org.dataone.service.util.ExceptionHandler:filterErrors:93]
dataone-indexer 20250610-15:44:50: [INFO]: ||||||||||||||||||| the head version is urn:uuid:da404cb4-1b7e-47ec-9e29-a552f5bb04cf for sid doi:10.15485/1898912 [org.dataone.cn.indexer.resourcemap.ForesiteResourceMap:isHeadVersion:214]
2025-06-10T15:44:50.236318555Z dataone-indexer 20250610-15:44:50: [INFO]: ||||||||||||||||||| the pid ess-dive-8e6738870d87db3-20221117T224253485 is NOT the head version for sid doi:10.15485/1898912 [org.dataone.cn.indexer.resourcemap.ForesiteResourceMap:isHeadVersion:221]
dataone-indexer 20250610-15:44:50: [INFO]: The id org.dataone.service.types.v1.Identifier@c6ea9113 is not the head of the serial id doi:10.15485/1898912 So, skip merge this one!!!!!!!!!!!!!!!!!!!!!!ess-dive-8e6738870d87db3-20221117T224253485 [
[... additional log lines omitted for brevity ...]

Additional Details

  • In total, at least 22 public and 3 private datasets are confirmed affected. Not all datasets have been exhaustively checked, but the issue is significant and representative.
  • The problem appears to have started immediately after the Metacat 3.0 migration and reindex.
  • Reindexing resource maps completes without error but does not restore the missing links.
  • Impact: Affected datasets have broken file-to-metadata links in Solr and the user-facing catalog.
  • Important: The log output above shows the indexer is attempting to connect to the Member Node at metacat-hl:8080, but this service does not exist in our deployment. The correct service name is metacat:8080. This service name mismatch may be causing or contributing to the issue.

Configuration/Deployment Note

How can we override or configure the indexer (or DataONE client libraries) to use the correct Member Node service name (metacat:8080) instead of the non-existent metacat-hl:8080? Please provide guidance on where this is set (environment variable, config file, or code) and the recommended approach to ensure proper resolution.

Hypothesis

This may be related to changes in how head versions and resource maps are processed in the indexer logic after the Metacat 3.0 upgrade. The indexer determines that the metadata PID is not the head version for its seriesId and therefore skips merging the resource map reference, leaving the resourceMap field empty in Solr.

Request

Please investigate and advise on a fix or workaround. Let us know if additional details, logs, or dataset examples are needed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions