Skip to content

Conversation

OriolMunoz-da
Copy link
Contributor

@OriolMunoz-da OriolMunoz-da commented Aug 20, 2025

Fixes #829

Pull Request Checklist

Cluster Testing

  • If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
  • If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

  • Include any change that might be observable by our partners or affect their deployment in the release notes.
  • Specify fixed issues with Fixes #n, and mention issues worked on using #n
  • Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

  • Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

@OriolMunoz-da
Copy link
Contributor Author

/cluster_test

Copy link

Deploy cluster test triggered for Commit 3b845ce5b206cfe9ff5ed5af7ef95ee9acbc514e in , please contact a Contributor to approve it in CircleCI: https://app.circleci.com/pipelines/github/DACH-NY/canton-network-internal/28163

@OriolMunoz-da OriolMunoz-da changed the title Oriol/packagename stores interface fallback Switch to relying on package names instead of module names Aug 25, 2025
}
}

"prevent against ingestion of same (moduleName, entityName) with different package name - via interface fallback" in {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is the best summary for the reason of being of this PR

Comment on lines 1 to 10
-- strictly not necessary, but lets us define the constraint as mandatory afterwards,
-- and also makes index creation faster since there's no data to index
truncate table acs_store_template cascade;
truncate table user_wallet_acs_store cascade;
truncate table external_party_wallet_acs_store cascade;
truncate table validator_acs_store cascade;
truncate table sv_acs_store cascade;
truncate table dso_acs_store cascade;
truncate table scan_acs_store cascade;
truncate table splitwell_acs_store cascade;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-check me that this is not a bad idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that all descriptors are bumped

* another list; that is why the contract ID by itself cannot be used by
* itself as the source of the sort key.
*/
def listAssignedContractsNotOnDomainN(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused except in a test (also deleted)

tableName: String,
storeId: AcsStoreId,
migrationId: Long,
where: SQLActionBuilder,
companion: C,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should've been the case all along, instead of having where ti_q_n = ? on every query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, thanks for pulling through that simplification!

@@ -837,51 +837,6 @@ abstract class MultiDomainAcsStoreTest[
MultiDomainAcsStoreTest.generatedCoids.value.size should (be >= 900 and be <= 1000)
}

"read assignment-mismatched contracts in a stable order" in {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested listAssignedContractsNotOnDomainN, deleted

@OriolMunoz-da
Copy link
Contributor Author

/cluster_test

@OriolMunoz-da OriolMunoz-da marked this pull request as ready for review August 25, 2025 15:31
Copy link

Deploy cluster test triggered for Commit 1680a64b4cb0ace131dd19435f78cb0d69064cea in , please contact a Contributor to approve it in CircleCI: https://app.circleci.com/pipelines/github/DACH-NY/canton-network-internal/28663

Copy link
Contributor

@moritzkiefer-da moritzkiefer-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks a lot!

Please run a performance test before merging so we can avoid surprises for long downtimes due to ACS backfilling.

tableName: String,
storeId: AcsStoreId,
migrationId: Long,
where: SQLActionBuilder,
companion: C,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, thanks for pulling through that simplification!

@@ -0,0 +1,362 @@
-- strictly not necessary, but lets us define the constraint as mandatory afterwards,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what performance we get for a full ACS backfill? I'm a bit nervous about this given that the ACS queries are known to not perform well without pruning and there is no pruning atm on SV participants.

How about on CILR you take one SV (maybe best to use the SV runbook given that it resets daily anyway so if it goes wrong we can just reset again) manually force a reingestion by dropping the data from the table and last ingested offset?

If it does take too long, the easy short term option we have is to merge this into the 3.4 branch only. Afaict, there isn't an immediate need to have this be on 3.3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about on CILR you take one SV (maybe best to use the SV runbook given that it resets daily anyway so if it goes wrong we can just reset again) manually force a reingestion by dropping the data from the table and last ingested offset?

What is an "ACS backfill"? After truncating the ACS tables and bumping the store descriptor, the new ACS store will be initialized from the ACS endpoint of the ledger API, is that what you meant? Apart from the participant being slow to deliver the ACS, we are not batching SQL inserts when consuming the ACS

override def ingestAcs() = {
    ...
    // TODO (#989): batch inserts
}

and we load the whole ACS into one in-memory structure:

private def ingestAcsAndInFlight(offset: Long) = {
    ...
    // TODO(#863): stream contracts instead of ingesting them as a single Seq

so I agree that one test on CILR is a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that what you meant?

yes

-- strictly not necessary, but lets us define the constraint as mandatory afterwards,
-- and also makes index creation faster since there's no data to index
truncate table store_last_ingested_offsets cascade;
truncate table acs_store_template cascade;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table should be empty, but it doesn't hurt to truncate it.

@@ -0,0 +1,362 @@
-- strictly not necessary, but lets us define the constraint as mandatory afterwards,
-- and also makes index creation faster since there's no data to index
truncate table store_last_ingested_offsets cascade;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, you're right... I technically don't have to truncate it since I'm bumping the ACS descriptor, though then the "last ingested offset" for those would be incorrect
Since anyway I need to figure out how to safely* delete things on the SV runbook, I'll do the same here

*in the SV runbook, the descriptors don't change, so I can't leave last_ingested_offsets lingering after truncating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/DACH-NY/canton-network-node/pull/18149

They will always be identical in practice, except when one of the stores is reset by changing the descriptor.

yeah that makes it hard for me to tell what I'm supposed to delete from the last_ingested 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually nvm - I can just use the version numbers...

Copy link
Contributor Author

@OriolMunoz-da OriolMunoz-da Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, given this:
image
and this:

txLogStoreDescriptor = StoreDescriptor(
        version = 1,
        name = "DbScanStore",
        party = key.dsoParty,
        participant = participantId,
        key = Map(
          "dsoParty" -> key.dsoParty.toProtoPrimitive
        ),
      )
txLogStoreDescriptor = StoreDescriptor(
        version = 2,
        name = "DbUserWalletStore",
        party = key.endUserParty,
        participant = participantId,
        key = Map(
          "endUserParty" -> key.endUserParty.toProtoPrimitive,
          "validatorParty" -> key.validatorParty.toProtoPrimitive,
          "dsoParty" -> key.dsoParty.toProtoPrimitive,
        ),
      ),

I believe I can:

delete
from store_last_ingested_offsets
where store_id in (select id
                   from store_descriptors
                   where descriptor ->> 'name' IN
                         -- these are ACS only, so we can delete everything
                         ('DbSplitwellStore', 'DbSvSvStore', 'DbValidatorStore', 'DbExternalPartyWalletStore')
                      -- these have txlog, preserve it. Version matches that of the txlog descriptor
                      OR (descriptor ->> 'name' = 'DbUserWalletStore' AND descriptor ->> 'version' != 2)
                      OR (descriptor ->> 'name' = 'DbScanStore' AND descriptor ->> 'version' != 1));

but I need an extra pair of eyes before i do anything stupid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, that invalid package_name won't be used after bumping the descriptor (old data, unused, whenWeGetToIt ™️ deleted), so maybe that's not so bad?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to handle package_name in that case

That is not a column in store_last_ingested_offsets? What I meant is you can truncate the acs tables, but not the store_last_ingested_offsets table. Then you can do whatever you want to the schema of the acs tables, and will only end up with one garbage row in store_descriptors and a few garbage rows in store_last_ingested_offsets (one per migration).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and will only end up with one garbage row in store_descriptors

it's not just garbage, but invalid, as the offset is wrong and the ACS won't be reingested if we rolled back to it - it's not clear to me that that's fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're worried that the old data in the database would be corrupt, even if that data is not used? That's only a problem if we want to roll back the change to store descriptors relatively quickly, in order to use the old data before it becomes so outdated that it's useless? And since we're truncating the acs tables, there's no point doing a rollback, we'd have to restore from backup or do another store version bump, no?

Copy link
Contributor Author

@OriolMunoz-da OriolMunoz-da Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy pasting, this should be reasonable to do:

  • truncate all ACS_store tables, don't touch any other table
  • bump descriptors
  • indexes will be quick to create cause I just truncated

-- where tablename like '%acs_store%'
-- and indexdef like '%template_id_qualified_name%';
DROP INDEX acs_store_template_sid_mid_tid_ce;
CREATE INDEX acs_store_template_sid_mid_pn_tid_ce ON acs_store_template (store_id, migration_id, package_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of indexes being created, did you time it on CILR? If it's more than a minute we should probably mention it in the release notes. There's also SqlIndexInitializationTrigger for creating expensive indexes asynchronously, but having the apps start without fast ACS queries is probably asking for trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took less than a minute to create all indexes for a DSO store copy, which is the biggest one according to https://grafana.cilr.global.canton.network.digitalasset.com/d/dduss3xr5or28c/splice-store-acs-size?orgId=1&from=now-1h&to=now&timezone=UTC&var-namespace=sv-1&var-store_name=$__all
(total of 300k rows across all store ids)

a lot less, if I end up managing to truncate the tables

@@ -326,12 +356,22 @@ object AcsQueries {
companionClass: ContractCompanion[C, TCId, T],
decoder: TemplateJsonDecoder,
): Contract[TCId, T] = {
// safety check: if the templates don't match,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// safety check: if the templates don't match,
// safety check: if the packages don't match,

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// safety check: if the PackageQualifiedNames don't match,

is most correct, will fix

@OriolMunoz-da
Copy link
Contributor Author

timings for ACS reingestion:

a bit more than 2m for scan, logs:
image

and SV app doesn't reingest, see #2041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to relying on package names instead of module names
3 participants