-
Notifications
You must be signed in to change notification settings - Fork 30
Switch to relying on package names instead of module names #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
/cluster_test |
Deploy cluster test triggered for Commit 3b845ce5b206cfe9ff5ed5af7ef95ee9acbc514e in , please contact a Contributor to approve it in CircleCI: https://app.circleci.com/pipelines/github/DACH-NY/canton-network-internal/28163 |
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
Signed-off-by: Oriol Muñoz <[email protected]>
} | ||
} | ||
|
||
"prevent against ingestion of same (moduleName, entityName) with different package name - via interface fallback" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is the best summary for the reason of being of this PR
-- strictly not necessary, but lets us define the constraint as mandatory afterwards, | ||
-- and also makes index creation faster since there's no data to index | ||
truncate table acs_store_template cascade; | ||
truncate table user_wallet_acs_store cascade; | ||
truncate table external_party_wallet_acs_store cascade; | ||
truncate table validator_acs_store cascade; | ||
truncate table sv_acs_store cascade; | ||
truncate table dso_acs_store cascade; | ||
truncate table scan_acs_store cascade; | ||
truncate table splitwell_acs_store cascade; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-check me that this is not a bad idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that all descriptors are bumped
* another list; that is why the contract ID by itself cannot be used by | ||
* itself as the source of the sort key. | ||
*/ | ||
def listAssignedContractsNotOnDomainN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused except in a test (also deleted)
tableName: String, | ||
storeId: AcsStoreId, | ||
migrationId: Long, | ||
where: SQLActionBuilder, | ||
companion: C, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should've been the case all along, instead of having where ti_q_n = ?
on every query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, thanks for pulling through that simplification!
@@ -837,51 +837,6 @@ abstract class MultiDomainAcsStoreTest[ | |||
MultiDomainAcsStoreTest.generatedCoids.value.size should (be >= 900 and be <= 1000) | |||
} | |||
|
|||
"read assignment-mismatched contracts in a stable order" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested listAssignedContractsNotOnDomainN
, deleted
Signed-off-by: Oriol Muñoz <[email protected]>
/cluster_test |
Deploy cluster test triggered for Commit 1680a64b4cb0ace131dd19435f78cb0d69064cea in , please contact a Contributor to approve it in CircleCI: https://app.circleci.com/pipelines/github/DACH-NY/canton-network-internal/28663 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks a lot!
Please run a performance test before merging so we can avoid surprises for long downtimes due to ACS backfilling.
tableName: String, | ||
storeId: AcsStoreId, | ||
migrationId: Long, | ||
where: SQLActionBuilder, | ||
companion: C, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, thanks for pulling through that simplification!
@@ -0,0 +1,362 @@ | |||
-- strictly not necessary, but lets us define the constraint as mandatory afterwards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what performance we get for a full ACS backfill? I'm a bit nervous about this given that the ACS queries are known to not perform well without pruning and there is no pruning atm on SV participants.
How about on CILR you take one SV (maybe best to use the SV runbook given that it resets daily anyway so if it goes wrong we can just reset again) manually force a reingestion by dropping the data from the table and last ingested offset?
If it does take too long, the easy short term option we have is to merge this into the 3.4 branch only. Afaict, there isn't an immediate need to have this be on 3.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about on CILR you take one SV (maybe best to use the SV runbook given that it resets daily anyway so if it goes wrong we can just reset again) manually force a reingestion by dropping the data from the table and last ingested offset?
What is an "ACS backfill"? After truncating the ACS tables and bumping the store descriptor, the new ACS store will be initialized from the ACS endpoint of the ledger API, is that what you meant? Apart from the participant being slow to deliver the ACS, we are not batching SQL inserts when consuming the ACS
override def ingestAcs() = {
...
// TODO (#989): batch inserts
}
and we load the whole ACS into one in-memory structure:
private def ingestAcsAndInFlight(offset: Long) = {
...
// TODO(#863): stream contracts instead of ingesting them as a single Seq
so I agree that one test on CILR is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that what you meant?
yes
-- strictly not necessary, but lets us define the constraint as mandatory afterwards, | ||
-- and also makes index creation faster since there's no data to index | ||
truncate table store_last_ingested_offsets cascade; | ||
truncate table acs_store_template cascade; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table should be empty, but it doesn't hurt to truncate it.
@@ -0,0 +1,362 @@ | |||
-- strictly not necessary, but lets us define the constraint as mandatory afterwards, | |||
-- and also makes index creation faster since there's no data to index | |||
truncate table store_last_ingested_offsets cascade; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can truncate this table, it's also used by txlog stores, see https://github.com/hyperledger-labs/splice/blob/main/apps/common/src/main/scala/org/lfdecentralizedtrust/splice/store/db/DbMultiDomainAcsStore.scala#L1080-L1086.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, you're right... I technically don't have to truncate it since I'm bumping the ACS descriptor, though then the "last ingested offset" for those would be incorrect
Since anyway I need to figure out how to safely* delete things on the SV runbook, I'll do the same here
*in the SV runbook, the descriptors don't change, so I can't leave last_ingested_offsets lingering after truncating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/DACH-NY/canton-network-node/pull/18149
They will always be identical in practice, except when one of the stores is reset by changing the descriptor.
yeah that makes it hard for me to tell what I'm supposed to delete from the last_ingested 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually nvm - I can just use the version numbers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txLogStoreDescriptor = StoreDescriptor(
version = 1,
name = "DbScanStore",
party = key.dsoParty,
participant = participantId,
key = Map(
"dsoParty" -> key.dsoParty.toProtoPrimitive
),
)
txLogStoreDescriptor = StoreDescriptor(
version = 2,
name = "DbUserWalletStore",
party = key.endUserParty,
participant = participantId,
key = Map(
"endUserParty" -> key.endUserParty.toProtoPrimitive,
"validatorParty" -> key.validatorParty.toProtoPrimitive,
"dsoParty" -> key.dsoParty.toProtoPrimitive,
),
),
I believe I can:
delete
from store_last_ingested_offsets
where store_id in (select id
from store_descriptors
where descriptor ->> 'name' IN
-- these are ACS only, so we can delete everything
('DbSplitwellStore', 'DbSvSvStore', 'DbValidatorStore', 'DbExternalPartyWalletStore')
-- these have txlog, preserve it. Version matches that of the txlog descriptor
OR (descriptor ->> 'name' = 'DbUserWalletStore' AND descriptor ->> 'version' != 2)
OR (descriptor ->> 'name' = 'DbScanStore' AND descriptor ->> 'version' != 1));
but I need an extra pair of eyes before i do anything stupid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then again, that invalid package_name won't be used after bumping the descriptor (old data, unused, whenWeGetToIt ™️ deleted), so maybe that's not so bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to handle package_name in that case
That is not a column in store_last_ingested_offsets
? What I meant is you can truncate the acs tables, but not the store_last_ingested_offsets
table. Then you can do whatever you want to the schema of the acs tables, and will only end up with one garbage row in store_descriptors
and a few garbage rows in store_last_ingested_offsets
(one per migration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and will only end up with one garbage row in store_descriptors
it's not just garbage, but invalid, as the offset is wrong and the ACS won't be reingested if we rolled back to it - it's not clear to me that that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're worried that the old data in the database would be corrupt, even if that data is not used? That's only a problem if we want to roll back the change to store descriptors relatively quickly, in order to use the old data before it becomes so outdated that it's useless? And since we're truncating the acs tables, there's no point doing a rollback, we'd have to restore from backup or do another store version bump, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy pasting, this should be reasonable to do:
- truncate all ACS_store tables, don't touch any other table
- bump descriptors
- indexes will be quick to create cause I just truncated
...allet/src/main/scala/org/lfdecentralizedtrust/splice/wallet/store/db/DbUserWalletStore.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Oriol Muñoz <[email protected]>
-- where tablename like '%acs_store%' | ||
-- and indexdef like '%template_id_qualified_name%'; | ||
DROP INDEX acs_store_template_sid_mid_tid_ce; | ||
CREATE INDEX acs_store_template_sid_mid_pn_tid_ce ON acs_store_template (store_id, migration_id, package_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of indexes being created, did you time it on CILR? If it's more than a minute we should probably mention it in the release notes. There's also SqlIndexInitializationTrigger for creating expensive indexes asynchronously, but having the apps start without fast ACS queries is probably asking for trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took less than a minute to create all indexes for a DSO store copy, which is the biggest one according to https://grafana.cilr.global.canton.network.digitalasset.com/d/dduss3xr5or28c/splice-store-acs-size?orgId=1&from=now-1h&to=now&timezone=UTC&var-namespace=sv-1&var-store_name=$__all
(total of 300k rows across all store ids)
a lot less, if I end up managing to truncate the tables
@@ -326,12 +356,22 @@ object AcsQueries { | |||
companionClass: ContractCompanion[C, TCId, T], | |||
decoder: TemplateJsonDecoder, | |||
): Contract[TCId, T] = { | |||
// safety check: if the templates don't match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// safety check: if the templates don't match, | |
// safety check: if the packages don't match, |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// safety check: if the PackageQualifiedNames don't match,
is most correct, will fix
Fixes #829
Pull Request Checklist
Cluster Testing
/cluster_test
on this PR to request it, and ping someone with access to the DA-internal system to approve it./hdm_test
on this PR to request it, and ping someone with access to the DA-internal system to approve it.PR Guidelines
Fixes #n
, and mention issues worked on using#n
Merge Guidelines