From fee65f529f888fd36c860961486b13d233c912f0 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Wed, 27 Aug 2025 16:02:19 +0000 Subject: [PATCH 01/13] Add docs about wasted traffic Fixes #576 [static] Signed-off-by: Martin Florian --- docs/src/deployment/observability/metrics.rst | 2 ++ docs/src/deployment/traffic.rst | 34 +++++++++++++++++++ docs/src/release_notes.rst | 3 ++ 3 files changed, 39 insertions(+) diff --git a/docs/src/deployment/observability/metrics.rst b/docs/src/deployment/observability/metrics.rst index 1720231359..3cfd02bce2 100644 --- a/docs/src/deployment/observability/metrics.rst +++ b/docs/src/deployment/observability/metrics.rst @@ -64,6 +64,8 @@ Configuring a docker compose deployment to enable metrics When using docker compose for the deployment, the metrics are enabled by default. These can be accessed at `http://validator.localhost/metrics` for the validator app and at `http://participant.localhost/metrics` for the participant. +.. _metrics_grafana_dashboards: + Grafana Dashboards ++++++++++++++++++ diff --git a/docs/src/deployment/traffic.rst b/docs/src/deployment/traffic.rst index 0443d00b5d..f458d74bdf 100644 --- a/docs/src/deployment/traffic.rst +++ b/docs/src/deployment/traffic.rst @@ -171,3 +171,37 @@ the validator app will For configuring the built-in top-up automation, please refer to the :ref:`validator deployment guide `. Configuring alternative methods for buying traffic, e.g., using third-party services, exceeds the scope of this documentation. + +.. _traffic_wasted: + +Wasted traffic +-------------- + +`Wasted traffic` is defined as synchronizer events that have been sequenced but will not be delivered to their recipients. +Wasted traffic is problematic for validators because of traffic fees: +it means that :ref:`traffic ` has been charged for a message that was ultimately not delivered. +Not all failed submissions result in wasted traffic: +wasted traffic only occurs whenever a synchronizer event is rejected after sequencing but before delivery. + +Validator perspective ++++++++++++++++++++++ + +Validator operators are encouraged to investigate failed submissions eagerly to avoid systemic causes for wasted traffic that are due +to their individual configuration and/or the specific applications using their validators. +The Splice distribution contains a :ref:`Grafana dashboard ` about `Synchronizer Fees (validator view)` that can be helpful in addition to inspecting logs; +see, for example, the `Rejected Event Traffic` panel there. + +SV perspective +++++++++++++++ + +SV operators are encouraged to monitor wasted traffic across all synchronizer members, +as reported for example by sequencer :ref:`metrics `, +to avoid cases where misconfiguration incurs excessive monetary losses for validators. +The Splice distribution contains a :ref:`Grafana dashboard ` about `Synchronizer Fees (SV view)` that can be helpful, +as well as an alert definition that focuses on validator participants. + +Note that wasted traffic is less relevant for SVs themselves as SV components have unlimited traffic. +Note also that SV mediators and sequencers waste traffic as part of their regular operation; +they frequently use aggregate submissions where all composite submission requests beyond the aggregation threshold get discarded. +All that said, should an SV component suddenly exhibit a significant increase in wasted traffic, +this likely points to an actual issue that should be investigated. diff --git a/docs/src/release_notes.rst b/docs/src/release_notes.rst index 88e662b1c1..daf17c73e3 100644 --- a/docs/src/release_notes.rst +++ b/docs/src/release_notes.rst @@ -36,6 +36,9 @@ Upcoming walletPayments 0.1.13 ================== ======= +- Docs + + - Add documentation about :ref:`Wasted traffic `. 0.4.12 ------ From a456a50d72f41533b6722fda02f87ea9fa44f72f Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 09:07:33 +0000 Subject: [PATCH 02/13] Script and integration test [ci] Signed-off-by: Martin Florian --- .../examples/recovery/manual-identity-dump.sc | 16 +++++ ...aintextIdentitiesDumpIntegrationTest.scala | 72 +++++++++++++++++-- .../validator_disaster_recovery.rst | 15 +++- 3 files changed, 93 insertions(+), 10 deletions(-) create mode 100644 apps/app/src/pack/examples/recovery/manual-identity-dump.sc diff --git a/apps/app/src/pack/examples/recovery/manual-identity-dump.sc b/apps/app/src/pack/examples/recovery/manual-identity-dump.sc new file mode 100644 index 0000000000..2076ed3137 --- /dev/null +++ b/apps/app/src/pack/examples/recovery/manual-identity-dump.sc @@ -0,0 +1,16 @@ +import com.digitalasset.canton.topology.transaction.TopologyMapping +import com.digitalasset.canton.topology.store.TimeQuery +import java.util.Base64 + +val id = participant.id.toProtoPrimitive + +val keys = "[" + participant.keys.secret.list().filter(k => k.name.get.unwrap != "cometbft-governance-keys").map(key => s"{\"keyPair\": \"${Base64.getEncoder.encodeToString( participant.keys.secret.download(key.publicKey.fingerprint).toByteArray)}\", \"name\": \"${key.name.get.unwrap}\"}") .mkString(",") + "]" + +val authorizedStoreSnapshot = Base64.getEncoder.encodeToString(participant.topology.transactions.export_topology_snapshot(timeQuery = TimeQuery.Range(from = None, until = None), filterMappings = Seq(TopologyMapping.Code.NamespaceDelegation, TopologyMapping.Code.OwnerToKeyMapping, TopologyMapping.Code.VettedPackages), filterNamespace = participant.id.namespace.toProtoPrimitive).toByteArray) + +val combinedJson = s"""{ "id" : "$id", "keys" : $keys, "authorizedStoreSnapshot" : "$authorizedStoreSnapshot" }""" + +// Write to file +import java.nio.file.{Files, Paths} +val dumpPath = Paths.get("identity-dump.json") +Files.writeString(dumpPath, combinedJson) diff --git a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala index c13b9978f8..d7d93904bb 100644 --- a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala +++ b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala @@ -1,10 +1,15 @@ package org.lfdecentralizedtrust.splice.integration.tests import better.files.File +import com.digitalasset.canton.ConsoleScriptRunner import com.digitalasset.canton.crypto.{CryptoKeyPair, Fingerprint} import com.digitalasset.canton.topology.ParticipantId import com.google.protobuf.ByteString -import org.lfdecentralizedtrust.splice.config.{ConfigTransforms, ParticipantBootstrapDumpConfig} +import org.lfdecentralizedtrust.splice.config.{ + ConfigTransforms, + ParticipantBootstrapDumpConfig, + SpliceConfig, +} import org.lfdecentralizedtrust.splice.config.ConfigTransforms.{ ensureNovelDamlNames, selfSignedTokenAuthSourceTransform, @@ -12,14 +17,16 @@ import org.lfdecentralizedtrust.splice.config.ConfigTransforms.{ updateAllSvAppConfigs, updateAllValidatorConfigs, } -import org.lfdecentralizedtrust.splice.config.SpliceConfig import org.lfdecentralizedtrust.splice.identities.NodeIdentitiesDump import org.lfdecentralizedtrust.splice.integration.EnvironmentDefinition -import org.lfdecentralizedtrust.splice.integration.tests.SpliceTests.IntegrationTest +import org.lfdecentralizedtrust.splice.integration.tests.SpliceTests.{ + IntegrationTest, + SpliceTestConsoleEnvironment, +} import org.lfdecentralizedtrust.splice.util.StandaloneCanton import monocle.macros.syntax.lens.* -import java.nio.file.{Path, Paths} +import java.nio.file.{Files, Path, Paths} @org.lfdecentralizedtrust.splice.util.scalatesttags.NoDamlCompatibilityCheck class ParticipantPlaintextIdentitiesIntegrationTest extends IntegrationTest with StandaloneCanton { @@ -97,9 +104,10 @@ class ParticipantPlaintextIdentitiesIntegrationTest extends IntegrationTest with implicit env => startAllSync(sv1Backend, sv1ScanBackend, sv1ValidatorBackend) - val svParticipantDump = clue("Getting participant identities dump from SV1") { - sv1ValidatorBackend.dumpParticipantIdentities() - } + val svParticipantDump = + clue("Getting participant identities dump from SV1 via validator API") { + sv1ValidatorBackend.dumpParticipantIdentities() + } clue("Checking exported key names for SV1") { val keyNames = svParticipantDump.keys.map(_.name.value) @@ -109,6 +117,29 @@ class ParticipantPlaintextIdentitiesIntegrationTest extends IntegrationTest with keyNames should contain(s"$prefix-encryption") } + val svParticipantDumpManual = clue( + "Getting participant identities dump from SV1 manually" + ) { + val dumpPath = Files.createTempFile("manual-participant-dump", ".json") + manuallyDumpParticipantIdentities( + "sv1Validator.participantClient", + dumpPath, + ) + NodeIdentitiesDump + .fromJsonFile( + dumpPath, + ParticipantId.tryFromProtoPrimitive, + ) + .value + } + + clue("Manually dumped identities match the ones dumped via the API") { + svParticipantDumpManual.id shouldBe svParticipantDump.id + svParticipantDumpManual.keys should contain theSameElementsAs svParticipantDump.keys + svParticipantDumpManual.authorizedStoreSnapshot shouldBe svParticipantDump.authorizedStoreSnapshot + // we don't care about the version + } + withCanton( Seq( testResourcesPath / "standalone-participant-extra.conf", @@ -143,6 +174,33 @@ class ParticipantPlaintextIdentitiesIntegrationTest extends IntegrationTest with } } + private def manuallyDumpParticipantIdentities( + participantHandle: String, + dumpPath: Path, + )(implicit env: SpliceTestConsoleEnvironment): Unit = { + val originalDumpScriptPath = File("apps/app/src/pack/examples/recovery/manual-identity-dump.sc") + + // the original script assumes that the participant is called `participant` + // and that the dump will be written to `identity-dump.json`; we need to adjust both + val modifiedDumpScriptPath = Files.createTempFile("modified-manual-identity-dump", ".sc") + + clue("Modifying dump script") { + val originalScript = originalDumpScriptPath.contentAsString + val modifiedScript = originalScript + .replaceAll("participant.", s"$participantHandle.") + .replaceAll("identity-dump.json", dumpPath.toAbsolutePath.toString) + File(modifiedDumpScriptPath).writeText(modifiedScript) + } + + clue("Running modified dump script") { + ConsoleScriptRunner.run( + env.environment, + File(modifiedDumpScriptPath).toJava, + logger, + ) + } + } + // TODO(tech-debt) Consider removing this method in favor of making `useSelfSignedTokensForLedgerApiAuth` take an `ignore` parameter private def useSelfSignedTokensForLongRunningLedgerApiAuth( secret: String, diff --git a/docs/src/validator_operator/validator_disaster_recovery.rst b/docs/src/validator_operator/validator_disaster_recovery.rst index a838185ba8..9008c1fc31 100644 --- a/docs/src/validator_operator/validator_disaster_recovery.rst +++ b/docs/src/validator_operator/validator_disaster_recovery.rst @@ -85,15 +85,17 @@ If you are running a docker-compose deployment, you can restore the Postgres dat .. _validator_reonboard: -Re-onboard a validator and recover balances of all users it hosts -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +Recovery from an identity dump: Re-onboard a validator and recover balances of all users it hosts ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ In the case of a catastrophic failure of the validator node, some data owned by the validator and users it hosts can be recovered from the SVs. This data includes Canton Coin balance and CNS entries. This is achieved -by deploying a new validator node with control over the original validator's participant keys. +by deploying a new validator node with control over the original validator's namespace key. In order to be able to recover the data, you must have a backup of the identities of the validator, as created in the :ref:`Backup of Node Identities ` section. +In case you do not have such a backup but instead have a backup of the validator participant's database, +you can :ref:`assemble an identity dump manually `. To recover from the identities backup, we deploy a new validator with some special configuration described below. Refer to either the @@ -151,6 +153,13 @@ where ```` is the path to the file containing the nod ```` is a new identifier to be used for the new participant. It must be one never used before. Note that in subsequent restarts of the validator, you should keep providing ``-P`` with the same ````. +.. _validator_manual_dump: + +Obtaining an Identity Dump from a Participant Database Backup +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TODO + Limitations and Troubleshooting ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 1cea87d10235b9e964028aaa5a8ac581a77af9f0 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 12:28:59 +0000 Subject: [PATCH 03/13] Manual dump instructions [static] Signed-off-by: Martin Florian --- .../validator_disaster_recovery.rst | 22 ++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/docs/src/validator_operator/validator_disaster_recovery.rst b/docs/src/validator_operator/validator_disaster_recovery.rst index 9008c1fc31..bcbb347d86 100644 --- a/docs/src/validator_operator/validator_disaster_recovery.rst +++ b/docs/src/validator_operator/validator_disaster_recovery.rst @@ -158,7 +158,27 @@ Note that in subsequent restarts of the validator, you should keep providing ``- Obtaining an Identity Dump from a Participant Database Backup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -TODO +In case you do not have a usable identities backup but instead have a backup of the validator participant's database, +you can assemble an identity dump manually. +Here is one possible way to do so: + +#. Restore the database backup into a temporary postgres instance and deploy a temporary participant against that instance. + + * See the section on :ref:`restoring a validator from backups ` for pointers that match your deployment model. + * You only need to restore and scale up the participant, i.e., you can ignore the validator app and its database. + * In case the restored participant shuts down immediately due to failures, add the following :ref:`additional configuration `: + + .. code-block:: yaml + + additionalEnvVars: + - name: ADDITIONAL_CONFIG_EXIT_ON_FATAL_FAILURES + value: canton.parameters.exit-on-fatal-failures = false + +#. Open a :ref:`Canton console ` to the temporary participant. +#. Run below commands in the opened console. This will store the identity dump into a *local* file + (relative to the local directory from which you opened the console) called ``identity-dump.json``. + + .. literalinclude:: ../../../apps/app/src/pack/examples/recovery/manual-identity-dump.sc Limitations and Troubleshooting ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 8795697e028776303c87e0b4d0093152b7111fd1 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 13:07:35 +0000 Subject: [PATCH 04/13] stuff Signed-off-by: Martin Florian --- ...tity-dump.sc => manual-identities-dump.sc} | 2 +- ...aintextIdentitiesDumpIntegrationTest.scala | 8 +-- docs/src/common/backup_suggestion.rst | 4 +- docs/src/sv_operator/sv_restore.rst | 2 +- .../validator_disaster_recovery.rst | 50 +++++++++++-------- 5 files changed, 38 insertions(+), 28 deletions(-) rename apps/app/src/pack/examples/recovery/{manual-identity-dump.sc => manual-identities-dump.sc} (95%) diff --git a/apps/app/src/pack/examples/recovery/manual-identity-dump.sc b/apps/app/src/pack/examples/recovery/manual-identities-dump.sc similarity index 95% rename from apps/app/src/pack/examples/recovery/manual-identity-dump.sc rename to apps/app/src/pack/examples/recovery/manual-identities-dump.sc index 2076ed3137..05a70ea615 100644 --- a/apps/app/src/pack/examples/recovery/manual-identity-dump.sc +++ b/apps/app/src/pack/examples/recovery/manual-identities-dump.sc @@ -12,5 +12,5 @@ val combinedJson = s"""{ "id" : "$id", "keys" : $keys, "authorizedStoreSnapshot" // Write to file import java.nio.file.{Files, Paths} -val dumpPath = Paths.get("identity-dump.json") +val dumpPath = Paths.get("identities-dump.json") Files.writeString(dumpPath, combinedJson) diff --git a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala index d7d93904bb..069c2d92d3 100644 --- a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala +++ b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala @@ -178,17 +178,17 @@ class ParticipantPlaintextIdentitiesIntegrationTest extends IntegrationTest with participantHandle: String, dumpPath: Path, )(implicit env: SpliceTestConsoleEnvironment): Unit = { - val originalDumpScriptPath = File("apps/app/src/pack/examples/recovery/manual-identity-dump.sc") + val originalDumpScriptPath = File("apps/app/src/pack/examples/recovery/manual-identities-dump.sc") // the original script assumes that the participant is called `participant` - // and that the dump will be written to `identity-dump.json`; we need to adjust both - val modifiedDumpScriptPath = Files.createTempFile("modified-manual-identity-dump", ".sc") + // and that the dump will be written to `identities-dump.json`; we need to adjust both + val modifiedDumpScriptPath = Files.createTempFile("modified-manual-identities-dump", ".sc") clue("Modifying dump script") { val originalScript = originalDumpScriptPath.contentAsString val modifiedScript = originalScript .replaceAll("participant.", s"$participantHandle.") - .replaceAll("identity-dump.json", dumpPath.toAbsolutePath.toString) + .replaceAll("identities-dump.json", dumpPath.toAbsolutePath.toString) File(modifiedDumpScriptPath).writeText(modifiedScript) } diff --git a/docs/src/common/backup_suggestion.rst b/docs/src/common/backup_suggestion.rst index 3c9f52c401..4f77e7e5e5 100644 --- a/docs/src/common/backup_suggestion.rst +++ b/docs/src/common/backup_suggestion.rst @@ -7,8 +7,8 @@ **If you lose your keys, you lose access to your coins**. While regular backups are not necessary to run your node, they are **strongly** recommended for recovery purposes. - You should regularly back up all databases in your deployment and ensure you always have an up-to-date identity backup. - Super Validators retain the information necessary to allow you to recover your Canton Coin from an identity backup. + You should regularly back up all databases in your deployment and ensure you always have an up-to-date identities backup. + Super Validators retain the information necessary to allow you to recover your Canton Coin from an identities backup. On the other hand, Super Validators **do not** retain transaction details from applications they are not involved in. This means that if you have other applications installed, the Super Validators cannot help you recover data from those apps; you can only rely on your own backups. diff --git a/docs/src/sv_operator/sv_restore.rst b/docs/src/sv_operator/sv_restore.rst index 8e0cca6a5e..230e2a1876 100644 --- a/docs/src/sv_operator/sv_restore.rst +++ b/docs/src/sv_operator/sv_restore.rst @@ -15,7 +15,7 @@ There are three ways to recover from disasters: network is still healthy, a :ref:`Restore from backup ` is usually sufficient. -#. If a full backup is unavailable but an identity backup has been +#. If a full backup is unavailable but an identities backup has been created, the balance of the SV can be :ref:`recovered ` on a dedicated validator but the SV must be onboarded as a separate node. diff --git a/docs/src/validator_operator/validator_disaster_recovery.rst b/docs/src/validator_operator/validator_disaster_recovery.rst index bcbb347d86..2350d5691d 100644 --- a/docs/src/validator_operator/validator_disaster_recovery.rst +++ b/docs/src/validator_operator/validator_disaster_recovery.rst @@ -15,7 +15,7 @@ There are three ways to recover from disasters: network is still healthy, a :ref:`Restore from backup ` is usually sufficient. -#. If a full backup is unavailable but an identity backup has been +#. If a full backup is unavailable but an identities backup has been created, the balance of the validator can be :ref:`recovered ` on a new validator. @@ -85,27 +85,34 @@ If you are running a docker-compose deployment, you can restore the Postgres dat .. _validator_reonboard: -Recovery from an identity dump: Re-onboard a validator and recover balances of all users it hosts -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +Recovery from an identities backup: Re-onboard a validator and recover balances of all users it hosts ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ In the case of a catastrophic failure of the validator node, some data owned by the validator and users it hosts can be recovered from the SVs. This data includes Canton Coin balance and CNS entries. This is achieved by deploying a new validator node with control over the original validator's namespace key. +The namespace key must be provided via an identities backup file. +It is used by the new validator for migrating the parties hosted on the original validator to the new validator. +SVs assist this process by providing information about all contracts known to them that the migrated parties are stakeholders of. -In order to be able to recover the data, you must have a backup of the identities of the -validator, as created in the :ref:`Backup of Node Identities ` section. +The following steps assume that you have a backup of the identities of the +validator, as created in the :ref:`Backup of Node Identities ` section. In case you do not have such a backup but instead have a backup of the validator participant's database, -you can :ref:`assemble an identity dump manually `. +you can :ref:`assemble an identities backup manually `. To recover from the identities backup, we deploy a new validator with some special configuration described below. Refer to either the -docker-compose deployment instructions or the kubernetes instructions +:ref:`docker-compose deployment instructions ` +or the +:ref:`kubernetes instructions ` depending on which setup you chose. Once the new validator is up and running, you should be able to login as the administrator and see its balance. Other users hosted on the validator would need to re-onboard, but their coin balance and CNS entries should be recovered. +In case of issues, please consult the :ref:`troubleshooting ` section below. + .. warning:: This process preserves all party IDs and all contracts shared with the DSO party. This means that you *must* keep the same validator party hint and you do not need a new @@ -113,6 +120,8 @@ coin balance and CNS entries should be recovered. new onboarding secret, double check your configuration instead of requesting a new secret. +.. _validator_reonboard_k8s: + Kubernetes Deployment ^^^^^^^^^^^^^^^^^^^^^ @@ -120,8 +129,8 @@ To re-onboard a validator in a Kubernetes deployment and recover the balances of repeat the steps described in :ref:`helm-validator-install` for installing the validator app and participant. While doing so, please note the following: -* Create a Kubernetes secret with the content of the identities dump file. - Assuming you set the environment variable ``PARTICIPANT_BOOTSTRAP_DUMP_FILE`` to a dump file path, you can create the secret with the following command: +* Create a Kubernetes secret with the content of the identities backup file. + Assuming you set the environment variable ``PARTICIPANT_BOOTSTRAP_DUMP_FILE`` to a backup file path, you can create the secret with the following command: .. code-block:: bash @@ -149,17 +158,17 @@ To re-onboard a validator in a Docker-compose deployment and recover the balance ./start.sh -s "" -o "" -p -m "" -i "" -P "" -w -where ```` is the path to the file containing the node identities dump, and +where ```` is the path to the file containing the node identities backup, and ```` is a new identifier to be used for the new participant. It must be one never used before. Note that in subsequent restarts of the validator, you should keep providing ``-P`` with the same ````. .. _validator_manual_dump: -Obtaining an Identity Dump from a Participant Database Backup -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Obtaining an Identities Backup from a Participant Database Backup +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In case you do not have a usable identities backup but instead have a backup of the validator participant's database, -you can assemble an identity dump manually. +you can assemble an identities backup manually. Here is one possible way to do so: #. Restore the database backup into a temporary postgres instance and deploy a temporary participant against that instance. @@ -175,10 +184,12 @@ Here is one possible way to do so: value: canton.parameters.exit-on-fatal-failures = false #. Open a :ref:`Canton console ` to the temporary participant. -#. Run below commands in the opened console. This will store the identity dump into a *local* file - (relative to the local directory from which you opened the console) called ``identity-dump.json``. +#. Run below commands in the opened console. This will store the backup into a *local* file + (relative to the local directory from which you opened the console) called ``identities-dump.json``. + + .. literalinclude:: ../../../apps/app/src/pack/examples/recovery/manual-identities-dump.sc - .. literalinclude:: ../../../apps/app/src/pack/examples/recovery/manual-identity-dump.sc +.. _validator_disaster_recovery_troubleshooting: Limitations and Troubleshooting ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -222,9 +233,8 @@ of at least one of the parties hosted on your node. To address this, you can usu participant.topology.party_to_participant_mappings.propose(, Seq((, )), store = syncId) -2. If your parties are still on the original node that you took identity dumps from, you can use your existing dump. - If your parties have been migrated already, take a new dump from the node. If your node is in a state where you cannot take a fresh dump, use the old dump but edit the ``id`` - field in your identity dump to the participant id of the new node. +2. If your parties are still on the original node that you took identities backup from, you can use your existing backup. + If your parties have been migrated already, take a new identities dump from the node. If your node is in a state where you cannot take a fresh dump, use the old dump but edit the ``id`` field to the participant id of the new node. You can now take down the broken node on which you tried to restore and try the restore procedure again with your adjusted dump on a fresh node with a different ````. .. _validator_recover_external_party: @@ -240,7 +250,7 @@ hosting it becomes unusable for whatever reason. recovery **must** be a **completely new validator**. An existing validator may brick completely due to some limitations around party migrations and there is no way to recover from that at - this point. Recovering a validator from an identity backup does not classify + this point. Recovering a validator from an identities backup does not classify as a completely new validator here. You must setup it with a completely new identity and a completely clean database. This limitation is expected to be lifted in From 4b3f771e47bd268e6fc89bf20a36630aca7d1f0d Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 15:55:34 +0000 Subject: [PATCH 05/13] more stuff Signed-off-by: Martin Florian --- .../validator_disaster_recovery.rst | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/src/validator_operator/validator_disaster_recovery.rst b/docs/src/validator_operator/validator_disaster_recovery.rst index 2350d5691d..62c62fcb2e 100644 --- a/docs/src/validator_operator/validator_disaster_recovery.rst +++ b/docs/src/validator_operator/validator_disaster_recovery.rst @@ -108,8 +108,8 @@ or the depending on which setup you chose. Once the new validator is up and running, you should be able to login as the administrator -and see its balance. Other users hosted on the validator would need to re-onboard, but their -coin balance and CNS entries should be recovered. +and see its balance. Other users hosted on the validator will need to re-onboard, but their +coin balance and CNS entries should be recovered and will accessible to users that have re-onboarded. In case of issues, please consult the :ref:`troubleshooting ` section below. @@ -219,23 +219,27 @@ of at least one of the parties hosted on your node. To address this, you can usu with the old participant id or they have been migrated to the new node. You can check by opening a :ref:`Canton console ` to any participant on the network (i.e., you can also ask another validator or SV operator for this information) and running the - following query where is the part after the ``::`` in - your participant id. + following query where is the part after the ``::`` in, for example, your validator party id .. code:: val syncId = participant.synchronizers.list_connected().head.synchronizerId participant.topology.party_to_participant_mappings.list(syncId, filterNamespace = ) - If all parties are on the same node, proceed with the next step. If some are on the old node and some are on the new node, migrate the ones on the old node to the new node through (adjust the parameters as required for your parties): + If all parties are on the same node, proceed with the next step. If some are on the old node and some are on the new node, migrate the ones on the old node to the new node by opening a console to the new node and running the following command + (adjust the parameters as required for your parties): .. code:: - participant.topology.party_to_participant_mappings.propose(, Seq((, )), store = syncId) + val participantId = participant.id // id of the new participant + participant.topology.party_to_participant_mappings.propose(, Seq((participantId, )), store = syncId) 2. If your parties are still on the original node that you took identities backup from, you can use your existing backup. - If your parties have been migrated already, take a new identities dump from the node. If your node is in a state where you cannot take a fresh dump, use the old dump but edit the ``id`` field to the participant id of the new node. - You can now take down the broken node on which you tried to restore and try the restore procedure again with your adjusted dump on a fresh node with a different ````. + If your parties have been migrated to the new node already, take a new identities dump from the new node. + If the new node is in a state where you cannot take a fresh dump, use the old dump but edit the ``id`` field to the participant id of the new node. + You can obtain the ``id`` in the correct format by, for example, running ``participant.id.toProtoPrimitive`` in a Canton console to the participant. + You can now take down the node to which you originally tried to restore and try the restore procedure again with your adjusted dump on a fresh node with a different participant ID prefix + (i.e., a different ``newParticipantIdentifier`` / ```` depending on your deployment model). .. _validator_recover_external_party: From bf861d10551ab788060159369e7127f56d2f8514 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 16:15:52 +0000 Subject: [PATCH 06/13] mainly text tweaks [ci] Signed-off-by: Martin Florian --- .../validator_disaster_recovery.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/src/validator_operator/validator_disaster_recovery.rst b/docs/src/validator_operator/validator_disaster_recovery.rst index 62c62fcb2e..7354303fcb 100644 --- a/docs/src/validator_operator/validator_disaster_recovery.rst +++ b/docs/src/validator_operator/validator_disaster_recovery.rst @@ -214,29 +214,29 @@ If you still observe issues, in particular you observe something has likely gone wrong while importing the active contracts of at least one of the parties hosted on your node. To address this, you can usually: -1. First make sure all parties are on a consistent node. The most +1. First make sure all parties are hosted on the same node. The most common case is that either the parties are still on the old node - with the old participant id or they have been migrated to the new + with the old participant ID or they have been migrated to the new node. You can check by opening a :ref:`Canton console ` to any participant on the network (i.e., you can also ask another validator or SV operator for this information) and running the - following query where is the part after the ``::`` in, for example, your validator party id + following query where is the part after the ``::`` in, for example, your validator party ID. .. code:: val syncId = participant.synchronizers.list_connected().head.synchronizerId participant.topology.party_to_participant_mappings.list(syncId, filterNamespace = ) - If all parties are on the same node, proceed with the next step. If some are on the old node and some are on the new node, migrate the ones on the old node to the new node by opening a console to the new node and running the following command + If all parties are on the same node, proceed to the next step. If some are on the old node and some are on the new node, migrate the ones on the old node to the new node by opening a console to the new node and running the following command (adjust the parameters as required for your parties): .. code:: - val participantId = participant.id // id of the new participant + val participantId = participant.id // ID of the new participant participant.topology.party_to_participant_mappings.propose(, Seq((participantId, )), store = syncId) 2. If your parties are still on the original node that you took identities backup from, you can use your existing backup. If your parties have been migrated to the new node already, take a new identities dump from the new node. - If the new node is in a state where you cannot take a fresh dump, use the old dump but edit the ``id`` field to the participant id of the new node. + If the new node is in a state where you cannot take a fresh dump, use the old dump but edit the ``id`` field to the participant ID of the new node. You can obtain the ``id`` in the correct format by, for example, running ``participant.id.toProtoPrimitive`` in a Canton console to the participant. You can now take down the node to which you originally tried to restore and try the restore procedure again with your adjusted dump on a fresh node with a different participant ID prefix (i.e., a different ``newParticipantIdentifier`` / ```` depending on your deployment model). @@ -273,7 +273,7 @@ it on multiple nodes, you will need to adjust this. .. code:: - // replace YOUR_PARTY_ID by the id of your external party + // replace YOUR_PARTY_ID by the ID of your external party val partyId = PartyId.tryFromProtoPrimitive("YOUR_PARTY_ID") val participantId = participant.id val synchronizerId = participant.synchronizers.id_of("global") @@ -337,7 +337,7 @@ We can now check that the topology transaction got correctly applied and get the .. code:: - // The detailed output will slightly vary. Make sure that you see the new participant id though. + // The detailed output will slightly vary. Make sure that you see the new participant ID though. participant.topology.party_to_participant_mappings.list(synchronizerId, filterParty = partyId.filterString) res36: Seq[topology.ListPartyToParticipantResult] = Vector( ListPartyToParticipantResult( From 902ca1a4aea224127e938717b1bd5c1441a929a2 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 16:21:29 +0000 Subject: [PATCH 07/13] release notes [ci] Signed-off-by: Martin Florian --- docs/src/release_notes.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/src/release_notes.rst b/docs/src/release_notes.rst index 88a67ba3d2..9cb7f60186 100644 --- a/docs/src/release_notes.rst +++ b/docs/src/release_notes.rst @@ -8,6 +8,15 @@ Release Notes ============= +Upcoming +-------- + +- Documentation + + - Various improvements to the docs on :ref:`recovering a validator from an identities backup `, + including adding a section on :ref:`obtaining an identities backup from a database backup `. + + 0.4.13 ------ From bd2681f231e1acac22c009ccbec45759a6e486f1 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 16:38:47 +0000 Subject: [PATCH 08/13] lol scalafmt [ci] Signed-off-by: Martin Florian --- .../ParticipantPlaintextIdentitiesDumpIntegrationTest.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala index 069c2d92d3..c94b8f77f0 100644 --- a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala +++ b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala @@ -178,7 +178,9 @@ class ParticipantPlaintextIdentitiesIntegrationTest extends IntegrationTest with participantHandle: String, dumpPath: Path, )(implicit env: SpliceTestConsoleEnvironment): Unit = { - val originalDumpScriptPath = File("apps/app/src/pack/examples/recovery/manual-identities-dump.sc") + val originalDumpScriptPath = File( + "apps/app/src/pack/examples/recovery/manual-identities-dump.sc" + ) // the original script assumes that the participant is called `participant` // and that the dump will be written to `identities-dump.json`; we need to adjust both From b899290cd783cb95489ddd98950e9c959266073a Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Thu, 28 Aug 2025 16:44:18 +0000 Subject: [PATCH 09/13] do something [ci] Signed-off-by: Martin Florian From 1d2c911fea05352d9b78b3a856cd756fa35664eb Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Fri, 29 Aug 2025 08:30:24 +0000 Subject: [PATCH 10/13] review comments [ci] Signed-off-by: Martin Florian --- apps/app/src/pack/examples/recovery/manual-identities-dump.sc | 3 ++- .../ParticipantPlaintextIdentitiesDumpIntegrationTest.scala | 4 +--- docs/src/validator_operator/validator_disaster_recovery.rst | 4 +++- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/apps/app/src/pack/examples/recovery/manual-identities-dump.sc b/apps/app/src/pack/examples/recovery/manual-identities-dump.sc index 05a70ea615..752dc93c39 100644 --- a/apps/app/src/pack/examples/recovery/manual-identities-dump.sc +++ b/apps/app/src/pack/examples/recovery/manual-identities-dump.sc @@ -4,7 +4,8 @@ import java.util.Base64 val id = participant.id.toProtoPrimitive -val keys = "[" + participant.keys.secret.list().filter(k => k.name.get.unwrap != "cometbft-governance-keys").map(key => s"{\"keyPair\": \"${Base64.getEncoder.encodeToString( participant.keys.secret.download(key.publicKey.fingerprint).toByteArray)}\", \"name\": \"${key.name.get.unwrap}\"}") .mkString(",") + "]" +// This line needs to be adapted if your participant stores keys in an external KMS +val keys = "[" + participant.keys.secret.list().filter(k => k.name.get.unwrap != "cometbft-governance-keys").map(key => s"{\"keyPair\": \"${Base64.getEncoder.encodeToString(participant.keys.secret.download(key.publicKey.fingerprint).toByteArray)}\", \"name\": \"${key.name.get.unwrap}\"}") .mkString(",") + "]" val authorizedStoreSnapshot = Base64.getEncoder.encodeToString(participant.topology.transactions.export_topology_snapshot(timeQuery = TimeQuery.Range(from = None, until = None), filterMappings = Seq(TopologyMapping.Code.NamespaceDelegation, TopologyMapping.Code.OwnerToKeyMapping, TopologyMapping.Code.VettedPackages), filterNamespace = participant.id.namespace.toProtoPrimitive).toByteArray) diff --git a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala index c94b8f77f0..bd4f069b89 100644 --- a/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala +++ b/apps/app/src/test/scala/org/lfdecentralizedtrust/splice/integration/tests/ParticipantPlaintextIdentitiesDumpIntegrationTest.scala @@ -134,10 +134,8 @@ class ParticipantPlaintextIdentitiesIntegrationTest extends IntegrationTest with } clue("Manually dumped identities match the ones dumped via the API") { - svParticipantDumpManual.id shouldBe svParticipantDump.id - svParticipantDumpManual.keys should contain theSameElementsAs svParticipantDump.keys - svParticipantDumpManual.authorizedStoreSnapshot shouldBe svParticipantDump.authorizedStoreSnapshot // we don't care about the version + svParticipantDumpManual shouldBe svParticipantDump.copy(version = None) } withCanton( diff --git a/docs/src/validator_operator/validator_disaster_recovery.rst b/docs/src/validator_operator/validator_disaster_recovery.rst index 7354303fcb..a4489cc369 100644 --- a/docs/src/validator_operator/validator_disaster_recovery.rst +++ b/docs/src/validator_operator/validator_disaster_recovery.rst @@ -187,7 +187,9 @@ Here is one possible way to do so: #. Run below commands in the opened console. This will store the backup into a *local* file (relative to the local directory from which you opened the console) called ``identities-dump.json``. - .. literalinclude:: ../../../apps/app/src/pack/examples/recovery/manual-identities-dump.sc + .. literalinclude:: ../../../apps/app/src/pack/examples/recovery/manual-identities-dump.sc + + Note that above commands need to be adapted if your participant is configured to store keys in an :ref:`external KMS `. .. _validator_disaster_recovery_troubleshooting: From aa129f787d0e15b54f592da05528ff86ba56c9e3 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Fri, 29 Aug 2025 09:25:39 +0000 Subject: [PATCH 11/13] review comments [static] Signed-off-by: Martin Florian --- docs/src/deployment/traffic.rst | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/docs/src/deployment/traffic.rst b/docs/src/deployment/traffic.rst index f458d74bdf..fd64cd0bd4 100644 --- a/docs/src/deployment/traffic.rst +++ b/docs/src/deployment/traffic.rst @@ -178,30 +178,42 @@ Wasted traffic -------------- `Wasted traffic` is defined as synchronizer events that have been sequenced but will not be delivered to their recipients. -Wasted traffic is problematic for validators because of traffic fees: -it means that :ref:`traffic ` has been charged for a message that was ultimately not delivered. +For validators, which are subject to traffic fees, +wasted traffic implies that :ref:`traffic ` has been charged for a message that was ultimately not delivered. Not all failed submissions result in wasted traffic: wasted traffic only occurs whenever a synchronizer event is rejected after sequencing but before delivery. +Some level of wasted traffic is expected and unavoidable, due to factors such as: + +- Submission request amplification. + Participants that use BFT sequencer connections duplicate submission requests after a timeout to ensure speedy delivery in the face of nonresponsive sequencers; + if processing was simply slower than usual but the sequencer was not faulty, the duplicate request counts as wasted traffic. +- Duplication of messages within the ordering layer, typically linked to transient networking issues or load spikes. +- Duplication of submissions on the participant/app side, for example when catching up after restoring from a backup or after some crashes. Validator perspective +++++++++++++++++++++ -Validator operators are encouraged to investigate failed submissions eagerly to avoid systemic causes for wasted traffic that are due -to their individual configuration and/or the specific applications using their validators. -The Splice distribution contains a :ref:`Grafana dashboard ` about `Synchronizer Fees (validator view)` that can be helpful in addition to inspecting logs; -see, for example, the `Rejected Event Traffic` panel there. +Validator operators are encouraged to monitor the rate of failed submissions on their validators and investigate the causes of repeatedly failing submissions eagerly. +As stated above, not all failed submissions result in wasted traffic, and some wasted traffic is unavoidable. +Attention is warranted, however, if the rate of wasted traffic increases significantly at some point in time. + +The Splice distribution contains a :ref:`Grafana dashboard ` about `Synchronizer Fees (validator view)`, +to assist in monitoring traffic-related metrics. +The `Rejected Event Traffic` panel on this dashboard is especially relevant for determining the rate of wasted traffic. +(Hover on the ⓘ symbols in panel headers for precise descriptions of the shown data.) SV perspective ++++++++++++++ SV operators are encouraged to monitor wasted traffic across all synchronizer members, as reported for example by sequencer :ref:`metrics `, -to avoid cases where misconfiguration incurs excessive monetary losses for validators. +to detect cases where wasted traffic increases significantly and/or in a global manner. The Splice distribution contains a :ref:`Grafana dashboard ` about `Synchronizer Fees (SV view)` that can be helpful, as well as an alert definition that focuses on validator participants. -Note that wasted traffic is less relevant for SVs themselves as SV components have unlimited traffic. -Note also that SV mediators and sequencers waste traffic as part of their regular operation; -they frequently use aggregate submissions where all composite submission requests beyond the aggregation threshold get discarded. +Note that wasted traffic is less relevant for SVs themselves, as SV components have unlimited traffic. +Note also that SV mediators and sequencers waste traffic as part of their regular operation: +They heavily use aggregate submissions where sequencers collect messages from a group of senders and only deliver a single message per recipient once a threshold of individual submissions has been sequenced; +sequenced individual submissions beyond the aggregation threshold count as wasted traffic. All that said, should an SV component suddenly exhibit a significant increase in wasted traffic, this likely points to an actual issue that should be investigated. From 859c0fc6622590cd4d7e27042229da22533fd59b Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Fri, 29 Aug 2025 09:28:40 +0000 Subject: [PATCH 12/13] [static] ? Signed-off-by: Martin Florian From 8dd3af6c190b42a4c3ee49da64c6dbd106040ab0 Mon Sep 17 00:00:00 2001 From: Martin Florian Date: Fri, 29 Aug 2025 09:57:16 +0000 Subject: [PATCH 13/13] more review [static] Signed-off-by: Martin Florian --- docs/src/deployment/traffic.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/deployment/traffic.rst b/docs/src/deployment/traffic.rst index fd64cd0bd4..274d2cfd9b 100644 --- a/docs/src/deployment/traffic.rst +++ b/docs/src/deployment/traffic.rst @@ -185,7 +185,7 @@ wasted traffic only occurs whenever a synchronizer event is rejected after seque Some level of wasted traffic is expected and unavoidable, due to factors such as: - Submission request amplification. - Participants that use BFT sequencer connections duplicate submission requests after a timeout to ensure speedy delivery in the face of nonresponsive sequencers; + Participants that use BFT sequencer connections retry submission requests after a timeout to ensure speedy delivery in the face of nonresponsive sequencers; if processing was simply slower than usual but the sequencer was not faulty, the duplicate request counts as wasted traffic. - Duplication of messages within the ordering layer, typically linked to transient networking issues or load spikes. - Duplication of submissions on the participant/app side, for example when catching up after restoring from a backup or after some crashes. @@ -193,7 +193,7 @@ Some level of wasted traffic is expected and unavoidable, due to factors such as Validator perspective +++++++++++++++++++++ -Validator operators are encouraged to monitor the rate of failed submissions on their validators and investigate the causes of repeatedly failing submissions eagerly. +Validator operators are encouraged to investigate the causes of repeatedly failing submissions. As stated above, not all failed submissions result in wasted traffic, and some wasted traffic is unavoidable. Attention is warranted, however, if the rate of wasted traffic increases significantly at some point in time.