-
Notifications
You must be signed in to change notification settings - Fork 50
Logbook 2022 H2
-
We have talked about the issue Upgrade Cardano devnet to 1.35.4 #523. The upgrade to the latest version of the Cardano node has introduced flakiness in the end to end test. We are currently working on fine tuning the genesis block of the
devnetto fix these hiccups. We have also talked about the usage of a custom environment variable that will allow us to update the url where the cardano node is downloaded without modifying the workflow -
We have paired and merged the issue Refactoring Crypto test helpers #663:
- It introduces a more versatile & clear way of preparing protocol fixtures to feed our unit & integration tests 💪
- During our work we have identified other points that need refactoring:
-
We have discussed about how we could remove the 'allow_non_certified_registration' feature and completely remove the uncertified part of the code. In order to do this, we need to investigate how we can dodge the spoofing of the Pool Ids from the signers nodes when we want to simulate stress tests in as close as possible conditions as in
mainnet(i.e.3K+SPOs and100GB+database). We will work on this subject shortly
-
We have paired on the redaction of a document that prepares our work for Handling Graceful Updates on Mithril Network:
- We have raised many questions that we need to answer
- We will proceed with the redaction of an ADR
- We will PoC:
- Interaction with the Cardano chain (to activate a new version): read & write transactions
- Handle backward compatibility of API messages (with protobuf, AVRO, in house development etc.)
- Once these steps completed, we will move forward with the implementation
-
We have continued pairing on the issue Refactoring Crypto test helpers #663 for which a PR should be ready shortly
-
We have paired and merged the issue Deactivate uncertified signer registration #621:
- We have fixed the difficulties we faced yesterday regarding the usage of the Rust features when artifacts are built from the workspace. For this we have removed the usage of a feature flag that must be activated only on one crate: it must be activated for all at once. In our case, we have decided to simply not use one anymore and it lead us to refactor the protocol demo tool and make it use its own types (including direct access to
mithril-stmtypes in order to keep it chain agnostic) - We have also deactivated the uncertified signers from the Mithril networks
- We have fixed the difficulties we faced yesterday regarding the usage of the Rust features when artifacts are built from the workspace. For this we have removed the usage of a feature flag that must be activated only on one crate: it must be activated for all at once. In our case, we have decided to simply not use one anymore and it lead us to refactor the protocol demo tool and make it use its own types (including direct access to
-
We have paired on the issue Refactoring Crypto test helpers #663 and we have started implementing a PR. We will continue working on it tomorrow
-
We have also discussed about the way to implement the upgrade strategy we have talked about yesterday during our team session
-
Finally, we have created an issue Add context to errors #665 in which we will try to provide better debugging information by adding context to errors and by providing less technical error messages
-
We have closed the following issues and PRs:
-
Re-genesis Mithril test networks #651: following the re-genesis of its certificate chain, the
release-preprodnetwork is producing new certificates, as expected, since yesterday -
Optimize Snapshot Digest Computation #510: the cache is now available on the
testing-previewand we already notice a speedup 💪 - Enforcement of API Protocol versions in Client/Signer/Aggregator #633: the nodes now embed a verification layer that enforces the usage of compatible versions of the nodes
-
Re-genesis Mithril test networks #651: following the re-genesis of its certificate chain, the
-
We have created a new issue Refactoring Crypto test helpers #663 to refactor the cryptographic test helpers used in the tests to provide easy access to protocol ready to use signers (key registration with Cardano certification, certificate chain, ...)
-
We have also paired on an issue with the PR Decommission signer registration with declarative PoolId #653 for which tests that were broken locally were still succeeding on the CI. After investigating the cache, we verified that they were not the source of the problem. The problem is related to the usage of features in the context of Rust workspace (and feature unification): when we build (or test) by calling
cargocommand from the root of the workspace, the features used are different that the ones used if we use the command from the crate directory. We actually were building tests and release binaries with unwanted features. We will think about how to solve this issue in the following days as no perfect solution seem to exist and probably create an ADR to set rules on how to use features in the future to avoid this pitfall -
During the team session, we discussed about:
- How to handle upgrades of the signer as smooth as possible when we reach
mainnet:- We must limit the usage of the re-genesis of the certificate chain to the strict necessary
- When a new version of the signer is released we need to reach the quorum at least once per epoch. This means that we can't afford to have the signers split in 2 populations that would not be able to create multi signature
- We will adopt a strategy that is close to the one used by Cardano: the idea is to deploy silently a new "big" version that gets activated once the deployment of the version is high enough (a la hard fork). This means that we need to monitor the deployment by using for example the single signatures that are regularly sent to the aggregator
- We will use a transaction on chain that will be read by the signer nodes to proceed to a synchronous upgrade
- Also, we will work in order to provide backward compatibility for "small" model updates:
- We need to version all the messages exchanged (protocol version + agent version)
- We need to provide golden tests to make sure that we can handle previous versions of the models in the newer versions
- We have decided to postpone the work on issue Add Stake Shares in Certificate #636 as we are not completely ready to move forward on this subject
- How to handle upgrades of the signer as smooth as possible when we reach
-
We have reviewed and merged the PR:
- Fix clippy warnings from Rust 1.66.0 #657 the new version of Rust created warnings that prevented the CI from being successful
- Update dependencies #659 the fortnihtly update of dependencies of the repository
-
We have reviewed the final adjustements to the PR Optimize snapshot digest computation #652 and talked about the robustness of the timed tests if we compile them for release (where the optimization is less obvious on small files). It should be merged very shortly
-
We have talked about some CI improvements that we need to address:
- Find a way to optimize the use of the cache as we have a hard limit of
10GBthat is reached very often and that leads to higher computation delays of the Rust jobs - Find a way to add more tags to an existing Docker image on the registry instead of rebuilding them from scratch for Pre-Release and Release
- Find a way to optimize the use of the cache as we have a hard limit of
-
We have also created a new issue Delete test lab monitor #658 to clean the code base and to avoid having come build issues for some SPOs
-
Finally, we have released a new distribution
2250.1💪
-
We have prepared the demo path of this iteration:
- Introduction
- Presentation of the optimization of the single signature (and why we need a re-genesis of the certificate chain)
- Showcase of the optimization of the snapshot digest computation
- Showcase of a protocol parameters transition on a
devnetnetwork - Presentation of the road map
- Conclusion/Next steps
- QA
-
We have also:
- Merged the PR Upgrade instances capacity infrastructure #656 that increases the memory of the instances VM running on the
testing-previewandpre-release-previewnetworks - Merged the PR Extract signer registration from multi-signer in Aggregator #655
- Reviewed the PR Optimize snapshot digest computation #652 which will be ready to be merged by end of week
- Reviewed the PR check API version #641 which will be merged by end of week
- Merged the PR Upgrade instances capacity infrastructure #656 that increases the memory of the instances VM running on the
-
We have prepared a pre-release for the next distribution:
2250.0-prerelease. We have also made a re-genesis of thepre-relase-previewnetwork for which we should see new certificates produced tomorrow, as described in issue Re-genesis Mithril test networks #651 -
We have reviewed the following issues:
- Enforcement of API Protocol versions in Client/Signer/Aggregator #633: Some minor adjustments in progress and once done, it will be ready to be merged
-
Protocol parameters transition is not working #627: It has been merged and we will proceed to a test update of the protocol parameters on
testing-previewsoon - Optimize Snapshot Digest Computation #510: Some minor adjustments in progress and once done, it will be ready to be merged
-
Finally, we have paired on the issue Extract the signer registration from multi-signer #642. It is completed and t will be merged tomorrow. We had encountered some difficulties when working on the tests and it appears that the
mithril_common::crypto_helper::tests_setup::setup_signerscould probably be refactored in order to avoid them. We will pair on this subject while working on the issue Deactivate uncertified signer registration #621 for which tests added in #642 will break after merging and rebasing
- We have started working on moving toward mainnet. We have tried to assess the subjects that need to be addressed first:
- The storage of the keys & signatures is currently done with a hex encoding in the database stores (especially in the certificate chain) and in the messages exchanged by the nodes, and also in the Genesis verification key file for tests. We should be ready to handle multiple types of encoding in order to:
- Avoid breaking changes (e.g. not being able to validate the certificate chain after a change of encoding)
- Optimize the size of the data (e.g. the size of a certificate) (this should benchmarked)
- The solution that we have identified is to create a codec that would be able to:
- Serialize in the default (or a specific) encoding (which can evolve in the future)
- Deserialize the data by attempting to parse a list of maintained decoding formats
- Activate the Mithril nodes only when the attached Cardano node is (almost) fully synced (threshold to be determined). This will avoid unnecessary computations when they are not appropriate (e.g. compute stake distribution, snapshot digest and archive)
- Separate the objects used for communication between the nodes and the business objects they use
- We have also discussed about adaptations that will be needed in order to handle new types of certified data (not final):
- Associate a type to the certificates so that they can represent accurately certified data
- Make the signer sign
2messages for each signing round (the next stake distribution and the message associated with the signing round) - Let the aggregator select which message it needs to aggregate first (the next stake distribution if it has not already created a certificate for the epoch, the message of the signing round otherwise). This could also be an efficient strategy in a decentralized context
- We will keep thinking on other features and we will also need to get a share of the iteration velocity dedicated to refactoring/technical debt
- The storage of the keys & signatures is currently done with a hex encoding in the database stores (especially in the certificate chain) and in the messages exchanged by the nodes, and also in the Genesis verification key file for tests. We should be ready to handle multiple types of encoding in order to:
-
We have reviewed the drafts implementations of:
-
We have merged the issue Remove VerificationKey and Stake from individual signature #619. As there are some breaking changes on the encoding of the multi-signatures, we are compelled to proceed to a re-genesis of the certificate chains of the Mithril networks:
- We have defined a short-term plan (to be reproduced whenever we have a re-genesis on the tests networks):
-
testing-previewre-genesis has been done. New certificates should show up tomorrow -
pre-release-previewre-genesis scheduled on Wednesday with new distribution pre-release. New certificates should be up on Thursday -
release-preprodre-genesis scheduled on Friday with new distribution release. New certificates should be up on Sunday - Communications will be done with SPOs on the discord channel when we proceed to re-genesis of
pre-release-previewandrelease-preprod
-
- We have also upgraded the version of
mithril-stmto0.2.0 - We have also talked about how we could handle the breaking changes in
mithril-stmin the future:- when working on test networks, we simply re-genesis the certificate chain
- when working on
mainnetinbetaversion (when we have not reached a high enough adoption rate), we simply re-genesis the certificate chain - when working on
mainnet: no more breaking changes, which means that the library should take care of handling compatibility as in other Cardano cryptographic libraries. The idea that we had to embed multiple versions of the library is not acceptable because of the high risk of embedding security vulnerabilities
- We have defined a short-term plan (to be reproduced whenever we have a re-genesis on the tests networks):
-
We also have paired on the Extract the signer registration from multi-signer #642. We have extracted the signer registration responsibility to a
Signer Registerermodule last week, which we have wired to the HTTP server and the state machine of the Aggregator. The last step will be to clean the multi-signer -
Our team session has mainly been dedicated to discussing about the Security Indicator of the certificates:
- Maybe we just need an "Unsafe" warning to be displayed in the UX (explorer and client) when the security is not full
- We could only rely on the percentage of stakes for this (as long as the full security protocol parameters are used)
- Using the signers list of the certificate might not be enough to guarantee security by checking that a well-known signer (or multiple) are listed. We could probably embed this list in the message that is signed, but this would only be interesting while we have not reached the 90% threshold of participation rate
- An important information is the adoption rate for which we could provide an evolution graph in the explorer
- Another idea, would be to have an external process (IOG hosted) that continuously checks the validity of the certificate chain produced by the aggregator, and in case of discrepancy with the actual Cardano chain, would revoke the genesis verification key used by clients to prevent them from restoring the snapshots
- We have agreed that we will add "Security" page to the documentation website that will explain how the ramp up (aka beta) phase on the
mainnetwill work and what security will be provided. We will dedicate a team session to the redaction of this page.
-
We have reviewed the code in progress and discussed about the issue Optimize Snapshot Digest Computation #510:
- We have decided to use a
CacheProvidertrait the will be responsible to provide cache of the immutable files given its (their)Immutable File Number - This will allow us to provide the following implementations:
- In memory at first, for being able to provide a minimal working implementation (for testing and that could also be used in the Client)
- In memory with state stored in the SQLite database (for Signer and Aggregator nodes that already have a store)
- In memory with state stored in a file with JSON format (that could used in the Client)
- We still wonder how we can test the trait efficiently:
- Use a mock to test behavior of the digester
- Benchmark the time gained with/without cache
- Maybe both approaches should be implemented
- We have decided to use a
-
We have also prepared the issue Deactivate uncertified signer registration #621 by deploying tests SPOs on the
pre-release-previewandrelease-preprodthat will be able to sign in2epochs and that should thus be ready when we decommission the declarative signer registration
-
We have reviewed and merged the issue Add signature of binaries in the artifacts released #587. This was the last issue of the epic issue Implement Release process #500 that is now finalized 💪 🎉
-
We have continued pairing on the issue Extract the signer registration from multi-signer #642 and we will keep our pairing sessions on the issue Simplify the Multi Signer in Aggregator #398 next week
-
We have taken some time to debug the PR check API version #641 for which the test end to end is always failing
-
Finally we have started designing a consistent way of handling compatibility between the Mithril nodes:
- We want to deal as efficiently as possible with situations where:
- We are introducing breaking changes that make nodes versions incompatible (avoid them if backward compatibility is possible or provide a way to dodge them. This is critical as we will need to get a very high level of participation of SPOs in order to provide full security for the certificates and also to avoid epoch gaps in the certificate chain)
- We are introducing breaking changes that make validation of a part of the certificate chain impossible (new version of nodes would not be able to validate previously generated certificates and reciprocally)
- We will create an ADR once our design is final
- Here some ideas that have talked about:
- We could use multiple versions of the
mithril-stmcrate and switch to the correct version to proceed to the certificate verification depending on the version embedded in the certificate. This solution is interesting but has some caveats: it is a bit cumbersome and raise questions on how to handle security issues that would be fixed in recent versions only for example. We will probably try to PoC this solution soon. - We could use a shift mechanism that would activate versions later at a defined epoch transition: we would embed 2 versions (current + next) in the nodes and make an announcement to the SPOs that a new critical version must be installed before the epoch transition. This would give time to upgrade the signers and maximize our chances to avoid epoch gaps. This would also be a convenient way to prepare for new use cases that involve new types of data to certify. We will probably try to PoC this solution soon.
- We could use multiple versions of the
- We need to make some adjustments on the way we handle the detection of incompatible versions of the nodes:
- Our current
MITHRIL_API_VERSIONthat is the OpenAPI specification version does not fully reflect incompatibility between nodes which can occur when the content of fields of the data exchanged are modified (e.g. in Optimize Snapshot Digest Computation #510 where the way digest are computed changes or in Remove VerificationKey and Stake from individual signature #619 where single and multi-signatures formats change) - We could extend the "meaning" of the
MITHRIL_API_VERSIONversion that would be updated when:- OpenAPI specification is updated
- Encoding or values computation is modified
- Breaking changes in the certificate chain occur (such that a version of the node is not able to validate it completely)
- We could rely on the crates nodes versions to establish compatibility tables (e.g. this version of the aggregator is compatible with these versions of the signer node and these versions of the client node)
- We could also rely on a baked minimum version of the distribution acceptable for a given node (e.g. aggregator running
2248.1is compatible with signer not older than2244distribution) - Some drawbacks exist with all the solutions. Relying on the distribution looks interesting even though it will more work
- Our current
- We want to deal as efficiently as possible with situations where:
-
We have reviewed and merged the dev blog post that describes the release process in the PR Start blog post describing release process #533
-
We have paired on the issue Simplify the Multi Signer in Aggregator #398:
- Reviewed and merged the issue Extract the Certificate creation from the multi-signer #638
- Started working on the issue Extract the signer registration from multi-signer #642 on which we will continue pairing tomorrow
-
We have made a test usage of the manually triggered workflow that has just been merged Mithril Client multi-platform test. We have agreed that we would use this manual workflow at least once when a pre-release distribution is created, and whenever is needed by the ongoing developments (as it is possible to target a commit from any branch)
-
We have talked about how to handle the breaking changes of issue Remove VerificationKey and Stake from individual signature#619:
- The breaking changes require a re-genesis of the certificate chains of the
3existing Mithril networks as soon as they are updated (which will not occur at the same time) - We will establish a short term plan in order to have a minimal impact and to communicate accordingly with the Pioneer SPOs
- This will be a good opportunity to structure a deployment plan that will be re-used when a re-genesis is required
- We will also organize a dedicated session in order to work on possible solutions to avoid/limit the re-genesis in the future
- The breaking changes require a re-genesis of the certificate chains of the
-
We have published a new distribution
2248.1and we have also published the first version of themithril-stmlibrary oncrates.ioautomatically with the CI/CD 💪 -
We have also created the first ticket associated to the issue Simplify the Multi Signer in Aggregator #398: issue Extract the Certificate creation from the multi-signer #638 for which we have paired and finished a PR that will be merged shortly. We will add new sub issues in the next days and keep our efforts on this simplification.
-
In order to finalize issue Implement Release process #500:
- We have reviewed and merged the issue Create manually triggered workflow to test Client binaries of all platforms (Windows, macOS, Linux) against testing-preview network #601
- We have reviewed issue Add signature of binaries in the artifacts released #587 that will be merged shortly
- We have also updated the PR Start blog post describing release process #533
- Once all of the issues/PR are closed, we will close issue #500
-
We also had followed a presentation of the
ΔQSDparadigm for quality and started applying it to the Mithril protocol. We will keep working on this in the next weeks -
Finally, we have done some cleaning on the repository and deleted the stale branches
-
We have groomed the following issues ofor this iteration:
- Optimize Snapshot Digest Computation #510
- Enforcement of API Protocol versions in Client/Signer/Aggregator #633
- Compute Security Level in Mithril Explorer#513 Needs more refinements from Product/Research
- Add Stake Shares in Certificate #636
- Protocol parameters transition is not working #627
- Deactivate uncertified signer registration #621
-
We have reviewed and merged the following PRs:
- Fix Cardano bin download URL #635: A change of the download location for the cardano binaries that prevented the CI to work
- Update dependencies #634: An update done at the end of each iteration to use the latest versions of the dependencies of the project
-
We have created the pre-release version
2248.1-prereleasefor the2248distribution. It has been qualified and under deployment a final2248.1release has been created. It is under deployment as the GitHub actions are currently very slow -
During our team session, we have made a final review of the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. A draft PR Mithril Decentralized Network CIP #637 has been created and for which we are expecting feedback from the Cardano network team shortly. If we ask for SPOs to register their signers on the Cardano chain at each epoch it means that we need to find a way to incentive their contribution as well
-
We have also discussed about the Compute Security Level in Mithril Explorer #513:
- We will probably use pre-computed values for the Security Level of the multi-signatures as we are already using the full security parameters on the
testing-previewnetwork - We could use only the Mithril Stake Share in order to get a reliable Security Indicator (if the full security protocol parameters are used and use 0 if not)
- We have also mentioned that displaying the Pool Ids (and/or tickers) of the SPO that have signed a certificate could be a good way to leave the choice of trusting a certificate based on who signed it (at least during ramp up phase on the
mainnet)
- We will probably use pre-computed values for the Security Level of the multi-signatures as we are already using the full security parameters on the
-
We have prepared the demo path of this iteration:
- Introduction
- Presentation of the first draft of the "CIP Mithril Decentralized Network"
- Showcase of the Store Automatic Migration second milestone for Signer and Aggregator
- Video demo of benchmark bootstrap of Daedalus on mainnet with/without Mithril
- Finalization/optimizations of the release process
- Announcement of deprecation of declarative Pool ID signer registration and next steps
- Conclusion/Next steps
- QA
-
We have prepared the pre-release of the next distribution:
2248.0-prerelease. It is currently tested and should be released tomorrow -
We have also been working on the issue CI does not trigger for PR from forks #597. We are now able to run correctly the CI for a PR that comes from a fork. We agreed that it could be a good idea to separate the CI workflow in 2 parts and putting the Docker build/push and Terraform deployment steps in a new Testing workflow
-
We have created the following issues:
- Protocol parameters transition is not working #627: A bug is preventing correct transition when updating the protocol parameters
- Deactivate uncertified signer registration #621: Decommission of the deprecated declarative signer registration mode
-
We have discussed about the CI does not trigger for PR from forks #597 which is very tricky. We have decided to rollback the trigger of the artifacts recording, Docker registry, Terraform deployments on the CI only when there is a push on the main branch. In other cases, only the build and testing part will run. This means that we will have to create tags for new distributions on commits merged by collaborators of the repository. We will investigate further and try to find a better option. We aso have had many difficulties with the CI being very slow for the last few days with some delays of more than 2 hours
-
The issue Implement Mithril SPO on testing/pre-release environments #563 has been merged and some tests SPOs are being setup on the
testing-previewnetwork -
We have also reviewed and merged the PR make SQL entities to create their projection #625
-
We have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586
- We made final adjustments of the lately redacted parts Abstract, Motivation, Specification/Overview, Rationale, Path to Active and Further Reading
- We had a meeting with researchers regarding the issue that we ave on achieving consensus on the signer registration:
- The best option that we have at this time is to make a transaction on chain to reach the consensus (for every signer registration at each epoch)
- We could probably have a KES like evolution mechanism for the Mithril keys in order to reduce the transaction frequency at once every few epochs
- Researchers will keep on reviewing our DIP draft and trying to find other solutions
-
We have reviewed the PR Add Mithril SPO on testing/pre-release environments #589 that will be ready to merge shortly after the documentation is updated. It will allow the creation/maintenance of SPOs on the Mithril test networks
-
We have reviewed and paired on the SQL automatic migration #600 that has been merged and will be embedded in the next distribution
2248 -
We have also reviewed and merged the Add versioning to documentation #555 issue that separates the documentation website in 2 separate versions (accessible via the drop-down top right menu on the website):
- Current version: that has been merged with the latest distribution
- Next version: the under construction version that will be shipped with the next distribution
-
We have paired on the CI does not trigger for PR from forks #597 for which we are still having some troubles with the management of the build caches. We will keep on investigating on this issue in the next days
-
We have continued working on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586 which is close to get in a decent first draft status. In the next days, we will:
- Make a full review of the document
- Enhance the schema overview to make it closer to the final specifications
- Enhance the description of the handling of the several aggregators certificate chains (regarding the genesis certificate) in this decentralized setup
- Work on dedicated sessions with researchers in order to find answers and solutions to the signer registration consensus problem that we have identified
-
Regarding the publication of the
mithril-stmcrypto library tocrates.io, we will proceed as follow:- First publish the crate with a crates.io
API Tokenfrom Inigo - He will then invite other members of the team as co-owners of the crate
- Finally, a team will be created in the IOHG GitHub organization that will also be added as owner of the crate (name of the team to be confirmed, e.g.
Core,Crypto,Rust,Mithril, and will depend on the strategy defined regarding grouping of the published crates)
- First publish the crate with a crates.io
-
We have talked about the issue CI does not trigger for PR from forks #597. We will probably have to trigger the CI only when a PR is created/updated/merged in order to avoid duplicate triggers. We need to make sure that this is not a problem when we retrieve the produced artifacts from other workflows. We will conduct some tests on that matter in the following days
-
We have paired on the issue SQL automatic migration #600 and the associated PR should be ready to merge shortly
-
We have merged the following PRs:
-
STM Readme update #616: This makes the publication to crates.io ready. We just miss the
API_TOKENin order to create the first publication - Deprecate uncertified signer registration #617: The stable mode of registration of signers is now the Certified Pool Id mode. We will decommission the deprecated declarative mode in a couple of weeks (see issue Deactivate uncertified signer registration #621)
-
Update 'testing-preview' protocol parameters #618: The
testing-previewenvironment now uses the full security parameters (which will be activated in2epochs)
-
STM Readme update #616: This makes the publication to crates.io ready. We just miss the
-
Finally, we have paired on the Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:
- We have reworked all the min protocols to follow the formalism of the Shelley Networking Protocol
- We have identified a difficulty with the consensus that needs to be reached on the verification keys of the signers when we broadcast the signer registration. We will work on this subject with researchers in the next days to try to find a solution
- In the mean time, we will complete the redaction of the first draft of the CIP tomorrow in a dedicated session
- We will also have to create a Mithril CIP in the next future as in CIP-0035. It will commit our team to be fully part of the CIP process
-
We have merged the PR Fix KES period verification #609 which narrows the range of KES Period verification when a signer registers. This closes the issue Signer registration fails with key certification mode #548. The next step is to deprecate the unverified signer registration as detailed in Deprecate Signer Declarative Pool Id mode #585. After a few weeks period we will decommission this mode of registration
-
We have reviewed the PR Greg/600/database migration #611. There are still some modifications that need to be addressed such as using a separate version mechanism from the one used by the application itself in order to be compatible with the life cycle of the nodes versions. We will pair on this early next week
-
We have continued pairing on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. We have completed a first version of the
Mithril Signer Registration Protocolspecification. We will continue with multiple pairing sessions in order duplicates this on the other mini protocols that need to be specified, as well as on the Motivation, Rationale and Path to Active sections
-
We have merged the following PRs:
-
Deployment to crates.io #610:
- We just need to update the final
API_TOKENin the GitHub secrets once we receive it - We will wait for a cleanup of the README file of the
mithril-stmcrates (akamithril-core) before activating the publication to crates.io - When publication tie has come, we will remove the
--dry-runargument in the publish step of the Pre-release workflow
- We just need to update the final
-
Add Daedalus/Mithril benchmark video #614 that adds the YouTube video of the benchmark we have done on the
mainnetwith/without Mihtril. It is accessible on the Bootstrap a Cardano Node guide of the documentation website
-
Deployment to crates.io #610:
-
We have paired on the issue Add nodes/libraries versions matrix in releases #599 and we have merged the PR Produce versions table in Release description #612 that will add a version table in the release description automatically
-
Finally we have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:
- We have carefully reviewed the
Mithril Signer Protocolpart and have made some refinements on it ⚠️ We have identified a tricky issue regarding the signer registration for which we need to find a consensus among the nodes. In order to do so, we could probably use the slot leader to certify (with its VRF keys) the list of signers registered to Mithril for an epoch- We have also scheduled a new session tomorrow dedicated at finalizing the specifications of this mini protocol
- We have carefully reviewed the
-
We have merged a quick fix on Store migration process does not accept a newer version #603 that as blocking the CI. It simply deactivates the panic that occurs when version mismatch is detected. The real fix will come with the issue SQL automatic migration #600
-
We have also paired on the issue Activate deployment to crates.io #588 for which:
- We have pushed the PR Deployment to crates.io #610 that should be merged shortly
- We are waiting for the API token of the
crates.ioaccount of IOG that will be used to deploy. In the mean time, we have kept adry-runversion of the publication step in the Release workflow
-
Finally, we have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:
- We made a full review of the CIP
- We have agreed that a light summary of the protocol should be added at the beginning of the CIP
- We still have to properly design the bootstrap of the certificate chain for an aggregator in this decentralized context
- In order to complete the work during this iteration and to get a first clean version:
- We will all re-read the document prior to new pairing sessions
- We will schedule 3 other pairing sessions dedicated at that CIP in the following days
- We have discussed and paired on some bugs:
- Computation of Stake Distribution is computed twice during Signer registration #596: It has been fixed and merged
- CI does not trigger for PR from forks #597:This issue is a little bit trickier than what we expected as it also has security implications. We have created a dummy PR from a fork Remove 'clippy' file #605 and have made some experiments in order to prepare a plan for fixing the problem. We will continue to work on the problem in the following days
- Store migration process does not accept a newer version #603: This issue require that we make some adjustments on the way we handle database upgrades. We will rollback to a separate version for the nodes and the database. We will concentrate our efforts on this issue as it is blocking on the CI
-
We have sliced the tickets of this iteration
-
We have talked about the issue SQL automatic migration #600 for which we will need to embed the actual SQL upgrade files. For this we will probably make use of a macro such as
include_bytes -
We have discussed about he next steps for the issue Simplify the Multi Signer in Aggregator #398 and about some possible enhancements in the test setup functions from the crypto helper so that they can provide a simpler usage in the integration tests
-
We have also talked about the bug CI does not trigger for PR from forks #597 that might be trickier than what we expected. We will pair on tomorrow in order to understand what is the bets way to fix the problem.
-
A new bug has been created Store migration process does not accept a newer version #603 that should be fixed shortly
-
Finally, these PR have been merged:
- Enhance Mithril networks infra #584, also the environments have been migrated to handle the associated breaking change in the terraform deployment
- Update dependencies #602
-
We have prepared the demo path of this iteration:
- Introduction
- Showcase of the Store Automatic Migration first milestone for Signer and Aggregator
- Showcase of the enhancements of the Explorer
- Showcase of live release of the
2246.1distribution - Conclusion
- QA
-
Showcase path of the
Live release of the 2246.1 distribution:
# Demo: Release distribution `2246.1`
## Open pre-release page
google-chrome https://github.com/input-output-hk/mithril/releases/tag/2246.1-prerelease
## Switch to main branch
git switch main
git fetch
git pull --rebase
## Show tag on repository
git log --oneline
## Create final tag
git tag -s 2246.1 0bff212a767399b01aef152e27782a7e7ba934f2 -m "2246.1 release"
## Show tag on repository
git log --oneline
## Push the final tag
git push origin 2246.1
## Open Pre-lease Workflow
google-chrome https://github.com/input-output-hk/mithril/actions/workflows/pre-release.yml
## Open release page
### Generate release notes
### Uncheck "Set as a pre-release"
### Check "Set as the latest release"
google-chrome https://github.com/input-output-hk/mithril/releases/tag/2246.1
## Open Release workflow
google-chrome https://github.com/input-output-hk/mithril/actions/workflows/release.yml
- We have also reviewed the issues of the current iteration and prepared work for the next iteration. We have created a new issue Create manually triggered workflow to test Client binaries of all platforms (Windows, macOS, Linux) against testing-preview network #601 that relates to the epic Implement Release process #500
-
We have created few tickets, some of which are bugs:
-
We have created pre-releases for a new distribution
2246:-
2246.0-prerelease: This was missing update of the versions of the modified nodes -
2246.1-prerelease: This pre-release is under qualification and should be released tomorrow 💪
-
-
We have paired and merged the PR database migration framework #571 that implements database version update detection and that closes issue Implement stores migration process #562 🎉
-
Finally, we have continued our pairing effort on the elaboration of the CIP for piggybacking Mithril nodes on the Cardano node network layer
-
We have merged the PR add API version in HTTP headers #566 that closes the issue API version #565. The next step is to enforce the compatibility of the nodes and as for update when an incompatibility is detected
-
We have reviewed the final modifications of the PR database migration framework #571 that should be merged shorty. Once this is done, we will work on the automatic upgrade of the stores of the nodes.
-
We also have reviewed, requested some modifications and merged this PR More refined list of pre-reqs #591 coming from the community
-
Following many comments, and some confusion that we have noticed on the discord channel regarding the configuration of the nodes for the several environments, we have merged this PR Enhance Mithril Networks documentation #593 which goal is to provide clear section for the configuration in every guide that requires it. This section is now centralized to provide up to date information efficiently. Also, we have removed all the mentions to the now decommissioned previous infrastructure that used to be accessible on the
https://aggregator.api.mithril.network/aggregatorendpoint -
Finally, we have merged the PR Upgrade to Cardano 1.35.4 #595 that uses the latest stable version of the cardano node as the previous
1.35.3will not be working any more by November 16th
-
We have talked about issue Implement stores migration process #562 and reviewed the PR database migration framework #571. We have decided to align the version number of the database to the version number of the node. The auto upgrade mechanism will be:
- Check if the version of the node has changed (from previously recorded in the database state)
- If the version has changed, select the ordered list of upgrade files that need to be applied to the node
- For each of these files (associated to a version):
- Apply the upgrade file (first file)
- If upgrade went OK, check the upgraded database (second file)
- If upgrade is checked successfully, record the updated version to the database
- Once all the upgrades have been applied, record the current version of the application and the last updated date
- There are 2 special cases:
- Table creation, for which a first upgrade will be a
CREATE IF NOT EXISTSquery - If list of upgrades to apply includes a version lower that the currently recorded version, for which a panic and error message should happen
- Table creation, for which a first upgrade will be a
-
We have also paired on the issue API version #565 for which we have added the
Mithril API Versionin the headers of the calls made to the Aggregator from the Signer/Client -
Finally, we have continued pairing on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. We will do another dedicated session with the whole team this week
-
We have reviewed the following PRs:
- Fix explorer state reload #592 has been merged and introduces new unit tests for the explorer
- Decommission legacy infra #578 has been reviewed and will be merged next week. At the same time the previous test infrastructure for Mithril will be destroyed
- database migration framework #571 has been reviewed and will e ready to merge next week
-
We have paired on the issue Simplify the Multi Signer in Aggregator #398and almost finished the refactoring of the Certificate production out of the multi-signer. We will keep our work on it in the following days
-
Finally, we have paired on the Prepare CIP/CPS for Mithril piggybacked on Cardano network #586 and continued redacting the CIP. A draft is available here. We have scheduled 2 dedicated sessions next week to continue our work.
-
We have reviewed, paired on a bug and merged the PR Enhance explorer aggregator selection #590 that closes issue Provide a 'copy' button for the aggregator URL in explorer #576 and also bring some enhancements of the Mithril Explorer like display of the epoch settings as well as the usage of redux storage that simplifies the code
-
We have also paired on the issue Simplify the Multi Signer in Aggregator #398 and worked on a first step: remove the responsibility of producing certificates form the multi-signer published on the branch simplify_multi_signer
-
We have also been experimenting with:
- Running a test network on
mainnetin order to evaluate the path to being live on themainnet - Producing multi-signatures with full security parameters (k=2422, m=20973, f=0.2) on a
devnet
- Running a test network on
-
We have prepared the tickets for the current iteration:
-
We have reviewed the enhancements of the Mithril Explorer of issue Provide a 'copy' button for the aggregator URL in explorer #576. A nice feature to have is also to be able to open the explorer to a specific Aggregator. We need to investigate if there exists security concern regarding this feature (or if it is problem to make the explorer available to potentially adversaries Aggregators)
-
We will resume our work on the issue Simplify the Multi Signer in Aggregator #398 tomorrow with a dedicated session
-
We have also started working on the issue during the team session Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. A first draft of the CIP is available on the wiki. We will keep iterating on it this week and next week as well.
-
We have reviewed the work in progress on the Mithril explorer for issue Provide a 'copy' button for the aggregator URL in explorer #576. The associated PR should be ready to merge shortly
-
We have identified some problems on the
testing-previewandpre-release-previewnetworks that were not producing snapshot for epoch10. Apparently some problems may exist in the fast bootstrap genesis tools/process. We are investigating the problem. In the mean time we have:- Reset the
testing-previewnetwork with fast bootstrap genesis: new certificates are produced and no epoch gap with protocol initializers/verification keys exist in the databases of the signer and aggregator nodes. We will see if the problem occurs again in the following epochs - Re-genesis the
pre-release-previewnetwork (as fast genesis is not possible anymore once new signers have registered). New certificates should be produced in the next epoch
- Reset the
-
Following the release of Rust
1.65.0someclippywarnings occurred in the CI and were blocking the process. We have paired to apply a fix for these warnings in Update rust dependencies #583
-
The first distribution of Mithril has been released 2244.0 🚀 🎉 💪
-
We have paired and merged the PR Add Debian packaging to CI #579 producing the debian packages for the installation of the Linux binaries in the CI. We will adjust the documentation to make this installation the preferred installation type for Mithril nodes
-
We have worked on the demo path of this iteration:
- Introduction
- Showcase of the new release process and of the first Mithril distribution
- Presentation of single signature without Merkle path
- Conclusion
- Q&A
-
Showcase path of the
new release process and of the first Mithril distribution:
# Demo: Bootstrap a Cardano node from a preprod Mithril snapshot with latest Client distribution
## Download binary
rm -f mithril-client
wget https://github.com/input-output-hk/mithril/releases/download/2244.0/mithril-client_0.1.0.12bb705_amd64.deb
sudo dpkg -x mithril-client_0.1.0.12bb705_amd64.deb .
sudo mv usr/bin/mithril-client ./mithril-client
## Test installation
./mithril-client
./mithril-client --version
## Get Latest Snapshot Digest
export NETWORK=preprod
export AGGREGATOR_ENDPOINT=https://aggregator.release-preprod.api.mithril.network/aggregator
export GENESIS_VERIFICATION_KEY=$(wget -q -O - https://raw.githubusercontent.com/input-output-hk/mithril/main/TEST_ONLY_genesis.vkey)
SNAPSHOT_DIGEST=$(curl -s $AGGREGATOR_ENDPOINT/snapshots | jq -r '.[0].digest')
echo $SNAPSHOT_DIGEST
## List Snapshots
./mithril-client list
## Show Latest Snapshot
./mithril-client show $SNAPSHOT_DIGEST
## Download Latest Snapshot
./mithril-client download $SNAPSHOT_DIGEST
## Restore Latest Snapshot
./mithril-client restore $SNAPSHOT_DIGEST
## Launch a Cardano Node
docker run -v $(pwd)/ipc:/ipc -v cardano-node-data:/data --mount type=bind,source="$(pwd)/data/preprod/$SNAPSHOT_DIGEST/db",target=/data/db/ -e NETWORK=preprod inputoutput/cardano-node:1.35.3-configs
## Query tip of the chain
watch -n 1 "sudo CARDANO_NODE_SOCKET_PATH=./ipc/node.socket ./cardano-cli query tip --cardano-mode --testnet-magic 1 | jq ."
-
We have paired on fixing the tests not working with the PR Single signature without merkle path #484. The PR is now merged, and the
release-preprodenvironment has been accordingly re-genesis (as the AVK format is not anymore compatible) 🎉 -
We have merged the PR Activate new Mithril networks #577 that activates the new Mithril networks for each workflow of the new release process:
| Mithril Network | Workflow |
|---|---|
testing-preview |
CI |
pre-release-preview |
Pre-Release |
release-preprod |
Release |
- Tomorrow we will create the first distribution release of the repository. We ave discussed about this first release and nice to have features to implement shortly in the distribution:
- Debian package
- GPG signature of the binaries
- Better handling of Docker artifacts re-tagging
- Manual testing of Client artifacts for macOS and Windows platforms
-
We have worked toward releasing the new
release-preprodenvironment:- ✔️ Deprecate current aggregator: it will not be updated anymore when some branches are merged on main
- ✔️ Use
release-preprodas the new environment that is deployed when branches are merged on main (temporarily, until the newpreviewcardano testnet is re-spun) - ❌ Merge breaking changes of mithril-core in the PR Single signature without merkle path #484. A blocking issue forced us to postpone the merge until a fix is implemented.
- ⌚ Fast re-genesis the aggregator of release-preprod (<30 min). Will be done after the #484 merge.
- ⌚ Communicate with SPOs on discord and dev blog about the new & deprecated environments. A blog post has been created and is under review in PR New environments documentation #575
-
We have paired on fixing the tests of the Aggregator of the PR Single signature without merkle path #484. We did not succeed, but we found out that there is probably an issue with the registration. We will keep on investigating this problem.
-
We have also discussed about how we could test that the macOS and Windows Client builds are running correctly when connected to an Aggregator that runs on Linux. We think that a good option is to create manually triggered complimentary pipelines. We will try to investigate this shortly.
-
Finally, we have reviewed the work in progress on the issue Implement stores migration process #562
-
We have reviewed and merged the following PRs:
-
We have discussed about the issue Implement Release process #500:
- Some fixes/optimizations will be addressed shortly on the CI pipeline
- We will continue our efforts on deploying the new
release-preprodenvironment that should be up and running by tomorrow 💪
-
We have discussed about the issue Implement stores migration process #562:
- This issue is closely linked to Move stores to relational design with SQLite #476. We will start working on it once #562 is completed
- We have agreed that it would be easier to release a first version of the system that is already handling the migration steps described by sequential migration files
-
We have reviewed, paired and merged the PR Adapt ci workflow to Release Process #557 🎉:
- There is small bug regarding naming of artifacs
- We still need to have the CI append the commit sha in the versions of the cargo.toml files
- We need to find a way to reuse docker artifacts between pipelines
- We must make some tests on the macOS and Windows client binaries to make sure they are working properly
-
The following PRs have been also merged:
-
Finally, we have decided to spin-up the
release-preprodenvironment at the EOW:- After merge of the breaking change PR Single signature without merkle path #484 or re-spin after this merge)
- Temporarily implement it in the CI pipeline (and then moving it to the Release pipeline when it is released)
- Communicate with the SPOs on discord and dev blog:
- Explain that the current Aggregator running on
previewis deprecated and will be decommissioned Nov, 1st - Explain that they need to move their Signer nodes to
release-preprodenvironment which will be thestableenvironment - Encourage them to also have a Signer node running on the
pre-release-previewenvironment to keep participating in the testing effort
- Explain that the current Aggregator running on
-
We have paired on fixing partially the issue Signer registration fails with key certification mode #548:
- We will merge shortly the PR Fix KES key update #569
- Once merged, we will make sure that the certification works as expected and that some Signers (that would have been recompiled) will show a verification on a KES Period strictly greater than
0 - Then we will reduce the range of KES Period verification to
[current_period - start_period - 1,current_period - start_period + 1] - Later the
KES Agentof the Cardano node will take care of signing the Operational Certificate with the correctly evolved KES Secret Key
-
We have discussed about the Mithril network API version and we have stated that it should be the version of the OpenAPI specification. This will be the only version given by request/response headers of the Client/Signer/Aggregator nodes. We will continue by enforcing semver compatibility and return a
HTTP 406error for example in case of incompatible versions -
Finally, we have continued working on the issue Implement Release process #500:
-
As the
previewCardano network will be re-spin next week (November, 1st), we will:- Add a deployment environment
release-preprodtemporarily on theCIworkflow - Communicate with the SPOs so that they run their test Signer nodes on this new environment
- Add a deployment environment
-
We have prepared the new tickets of the iteration:
-
The PR ADR of the release process #556 has been reviewed and merged. The ADR is available at https://mithril.network/doc/adr/3
-
We have paired on the sub issues of Implement Release process #500:
-
We have also paired on the issue API version #565 in order to add the communication protocol version to response headers of the Aggregator
-
We have fixed the issue with
cargo sortthat was crashing the CI with PR Cargo update sort and dependencies #558 -
We have also reviewed and merged the PR Add version information #553
-
The issue that we have with the Signer registration (as in issue Signer registration fails with key certification mode #548) seems to be related to the fact that KES Secret Keys evolves in memory. This explains why we can verify the signature only with a
0value for the KES Period. In order to fix the problem some solutions exist:- Compute the correct KES Period when doing the signature of the Mithril Verification Key (
current_period - start_period,current_periodgiven by the cardano cli andstart_periodgiven by the Operational Certificate). We will pair on this solution next week - Update the cardano cli so that it computes the signature with the in memory KES Secret Key. We expect an estimate from the Cardano node team for this feature
- Compute the correct KES Period when doing the signature of the Mithril Verification Key (
-
We have worked on the demo path of this iteration:
- Introduction
- Presentation of the results of the SPO certification on the hosted Aggregator
- Presentation of the release process updated
- Showcase of the CI/CD Workflows: Testing -> Pre-Release -> Release
- Showcase of the bootstrapping of a deployment environment on preview
- Conclusion
- Q&A
-
Showcase path of the
bootstrapping of a deployment environment on preview:
# Mithril Bootstrap Deployment Environment
# On preview network
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git
---
# Demo: Bootstrap Deployment Environment
## Change directory
cd mithril/mithril-infra
## Setup environment variables
DEPLOY_ENVIRONMENT=demo-preview
API_DOMAIN=api.mithril.network
## Setup terraform variables
cat > env.$DEPLOY_ENVIRONMENT.tfvars << EOF
environment_prefix = "demo"
environment_suffix = ""
cardano_network = "preview"
google_project = "mithril-test-365514"
google_region = "europe-west1"
google_zone = "europe-west1-b"
google_machine_type = "e2-medium"
google_service_credentials_json = "../gcp-credentials.json"
google_application_credentials_json = ""
mithril_api_domain = "$API_DOMAIN"
mithril_image_id = "latest"
mithril_genesis_verification_key_url = "https://raw.githubusercontent.com/input-output-hk/mithril/main/TEST_ONLY_genesis.vkey"
mithril_genesis_secret_key = ""
mithril_signers = {
"1" = {
pool_id = "pool15qde6mnkc0jgycm69ua0grwxmmu0tke54h5uhml0j8ndw3kcu9x",
},
"2" = {
pool_id = "pool10g0tvpyc3phkym8r6hamdulyzd6shzjldpahyvdkljl7ur2adfe",
}
}
EOF
## Create & init terraform workspace
terraform workspace new $DEPLOY_ENVIRONMENT
terraform init
## Plan terraform deployment
terraform plan --var-file=env.$DEPLOY_ENVIRONMENT.tfvars
## Apply terraform deployment
terraform apply --var-file=env.$DEPLOY_ENVIRONMENT.tfvars
## Connect to VM and list docker containers
ssh [email protected].$API_DOMAIN -- docker ps
ssh [email protected].$API_DOMAIN -- tree /home/curry/data
## Query aggregator REST API
curl -sk https://aggregator.demo-preview.$API_DOMAIN/aggregator/epoch-settings | jq .
watch -n 1 "curl -sk https://aggregator.demo-preview.$API_DOMAIN/aggregator/epoch-settings | jq ."
## Destroy terraform deployment
terraform destroy --var-file=env.$DEPLOY_ENVIRONMENT.tfvars
ssh-keygen -f "/home/jp/.ssh/known_hosts" -R "aggregator.demo-preview.$API_DOMAIN"
rm -f env.$DEPLOY_ENVIRONMENT.tfvars
rm -rf .terraform
rm -rf terraform.tfstate.d
rm -f .terraform.lock.hcl-
We have discussed about how we could implement a differential download system for immutable files:
- It would allow to download a specific range of immutable files
- Parallelization would be easy to implement for snapshot chunks download, verify and restore
- In this setup, we would only sign the penultimate immutable file instead of the whole immutable folder
- We would also need to add a snapshot retrieve route by immutable file number
-
We have also paired on the issue Implement Release process #500:
- Conceptualizing and formalizing the case of
hotfixfor a release (added in the ADR of the release process #556) - Implementing multi target compilation of the nodes: the Client binaries will be available for Linux, macOS and Windows and the Signer for Linux and macOS(in the PR Adapt ci workflow to Release Process #557)
- Stabilizing the deployment environments of issue Setup new hosted environments for testing-preview, pre-release-preview and release-preprod) with their terraform and GitHub environments #542
- Conceptualizing and formalizing the case of
-
We have discussed and paired on the issue Implement Release process #500:
- Following our previous discussions, we have decided all the details regarding the handling of versions
- The decisions have been gathered in a new ADR, waiting for review in the PR ADR of the release process #556
- We have agreed on:
- Working on a distribution release that will package all the artifacts produced for the nodes/libraries
- Each node will have its own version
- A communication protocol version will be introduced to handle compatibility between nodes
- The CI will automatically append the hash of the commit for which the artifacts are being produced. This will allow a full artifacts promotion flow
- We will try to sign the releases with GPG e.g.
-
We have also talked about:
- Issue Setup new hosted environments for testing-preview, pre-release-preview and release-preprod) with their terraform and GitHub environments #542: Work is in progress, should be completed shortly with ful
terraformenvironments - Issue Simplify the Multi Signer in Aggregator #398: We will resume work on this issue shortly as there are no breaking changes planned on the multi signer currently
- Issue Setup new hosted environments for testing-preview, pre-release-preview and release-preprod) with their terraform and GitHub environments #542: Work is in progress, should be completed shortly with ful
-
We had talks about issue Get/Show current version on Mithril nodes cli / APIs #541. We agreed that:
- The Client/Signer nodes should expose the version they run in the headers when requesting the Aggregator
- The Aggregator node should expose the version it runs in a header when it is called
- We could implement a version check system that returns an error message stating an update is required if the Aggregator version is not compatible with the Client/Signer version
-
We also discussed about the issue Implement Release process #500:
- Adapt CI workflows to work with the new release process #543: In progress, has been tested in a temporary repository and new workflows will be added soon
- A first task of extracting the documentation generation in a separate workflow is in progress
- We think that we may need to handle the documentation a bit differently than the rest of the process:
- We need to produce new dev blog posts without releasing a new version
- We could use the versioning feature of
docusaurusand publish pre-release/release versions on the same website - This would require some manual operations on developers end
- There are advantages and drawbacks on this approach. Well keep on improving the design of this part of the process
-
We have investigated the case of a SPO which was unable to get the
Verified Signerbadge. It appears that his pool Id was spoofed by one of the Signer nodes running on the GCP platform. We have fixed and merged the PR Fix mithril infra configuration #554 -
We have talked more about the release process during the team session:
- Regarding the versions management of the versions, we have worked on several ideas:
- We could have an hybrid version where the major+minor would be handled by the
cargo.tomlversion and the patch would be handled remotely in a "directory" of all versions - We could also handle the full version in a remote directory
- We could package the version in an external file dedicated to the versioning and that would be embedded in the GitHub package
- Another idea that looks simpler is to add a patch identifier that reflects the commit id like in
-{COMMIT_SHA1}for example
- We could have an hybrid version where the major+minor would be handled by the
- Regarding the documentation website:
- There will be only one version of the website that is deployed when a merge occurs on the main branch
- The website will support 2 versions:
currentandnext - We will create a commit post release that will update the
currentandnextversions and also the versions in thecargo.tomlfile(s)
- Regarding the versions management of the versions, we have worked on several ideas:
We had an introductory call today with Alex and the Mithril team. After some presentations, we went through the current state of Mithril and the short term roadmap, emphasizing our current target is to address the specific need of fast bootstrap of a full node.
Alex asked some questions about the roadmap:
- What do we think of distributing data using "alternative" to HTTP?
We think this is a good idea, we made room for it in the snapshot's schema, and we did not tackle it for want of time and because it seems something that can be contributed later
- What's the plan for deploying to mainnet and how much stake do you need?
We don't know exactly yet, one idea would be to grade the signatures according to the amount of stake while we ramp up. Beside, signers are known so that's also a possible source of trust
- How about speeding up client's state reconstruction process through some form of indexing (eg. think SPV + Bloom filter)?
That's something we explored briefly in the initial prototype phase. We want to make mithril "extensible" in the sense that SPO could sign various artifacts beside the node's db, which could make this feature possible
- What about the use case of a node/wallet catching up on a few months of activity?
Right now we "naively" sign and store full snapshots but obviously we want to chunk those for download and snapshot signing performance reasons
We agreed on these follow-up items:
- Answer any question Alex has on the dedicated discord channel (#moria of course)
- Alex is most welcomed to attend the bi-weekly demo/Q&A session.
If need be we are comfortable with the idea of "Mithril Office Hours" on a weekly basis should the community feels a need for it
- @Reza will be main contact point with the team when it comes to discussing features and roadmap
-
Following the release of the experimental certified signers mode, we can now see some green badges next to the verified
PoolIdsin the certificates of the Explorer 🎉 -
We have discussed about the issue Get/Show current version on Mithril nodes cli / APIs#541:
- The node will display its version when launched
- We will add a
versioncommand on the CLIs that will output the running version - We will add headers with versions in the Signer and Client requests as well as in the Aggregator responses
-
We have talked and paired on the Adapt CI workflows to work with the new release process#543. We have made a PoC of the pre-release pipeline in order to test that:
- We catch the correct triggers ✔️
- We can retrieve artifacts produced in a previous/different workflow run ✔️
- We can produce GitHub releases from the workflows run ✔️
-
During our discussions, we have talked about how to handle:
- Adding new information that are part of the signed message (as Signers will probably not all upgrade at the same time). In that case will it be possible to produce signatures in that conditions ?
- A solution could be to type certificates depending on what information is signed and to chain only the ones that embed the next stake distribution
- We will probably have the same issue when we upgrade protocol versions that are not backward compatible
- We had talks and paired on the issue Implement Release process#500:
- The process of artifacts promotion required some clarifications:
- Each commit triggers a first
CIworkflow that builds artifacts and deploys totestingenvironment - Each git tag triggers a second
Pre-Releaseworkflow that promotes artifacts topre-releaseenvironment and also creates the associated GitHub release (same name, inpre-releasestatus) - When the release candidate is validated, the
pre releasestatus is removed from the GitHub release and that promotes artifacts toreleaseenvironment
- Each commit triggers a first
- We have tried to define a process regarding when/how to update the versions of the crates:
- One version for the workspace and one different version for
mithril-core - Just after releasing a version
0.1.2:- We commit a new
0.1.3-devversion until we are happy with a release candidate (tagged0.1.3-rcX) - When we are ready to release a candidate, we update the version to
0.1.3and we tag it as0.1.3(and re test it) - We then release version
0.1.3and we start all over again this process
- We commit a new
- One version for the workspace and one different version for
- The process of artifacts promotion required some clarifications:
-
Following the activation of the experimental Signer certified registration, SPOs have reported troubles with their nodes:
- Issue Unhelpful Log message#546: The error messages provided were not helpful to the users. We paired on improving them by a giving a detailed feedback on the bad request status code from the Aggregator
- Issue Signer registration fails with key certification mode#548: Signer trying to register by using the certification mode fails because the
KES Signaturecan't be verified. This is still under investigation as the underlying cryptography is complex. In the mean time, we have paired and merged a temporary fix that tries all the possibleKES Periodvalues: the Signer are now registered. We expect them to be able to sign the snapshots in2epochs (rebuild of their nodes is however mandatory)
-
We have noticed some warning messages in the CI jobs and have created the issue Update workflow github actions#550. A first PR will be merged shortly to update a first part of the GitHub actions. We will keep an eye on the other actions to be updated as soon as updates are released
-
We have also merged the PR remove SQL migration tool #540 which decommission the data stores migration tools of the Aggregator and the Signer
-
The PR New STM registration procedure #433 has finally been merged 🚀. A data store upgrade has been produced as well as some explanation about the process in a dev blog post. We are monitoring the GCP network and expect feedbacks from the community soon
-
We have discussed and sliced the first tasks to be done on the new release process as in issue Implement Release process #500:
-
We also had talks about:
- The optimizations of the Docker images in issue Optimize Docker CI images #318:
- We have recreated an ad hoc builder for the
devnetimages (and thus got rid of the legacylibssl1.1dependency) - We have aligned all the source images from
ubuntu:latesttoubuntu:22.04
- We have recreated an ad hoc builder for the
- Serialization/Deserialization of the keys in the
entitiesmodels:- We will try to implement automatic serialization/deserialization either with
serdeannotations or with a custom behavior - We may have to take care of log verbosity and implementation of custom display traits
- We will try to implement automatic serialization/deserialization either with
- The optimizations of the Docker images in issue Optimize Docker CI images #318:
- Key certification
We need to compute the KES range -> need to pass the KES period
- compute range for KES period from genesis parameters
- Pb: we don't know what's really useful on mainnet registration process requires each signer to know the key of every other signer
- write one or more CIPs for "Mithril Decentralisation"?
- Mithril networking CIP
- key registration process
- What about multi-pool runners? -> no need to take care of
- Signer deployment -> which deployment model? Mithril Deployment Model CIP? RFC?
- TODO:
- draft something for each "CIP" -> 2 pager (A3) respecting somewhat the structure of a CIP
- check with CIP process "guardians" whether or not they would fit -> Michael, Matthias a. if OK -> write the full thing b. if NOK -> turn into a GH discussion -> invite people from Community + IOG to react/comment/propose
- Use A3 format ?
- Edinburgh:
- JP -> presentation 15'
- Iñigo -> support + Q/A
- Arnaud -> test demo w/ daedalus
- Deadline EOW Slide deck for the talk
- Reza + Arnaud -> presentation 1-slide for CH keynote + Slide deck
-
We had discussions about the
renameattribute inserdeannotations. It was used in almost all of the entities fields even though they had the same as the JSON version. The redundant annotations have been removed in the PR Enhance serde annotations #538 that has been merged -
We have reviewed some build time analysis that have been produced in order to understand the bottleneck of the build step of the CI. We didn't find any interesting information and we think that it is maybe due some cache loading issue within the CI. We will continue our investigations
-
We have reviewed the modifications done to the PR New STM registration procedure #433. We are pairing on final modifications before we can merge it tomorrow
-
Also, the PR use command and parameters for the client #536 has been reviewed and merged
-
We have reviewed the last version of the issue Fix CLI args precedence in Client/Signer/Aggregator #511. Everything is done now, except the
digestargument that could probably be passed without being named (as this is the case until now). This will avoid current users of the client to break their implementation. The PR should be merged early next week -
Many comments have been received on the PR New STM registration procedure #433. They are currently being treated and the review should be merged early next week
-
Here is the schema presented yesterday during the demo that illustrates the certification process of the Mithril verification keys:

-
We have discussed about the peer review that we have made yesterday on the PR New STM registration procedure #433. All the points noted during the session have been fixed. The documentation part will be added today, stating that ths feature is experimental. We will prepare a Dev Blog post in a separate review that will explain the next steps:
- Testing of the new feature with volunteer SPOs for a transient period that allows both
CertifiedandNon CertifiedSPOs for smooth transitioning - Improvement of the design so that it fits well with the SPO Cardano nodes architecture (Core/Relay/Firewall/Keys security)
- Progressive deprecation of the
Non Certifiedmode
- Testing of the new feature with volunteer SPOs for a transient period that allows both
-
We have talked about the issue Fix CLI args precedence in Client/Signer/Aggregator #511 that will be merged very shortly once some issues with the test lab are fixed
-
We have also worked on the demo path of this iteration:
- Introduction
- Showcase of the
Mithril Keys Certificationon thedevnet - Presentation of the
Evolution of arguments handling in the CLIof the nodes - Conclusion
- Q&A
-
Showcase path of the
Mithril Keys Certification:
# Mithril Keys Certification
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with full Certificate chain, with keys certification
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git
## Checkout correct branch
cd mithril/
git switch mock_certification
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..
---
# Demo: Run devnet
## Start explorer
cd mithril/mithril-explorer
make dev &
cd ../..
google-chrome http://localhost:3000/explorer
## Change directory
cd mithril/mithril-test-lab/mithril-devnet
## Query Cardano
watch -n 1 NODES=cardano ./devnet-query.sh
## Logs Mithril
watch -n 1 NODES=mithril LINES=100 ./devnet-log.sh
## Start devnet with 5 pools
./devnet-stop.sh && NUM_POOL_NODES=5 DELEGATE_PERIOD=100 EPOCH_LENGTH=60 ./devnet-run.sh
---
# Demo: Restore a snapshot from devnet
## Prepare vars
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
GENESIS_VERIFICATION_KEY=5b33322c3235332c3138362c3230312c3137372c31312c3131372c3133352c3138372c3136372c3138312c3138382c32322c35392c3230362c3130352c3233312c3135302c3231352c33302c37382c3231322c37362c31362c3235322c3138302c37322c3133342c3133372c3234372c3136312c36385d
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST
## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client list
## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client show $LATEST_DIGEST
## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client download $LATEST_DIGEST
## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client restore $LATEST_DIGEST
-
We have also talked about:
- The possibility to use the Mithril Signer as a process that would not be running as daemon. It could be launched by a
cronor the Cardano node itself at regular intervals - With that perspective, we could piggyback on the Cardano node which would be used to broadcast (Tx/Rx) the messages and store them in a bus. The Mithril Signer would use this bus whenever it is launched
- The possibility to use the Mithril Signer as a process that would not be running as daemon. It could be launched by a
-
During the demo some interesting points were addressed:
- The
Mithril Relaydesign seems to be preferred by the SPOs as it would provide more security (theCardano Relayis very likely to be subject to attack attempts) - We need to understand the impact of using
Operational CertificatesforMultiple Poolsand see if this is a concern (as each server would have its ownOperational Certificate)
- The
-
We have paired on resolving the issue that we discovered regarding
KES Periodusage in Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:-
Simple (but not complete) solution implemented: store the
KES Periodalong with theSignerWithStakein the aggregator store and send it back to the Signers for them to make a valid key registration (even if theKES Periodhas expired when used) -
Simple solution (next steps): enforce the range of valid
KES Periodsvalid for anEpoch(which should be easily computable given the currentSlotand the genesis parametersslotsPerKESPeriodandmaxKESEvolution(could be added to the AggregatorBeaconin the pending certificate or computed from theEpochnumber directly on the Signer node) -
More difficult solution: build the
KeyRegat the same time on the Signers and Aggregator and store theCloseReg(and use it on the Signer when the time has come). This would require a broadcatst/gossip mechanism between the nodes
-
Simple (but not complete) solution implemented: store the
-
We also had some discussions on the design of the Signer (with Key Certification) given the topology of the Cardano nodes run by the SPOs:
- We could use the
Corenode to process the signature w/ KES secret key given a message through the Cardano CLI - Or maybe use the
Relaynode to act as a proxy to make this operation - Some other discussions will take place to find the best architecture in the next weeks
- We could use the
-
We have made a thorough peer review (with the whole team) of the PR New STM registration procedure #433 that should be merged before the end of the week 💪
- We discussed about the security that should be applied to
Mithril Secret Keysversus theCardano Secret Keys:- The best option is to delete the keys as soon as the associated
Certificateis produced - We must keep in mind that in case of an epoch gap in the
Certificate Chain, we may need the keys for1more epoch - The storage of the keys is also a concern (maybe they should be on the Core node)
- The best option is to delete the keys as soon as the associated
- Presentation of the team
- Discovery of the GitHub repository (Project, Wiki, ...)
- Discovery of the documentation website and of the Mithril Explorer
- Q&A session
- Plan next sessions in the following days/weeks
-
We have discovered a problem with
cargo2junitplugin in the CI: thetest-mithril-corejob was not able to produce tests result file and failed. It did not make the CI fail completely which was odd behavior (green in the "Actions" tab and red in the "Pull Requests" tab). We have identified that it was apparently due to stale cache version on the CI. However, a fix has been merged with the PR Enhance Mithril networks documentation #534 that will avoid failure when that situation occurs -
We have also talked about the progress of the issues:
-
Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:
- Final review will be done this week and the work can be showcased during the iteration demo
- Many discussion have taken place in the discord channel regarding the security of the certification. They have been resumed on a discussion as well: How should we link the Mithril identity with Cardano identity #508
- Fix CLI args precedence in Client/Signer/Aggregator#511: the work merged on the Aggregator can be showcased during the iteration demo and the adaptation of the Client is in progress
-
Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:
-
We have reviewed the issue Fix CLI args precedence in Client/Signer/Aggregato #511:
- The adaptation needs also to be done on the Client (so that the use of the
Genesis Verification Keyis mandatory only on therestorecommand) - The PR make parameters precedence on signer #529 has been merged
- The PR Fix crash on startup GCP Aggregator #535 has been merged to fix the GCP infrastructure
- The adaptation needs also to be done on the Client (so that the use of the
-
We have paired on implementing a
Verified Signeron the Mithril Explorer for the Signers that have registered their SPO with the certification process as in Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455. Also thedevnetDocker images were not working since the merge of the PR fix SQLite deadlocks #521 -
We have talked about the process of Mithril Keys Certification which was challenged on the Discord channel:
- The
Operational Certificatedoes not need to be available on the Cardano chain (which means that any pool that has not produced blocks yet can register on a Mithril network) - The validation mechanism works this way:
- The
Mithril Signer Verification Keyis signed by theKES Secret Keyof the SPO - This signature is verified with the
KES Verification Keystored in theOperational Certificate - The
Operational Certificateis signed by theCold Secret Key - This signature is verified with the
Cold Verification Keystored in theOperational Certificate - The
PoolIdis computed as the hash of theCold Verification Keystored in theOperational Certificate - This ensures that only the holder of the SPO
Cold Secret Keyis able to register itsPoolIdandMithril Signer Verification Keyon a Mithril network
- The
- We will open a GitHub discussion regarding this subject and we will as well create clear documentation for this feature
- The
-
Following our work from last week, we have continued working on the setup of the new
Release Process, as in issue Implement Release process #500:⚠️ The reset of thepreviewandpreprodnetworks that will occur in a near future will require a newGenesis Certificatefor the currenttestingenvironment- We agreed that the SPO that we host on the
testingandpre-releaseenvironments will be in anaivesetup at first (only oneCoreCardano node, aRelayCardano node will be added in a second time). We won't apply heavy security requirements on the keys (cold/air-gap) and we will keep things simple and maintainable with automation - Once a commit artifacts are deployed on the
testing-previewand/orpre-release-previewenvironments, we will launch automatedSmoke Tests(to be defined) that will validate the conformity of the development (by testing the available routes and their responses, and that snapshots/certificates are produced after a deployment) - A
pre-releasedeployment will be tested on a 24-48 hours depending if it is a minor or patch update before being qualified as releasable - Some selected SPOs will be running some Signer nodes on the
pre-release-testingenvironment and will provide with some feedback before release - In case of critical bug fix, the qualification phase will be drastically shortened and the main indicator that will be used will be
MTTR(Main Time To Repair) - We still have to find solutions on how to manage the window release length vs the merge locks that it could create
- We will have to refine our vision of how to manage failing deployments with dedicated process/checklists
- We will try to release a new version every 2 weeks, even if it only embeds crates update and small fixes
- We have decided to implement a lightweight
Monitoring/Alerting/Status Pagesolution:uptime robotthat will help us monitor closely failing deployments and provide status feedback
- We have reviewed the PRs on the current issues:
- Fix CLI args precedence in Client/Signer/Aggregator #511: We have paired on fixing the test lab that was not working properly and we have also made some optimizations concerning the default configurations handling. The PR is now completed and will be merged very shortly
- Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455: the unit tests have been adapted so that the new key certification is properly tested along with the legacy declarative version. Some optimizations are in progress and a full pair review of the PR New STM registration procedure #433 will be done early next week (as well as documentation updates)
- We have reviewed the work in progress on the issues:
- Fix CLI args precedence in Client/Signer/Aggregator #511: some re factorization of the commands and their arguments handling is in progress and should be completed shortly. This will fix a problem of having to use unnecessary arguments for some sub commands.
-
Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455: the adaptation of the test setups (module
test_setupinmithril-common) is under development. In particular, it requires to be able to generate on the flyOperational Certificates,KES Key PairsandCold Key Pairs
-
We have talked about the issue Fix CLI args precedence in Client/Signer/Aggregator #511:
- The problem is linked to the default value of the arguments passed by
clapthat is always used (even though an overriding value has been passed by an environment var or via a configuration file) - Some of the arguments used to setup the nodes are thus working only if we use the clap arguments which is not very convenient/coherent (as the vast majority of the others are set with environment vars)
- The best solution is to not use default values for configuration that can be overridden (all except
run_modeandverbosity_level) - We will formalize this rule in a dedicated
ADR
- The problem is linked to the default value of the arguments passed by
-
We have also made a deep review of the PR New STM registration procedure #433 that is linked to issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:
- The development of the first phase are close to getting ready and we hope to merge it soon
- It will not include breaking changes as the Signer and Aggregator will be able to work on hybrid modes:
-
Declarative mode with a non certified
PoolId(as already running) -
Certified mode with a certified
PoolId(activated only when a Signer is associated to anOperational Certificateand aKES Secret Key
-
Declarative mode with a non certified
- A second phase will involve the development of a dedicated
Mithril Certifierthat will help handlingKES Secret Keythat will not be stored on the same Cardano node (Core) as theMithril Signerwhich will be running on top of theRelayCardano node
-
We had discussions about the issue Move stores to relational design with SQLite#476 for which we will probably proceed in multiple phases (Signer + Aggregator):
- Use a relational data model that will be used to implement the current
Store traits - Implement a data model upgrade a la
sqitch - Refactor(if needed) the several
Store traitsused to access these datas - Create ways of aggregating the relational data (with new routes to access them). We will need to dedicate a session for this
- Use a relational data model that will be used to implement the current
-
We talked about the next steps following the setup of our first SPO on
preview:- Automate with scripts to deploy easily with
terraformin the different environments - Handle pool metadata hosting on the documentation website
- Implement the Core/Relay nodes topology
- Work on automating the rotation of the keys
- We will dedicate a session to these next steps
- Automate with scripts to deploy easily with
-
We also discussed about the progress of the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455 which is close to getting ready:
- Test adaptation to do (vs hybrid mode of the Signer/Aggregator certification for a smooth transition with SPOs)
- Updating documentation to reflect the changes
- Write a blog post to explain the Certification activation road map (with
Mithril Certifierto come)
-
We have paired and merged on the issue migrate snapshot store to SQLite#518. The last store has been successfully migrated on the GCP hosted Aggregator 🎉 However, we faced some difficulties with dynamic libraries (
libssl) that was different between the compiled binary and the running OS which made themigratorbinary crash. We will have to take care of these details when ramping up the new release process (issue Implement Release process#500) -
We have discussed about the progress of the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455
-
We have also paired on setting up a SPO node from scratch on the
previewnetwork:- We have followed this guide
- After few attempts, we have been able to activate a pool on the
previewnetwork that is pool15qde6mnkc0jgycm69ua0grwxmmu0tke54h5uhml0j8ndw3kcu9x 💪 - We will keep on working on streamlining the setup of SPOs for our
testingandpre-releasefuture environments, as well as the management tasks of a SPO to use them in the long run
-
We have paired on the issue Fix database dead locks in Aggregator#517. The solution that we have implemented is the following:
- Add a minimum version of
SQLite:3.35+so that we can useDELETE...RETURNINGstatements that avoid explicit use of transactions - Update the CI so that it embeds this minimum version of
SQLite - Add a retry mechanism to fetching data (simple but efficient with fixed sleep duration and max retry limit)
- We have merged the PR fix SQLite deadlocks #521
- We will keep watching if the database locks keep occurring on GCP and on the CI
- Add a minimum version of
-
We had talks about evolution of the stores that will be required by the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455:
- We will probably prepare a manual update script (as only the Aggregator is concerned with this upgrade)
- We definitely need to work with a relational data model soon to handle smoothly this type of upgrade (that could also occur on the Signer)
- This will be addressed in issue Move stores to relational design with SQLite#476
-
We have also prepared a demo path for the first demo with the members of the Mithril Pioneer Program:
- Introduction
- Showcase of the
Genesis Certificateon thedevnet - Presentation of the milestone of
10SPOs signing on ourpreviewnetwork - Presentation of the
Dev Blog - Showcase of the SQLite migration
- Presentation of the
Store Retentionfeature - Presentation of the upcoming
Release Process - Conclusion
- Q&A
-
Here is the showcase path for the
Genesis Certificateon thedevnet:
# Mithril Genesis Certificate
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with full Certificate chain, without keys certification
# Resources
## Github
google-chrome https://github.com/input-output-hk/mithril
## Website
google-chrome https://mithril.network/doc
## Explorer
google-chrome https://mithril.network/explorer/
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git
## Checkout correct commit
cd mithril/
git checkout b7069fd6281f21052f90b80d149f743471c63bbe
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..
---
# Demo: Run devnet
## Start explorer
cd mithril/mithril-explorer
make dev &
cd ../..
google-chrome http://localhost:3000/explorer
## Change directory
cd mithril/mithril-test-lab/mithril-devnet
## Query Cardano
watch -n 1 NODES=cardano ./devnet-query.sh
## Logs Mithril
watch -n 1 NODES=mithril LINES=100 ./devnet-log.sh
## Start devnet
./devnet-stop.sh && DELEGATE_PERIOD=100 EPOCH_LENGTH=60 ./devnet-run.sh
---
# Demo: Restore a snapshot from devnet
## Prepare vars
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
GENESIS_VERIFICATION_KEY=5b33322c3235332c3138362c3230312c3137372c31312c3131372c3133352c3138372c3136372c3138312c3138382c32322c35392c3230362c3130352c3233312c3135302c3231352c33302c37382c3231322c37362c31362c3235322c3138302c37322c3133342c3133372c3234372c3136312c36385d
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST
## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client list
## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client show $LATEST_DIGEST
## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client download $LATEST_DIGEST
## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client restore $LATEST_DIGEST
- We have paired on the issue Fix database dead locks in Aggregator#517. After investigation, it appears that although we have implemented the store adapters behind
RwLock, in some situation a database lock is possible:- If a transaction is opened by an adapter, the whole database is locked. Thus an attempt to make a query will result in a
Error 5: database is locked, until the transaction is committed or rollback - A first issue is that in the case an error occurred during the transaction, it was never closed and resulted in a permanent lock of the database (until the service was restarted)
- We are working on some improvements that will make the system more resilient and efficient (although it requires some modifications on the CI to make sure the version of
sqliteis at east3.35) - We will continue working on a this issue tomorrow as it also creates some flakiness in the CI test lab runs
- If a transaction is opened by an adapter, the whole database is locked. Thus an attempt to make a query will result in a
- We had discussions about:
- Signer Registration (see discussion How should we link the Mithril identity with Cardano identity #508): it can be trusted because of the Genesis Certificate, so there is no specific problem with it
- Stake Distribution (new discussion to be setup to share these information with the community): understanding the portion of the stakes that is required to be secure, and how to possibly ramp up Mithril on the
mainnetin multiple phases (with the implication of IOG stakes at first until we reach the required portion of all Cardano stakes) - Batch verification of the Certificates multi-signatures which would be provided by a batch verification function in the core library. This would involve a slightly different way of validating the Certificate Chain to take advantage of this feature
-
We had reviewed and merged the issue Add auto pruning in stores#504 🎉:
-
We have discovered a bug that is responsible of deadlocks on the Aggregator database and created an issue Fix database dead locks in Aggregator#517
-
We have created some issues with features that we need to implement or low priority bugs we need to fix:
-
We have planned the topics that will be showcased during the demo of the iteration:
- SQLite migration
- Genesis Certificate on the Certificate Chain
- New Dev Blog section the documentation website
- 10 signing SPOs on the
previewnetwork milestone - Release process under construction
-
We have talked about the issue Add auto pruning in stores#504 that is almost ready and should be merged shortly
-
We have reviewed and discussed in depth the issue Implement Release process#500:
- Issue is completed, need to flesh it out in the form of a document?
- Deployment of hosting environments is dependent on some work about deploying custom SPOs
- Have a single version for all crates?
- How to handle version for artifacts that do not change but get promoted?
- References:
- Informations about build number needs to be added (
version = sha1 + build number)
-
We have decided to dedicate a future session to setting up a SPO pool as explained in this guide to better understand the way SPO work
-
We have also discussed about the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455:
- We will create a poll inside "Discussions" tab in order to get a better understanding of how SPOs host their
previewandpreprodpools vsmainnet(Core + Relay / with firewall rules,Core + Relay / No firewall rules,Core only) - A first possible design to handle properly the certification is a
Proxyversion:
- We will create a poll inside "Discussions" tab in order to get a better understanding of how SPOs host their

- A preferred design (that should be more adapted to the SPOs) is a
Async Validatorversion:- Signer creates key material to sign when crossing epoch threshold (the protocol initializer with its associated verification key)
- Validator calls signer when "ready" (on cron, or manually) and ask for key material to sign
- Validator uses hot KES keys to sign the key material and send it to the signer
- Signer can then start registration process once it has signed material

- In the Mithril Explorer we will display the security level (or probability of an adversarial party to create a fake certificate) on each snapshot (and provide the formula used to compute it when hovering the protocol parameters displayed)
-
We had talks about the issue Add auto pruning in stores#504:
- It appears that there was a bug in the
MemoryAdapterwere theget_last_n_recordsfunction retrieved the n last records sort by date of update instead of date of creation. This bug was fixed. - However, there was a bug in the implementation and in its test. We have discussed about how we could create some trait related tests that could help us spot such a problem easily (and also help qualify a new implementation of the traits is "correct")
- We have also talked about how to handle the configuration of the retention length on the stores: if none is specified (as this is currently the case) full retention is applied, if a retention length is specified then this length is used to prune the stores
- It appears that there was a bug in the
-
We had some discussions about the discussion Use CIP-22 as a way to identify SPOs when registering keys #507:
- The idea behind is the same as the one under implementation in the PR New STM registration procedure #433:
- Asking the owner of the pool to sign a message with its secret key in order to prove it owns this secret key
- In
CIP-22:- The message signed has no meaning and is randomly generated by the verifier of the ownership
- The secret key used is the
VRF Secret Keywhich is a hot non rotated key (but for which there is no Rust library available for signing/verifying)
- In our proposal:
- The message signed is the actual
Mithril Signer Verification Keyvalid for1epoch - The secret key used is the
KES Secret Keywhich is a hot rotated key (for which a Rust library is available, done by IOG at https://github.com/input-output-hk/kes)
- The message signed is the actual
- The architecture of a Cardano SPO on the
mainnetimplies that:- A
Core Serverhosts aCore(orBlock Producing) Cardano node, which is aFullnode that has access to SPO hot secret keys and is isolated from the rest of the world (except that it is allowed to communicate with one or multiple associatedRelaynodes) - A
Relay Serverhosts aRelayCardano node, which is aFullnode which is accessible from other external Cardano node peers, but does not have access to the SPO secret keys
- A
- A naive setup for running a
Certified Mithril Signers(devnetorpreview) requires that the Mithril Signer node has access to:- An Aggregator that is external to the SPO infrastructure via a REST API (to send individual signatures)
- The database of a local Cardano
Fullnode via file system (to compute snapshot digests and stake distribution) - The SPO hot secret keys (and operational certificates) via file system (to compute the signature that certifies the SPO is genuine)
- A more elaborated setup (
preprodormainnet) would probably require that we split theMithril Signerin 2 parts:- A first part running on the
Coreserver only responsible for signing theMithril Signer Verification Keys(when requested by the other part) - A second part running on the
Relayserver and responsible for the rest of the Mithril protocol (registering with Aggregator, sending individual signatures, ...)
- A first part running on the
- Here is a sketch of the naive setup:
- And a sketch of the real setup:
- The idea behind is the same as the one under implementation in the PR New STM registration procedure #433:
-
As expected,
2epochs after applying the fix on the Stake Distribution computation of issueStake distribution discrepancy#497, the Signers have been able to produce reliably individual signatures that are successfully registered on the Aggregator 💪 -
We have followed up on the merge of the issue
Deploy SQLite store adapter#475. We have made some fixes on the migrators. We have helped the SPOs who had hard times migrating some of their stores and everything looks good now 🎉 -
We have talked about a nice to have feature of pruning automatically the stores of the Signer/Aggregator nodes. This will be implemented shortly in this issue
Add auto pruning in stores#504 -
Also we have paired on the issue
Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455. We are working on a plan to deploy smoothly the feature to the SPOs before activating it on the Aggregator, so that a transition window will be opened for SPOs to deploy the change on their Signer nodes. We will keep on pairing on this complex topic during this iteration
-
Following the merge of the issue
Stake distribution discrepancy#497, the stakes stores on GCP (Aggregator and Signers) are OK. We keep an eye on the list of signers in the Certificates from epoch37that should embed new Signers and the error rate on the individual signatures registration that should drop -
We have paired and merged the issue
Deploy SQLite store adapter#475 that activates the newSQLitedata store:- The Aggregator and the Signers nodes running on GCP have been successfully migrated to use the new store adapter
- We encountered a few difficulties when migrating the Aggregator stores. It appears that being able to qualify the migration on a testing environment would have been very helpful
- We are expecting the SPOs to migrate their stores (as explained in this dev blog post)
-
We have have continued working on the
Release Processsetup:- A dedicated issue has been created
Implement Release process#500 and some tasks have been added to it - Here is the updated definition of the process:
- We will use a common version (
semver) for all the crates of the repository and for the GitHub release - All the nodes should be able to display the current version they are running
- In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
- We will work with GitHub environments to support deployments of versions on multiple environments
- A new version
0.1.2will have the following life cycle:- A commit
abc123merged onmainbranch is deployed ontestingenvironment namedtesting-preview - A commit
def456tagged with0.1.2-prerelease1is deployed onpreprodenvironment namedpre-release-preview - A GitHub release
0.1.2is created and linked with the0.1.2-rc1tag and marked aspre-release - A tag
0.1.2-prerelease1is qualified and selected for release or rejected (and replaced by a0.1.2-prerelease2tag if necessary on aghj789) - If the tag
0.1.2-prerelease1is selected, a new tag is created and name0.1.2on the same commitdef456 - The GitHub release is linked to the
0.1.2tag and marked asrelease - The commit
def456with tag0.1.2is deployed to theprodenvironment namedrelease-preprod
- A commit
- We will use a common version (
- Some questions remain:
- When to update
cargo.tomlcrates version vs creation of the draft release on GitHub? - How to handle
merge lockduring qualification of a release candidate (with onlymainbranch) (Use of feature flag?) - How to handle
Protocol Versionssmoothly (backward compatibility of messages w/Avroor equivalent solution?) - How to simplify the update process for the SPOs (with debian package for example)?
- How to handle real SPOs on the
testing-previewandpre-release-preprodenvironments (vs key rotations, secret keys management, ...)?
- When to update
- The deployment schema is now:
- A dedicated issue has been created
-
We have reviewed and merged the issue
Stake distribution discrepancy#497:- The Stake Distribution should get back to normal
2epochs after rebuilding the Signer - We will keep monitoring the GCP hosted Aggregator to check that the deployment goes well and does not prevent the Snapshot production.
- The SPOs should rebuild their Signer node (as explained in this dev blog post)
- The Stake Distribution should get back to normal
-
We have paired on the issue
Deploy SQLite store adapter#475 and finalized the steps to follow in order to smoothly migrate the Signer/Aggregator nodes stores. TheUse Sqlite datastore in Aggregator & Signer#477 should be merged tomorrow -
We have also worked on defining the
Release Processfor the Mithril Network:- We will use a common version (
semver) for all the crates of the repository and for the GitHub release - All the nodes should be able to display the current version they are running
- In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
- We will work with GitHub environments to support deployments of versions on multiple environments
- A new version
0.1.2will have the following life cycle:- We will use a common version (
semver) for all the crates of the repository and for the GitHub release
- We will use a common version (
- All the nodes should be able to display the current version they are running
- In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
- We will work with GitHub environments to support deployments of versions on multiple environments
- A new version
0.1.2will have the following life cycle:- A commit
abc123merged onmainbranch is deployed ontestingenvironment namedtesting-preview - A commit
def456tagged with0.1.2-prerelease1is deployed onpreprodenvironment namedpre-release-preview - A GitHub release
0.1.2is created and linked with the0.1.2-rc1tag and marked aspre-release - A tag
0.1.2-prerelease1is qualified and selected for release or rejected (and replaced by a0.1.2-prerelease2tag if necessary on aghj789) - If the tag
0.1.2-prerelease1is selected, a new tag is created and name0.1.2on the same commitdef456 - The GitHub release is linked to the
0.1.2tag and marked asrelease - The commit
def456with tag0.1.2is deployed to theprodenvironment namedrelease-preprod - Diagram of the release process is below:
- A commit
- We will use a common version (
-
We have talked about the nearly ready to merge issue
Deploy SQLite store adapter#475:- How long do we keep the migration binaries available before decommissioning them? (From
2to4weeks) - How to communicate with the SPOs about that breaking change and provide them with simple yet efficient documentation (This will be implemented inside a dedicated dev blog post)
- How long do we keep the migration binaries available before decommissioning them? (From
-
We have reviewed and merged the
Record 'contributing' Signers only in Certificate#495 -
We had discussions about the issue
Stake distribution discrepancy#497 that makes theStake Distributioncomputation non deterministic and source ofA provided signature is invaliderror messages when a Signer submits individual signatures. In order to fix swiftly the problem, we have defined a plan:-
Solution 1: Add a feature that makes the
Stake Storeretrieve always the sameStake Valuesuntil a better solution is found (worst case scenario; this will not be necessary, as we moved to Solution 2 directly) -
Solution 2: Compute the
Stake Distributiondifferently by gathering the Stakes from the previous epoch pool by pool (best solution fortestnet; solution that is under development in the PRFix Stake Distribution retrieval#499) -
Solution 3: Modify the
cardano-cliso that it computes the stake distribution at the previous epoch (better solution for long term andmainnet; we will explore it in the future) -
Solution 4: Package a custom developped cli in Haskell that will query the ledger state and retrieve the
Stake Distributionof the correct epoch (good solution, but drawback is that we need to package/deliver several binaries at once)
-
Solution 1: Add a feature that makes the
-
Other solutions have been debated such as calling Haskell functions from Rust or using a third party chain indexer
-
We have postponed the talks about the release process and we will resume them tomorrow during a dedicated session.
-
We have agreed that a relevant test case of Daedalus/Mithril would be to boostrap a
mainnetarchive with/without Mithril snapshot. This will require that we run amainnet"test" environment. This will be part of our release/environments concerns/discussions -
Also, as we have been using the Cardano infrastructure (node/cli) quite a lot during our developments, we will organize a retrospective to give some feedback about it
-
The
Genesis Certificatedeployement worked as expected and new Snapshots are now available on the Mithril Explorer 🎉 -
We have reviewed and paired on the PR
Use Sqlite datastore in Aggregator & Signer#477 of the issueDeploy SQLite store adapter#475 with a main focus on the migration tool that is being built in order to migrate existingJSONstores toSQLite. We are at the stage of making the tool as easy to use as possible for the SPOs that will use it. Also we will create aHow to migrate storesguide and a post on the dev blog that explains why and how use this tool. We should be able to merge next week -
We have also reviewed, paired and merged many fixes and improvements PR:
-
The PR
Implement Real Genesis Certificate#438 has been merged and deployed successfully on the GCP Aggregator. However we had hard times to run thegenesis bootstrapcommand. A fix is available in this PRUpdate Genesis GCP infra#487. The firstGenesis Certificatehas been generated and saved successfully at epoch29of thepreviewnetwork and we should see newCertificatesproduced as soon as the transition to epoch30has taken effect 🎉 -
We have also worked on the preparation of the migration from
JSONtoSQLitestores (which must take place on the Aggregator as well as the Signers), and have identified few options:- Add a specific command line in Aggregator/Signer to handle the migration
- Handle the migration with dedicated scripts, which would be cumbersome and does not look like the best option
- Add a new binary build in the cargo projects of Aggregator/Signer (that looks like the best option to take advantage of the CI and drop the code within a short time frame after release)
-
Once we have migrated to
SQLiteour stores, we will move on the relational implementation of the stores. We will have to work on an upgrade mechanism that will automatically upgrade the database schema when required -
We had discussions about the
Signersdisplayed in theCertificatesof the Explorer:- We could display the stakes as
ADAvalue or as%ageof total stakes enrolled in the Mithril network - We could also display which Signers have their individual signatures included in the certificate
- We could display the stakes as
-
We have added a new
Dev Blogon the documentation website. This will help handle communications with the SPOs regarding breaking changes, deprecated features, new versions release, ... -
We need to work on the release process in order to manage correctly the evolution of the network with SPO users. We have talked about the options and questions we have, and will address them in a dedicated session:
- Rhythm of releases
- Versioning of the crates vs the Github tags
- Validation of the release candidates
- Trunk based or Gitflow?
- Packaging of the releases
- Automatic updates?
-
We have talked about the possible implementations of the optimization described in issue
Extend API to accept signature generation without Merkle path#161 and in PRSwitch blst#159 -
We also talked about the way we could create more compact certificates by avoiding duplication of the common parts of the Merkle paths stored
-
We have discussed the way we could provide a
Security Levelof the Chain on the Mithril Explorer, which relates to issueInclude probability of success for different parameters#48. Researchers will provide a formula based onk,m, andphi_fprotocol parameters that can be used to compute a probability that an adversarial party produces a valid multi-signature -
We discussed about the evolution of the protocol parameters and Researchers will come back with proposed set of parameters that fits the number of Signers involved in the network
-
Finally, we talked about the RFP regarding the understanding of the impact of the percentage of the stakes involved in the network vs the security level, as it appears that the paper assumption of 100% stakes involved is not realistic. Also some very different scenarios can occur when we think about only a share of the stakes involved: if 10% of stakes are involved in Mithril network and 10% of the stakes of the Cardano network are considered adversarial, do we consider that 100% (all the adversaries of Cardano) or 10% (the share of adversaries of Cardano) of the Mithril stakes are adversarial?
-
We have been reviewing and finalizing the PR
Implement Real Genesis Certificate#438. It is ready and will be merged tomorrow. Here are the operational implications:- Reset the
Certificate Chainof the GCP hosted Aggregator - Bootstrap the
Genesis Certificateon the GCP Aggregator - Requires that the SPOs recompile their Signer node (to handle faster registration), but previous version is compatible and will continue working
- Reset the
-
Regarding the flakiness of the CI:
- We attribute it to the way the
Stake Distributioncomputed by thecardano-cli - The expected error rate on the CI is
~4%. If this rate gets too high, we will have to deactivate the stake delegation feature of the test lab until we find a better solution
- We attribute it to the way the
-
We have also worked on the migration of the stores of the Aggregator/Signer to
SQLiteas inDeploy SQLite store adapter#475. We still have a few issues to fix and we will also work on an automatic upgrade mechanism (especially on the Signer side) before merging
-
We have merged the issue
Deploy mithril demo infra on 'preview' network#457 (as well as the PRUpdate Blake dependency#474). The Aggregator hosted on GCP is now running on thepreviewnetwork and producing snapshots 💪 -
We have debriefed about the previous session and the Certification of the Mithril Signer Verification Keys and we all agreed on the next steps discussed previously
-
We have spent some time to dig in the Haskell code that makes the calculation of the stake distribution and we have found out that the
cardano-cliprovides the full precision on the stake distribution when the--out-fileoption is activated. An issue has been created to adapt the current implementation of theChain Observerand take advantage of this optionEnhance Stake Distribution retrieval#480 -
⚠️ We have also tried to understand the source of flakiness on the CI and we have noticed that the computation of the stake distribution may be responsible:- We have noticed that even though we plugged all the Mithril nodes of the test lab on the same Cardano node of the
devnet, the nodes retrieved different stake distributions during the same epoch - We have leaded another experimentation with stake delegation and we have clearly found that we could actually have different results during the same epoch
- This is a problem as we are expecting:
- The Stake Distribution to be computed for the previous epoch (and not the current epoch)
- The Stake Distribution to be deterministically computed on all the nodes
- We will probably have to work on different implementations of the
Chain Observer:- Propose an evolution of the
cardano-clithat allows to target a specific epoch when computing the Stake Distribution - Investigate other technologies that allow to observe the evolution of the chain
- Propose an evolution of the
- We have noticed that even though we plugged all the Mithril nodes of the test lab on the same Cardano node of the
-
We have talked about the incoming PR that include breaking changes:
-
Move GCP Aggregator to 'preview' network#470 -
Update Blake dependency[#474] (https://github.com/input-output-hk/mithril/pull/474) -
Use Sqlite datastore in Aggregator & Signer#477 -
Implement Real Genesis Certificate#438 - We will, at least, merge #470 and #474 at the same time: (Scheduled for Next Monday)
- Requires that the SPOs recompile their Signer node, update the configuration (
NETWORK=previewandNETWORK_MAGIC=2) - Involves a full reset of the Aggregator on GCP, and a manual intervention to produce new certificates
- Requires that the SPOs recompile their Signer node, update the configuration (
- If possible, we will also merge #477:
- Requires that the SPOs recompile their Signer node
- Involves a full reset of the Aggregator on GCP, and a manual intervention to produce new certificates
- When ready, we merge #438:
- Transparent for SPOs
- Requires a reset of the
SnapshotsandCertificate Chain(which will be bootstrapped with aGenesis Certificate) on the Aggregator
-
-
We have paired on the last bug that creates flakiness in the CI in the
Bootstrap Certificate Chain w/ Genesis Certificate#364. It appears that a discrepancy occurs from time to time (~5%) on the computation of theNext Aggregate Verification Keybetween the Signers and the Aggregator. We are still investigating the issue and we should fix it shortly -
We have also paired on the issue
Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455 in order to elaborate the best way to implement this feature. We have agreed on:- Implementing this feature in
mithril-commonin order to keepmithril-corechain agnostic - In order to guarantee that no Mithril node can interact with the core library without being authenticated (now and in the future):
- The
mithril-corelibrary should be directly imported only by themithril-commoncrate (we should probably enforce this rule in the CI) - A Cardano specific
ProtocolKeyRegistrationwill be implemented as a wrapper around themithril_core::KeyRegand added as a sub module ofcrypto_helpermodule - A Cardano specific
ProtocolInitializerwill be implemented as a wrapper around themithril_core::StmInitializerand added as a sub module ofcrypto_helpermodule - We will extend the
entities::Signertype so that it includes the Cardano specific material required for Signer certification (Operational Certificateof the SPO,Signer Verification Key Signaturesigned by theKES Secret Keyof the SPO). This will allow theSigner Verification Key Certifierto certify that the Signer node is the genuine holder of apoolIdon the Cardano network and of aMithril Signer Verification Key - Another required information is the
KES Periodthat can be retrieved from thecardano-cliand that will be retrieved through the currentChain Observer(using the fieldqKesCurrentKesPeriodof the commandcardano-cli query kes-period-info) - We will add a new type dedicated to serialize/unserialize Cardano crypto material (that will also handle the
cborHexconversion. This type will be able to parse a crypto file generated by the Cardano cli and convert it tobytes, and to export ajsonformat with keys encrypted incborHex. This type will be also used for theGenesis Certificate Verification Key.
- The
- Implementing this feature in
-
We had discussions about the fixing of the flakiness of the CI that we are trying to fix in the
Bootstrap Certificate Chain w/ Genesis Certificate#364. We have paired and prepared some fixes in theImplement Real Genesis Certificate#438. Also a fix on themithril-corehas been merged in order toAvoid panics in 'StmInitializer'#472 -
We also had some talks about the migration of the Aggregator hosted on GCP to the
previewnetwork:- At first, we will decommission the
testnetsnapshotting - Then, it will be replaced by the
previewnetwork (target ETA is EOW) - In a second time, we will work on supporting multiple networks
- At first, we will decommission the
-
In order to work efficiently with SPOs, we will need to work with regular releases:
- We intend to create new releases every 1/2 weeks
- We will name our deployment environments the same way as the Cardano networks (
devnet,preview,preprod,mainnet) - When a commit is pushed on a working branch, the
devnetis launched in therun-test-labjob of the CI - When a commit is merged on the
mainbranch, a terraform deployment will be triggered on thepreviewfrom the CI - When a
tagis created (maybe following a specific format), a terraform deployment will be triggered on thepreprodfrom the CI - The Signer, Client and Aggregator nodes will be released synchronously with the same tag version
- We will probably implement a feature where if a Signer or a Client requests the Aggregator with a different version, a
400bad request will be returned
-
We also had discussions about the issue
Simplify the Multi Signer in Aggregator#398 and we have tried to elaborate a road map to implement it:- The strategy is to make the multi signer pure and let the state machine handle the state
- We will define a clear interface for interacting with the state
- In a second time, we will also try to enhance the state machine of the Aggregator, then of the Signer
- We will use an event driven state machine that gets updated given a list of
(State, Event) -> ApplyTransition -> NewStateby depiling queued events. We still need to find a way to handle the synchronous responses of the http server routes
-
We have reviewed the new issues that have been created:
-
permission denied issue in dev-net#459: we have hard times reproducing the issue. Therefore, we have asked the user to provide with more details about his setup. However, we have merged a PR that could fix the permission issueFix attempt 'Permission Denied' in devnet#467. We are waiting for a feedback of the user to see if this patch fixes the problem -
Provide machine-readable output for mithril-client#464: We will start working on it shortly
-
-
We have received and reviewed a first PR from the community
DATA_STORE_DIRECTORY#465 that adds a missing configuration update on the Signer setup for a SPO -
We also had discussions about the PR in progress:
-
Greg/444/sql store#460 has been merged as a first milestone of the PoC we are conductig on switching the stores toSQLite💪. We will work on the enhancement of the iterator management (and avoid loading the full store in memory) and also on moving the actual stores in the Aggregator and Signer nodes in the nex future -
Implement Real Genesis Certificate#438: we need to fix the panic that occurs sometimes on the Signers and we should be able to merge the PR then. Once the PR is merged, we will be able to bootstrap a brand newpreprodGCP Aggregator as in issueDeploy mithril demo infra on 'preprod' network#457
-
-
We have paired on the issue
Bootstrap Certificate Chain w/ Genesis Certificate#364. All the features have been implemented in the PRImplement Real Genesis Certificate#438. However, we have some flakiness issues that we need to fix prior to merging (that must have been in the previous code and that create somepanicsin the Signer) -
We have reviewed and discussed about the PoC for implementing a
SQLitestore adapter. A first version is close to being ready with an iterator that loads all the records in memory. Once this version is stabilized, we will work on a optimizing the iterator
-
We have paired on the issue
Bootstrap Certificate Chain w/ Genesis Certificate#364. We are close to being ready to merge the PRImplement Real Genesis Certificate#438 -
We also had discussions about:
-
We have sliced and created the tickets for the new iteration
-
We have cleaned up the stales branches of the repository
-
We have merged the PR
Flaky tests#374 🥳 We now useblstas the crypto backend (withportablefeature activated in the CI). We have also resetted the stores of the GCP Aggregator (as the previous keys were not compatible withblst) -
As we will start working on the Mithril Keys Certification we had some discussions about this feature (and about
cborencodings for the keys) -
Also, we have paired on the PR
Implement Real Genesis Certificate#438, that we will merge shortly
-
We have open sourced the repository!!! 🎉
-
We have reviewed the final version of the PR
Flaky tests#374 and we have paired on optimizing theportablefeature implementation -
We also had discussions about the difficulty we face when trying to implement the
SQLitestore adapter. We will try a different approach by working the underlying crate used by the crate we are trying to implement -
We have prepared a path for the demo with the goal of
Open Sourcingthe GitHub repository 🥇:- Making the GitHub repository public in live 🚀
- Showcasing the final version of the documentation website (that we have already made public)
- Showcasing the restoration of a
tesnetCardano Nodefrom aMithril Snapshothosted on GCP (and also showcasing theMithril Explorer)
# Mithril End ot End
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with real Certificate chain (without genesis)
# Resources
## Github
google-chrome https://github.com/input-output-hk/mithril
## Website
google-chrome https://mithril.network/doc
## Explorer
google-chrome https://mithril.network/explorer/
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git
## Checkout correct commit
cd mithril/
git checkout 2c286878d070b842cd40f63ae580456cc50c00f7
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..
---
# Demo: Restore a snapshot from testnet
## Prepare vars
NETWORK=testnet
AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST
## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list
## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST
## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST
## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
-
We have reviewed the PR
Flaky tests#374 that corrects the CI flakiness ofmithril-core🥳 . There is still a question regarding the implementation of theportablefeature ofblastthat we need to investigate as we are using the artifacts built by the CI to created Docker images (and in the future released binaries). Also when merging this PR we will have to reset/recreate the stores on the GCP Aggregator (as the keys currently generated withzcashare not compatible with theblastkeys). We should merge at the end of the iteration. After some discussions, we have decided to use a featureportablein themithril-corelibrary and not to re-exposemithril-corefrommithril-common. This feature will be used in the CI (tests and artifacts released) at first. We still need to understand what is different between portable and not portableblast(apparently related to IAS extensions that may causetheSIGILL) and also we will work on adapting the CI and artifacts (Docker, executable) production with the idea that we must test the artifacts that we release. -
We have reviewed the latest commits of the PR
Implement Real Genesis Certificate#438. We will continue to work on it and expect to merge it shortly -
Also, we have paired on the
use SQL store#444
-
We have reviewed and merged the
Repository is missing a CONTRIBUTING document#446. We also had discussions about the final steps before open sourcing a branch protection rules before merging a PR (see) -
We have paired on:
-
We have activated the
Require approvalsfeature on the repository before merging new PRs (this will be needed when open sourcing the repository)
-
We have paired on numerous bug fixes and enhancements related to the flakiness of the CI:
-
We have reviewed and merged:
- The PR
Aggregator check existing certificate#435 which closes the issueAggregator is stuck in "Signing" state when epoch changes#431 🥳 - The PR
Move Certificate Verifier to Common#436. It prepares the work to be done in the issueBootstrap Certificate Chain w/ Genesis Certificate#364 for which we have been talking about the steps that needs to be completed - The PR
add code doc & factor service initialization#440 that relates to issuePrepare open-sourcing of repository#92
- The PR
-
We had discussions about the need to handle data structure update and to have debug tools. A way to work on these two issues is to use
SQLiteand implement a store adapter on top of it. We will run a small PoC on this implementation
-
We have merged the
Add signer integration test#430 🥳 -
We have also reviewed the first PR of the issue
Aggregator is stuck in "Signing" state when epoch changes#431 that will be merged shortly. We will pair on the second part of the issue which requires some modifications of the Snapshots store -
We had also discussions about the
Mithril Keys Certification:- We have reviewed the PR
New STM registration procedure#433 - We still need to find out how to retrieve all the information needed (
KES Key periodwith Cardano Cli andCold Verification Keymaybe from the Core Cardano Node) - We were wondering if the
KES Keysare renewed by overwriting the files. If this is he case, it means that we would need to reconfigure the Signer node after renewal of the keys - The Signer does 2 new things during key registration:
- Sign the Mithril Verification Key with the
KES Secret Keyto produce aKES Signature - Send the
Operational Certificate, theCold Verification Key, theKES Periodand theKES Signatureto the Aggregator during the registration process
- Sign the Mithril Verification Key with the
- The Aggregator will verify the authenticity of the
Pool Idand the associatedMithril Verification Keyduring the registration of the Signer. It will allow the Aggregator to match thePool Idwith theStake Shareretrieved from the Cardano Node. We still need to check if theOperational Certificate, theCold Verification Key, theKES Periodand theKES Signatureneed to be stored on the Aggregator - For now, the Core library will keep computing the Merkle trees the same way and use only the Stakes from the registered Signers (and not from the whole Cardano Network)
- Before we merge this PR, we will need to have a running SPO node on GCP (that needs to be configured) so that we don't miss epochs in the Certificate Chain
- We have reviewed the PR
-
We had also talks about the
Genesis Keys:- We will probably store the
Genesis Keyswith the same codec as the other keys used in Mithril (by usingserde(de)serialization andbase64encoding) in the first place - However, the
Genesis Keysused by the Cardano Node seem to be using acborformat. We will try to handle this encoding instead - Another question that was raised is where can we find the
mainnetGenesis Verification Key?
- We will probably store the
-
We have reviewed and will merge shortly the latest modifications of the issue
Add signer integration test#430 -
We have paired on understanding and fixing a bug on the Aggregator
Aggregator is stuck in "Signing" state when epoch changes#431. Some PRs that fix the problem are in progress and will be merged shortly -
Following the occurrence of this bug, we have thought that it would be a good idea to implement a
Max Errorfeature for a runtime cycle: if the runtime is in errorMtimes in a row for the same state, the Aggregator runtime would panic. This would also help us spot early problems in state transitions -
We had also discussions about the
Mithril Keys Certification:- In order to verify the SPO that is running a Mithril Signer, we will sign the
Mithril Verification Keywith theCardano Hot Secret KeyakaKES.skeyand we will verify it with theCardano Hot Verification KeyakaKES.vkeythat is stored inside theOperational Certificateof the Cardano Node of the SPO - Every 6 epochs, the
KES Keysare rotated and a newOperational Certificatewill be issued. This means that we need to retrieve the currentOperational Certificateat each epoch (before the Signer registers its keys with the Aggregator) - We will try to stay on the
Cardano Relay Nodeand avoid if possible to work with theBlock Producing Node. It means that thePoolIdwhich is the hash of theCold Verification Keyshould be declared by the SPO (and also verify that it matches with the one included in theOperational Certificate) - The
Mithril Verification Key Signaturemust be verified on the Signer at startup and also on the Aggregator during registration - We will include the
KES.skeysiging of theOperational Certificatein the core library - We will maybe use the
Cardano Clito verify the signatures as it will require less work at first. This code should be incorporated into the core library when we go to mainnet - We also need to find a way to retrieve the
Operational Certificatefrom theCardano Cli
- In order to verify the SPO that is running a Mithril Signer, we will sign the
-
We have reviewed and merged the PR
Certificate chain integration test for Aggregator#424.It should fix some bugs related to issueProduce valid certificate chain for several epochs on Devnet#396 -
We have also reviewed and paired on the
Greg/317/signer integration test#426. It should be merged shortly -
We have also discussed about the
Certificate Chain:-
Epoch Gap: We will work in the first place on handling the Epoch Gap with using the latest "certified" stake distribution to sign the current epoch as defined in the previous Research/Engineering session. This will be done when thedevnetis working smoothly. The mechanism needs to:- Detect a gap in the
Certificate Chainin the Aggregator - Modify the
Beaconof thePending Certificateto use the previousEpochin the Aggregator - Make the Signers use the
Epochfrom theBeaconof thePending Certificatein order to select theProtocol InitializerandStake Distributionto use to produceSingle Signatures
- Detect a gap in the
-
Multiple Protocol Parameters: the Aggregator can try multiple sets of parameters (with equivalent security level) on the gatheredSingle Signaturesin order to produce the most efficientMulti Signature. It will try the harder to reach parameters first. The only constraints on the parameters are:- They must share the same parameter
phi_fvalue that is used to createProtocol Initializer - The Signers must use the worst case parameters (the one with the highest number of lottery attempts
m)
- They must share the same parameter
-
Genesis Certificate: We will try to put in place a process in thetestnetthat is as close as possible as what we will deploy on themainnet. The genesis mechanism would the as follows:- The Aggregator must wait until a
Genesis Certificateis available before appending any Certificate to the chain - In the mean time, the Signers will be able to proceed to the key registration
- At a manually selected epoch (preferably at the beginning of the epoch), the
Genesis Certificate Bootstrapwill happen - Once the
Genesis Certificateis saved in the Aggregator store, it will be able to produce validCertificatesand to append them to the chain. This should start occurring at the next epoch. - The
Genesis Certificate Bootstrapwill be done as follows:- Export the
payload/messageto be signed in theGenesis Certificatefrom the Aggregator (via cli) and store aProto Genesis Certificate(unsigned) - Use the
Genesis Private Keyto sign thismessageand create aGenesis Signature(cold process, done out of Mithril Network on themainnet, can be done via Mithril cli on thetestnetanddevnet) - Import the
Genesis Signatureback in the Aggregator and update theProto Genesis Certificateand convert it to a definitiveGenesis Certificate(metadata will be updated and hash needs to be recomputed, done via cli)
- Export the
- The Aggregator must wait until a
-
Mithril Keys Certification: This subject is still under definition, but some issue arose about:- Do we need to run a Mitril Signer on the
Block Producing Nodejust for this certification (the one that holds the cold keys required to sign and that is closed to the outside)? Or is this operation done by the Cardano Node itself? - The Mithril Signer will be running on the
Relay Node, the one that is opened to the outside world (and does not have access to the hot keys)
- Do we need to run a Mitril Signer on the
-
-
This was the first meeting with the
Daedalus/Laceteam. The goal was to understand each other needs and to setup short term goals and working environment -
Daedalusend of life will happen soon andLacewill replace it (with an Open Source approach).Lacewill also handle a light client wallet -
We showcased the restoration of a
Cardano Nodeon thetestnetthanks to aMithril Snapshot -
Questions discussed:
- Is it possible to restore not the full immutable database, but instead work with the range of missing files? (Answer is yes, but not on the first version as the feature is not implemented yet)
- How secured is Mithril and the downloaded snapshot? (Answer is fully secured by design, %age of SPOs participating, and protocol parameters selection)
- Who pays for the bandwidth? (Answer is IOG for the Aggregator that it currently hosts, and each Aggregator provider when multiple are available. Also we have plans for using peer to peer networks for hosting the archives)
- What about Utxo set? (Answer is not implemented yet, but will allow Mithril to handle light wallets)
- What about the new
testnet? (Answer is we need to work in that issue, but the new testnet is not stable enough at the moment) - Do the Mithril client binaries exist for Linux, macOS and Windows? (Answer is not yet but easy to do, will be part of the work)
- How to communicate with a Mithril Client? (Answer is stdout or text file in a first version, then IPC later. It will provide a percentage of completion and error/log messages. Will work the same whether running on Daedalus or Lace)
- How to integrate Mithril snapshot restoration in the wallet? (Answer is by being a part of the
Cardano Launchermodule of the wallet. Once the archive is extracted, Mithril is not used/needed anymore)
-
Next steps for the PoC:
- Setup another meeting to create technical tasks in Jira/Github Projects with engineers
- Create a dedicated private Slack channel with members from the 2 teams
-
We have reviewed the PRs about:
- Integration tests on the Signer (incoming)
-
Add Store Protocol Parameters in Aggregator#385 that is ready to be merged
-
All our efforts have paid off and we now have the GCP Aggregator working smoothly, see issue
Produce valid certificate chain for several epochs on Testnet#397. However, we will monitor it closely to be sure that there are no other snapshot producing blockers -
Also we have noticed that the refresh rate of the runtime interval of the Mithril nodes (especially the Aggregator) seem to have a high impact on flakiness of the CI/devnet. We are still activaly investigating this issue
Produce valid certificate chain for several epochs on Devnet#396, however the flakiness is now considerably mitigated -
We have also prepared the demo path:
# Mithril Certificate Chain
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution
# Resources
## Github
google-chrome https://github.com/input-output-hk/mithril
## Architecture
google-chrome https://mithril.network/doc/mithril/mithril-network/architecture
## Certificate Chain
google-chrome https://mithril.network/doc/mithril/mithril-protocol/certificates
## Explorer
google-chrome https://mithril.network/showcase/
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git
## Checkout correct commit
cd mithril/
git checkout 4325260ec657b4cde0d4be5c6ff2a23241f2d886
cd mithril-client && make build && cp mithril-client ../../ && cd ..
---
# Demo: Download & Restore Latest Snapshot All In One (~20 min)
NETWORK=testnet && AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST -vvv
NETWORK=testnet && AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
---
# Demo: Launch a Mithril Network explorer
## Change directory
cd mithril-showcase
## Build website
make dev
## Open explorer
google-chrome http://localhost:3000/showcase
---
# Demo: Bootstrap and start a Mithril/Cardano devnet
## Change directory
cd mithril-test-lab/mithril-devnet
## Run devnet with 1 BTF and 2 SPO Cardano nodes
MITHRIL_IMAGE_ID=main-4325260 NUM_BFT_NODES=1 NUM_POOL_NODES=2 EPOCH_LENGTH=45 SLOT_LENGTH=1.0 DELEGATE_PERIOD=90 ./devnet-run.sh
## Watch devnet logs
watch -n 1 LINES=5 ./devnet-log.sh
## Watch devnet queries
watch -n 1 NODES=cardano ./devnet-query.sh
## Visualize devnet topology
./devnet-visualize.sh
## Stop devnet
./devnet-stop.sh
# Client
## Get Latest Snapshot Digest
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST
## List Snapshots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list
## Show Latest Snapshot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST
## Download Latest Snapshot (Optional)
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST
## Restore Latest Snapshot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
## All at once
NETWORK=devnet && AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
-
We have reviewed and merged the issue
Add state machine runtime Signer#317 🥳 it apparently solves the problem that prevented the creation of certificates because signer registration was not done properly at each epoch -
We have also reviewed and merged the issue
Add/Use Protocol Initializer Store in Signer#362. The non deterministic verification keys have been rolled back and a bug has been fixed in theClerkcomputation . With invariantStake Distribution, the network is able to generate a validCertificate Chain💪 -
We still have some flakiness occurring when the stake distribution changes and we are actively investigating them
-
This was the first official meeting to synchronize Research and Engineering teams. This meeting will take place every 2 weeks
-
We have mainly discussed about how to handle an
Epoch gap in the Certificate Chain(seeMithril Client fail to validate certificate chain if the previous certificate is more than one epoch older#377:- Having no epoch gap in the Certificate Chain is mandatory to guarantee the security of the protocol an avoid "long range" attacks
-
Re-genesisthe Certificate Chain is always possible and "nuclear" option used if nothing else works - In case of multiple Aggregators, downloading a valid chain from another Aggregator is possible
- Also an Aggregator should be able to try different protocol parameters in order to produce the multi signature:
- They would provide the same security level
- But the first tried would produce lighter signatures (whereas the quorum would be harder to be reached)
- If a multi signature is produced, no other tries
- If not, a different set of parameters is tried
- If an Aggregator is not able to produce a valid certificate at epoch
n, and is now at epochn+1:- It should use the previously valid stake distribution (
next AVK) in certificate at epochn-1 - Instead of the stake distribution at epoch
nwhich is not validated - And produce a certificate for epoch
n+1
- It should use the previously valid stake distribution (
-
We have paired on the
Add state machine runtime Signer#317 andAdd/Use Protocol Initializer Store in Signer#362 issues all day long. We hope to merge very shortly 💪 -
We have also had discussions on the
Add Store Protocol Parameters in Aggregator#385: this implies that theNext Protocol Parametersare broadcasted in thePending Certificateof the Aggregator
-
We have reviewed and merged all the PRs that relate to issue
Configure SSL certificate for Mithril Aggregator GCP#324. The showcase is now working correctly on the production documentation website and it will be activated in the navbar shortly 🥳 -
We have reviewed and paired on the issue
Add state machine runtime Signer#317 that is a blocker for 3 other issues so that we can complete it asap and not jeopardize the demo of the iteration. There is still much work to do and some questions are still open (in particular regarding the epoch that should be used: from the Cardano node or the Pending Cetificate). This is our main focus for the following days
-
We have reviewed some work that has been done yesterday on the
Add state machine runtime Signer#317 -
We have also created new issues (wth high priority) related to fixes/optimizations that need to be implemented to:
-
Following our conversations from the previous days, we created an issue
Simplify the Multi Signer in Aggregator#398 that will conduct a study on what is the best strategy to enhance the Multi Signer
-
We had discussions about how we can handle missing certificates for some epochs in the
Certificate Chain. The problem is tricky and could be solved by:- Using a higher epoch offset and embedding in the signed message multiple
Next AVKs. This could work, but would be cumbersome (as the Signers would have to wait more epochs before being able to sign) - Use the Aggregator beacon to handle certificate creation for an epoch at a later epoch when network is back up. This means that the Aggregator is in charge of broadcasting the epoch to be used by the Signers to individually sign. This solution is likely to be the most simple to deploy, but it might not cover all of the cases that would be responsible for an epoch drop in the chain (for example if the Signers were not able to gather previous Stake Distribution on their end)
- In a multiple Aggregator network, if an Aggregator misses an epoch (due to networking or operations trouble), it should be able to recover the chain by retrieving from any other up to date aggregator)
- A last option to cover such an epoch drop would be to re genesis the chain (will always work, but hard to operate)
- Using a higher epoch offset and embedding in the signed message multiple
-
We have also talked about the
Multi Signerof the Aggregator and the issueReunite Beacon Store/Provider Aggregator#363. We have decided to replace theBeacon Storedependency with aBeaconthat is fed by the runtime. Also, we have agreed that this module could be simplified and we will work on that step by step. Maybe we can split the module in sub modules and we should wait for the Certificate Chain to be fully functional before making to impacting modifications. In the mean time, we agreed on pairing whenever breaking modifications are applied we should be doing them in pair -
We have paired intensively on the issue
Add state machine runtime Signer#317 -
A last point we have discussed is that we should define a dedicated type for handling serialized keys from the Core library
-
We have reviewed and merged a PR
Improve aggregator dependencies management#382 regarding some optimization on the dependency management in the Aggregator -
We have discussed about the issue
Add state machine runtime Signer#317 and we have stated that:- We will use the
Beacon Providerfrom the Aggregator in the Signer, which implies that the module will be moved to themithril-commonfolder - The
Immutable Digesterwill be fed with aBeaconat which it will compute the digest - The Signer will not rely any more on the
Beaconretrieved from thePending Certificateof the Aggregator - We will also paired on this issue after these adjustments have been done
- We will use the
-
We have reviewed the PRs that have been done last week and took some time to talk about the epoch offset used to implement the Certificate Chain
-
We have discussed about several topics:
- The flakiness of the CI that was partially fixed, but sometimes another error occurs which is apparently related to a gap in the certificate chain (one epoch is not signed). We will investigate that issue and also work on the possibility of verifying AVK signed certificates up to
Nprevious epochs to avoid breaking the chain (currentlyNis 1, it could be a parameter of the Client). Also the code to verify a certificate could maybe be optimized for clarity (too many intricatematch) - Implementing a
Service Builderin the Aggregator to simplify usage of dependencies - Removing the
Beacon Store(see issueReunite Beacon Store/Provider Aggregator#363) and using only theBeacon Providerinstead. This also means that we need to create a store for theStatesof the state machine of the Aggregator. This will allow the Aggregator to restart gracefully (and not sign the sameImmutable File Numbermultiple times) - Improving the source of the
Immutable File Numberthat should be only the responsibility of theChain Observerand use this source to feed to theImmutable Digester(who should only be responsible for computation of the digest) - Also, the computation of the digest takes too long. An optimization would be to cache the digest of each immutable files and compute the digest as a root of a Merkle tree for example. This would require to compute almost only the hash of the latest
Immutable File Numberand would drastically reduce the time and CPU resources needed for computation - We could simplify state stores parameters by using only one
Store Directoryand use it as a prefix for all the stores data path. This would greatly reduce the complexity of the setup of the nodes and would avoid impacting other resources each time a new to store is added (GCP, test lab, ...) - Also in order to simplify querying and debugging of the stores we could:
- Implement a
SQLiteadapter - Provide specific tools for retrieving/gathering the data from the stores
- Implement a
- The flakiness of the CI that was partially fixed, but sometimes another error occurs which is apparently related to a gap in the certificate chain (one epoch is not signed). We will investigate that issue and also work on the possibility of verifying AVK signed certificates up to
-
We also agreed that some efforts are still needed to stabilize the system so that
- Snapshots and certificates are producing consistently (there are many hiccups on GCP)
- The Signer seems to be mainly responsible for this and the ongoing re factorization and improvements in progress should allow it shortly
-
We have reviewed the latest developments for the issue
Implement certificate chain Aggregator/Signer/Client#316. The PR has been merged 🥳 -
The PR
Set indices to be represented as vectors instead of unique#351 has been merged and thus closes the issueOptimize single signature in Mithril Aggregator/Signer/Core#296 🎉 -
We have reviewed and talked about the issue
Add integration tests in Mithril Aggregator#284 which should be ready to be merged shortly -
We have also reviewed the developments in progress of the website
Showcasesection of issueShowcase snapshots/certificate pending on doc website#315. The first results look very good and we are keen on seeing it live on the website! As there is not always aPending Certificateavailable, we were asking ourselves if maybe we could add a/beaconroute on the Aggregator API that would display the currentBeacon🤔
-
We have reviewed the showcase interface in its first version
Showcase snapshots/certificate pending on doc website#315.It is working and displays the first information retrieved from the Aggregator. Some more work needs to be done in order to complete the issue -
We have reviewed and talked about the
Implement certificate chain Aggregator/Signer/Client#316: there seem to be a problem with the stake distribution update that prevents the Aggregator to produce multi signatures. Some investigation are in progress. If the fix is not obvious, a feature flag will be activated to allow the merging of the PR -
We have discussed and contributed to the issue
Optimize single signature in Mithril Aggregator/Signer/Core#296, specifically about the dedupliction of the won lottery indices. The PR should be merged shortly
-
We have reviewed and paired on the
Add integration tests in Mithril Aggregator#284. It is still under progress for the implementation of the Happy Path, but will be ready to merge shortly -
We have reviewed the
Implement certificate chain Aggregator/Signer/Client#316. Some enhancements will be done in theEnd to End Tests Runnerand the PR should be merged shortly. -
We have discussed about the short term fix for the issue
Signer can not sign after restart (UnregisteredVerificationKey)#361. We agreed to switch temporarily to a deterministicVerification Keygenerator. The fix has been merged and works as expected on GCP 🥳 The long term fix will be implemeted inAdd/Use Protocol Initializer Store in Signer#362 -
We also had discussions about the
Showcase snapshots/certificate pending on doc website#315 issue and listed some nice to have features:- Use for the demo with the
devnetin local website - Have a refresh every
30son the first page - Implement responsive design pages
- Use for the demo with the
-
The tickets of the current iteration have been sliced and created in the board
-
We have reviewed and paired on the issue
Add integration tests in Mithril Aggregator#284. TheAggregatorConfigstruct was wrongly holding a reference to theDependencyManagerwhich was preventing from using the full features of theDumbImmutableFileObserver(that will power the newly added tests). -
We have also talked about how the
Showcasesection of the documentation website and the type of information that would be displayed. A first version could showcase:- The
Pending Certificateif it exists, and the list of the latestSnapshotson a first page - The
Snapshotsprovides a link to the associatedCertificatedetails on a new page - The
Certificateprovides a link to thePrevious Certificatein the chain if it exists
- The
-
We have made a review of the PRs that have been merged during the previous iterations and of the technical debt that we have accumulated so far. We have decided to take some time to lower this debt during the current iteration
-
Here is a list of the issues that have been listed as such:
- Add and use a
Verification Key Storein the Signer - The previous issue should fix a bug that makes the Signer to not recognize its
Verification Keyin theSignerslist retrieved from thePending Certificate(and trigger aUnregisteredVerificationKeyerror) after a restart (due to the randomness of theVerification Keys) - Reunite the
BeaconStoreand theBeaconProviderin the Aggregator (we need to check if we want to remove completely theBeaconStore) - The previous issue should fix a bug that makes the Aggregator create a new
Pending Certificatefor aBeaconthat already has aCertificate - A bug that makes the Aggregator disk saturate (because the temp snapshot archive file is not deleted after upload)
- Add and use a
-
We have reviewed the PR
Add certificate chain Aggregator/Signer/Client#355 in relation withImplement certificate chain Aggregator/Signer/Client#316 and discussed about some small adjustments that will be done shortly -
We have also reviewed and merged the
Enhance documentation website#356 with:- The enhanced
Glossarysection of the website - The enhanced
Mithril Certificate Chain in depthpage
- The enhanced
-
We have paired on the bug of the issue
Fix test lab CI flakiness#352:- A fix to the single signer of the Mithril Signer was applied (concerning the late instantiation of the protocol initializer)
- We fine tuned the runtime intervals of the Signer and Aggregator nodes (which were running with the same cadence and thus was a source of flakiness)
- We made some tests with
2signers and an epoch offset of-1and the execution time of the test lab is still very good (~2m 30s) - We will merge with
2signers and an epoch offset of0at first (as there are still some unexplained delays in signer registration with a non0epoch offset) - We have also identified an optimization when producing the CI run attempts artifacts (to separate them clearly). It will be included in this PR
-
We also discussed about the ongoing issues:
-
We have paired on the issue
Optimize single signature in Mithril Aggregator/Signer/Core#296, on the PRSet indices to be represented as vectors instead of unique#351 in order to find the best way to deduplicate indices of the single signatures before generating a multi signature. We will continue pairing on this tomorrow.
-
We have talked about solving the flakiness of the test lab in the CI. The solution is under development and the new version of the end to end test runner along with the activation of the epoch offset should work. At the same time, the parameters of the
devnetare fine tuned in order to keep the fast test execution time. A PRLessen test lab flakyness#350 has been pushed and will be merged shortly -
The website documentation enhancements has been reviewed in the PR
Enhance documentation website#349. It will be merged shortly and will deploy the following changes:- Enhanced
Getting Startedpages - Enhanced
Developer Docs > Mithril Networkpages - Reorganized
About Mithrilsection with clearMithril ProtocolandMithril Networkmenus
- Enhanced
-
We have reviewed the work in progress regarding the integration tests of the Aggregator runtime of this issue
Add integration tests in Mithril Aggregator#284. We had discussions about the purpose of the tests and decided to use the runtime tests as unit tests and work on a happy path scenario with the full node for the integration test. -
We have reviewed the issue
Cannot sync a cardano-node using latest snapshot on GCP#344. After investigations, it appears that the issue is linked to the1.35.0version of the Cardano node and is fixed in the1.35.1 -
We also had discussions about the use of
nigthly/pre-release/releasetags (and packages & environments). We will start with thenightlyone -
Also, the CI is very flaky at this time (mainly because the test lab is failing due to using the same epoch for registration and signing). We have decided to activate an epoch offset of
-1and to work on fine tuning thedevnetto accelerate the production of immutable files and epochs. This should fix the problem and should be available shortly.
-
We have reviewed and closed the
Enhance runtime state machine Aggregator#323 issue which will prevent the Aggregator to update the stake distribution too often -
We have also merged some bug fixes and enhancements:
-
We have paired on the
Optimize single signature in Mithril Aggregator/Signer/Core#296 that should be merged shortly
-
We have paired on getting the project one step further toward open sourcing:
- Creating a service account so that we are autonomous in managing the cloud operations (Aggregator hosting and Terraform on the CI)
- Activating the
Discussionsfeature on the repository - Finding how to correctly handling the
latesttagging of Docker images (such as what has been done onhydra) - Finding a way to add an automatically renewing SSL certificate to the Aggregator API (with
Let's encrypt) - Reviewing the new documentation tutorial pages (that need a second pair of eyes and beta testers to verify that they are functional and easy to use)
-
We had discussions about:
-
Upgradable protocol parameters: the Aggregator will keep on broadcasting the
Protocol Parametersused for the current epoch and they will be stored y the Signers (along with theVerification Keysfor easy retrieval and usage) -
Epoch offsetting strategy: the
-1and-2that are used to work with theStake Distributionand theVerification Keysare well defined constants that will probably never change (as they provide sufficient security). It is therefore better to use them as hard-coded constants that will be provided at compilation time for the Signer and the Aggregator, than as an information provided at runtime by the Aggregator -
Certificate Chain Verification Requirements: The
multi signaturesembedded in theCertificatesmust be verifiable even though the cryptographic library has evolved along the way- The message signed needs to be switched to a
mapformat where we are free to add new entries without breaking the chain validation (today only with aimmutable_digestentry and later with other such asutxo_setfor example) - We could maintain a set of
verifierfunctions in the core library for each earlier version (could be cumbersome) - We could add a
verifierfunction compiled inWASMthat is stored in the certificate - We could add a
format migrationfeature to the certificate chain - We could add
milestone genesis certificatesthat would provide a complimentary signature to certificates (produced with thegenesis keysin the certificate) from time to time (e.g. everyNepochs or as soon as a break in backward compatibility is introduced in the code) - We could also implement such a mechanism automatically by using the Cardano chain (but that would involve posting a transaction on it)
- The message signed needs to be switched to a
-
Releases packaging: In order to facilitate the distribution of the nodes (particularly to the SPO) and to have a broad adoption of the protocol, we will need to work on deploying packages for each release (
.deb,.rpm, ...) with the CI
-
Upgradable protocol parameters: the Aggregator will keep on broadcasting the
-
We have reviewed and merged the following PRs:
-
We have paired on updating the state machine of the Aggregator runtime so that it computes the stake distribution only once for an epoch:

- We have also paired on creating the state machine of the Signer runtime:

- During this pairing session we had many discussions about:
- The usefulness of the
Beaconused in the certificate pending - The fore coming work that will be done regarding the
Certificate Chainimplementation - And some long term implications of the multiple Aggregators running and what it means on how we compute the multi signatures
- The usefulness of the
-
The tickets of the current iteration have been sliced and created in the board
-
We have reviewed and merged the PR
Improve UI/UX documentation website#309. The UI/UX review comments have been taken into account in their vast majority. The website content is under redaction and this work will continue during the iteration -
We had a session related to the
Certificate Chainwhich goal was to:- Specify which information to embed in the
Genesis Certificate - Specify which information to embed in the other certificates of the chain
- Define how to link the certificates to each others
- Define how to verify a certificate
- Some questions remain such as:
- Is the Mithril
Epoch 0an empty epoch (which means no other certificate than the Genesis one will be produced)? - What is the exhaustive list of information that we need to embed in the
Medata(p,n)group? (AmongCertificate Version,Protocol Parameters,Dates,Signers Listwhich included their single signature in the multi signature)
- Is the Mithril
- Here is a diagram that summarizes the structure of the chain: (see on
miro)
- Specify which information to embed in the
- We have paired and merged the last step of retrieving the real Stake Distribution from the Cardano node
Use SD from cardano-cli in Aggregator/Signer#314 🥳
