Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions docs-main/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -2594,7 +2594,8 @@
"pages": [
"global-synchronizer/production-operations/disaster-recovery",
"global-synchronizer/production-operations/pruning",
"global-synchronizer/production-operations/security-hardening"
"global-synchronizer/production-operations/security-hardening",
"global-synchronizer/production-operations/logical-synchronizer-upgrade"
]
}
]
Expand Down Expand Up @@ -2742,7 +2743,8 @@
"pages": [
"global-synchronizer/production-operations/disaster-recovery",
"global-synchronizer/production-operations/pruning",
"global-synchronizer/production-operations/security-hardening"
"global-synchronizer/production-operations/security-hardening",
"global-synchronizer/production-operations/logical-synchronizer-upgrade"
]
}
]
Expand Down Expand Up @@ -2890,7 +2892,8 @@
"pages": [
"global-synchronizer/production-operations/disaster-recovery",
"global-synchronizer/production-operations/pruning",
"global-synchronizer/production-operations/security-hardening"
"global-synchronizer/production-operations/security-hardening",
"global-synchronizer/production-operations/logical-synchronizer-upgrade"
]
}
]
Expand Down Expand Up @@ -2980,7 +2983,8 @@
"group": "Exchanges",
"pages": [
"integrations/exchanges/sdk-download",
"integrations/exchanges/guidance"
"integrations/exchanges/guidance",
"integrations/exchanges/node-operations"
]
},
{
Expand Down Expand Up @@ -3017,7 +3021,8 @@
"group": "Exchanges",
"pages": [
"integrations/exchanges/sdk-download",
"integrations/exchanges/guidance"
"integrations/exchanges/guidance",
"integrations/exchanges/node-operations"
]
},
{
Expand Down Expand Up @@ -3054,7 +3059,8 @@
"group": "Exchanges",
"pages": [
"integrations/exchanges/sdk-download",
"integrations/exchanges/guidance"
"integrations/exchanges/guidance",
"integrations/exchanges/node-operations"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
---
title: "Logical Synchronizer Upgrades"
description: "Upgrade the protocol version of a Global Synchronizer with very limited network downtime through Logical Synchronizer Upgrades (LSU)"
---

{/* COPIED_START source="splice:docs/src/sv_operator/sv_logical_synchronizer_upgrade.rst" hash="0cf8ccb8" */}

<Warning title="Pre-reviewed Content - Do Not Modify">
This section was copied from existing reviewed documentation.
**Source:** `docs/src/sv_operator/sv_logical_synchronizer_upgrade.rst`
Reviewers: Skip this section. Remove markers after final approval.
</Warning>

<Warning>
Logical Synchronizer Upgrades (LSU) are still in development and the instructions here are intended as a preview primarily targeted at Super Validators but will likely change in minor ways before the full release.
</Warning>


Logical synchronizer upgrades (LSUs) allow upgrading the protocol version of a synchronizer
**with very limited network downtime** and no operational overhead for validator operators and app devs around upgrades.
Super validators still have to perform operational steps to deploy successor nodes and schedule the upgrade but those are done asynchronously before the actual upgrade happens.

## High-Level Overview

1. A new Canton release with an updated protocol version becomes available, along with a compatible Splice release. For testing purposes or in some disaster recovery scenarios this can also be the same version and/or protocol version.
This release supports both the old and new protocol version.
2. Validators and super validators upgrade to the new release, but continue running the original physical synchronizer with the old protocol version. This is a regular upgrade and can be done asynchronously
but must be done before the actual upgrade time.
3. A vote is created in the SV UI to schedule an LSU through the `nextScheduledLogicalSynchronizerUpgrade` field in `DsoRulesConfig`.
The schedule includes

1. **topology freeze time**: after this time, no topology transactions can be sequenced until the upgrade time, so in particular no parties can be added and no Daml packages can be vetted
2. **upgrade time**: at this time Daml transactions on the original physical synchronizer will time out and new Daml transactions will run on the new physical synchronizer
3. **new physical synchronizer serial**: usually just the old serial incremented by 1
4. **new protocol version**: the protocol version of the successor synchronizer

4. All SVs deploy *successor* synchronizer nodes (sequencer, mediator, and optionally CometBFT if DABFT is not used) alongside their existing nodes. Note: There is no new participant, the participant is tied to a logical synchronizer so it does not change on an LSU. As part of that they also [configure](#super-validator-deployment-changes) the successor synchronizer in their SV and scan config.
This deployment should be completed before the freeze time.
5. At the scheduled **topology freeze time**, the SV app automation of each SV transfers the topology state to the successor nodes and publishes the sequencer URL for the new sequencer in the topology state (this is the only topology transaction that can be published after the freeze time).
6. Between the topology freeze time and the upgrade time, SV app
automation will periodically send special health check events on
the new physical synchronizer to verify its health. Each super
validator should use their metrics to validate that they observe at
least one event from each other super validator in the `LSU
Sequencing Test` dashboard as well as that the BFT peer
connections (CometBFT or DABFT) of the successor nodes are healthy.

<div className="todo">add more details once we have added this</div>

7. At the scheduled **upgrade time**, participants automatically connect to the successor synchronizer.
The SV automation transfers traffic control state from the current sequencer to the successor.
The successor physical synchronizer may be configured with a lower initial rate limit that will be
raised by the SV app after a configurable amount of time to avoid an initial traffic surge on the new synchronizer.

<div className="todo">add more details once we have added this.</div>

9. The successor physical synchronizer is now fully usable. Super Validators update their [configuration](#super-validator-deployment-changes) to mark the original synchronizer as legacy
and successor as the current synchronizer.
10. After 30 days, the super validators remove the old physical synchronizer node deployment. Super Validators update their [configuration](#super-validator-deployment-changes) to remove the
legacy synchronizer configuration.

### LSU Cancellation

Between the topology freeze time and and upgrade time, the upgrade can be cancelled if the successor physical synchronizer is deemed unhealthy, e.g., because the health checks fail.
To do so, a threshold of super validators must send a `POST` request to the `/v0/admin/synchronizer/lsu/cancel` endpoint on the SV API.

### Disaster Recovery through Roll-Forward LSU

In case of a disaster that causes the current physical synchronizer to become unavailable, an LSU can be used as a roll-forward recovery mechanism.
The procedure is similar to a regular LSU but because the current physical synchronizer is unusable, the coordination through a vote and topology transactions is not possible and instead validators and super validators need to manually initiate the upgrade.

This procedure can also be used for recovering from a failed LSU. There are two relevant cases:

1. The LSU did not get cancelled before the upgrade time but no Daml transactions and topology transactions were able to be sequenced on the successor physical synchronizer after the upgrade time. In this case, the original successor synchronizer can be thrown away and replaced
by a new successor synchronizer with a serial incremented by 1 (so 2 compared to the original non-successor synchronizer).
2. The LSU proceeded and some transactions did get sequenced on the successor physical synchronizer but the successor physical synchronizer then became unusable. The procedure is the same in this case but
the SVs should keep both the original synchronizer and the broken successor synchronizer running (assuming it can still serve events just not sequence new messages) to allow nodes to catchup first and spin up a new successor synchronizer on the side
so they are running 3 synchronizer nodes for some period of time. Allowing nodes to catch up as much as possible limits the potential for desynchronization requiring manual resolution through ACS commitment mismatches.

Concretely, the procedure is as follows:

1. The old physical synchronizer is deemed broken and the last sequenced message was at record time R.
2. Super validators configure this as the max sequencing time on the old sequencer to guarantee that nothing accidentally gets sequenced after that time. This is done by applying the following environment variable to the existing sequencer:

```
- name: ADDITIONAL_CONFIG_SEQUENCER_LSU_MAX_SEQUENCING_TIME
value: |
canton.sequencers.sequencer.parameters.lsu-repair.global-max-sequencing-time-exclusive=MAX_SEQUENCING_TIME
```

2. Super validators deploy successor nodes. Depending on the issue,
the successor nodes may be configured with older image and protocol
versions if the issue is limited to the new version. The successor
sequencer must be configured with two timestamps:
`lower-bound-sequencing-time-exclusive` and
`upgrade-time`. These correspond to the topology freeze time and
the upgrade time in a regular LSU. In particular, after
`lower-bound-sequencing-time-exclusive` sequencing test messages
can be submitted and observed in the `LSU Sequencing Test`
dashboard. After `upgrade-time` all Daml transactions can be
submitted. The actual timestamps will be chosen through coordination with all SVs.
The timestamps are applied through an environment variable on the successor sequencer:

```
- name: ADDITIONAL_CONFIG_SEQUENCER_LSU_SEQUENCING_BOUNDS
value: |
canton.sequencers.sequencer.parameters.parameters.lsu-repair.lsu-sequencing-bounds-override.lower-bound-sequencing-time-exclusive=LOWER_BOUND_SEQUENCING_TIME_EXCLUSIVE
canton.sequencers.sequencer.parameters.parameters.lsu-repair.lsu-sequencing-bounds-override.lower-bound-sequencing-time-exclusive=UPGRADE_TIME
```

3. Super validators wait until ingestion completed.

4. Super validators configure their SV app app to transfer the topology and traffic state from the old physical synchronizer to the successor nodes.
To do so, add the following helm values to the SV app:

```
rollForwardLsu:
newPhysicalSynchronizerSerial: NEW_PHYSICAL_SYNCHRONIZER_SERIAL # Must be agreed between SVs, usually existing (broken) synchronizer serial + 1
newPhysicalSynchronizerProtocolVersion: NEW_PHYSICAL_SYNCHRONIZER_PROTOCOL_VERSION # Must be agreed between SVs, usually existing (broken) synchronizer serial + 1
exportTimes:
topologyExportTime: TOPOLOGY_EXPORT_TIME # Must be agreed between SVs
trafficExportTime: TRAFFIC_EXPORT_TIME # Must be agreed between SVs
upgradeTime: UPGRADE_TIME # Must be agreed between SVs
```

5. Validators initiate the *procedure* on their side.

#### Recovery from a failed LSU where nothing got sequenced

For the special case where an LSU was announced and not cancelled but
failed and nothing got sequenced on the successor synchronizer, there
is a variant that avoids the need to manually check for ingestion
being completed and does not require explicit interaction from validators.

To do so, use the following steps:

1. Super validators configure the manual LSU in their scan.

```
rollForwardLsu:
enabled: true
upgradeTime: UPGRADE_TIME # Must be agreed between SVs, optional, if not specified it is taken from an existing LSU announcement which should usually be sufficient.
```

2. Validator app automation picks up that configuration and initiates a manual roll-forward LSU to the new synchronizer.

#### Resolving ACS mismatches

Note that depending on how exactly the old synchronizer failed,
validators may desynchronize if some validators have observed a
transaction before the failure while others have not. To recover from
that follow the instructions for *validators*.

## Super Validator Deployment Changes

<div className="todo">update helm values and link them here</div>

LSU requires deployment changes for super validators. Concretely:

1. Participants are now preserved as part of LSUs. So if you previously assumed participant, sequencer and mediator always come as one unit per migration id, you now need to move the participant out of that.
2. The `domain` value on the sv app helm chart should be replaced by `synchronizers`. `synchronizers.current` replaces the synchronizer previously configured through `domain`. `synchronizers.successor`
should be configured to the successor physical synchronizer when that is deployed. After the upgrade, `synchronizers.current` becomes `synchronizers.legacy` and `synchronizers.successor` becomes `synchronizers.current`. The legacy configuration should be removed together with removing the old physical synchronizer after 30 days.
The CometBFT configuration also moves under `synchronizers.(current|successor|legacy)`.
3. The `sequencerAddress` and `mediatorAddress`values in scan should be replaced by `synchronizers.current.sequencer` and `synchronizers.current.mediator`. The corresponding values under `synchronizers.successor` should be set together with
the deployment of the successor physical synchronizer. After the upgrade `successor` becomes `current` and `current` is removed.
4. When using DABFT as the successor node, further changes will be required. Most notably the cometbft node goes away as DABFT runs as part of the sequencer pod. The sequencer pod and SV app will require some additional configuration. Details of this will be added later.

{/* COPIED_END */}
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,56 @@ Canton Network upgrades fall into two categories: minor upgrades that each node

## Minor Upgrades

{/* COPIED_START source="splice:sv_operator/sv_upgrades.rst" hash="sv-minor-upgrades" */}
### Validator nodes

{/* COPIED_START source="splice:docs/src/validator_operator/validator_upgrades.rst" hash="1951360a" */}

<Warning title="Pre-reviewed Content - Do Not Modify">
This section was adapted from existing reviewed documentation.
**Source:** `sv_operator/sv_upgrades.rst`
This section was copied from existing reviewed documentation.
**Source:** `docs/src/validator_operator/validator_upgrades.rst`
Reviewers: Skip this section. Remove markers after final approval.
</Warning>

There are two types of upgrades:

Version upgrades (this corresponds to an upgrade from `0.A.X` to `0.B.Y`)
and protocol upgrades (the actual version can remain the same, only the protocol is upgraded and it requires no action).

Version upgrades can be done by each node independently and only require
an upgrade of the docker-compose file or a `helm upgrade` for a
kubernetes deployment.
You must not delete or uninstall any Postgres database, change migration IDs or secrets for a version upgrade;
Make sure to read the [Release Notes](/global-synchronizer/release-notes/current-release) to learn
about changes you may need to make as part of the upgrade.

Note that for docker-compose you must update the full bundle including
the docker compose file and the start.sh script and adjust
`IMAGE_TAG`. Only updating `IMAGE_TAG` is insufficient as the old
docker compose files might be incompatible with the new version.

{/* COPIED_END */}

### Super Validator nodes

{/* COPIED_START source="splice:docs/src/sv_operator/sv_upgrades.rst" hash="5405319d" */}

<Warning title="Pre-reviewed Content - Do Not Modify">
This section was copied from existing reviewed documentation.
**Source:** `docs/src/sv_operator/sv_upgrades.rst`
Reviewers: Skip this section. Remove markers after final approval.
</Warning>

Minor upgrades (e.g., `0.5.8` to `0.5.9`) can be performed independently by each node operator. They require only a `helm upgrade` with the new chart version.
There are two types of upgrades:

Version upgrades (this corresponds to an upgrade from `0.A.X` to `0.B.Y`)
and protocol upgrades (the actual version can remain the same, only the protocol is upgraded).

Version upgrades can be done by each node independently and only require
a `helm upgrade`. Make sure to read the [Release Notes](/global-synchronizer/release-notes/current-release) to learn
about changes you may need to make as part of the upgrade.

Always read the [release notes](/global-synchronizer/release-notes/current-release) before upgrading to learn about required configuration changes.
Protocol upgrades are performed through [logical synchronizer upgrades](/global-synchronizer/production-operations/logical-synchronizer-upgrade),
which allow upgrading the protocol version with very limited network downtime.

{/* COPIED_END */}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,9 @@ additional considerations may include the following:

For a representative example runbook covering the migration of a
specific integration use-case, see the [Rolling out Major Splice
Upgrades](https://docs.digitalasset.com/integrate/devnet/exchange-integration/node-operations.html#rolling-out-major-splice-upgrades)
section of the Digital Asset [Exchange Integration
Guide](https://docs.digitalasset.com/integrate/devnet/exchange-integration/).
Upgrades](/integrations/exchanges/node-operations#rolling-out-major-splice-upgrades)
section of the [Validator Node Operations](/integrations/exchanges/node-operations)
guide.

### Migration Dumps
Migration dumps contain identity and transaction data from the validator
Expand Down
4 changes: 1 addition & 3 deletions docs-main/global-synchronizer/troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -589,9 +589,7 @@ Always take snapshots/backups before upgrading. For Kubernetes: snapshot both Va
**Solution:**
1. Check current network version at [canton.foundation/sv-network-status](https://canton.foundation/sv-network-status/)
2. Upgrade directly to the current network version (don't stop at intermediate versions)
3. Follow the upgrade guide for your deployment method:
- [Docker Compose Upgrade Guide](https://docs.sync.global/validator_operator/validator_compose.html#upgrading)
- [Kubernetes Upgrade Guide](https://docs.sync.global/validator_operator/validator_k8s.html#upgrading)
3. Follow the [validator upgrade guide](/global-synchronizer/production-operations/upgrade-procedures#validator-nodes) — applies to both Docker Compose (`docker-compose` bundle update) and Kubernetes (`helm upgrade`) deployments.
</Accordion>

<Accordion title="Disaster recovery from old node ID backup">
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading