Replies: 1 comment 2 replies
-
|
Just wanted to chime in and say this would be super helpful for my personal use case with powersync in 2 ways:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Currently, when changes to Sync Rules or Sync Streams are deployed, PowerSync re-replicates all data from the source database from scratch, processing it with the new Sync Rules. Once that is ready, clients are switched over to sync from the new copy.
While there is no direct "downtime", it can take a long time on large databases, and clients have to re-sync all data even if only a small portion changed.
Status
2025-12-09: Updated plan with more specifics and implementation tasks.
2025-09-01: Original version of proposal outlined two implementation options.
Proposal
The base idea is to only reprocess bucket or Sync Stream definitions that have actually changed. This is on the definition-level - any change to any single query in a bucket definition would cause the entire bucket definition to be re-processed, and all related buckets to be re-synced.
Specifically, with bucket definitions:
For Sync Streams:
In the future, Sync Streams could support more granular reprocessing depending on the changes to the query. For example, only changing a subquery could be treated the same as updating a parameter query in Sync Rules bucket definitions.
Implementation
Where we currently use a separate replication stream per Sync Rules version (and associated logical replication slot in Postgres), this will change to only use a single replication stream, which processes all Sync Rules versions. When a new Sync Rules version is deployed, it re-replicates relevant data:
(unchanged definitions, new definitions, removed definitions).(new definitions). Here we need to be careful:(removed definitions)from the replication stream.(removed definitions).What makes this implementation particularly tricky is avoiding updating existing bucket data if unchanged: If we do trigger updates for those, it can cause clients to re-sync the data twice: Once on the old definitions, and again on the new definitions.
Storage changes
The relevant data we store are:
Currently, each of the above is scoped to a specific Sync Rules version. This needs to be changed to be more granular:
Implementation progress
Other considerations
Defragmenting
Currently, the fact that data is fully reprocessed is used as a method for "defragmenting", as described here. If we implement the incremental reprocessing, we need alternative methods for defragmenting.
Config changes
Changes to replication config affect all bucket & stream definitions, so still requires re-replicating all data. For the most part it is very difficult to predict the effects of config changes on a more granular level.
However, if we avoid creating new operations for unchanged bucket data, we can avoid re-syncing data unaffected by config changes to clients.
Beta Was this translation helpful? Give feedback.
All reactions