Conversation
… for SpanByteAllocator. The AddressType change is a breaking on-disk format change: it shuffles bits around in RecordInfo to add an additional bit adjacent to the old ReadCache bit to mark an address as: - 00: Reserved - 11: ReadCache - 10: InMemory portion of the main log - 01: On-Disk
…hem "in" rather than "ref"
* wip * wip * wip * Added unified store session * Correcting generic typing * Added MEMORY USAGE + TYPE to unified ops * Added TTL, EXPIRETIME and EXISTS to unified ops * implemented DEL in unified ops * wip - expire & persist (broken) * wip - adding expire to unified ops * wip - expire * add cref to server-side replication inter-node commands * fix server-side BeginRecoverReplica * wip * Fix transaction key locking * format * Some test fixes * Fixing tests * reverting a couple of unnecessary changes * Eliminating more multi-context methods from API * Removed some unnecessary stuff * Some more cleanup to TransactionManager * merge tedhar/storage-v2 (ObjectAllocator serialization updates) * Updating memory usage values * format * Handling wrong type ops * Revert "Updating memory usage values" This reverts commit 88ba307. * fix no-object-log case * Fixes for Tsavorite UTs * Fixes for Tsavorite UTs (mostly ReadCache, TsavoriteLog, Compaction) * Tsavorite Iterator work and UT re-enabling * Fixes to Object iteration, LogRecord.ToString() * Add RecordMetadata.ETag * Readding --no-obj config * fix * test fix * Prep for Recovery * wip * wip * fix * More fixes for UT (mostly Recovery, Migration) * Moving DELIFEXPIM to unified store * ObjectLogTail in Recovery, and more UT fixes (Migration record serialization, ReadCache size and tailAddress verification calculations, etc.) * More Tsavorite recovery tests * Removing unnecessary isObject flag from record serialization --------- Co-authored-by: Vasileios Zois <vazois@microsoft.com> Co-authored-by: TedHartMS <15467143+TedHartMS@users.noreply.github.com>
… into tedhar/storage-v2
… correctly initialized SegmentSize for ObjectLogDevice; other fixes to UTs to work with UnifiedStore
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 165 out of 169 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Author
|
@copilot, investigate why ClusterMigrateWrite would be failing |
Contributor
…k-based replica sync (#1633) In TryReplicateDiskbasedSync, ExecuteClusterInitiateReplicaSync was sending beginAddress.Span for both the aofBeginAddress and aofTailAddress parameters. This was introduced in commit 6fb99e5 when converting from ToByteArray() to Span-based calls. The primary uses the replica's tail address to compute the AOF sync replay range. With both parameters being the begin address, the primary couldn't determine where the replica's AOF actually ended, causing the replica to never receive AOF records and remain stuck at offset 64 (kFirstValidAofAddress). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Multi-Log Parallel Replication Feature
This PR introduces multi-log based Append-Only File (AOF) support to Garnet, enhancing write throughput and enabling optimized parallel replication replay. The feature leverages multiple physical
TsavoriteLoginstances to shard write operations and parallelize log scanning, shipping, and replay across multiple connections and iterators. While designed primarily for cluster mode replication, this feature can also be used in standalone mode to improve performance when AOF is enabled.Feature Requirements
1. Sharded AOF Architecture
TsavoriteLoginstances.2. Flexible Parallel Replay with Tunable Task Granularity
3. Read Consistency Protocol
4. Transaction Support
5. Fast Prefix-Consistent Recovery
Newly Introduced Configuration Parameters
AofPhysicalSublogCountTsavoriteLoginstancesAofReplayTaskCountAofRefreshPhysicalSublogTailFrequencyMsImplementation Plan
Phase 1: Core Infrastructure
1.1 Implement
AofHeaderextensions to eliminate single log overhead.ShardedHeaderfor standalone operations.TransactionHeaderfor coordinated operations.1.2 Implement
GarnetLogabstraction layer.SingleLogwrapper for legacy single log.ShardedLogimplementation for multi-log.1.3
SequenceNumberGeneratorclass.Phase 2: Primary Replication Stream
2.1
AofSyncDriverclass.AofSyncDriverper attached replica.AofSyncTaskper physical sublog.AdvanceTimebackground task per attached replica.2.2
AofSyncTaskclass.2.3
AdvanceTimebackground task.Phase 3: Replica Replay Stream
3.1
ReplicaReplayDriverclass.ReplicaReplayTaskfor parallel replay within a single physical sublog.3.2
ReplicaReplayTaskclass.3.3 Standalone operation replay
BasicContextorTransactionalContext).3.4 Multi-exec transaction replay
3.5 Custom transaction procedure replay
Phase 4: Read Consistency Protocol
4.1
ReadConsistencyManagerclassVirtualSublogReplayStatestruct using sketch arrays for key freshness tracking and sequence number frontier computation.4.2 Session based prefix consistency enforcement
ConsistentReadGarnetApiandTransactionalConsistentReadGarnetApito allow the jitter to optimize operational calls.ValidateKeySequenceNumber,UpdateKeySequenceNumber).ReplicaReadSessionContextstruct used tomaximumSessionSequenceNumbermetadata (i.e.sessionVersion,lastHash,lastVirtualSublogIdx) to enforce prefix consistency when is stable or during recoveryPhase 6: Prefix consistent recovery
5.1 Commit operation
GarnetLoglayer instead of withinTsavoriteLogto control across sublogs commit.5.2
RecoverLogDriverimplementationsequenceNumber < untilSequenceNumber.ReadConsistencyManagerstate at recovery to initializeSequenceNumberGenerator.Phase 6: Testing & Validation
NOTES
Prefix Consistent Single Key Read Protocol
Prefix Consistent Batch Read
TODO