Skip to content

Conversation

@WiedersehenM
Copy link
Contributor

Migrate LFS API to New Object Storage Implementation

Summary

This PR migrates the Git LFS API from the legacy object storage implementation to the new unified object storage interface. The migration maintains backward compatibility with existing LFS objects while removing deprecated chunk upload logic.

Related Issue: #1752

Changes

Core Implementation

  • Replaced LFS storage implementation with ObjectStorageAdapter that wraps the new ObjectStorage trait
  • Removed LFS chunk upload logic - all chunk-related code has been deleted
  • Maintained object key generation rules - path format objects/{sha256[0..2]}/{sha256[2..4]}/{sha256[4..]} preserved for compatibility
  • Added S3-compatible service support - enabled force_path_style(true) for custom endpoints (MinIO, LocalStack, etc.)

Files Modified

New Files:

  • jupiter/src/lfs_storage/object_storage_adapter.rs - Adapter bridging LFS interface to new ObjectStorage
  • mono/tests/lfs_api_tests.rs - Integration tests for LFS API
  • docs/lfs-test-results.md - Comprehensive test results documentation
  • docs/lfs-testing-guide.md - Testing guide for LFS functionality

Modified Files:

  • jupiter/src/lfs_storage/mod.rs - Updated to use ObjectStorageAdapter
  • jupiter/src/object_storage/rustfs_object_storage.rs - Added path-style addressing support
  • jupiter/src/object_storage/fs_object_storage.rs - Updated LFS path transformation
  • ceres/src/lfs/handler.rs - Updated to use new storage interface, added tests
  • mono/src/api/router/lfs_router.rs - Updated error handling and tests

Removed Features

  • ❌ LFS chunk upload logic (no longer needed with new implementation)
  • ❌ Legacy chunk-related code paths

Testing

Test Results

All tests passed successfully:

  • Unit Tests: 15+ test cases (jupiter, ceres, mono)
  • Integration Tests: 8 test cases created (marked as #[ignore] for CI)
  • Manual Tests: Core LFS workflows verified

Test Coverage

Core Functionality:

  • ✅ LFS object upload (PUT) - 20MB and 100MB files tested
  • ✅ LFS object download (GET) - verified with file integrity checks
  • ✅ Object existence check (HEAD) - 200 OK for existing, 404 for missing
  • ✅ Batch API operations - upload and download requests

Compatibility:

  • ✅ Object key path format matches legacy implementation
  • ✅ Idempotent upload operations
  • ✅ Historical LFS objects remain accessible

Error Handling:

  • ✅ 404 errors for non-existent objects (Git LFS protocol compliant)
  • ✅ 422/400 errors for invalid requests
  • ✅ Proper error message formatting

Test Commands

# Run unit tests
cargo test -p jupiter
cargo test -p ceres
cargo test -p mono

# Run integration tests (requires Redis)
cargo test -p mono --test lfs_api_tests -- --ignored

Compatibility

Backward Compatibility

  • Object path format preserved - existing LFS objects remain accessible
  • API endpoints unchanged - /api/v1/lfs and /info/lfs paths work as before
  • Git LFS client compatibility - standard git lfs push/pull workflows work correctly

Verification

Manual Testing Performed

  1. ✅ Uploaded 20MB LFS file via git lfs push
  2. ✅ Downloaded LFS file via git lfs pull after clearing cache
  3. ✅ Verified object existence via HEAD requests
  4. ✅ Tested 100MB large file upload
  5. ✅ Verified object key path format in MinIO
  6. ✅ Confirmed idempotent upload behavior

Test Environment

  • Mega Server: localhost:8000
  • Object Storage: MinIO (localhost:9000)
  • Bucket: mega
  • Git LFS Client: git-lfs/3.7.1

Known Limitations

  • ⚠️ Git LFS Locking API: Client reports locking API not supported (non-blocking, optional feature)
    • Core upload/download functionality unaffected
    • Can be disabled: git config lfs.http://.../info/lfs.locksverify false

Documentation

  • Test results: docs/lfs-test-results.md
  • Testing guide: docs/lfs-testing-guide.md

Checklist

  • Code follows project style guidelines
  • All tests pass locally
  • Documentation updated
  • Commit messages signed with -s -S
  • Backward compatibility verified
  • No breaking changes introduced
  • Error handling improved
  • Legacy code removed

Additional Notes

This PR completes the migration to the new object storage interface for LFS operations. The implementation provides:

  • Better abstraction for multiple storage backends (S3, GCS, MinIO, etc.)
  • Improved error handling and retry mechanisms
  • Unified interface for reuse across other modules
  • Reduced technical debt by removing legacy chunk logic

Replace legacy LFS object storage with unified ObjectStorage interface.
Remove chunk upload logic and add comprehensive test coverage.

- Implement ObjectStorageAdapter for LFS operations
- Update LFS handlers to use new storage interface
- Add unit and integration tests
- Maintain backward compatibility with existing object paths
- Support S3-compatible services (MinIO) with path-style addressing

Related to web3infra-foundation#1752
Signed-off-by: Hongze Gao <15101764808@163.com>
Resolve checks error.

Signed-off-by: Hongze Gao <15101764808@163.com>
Resolve checks error.

Signed-off-by: Hongze Gao <15101764808@163.com>
@WiedersehenM WiedersehenM force-pushed the feature/lfs-migrate-object-storage branch from 549684e to 329aa87 Compare January 7, 2026 06:20
Resolve checks error.

Signed-off-by: Hongze Gao <15101764808@163.com>
async fn exist_object(&self, object_id: &str) -> bool {
exist_object(self.config.lfs_file_path.clone(), object_id)
async fn exist_object(&self, object_id: &str, enable_split: bool) -> bool {
if enable_split {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这段代码已经被删除,请检查代码合并流程

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议删除lfs_storage 目录,统一使用object storage

ObjectByteStream, ObjectKey, ObjectMeta, ObjectNamespace, ObjectStorage,
};

/// Transform LFS object SHA256 to old path format for compatibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不用考虑文件迁移的兼容性

&id[6..]
)
match key.namespace {
ObjectNamespace::Lfs => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理,不考虑文件的兼容性,统一使用3级目录

/// An empty vector is always returned when querying relations so that callers can
/// gracefully degrade without changing their logic. New code should migrate
/// away from the split-file feature entirely.
pub struct LfsSplitRelation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经被删除,请检查代码合并流程

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认是否修改了文件格式

@genedna
Copy link
Member

genedna commented Jan 7, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77a9965a30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +123 to +127
let lfs_object_storage = ObjectStorageFactory::create(ObjectStorageConfig::from_config(
config.lfs.storage_type.clone(),
config.clone(),
))
.await?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use configured LFS local path for LocalFs storage

The LFS object storage is now created via ObjectStorageFactory::from_config, which uses config.base_dir as the root for LocalFs. This ignores config.lfs.local.lfs_file_path (defaulting to <base_dir>/lfs). On LocalFs deployments that already have LFS data under the configured LFS path (or any custom lfs_file_path), the server will now look in <base_dir>/objects instead, so HEAD/GET/download will return 404 even though the objects exist. New uploads will also be written to the new location. Consider passing lfs.local.lfs_file_path into the ObjectStorageConfig or migrating/aliasing the old directory.

Useful? React with 👍 / 👎.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment on lines +19 to +32
/// Transform LFS object SHA256 to old path format for compatibility.
/// old rule: {sha1[0..2]}/{sha1[2..4]}/{sha1[4..]}
fn transform_lfs_path(sha1: &str) -> String {
if sha1.len() < 5 {
sha1.to_string()
} else {
Path::new(&sha1[0..2])
.join(&sha1[2..4])
.join(&sha1[4..sha1.len()])
.into_os_string()
.into_string()
.unwrap()
}
}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function parameter and variable are named sha1, but LFS uses SHA-256 hashing, not SHA-1. This naming is misleading. Consider renaming the parameter to hash, oid, or sha256 to accurately reflect what it represents.

Copilot uses AI. Check for mistakes.

self.storage.put(&key, stream, meta).await
}

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter _enable_split is prefixed with underscore to indicate it's unused, which is correct. However, the documentation for this method or trait should clarify that the split feature has been deprecated/removed, so future developers understand why this parameter is ignored. Consider adding a comment explaining that this parameter is kept for interface compatibility but is no longer used since the chunk/split feature was removed.

Suggested change
// NOTE: `_enable_split` is intentionally ignored. The LFS chunk/split feature was
// removed, but this parameter is kept for `LfsFileStorage` interface compatibility.
// When/if the trait is revised to drop split support entirely, this argument can
// be removed from the signature.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +52
async fn create_test_state() -> (MonoApiServiceState, tempfile::TempDir) {
let temp_dir = tempdir().unwrap();
let storage = test_storage(temp_dir.path()).await;

// Create a Redis connection for GitObjectCache
// Note: This requires Redis to be running at localhost:6379
// Since tests are #[ignore], this is acceptable
let redis_config = common::config::RedisConfig {
url: "redis://127.0.0.1:6379".to_string(),
};
let connection = init_connection(&redis_config).await;

let state = MonoApiServiceState {
storage: storage.clone(),
oauth_client: None,
session_store: None,
listen_addr: "http://localhost:8000".to_string(),
entity_store: saturn::entitystore::EntityStore::new(),
git_object_cache: Arc::new(ceres::api_service::cache::GitObjectCache {
connection,
prefix: "git-object-bincode".to_string(),
}),
};

(state, temp_dir)
}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test setup function has multiple .unwrap() calls (lines 28, 37) that can panic if setup fails. While this is acceptable for test code, consider using more descriptive error messages with .expect() to make test failures easier to debug. For example: .expect("Failed to create temporary directory for test") and .expect("Failed to initialize Redis connection").

Copilot uses AI. Check for mistakes.
let res = file_storage.put_object(&meta.oid, body_bytes).await;
if res.is_err() {
if let Err(e) = res {
lfs_delete_meta(&db_storage, req_obj).await.unwrap();
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .unwrap() on line 279 can panic if deleting the metadata fails. Since this is error-path cleanup code, it would be better to log the error rather than panicking. Consider using if let Err(e) = lfs_delete_meta(...).await { tracing::error!("Failed to cleanup metadata: {}", e); } to handle this gracefully.

Suggested change
lfs_delete_meta(&db_storage, req_obj).await.unwrap();
if let Err(delete_err) = lfs_delete_meta(&db_storage, req_obj).await {
tracing::error!(
"Failed to cleanup LFS metadata for oid {} after upload failure: {}",
meta.oid,
delete_err
);
}

Copilot uses AI. Check for mistakes.
};
// because return type must be `ReceiverStream`, so we need to wrap the bytes into a stream.
let (tx, rx) = tokio::sync::mpsc::channel(1);
tx.send(Ok(bytes)).await.unwrap();
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .unwrap() on line 312 can panic if sending through the channel fails. While this is unlikely since we just created the channel, it's better to handle this error case. Consider using .expect("Channel send should not fail for newly created channel") with a descriptive message or properly propagating the error.

Suggested change
tx.send(Ok(bytes)).await.unwrap();
if let Err(e) = tx.send(Ok(bytes)).await {
tracing::error!("Failed to send LFS object into channel: {}", e);
return Err(GitLFSError::GeneralError(
"Failed to prepare object stream".to_string(),
));
}

Copilot uses AI. Check for mistakes.
use crate::storage::base_storage::{BaseStorage, StorageConnector};

#[derive(Clone, Debug)]
/// Relationship between an original (ori) LFS object and its splitted sub object.
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "splitted" should be "split". The past participle of "split" is "split", not "splitted".

Suggested change
/// Relationship between an original (ori) LFS object and its splitted sub object.
/// Relationship between an original (ori) LFS object and its split sub object.

Copilot uses AI. Check for mistakes.
Comment on lines +403 to +411
let body_bytes: Vec<u8> = req
.into_body()
.into_data_stream()
.try_fold(Vec::new(), |mut acc, chunk| async move {
acc.extend_from_slice(&chunk);
Ok(acc)
})
.await
.unwrap();
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .unwrap() call on line 411 can panic if the stream collection fails. This is critical since it's handling user data uploads. Consider using proper error handling and returning an appropriate error response to the client instead.

The error should be mapped through map_lfs_error to provide a proper HTTP status code and error message.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +183
.presigned(PresigningConfig::expires_in(expires_in).unwrap())
.await
.map_err(|e| {
let detail = dump_error_chain(&e);
MegaError::Other(detail)
})?;
Ok(presigned_request.uri().to_string())
}

/// Generate a presigned URL for uploading an object.
pub async fn put_presigned_url(&self, key: &ObjectKey) -> Result<String, MegaError> {
let expires_in = Duration::from_secs(3600);
let presigned_request = self
.client
.put_object()
.bucket(&self.bucket)
.key(s3_key(key))
.presigned(PresigningConfig::expires_in(expires_in).unwrap())
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .unwrap() calls on lines 166 and 183 can panic if the PresigningConfig::expires_in() fails. While this is unlikely with a valid duration, it's better to handle the error explicitly for robustness. Consider using ? operator or map_err to convert this to a MegaError.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants