Skip to content

git-lfs-centric-documentation #65

@bwalsh

Description

@bwalsh

Git DRS User Guide

Git DRS extends Git LFS to register and retrieve large data files from DRS-enabled platforms while keeping the familiar Git workflow. Use Git LFS for file tracking, fetching, and local cache management. Use Git DRS to configure the DRS server connection and manage cloud-backed object references for your repository.

Relationship to Git LFS: git-drs is built on top of Git LFS. It uses the same clean and smudge filter architecture, the same .gitattributes tracking patterns, and a compatible pointer file format. If you already know git lfs track and git lfs pull, the git drs equivalents will feel natural.


Table of Contents


Prerequisites


Install Git DRS

Use the project installer after Git LFS is installed:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/calypr/git-drs/refs/heads/main/install.sh)" -- $GIT_DRS_VERSION

Getting Started

Initialize a Repository

Initialize Git DRS once per repository. This configures hooks and prepares the repository for DRS-backed files — similar to running git lfs install at the repo level.

git drs init

Configure a DRS Remote

Add at least one DRS remote. Provide the server URL, credentials, project, and bucket:

git drs remote add gen3 production \
  --cred /path/to/credentials.json \
  --url https://calypr-public.ohsu.edu \
  --project my-project \
  --bucket my-bucket

Track Large Files with Git LFS

Use Git LFS to select which files should be stored as LFS objects. Git DRS works with the tracking patterns you configure via Git LFS:

git lfs track "*.bam"
git add .gitattributes
git commit -m "Track BAM files with Git LFS"

For more details, see the Git LFS tracking documentation.

Add, Commit, and Push Data

Once files are tracked with Git LFS, use standard Git commands to add and commit. During git push, Git LFS uploads large objects to the LFS server while Git DRS automatically registers them with the configured DRS server via its pre-push hook.

git add my-file.bam
git commit -m "Add data file"
git push

What happens behind the scenes: The git push triggers Git LFS transfer hooks. Git DRS intercepts this flow to register each LFS object with your DRS server (e.g., gen3/indexd), making the file discoverable via DRS IDs. You don't need to run any extra commands. For a detailed breakdown, see How It Works.

For background on the Git LFS transfer flow, see the Git LFS overview and the Git LFS push documentation.

Download Files

Use Git LFS to download files on demand:

git lfs pull -I "*.bam"

Refer to the Git LFS pull documentation for filters and options.

Check Status and Tracked Files

To see which files are tracked and their status, rely on Git LFS tooling:

git lfs ls-files

The Git LFS ls-files documentation explains the available flags and output format.


Cloning a Repository That Uses Git DRS

When you clone a repository that already uses Git DRS, the repo will contain small pointer files instead of full file content. You need to install Git DRS, initialize it in the clone, configure the DRS remote, and then pull file content.

Step 1 — Clone the repository

Clone as you normally would. Git LFS pointer files are checked out automatically, but large file content is not downloaded yet.

git clone https://github.com/your-org/your-data-repo.git
cd your-data-repo

Tip: If you want to skip downloading any LFS content during clone (useful for large repos), use the GIT_LFS_SKIP_SMUDGE environment variable. See git lfs install --skip-smudge for details.

GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/your-org/your-data-repo.git

Step 2 — Initialize Git DRS

Run git drs init inside the cloned repo to configure the DRS hooks and filters:

git drs init

Step 3 — Configure the DRS remote

Set up the DRS server connection. Your team or project documentation should provide the server URL, credentials, project, and bucket:

git drs remote add gen3 production \
  --cred /path/to/credentials.json \
  --url https://calypr-public.ohsu.edu \
  --project my-project \
  --bucket my-bucket

Note: This step is required even if the original repository author already configured a DRS remote — remote configurations are local to each clone and are not committed to Git.

Step 4 — Pull file content

Download the actual file content using Git LFS:

# Pull all LFS-tracked files
git lfs pull

# Or pull specific files by pattern
git lfs pull -I "*.bam"

Refer to the Git LFS pull documentation for filters and options.

Step 5 — Verify

Confirm that pointer files have been replaced with full content and that DRS-tracked files are recognized:

git lfs ls-files

A * next to a file indicates its content is present locally. A - means only the pointer is checked out.

Quick reference

# Full clone workflow — copy and paste
git clone https://github.com/your-org/your-data-repo.git
cd your-data-repo
git drs init
git drs remote add gen3 production \
  --cred /path/to/credentials.json \
  --url https://calypr-public.ohsu.edu \
  --project my-project \
  --bucket my-bucket
git lfs pull
git lfs ls-files

Git DRS Commands

The following commands are specific to Git DRS. For all file tracking, downloading, and listing operations, use the corresponding Git LFS commands.

git drs init

Initialize the current repository for Git DRS. Configures hooks and filter settings.

git drs init
What it configures Purpose
Pre-push hooks Registers LFS objects with the DRS server during git push
Custom transfer agent Routes LFS transfers through Git DRS — see Git LFS custom transfer agents
Clean / smudge filters Processes pointer files on git add and git checkout — similar to Git LFS clean and Git LFS smudge

Run this once per repository, after git lfs install.

git drs remote

Manage DRS server configurations for the repository.

# Add a new DRS remote
git drs remote add <name> <environment> \
  --cred <path-to-credentials> \
  --url <server-url> \
  --project <project-id> \
  --bucket <bucket-name>

# List configured remotes
git drs remote list

# Remove a remote
git drs remote remove <name>

Parameters:

Parameter Description
<name> Identifier for this DRS remote (e.g., gen3)
<environment> Environment label (e.g., production, staging)
--cred Path to credentials file for authenticating with the DRS server
--url Base URL of the DRS server
--project Project identifier on the DRS server
--bucket Storage bucket associated with the project

git drs query

Query a DRS server by DRS ID to retrieve object metadata.

git drs query <drs-id>

Returns metadata about a DRS object, including its access URLs and checksums.

git drs version

Display the installed Git DRS version.

git drs version

Using Git DRS Alongside Git LFS

Git DRS and Git LFS work together — Git LFS handles the heavy lifting of file storage and transfer, while Git DRS adds DRS server registration on top.

Operation What to use Documentation
Track file patterns git lfs track "*.bam" git-lfs-track
List tracked files git lfs ls-files git-lfs-ls-files
Download content git lfs pull git-lfs-pull
Push content git push (LFS transfer + DRS registration) git-lfs-push
Configure DRS connection git drs remote add ... This guide
Query DRS metadata git drs query <drs-id> This guide
Initialize repo for DRS git drs init This guide
Clone a DRS-enabled repo git clone + git drs init + git lfs pull This guide

Troubleshooting

Problem Solution
git drs: command not found Verify the install completed successfully. Ensure the git-drs binary is on your PATH.
git lfs commands fail Run git lfs install first — see Git LFS install docs
DRS registration fails on push Check your DRS remote configuration with git drs remote list. Verify credentials and server URL.
Files not tracked Ensure patterns are in .gitattributes via git lfs track. See Git LFS tracking docs.
Pointer files not restored on checkout Run git lfs pull — see Git LFS pull docs
Cloned repo shows pointer text instead of file content Run git drs init then git lfs pull — see Cloning a Repository That Uses Git DRS
Query returns no results Verify the DRS ID is correct and that the object was registered during a previous git push.

For general Git LFS troubleshooting, see the Git LFS FAQ and GitHub's Git LFS documentation.


Further Reading

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions