-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Feature Request: Deprecate Pure-Go .gitattributes Matching in Favor of git check-attr
Summary
Deprecate the existing “best-effort pure-Go matcher for .gitattributes” and standardize on authoritative attribute resolution via git check-attr for all path-based routing, filtering, and policy decisions.
The pure-Go matcher is inherently incomplete and will produce incorrect results in common, real-world Git repositories due to Git’s attribute precedence rules. These failures are subtle, hard to debug, and can lead to incorrect routing (RO vs RW), incorrect enforcement, or data integrity issues.
Motivation
Git attributes are not a simple pattern-matching file. They are resolved by Git using:
- hierarchical precedence
- multiple attribute sources
- overrides and negation
- path-relative scope
- repo configuration and info files
Re-implementing this logic outside of Git is brittle and error-prone.
Git already exposes the correct resolution mechanism via:
git check-attrUsing Git as the source of truth eliminates ambiguity and guarantees correctness.
Problem Statement
The current pure-Go matcher:
- Parses a single
.gitattributesfile - Applies “last match wins” semantics locally
- Ignores Git’s full attribute resolution rules
This approach cannot faithfully replicate Git behavior and will return incorrect results in many common scenarios.
Typical Failure Scenarios
Below are non-edge-case, real-world situations where a best-effort matcher will give the wrong answer.
1. Nested .gitattributes Files
Scenario
.gitattributes
data/** drs.route=ro
data/projectA/.gitattributes
*.dat drs.route=rw
Path
data/projectA/file.dat
Correct Git behavior
drs.route = rw
Pure-Go failure
- Only reads the root
.gitattributes - Returns
ro - Routes uploads incorrectly
Git resolves attributes per directory, not per file, and applies the closest .gitattributes.
2. Attribute Overrides and Unsets
Scenario
*.dat drs.route=ro
scratch/** -drs.routePath
scratch/test.dat
Correct Git behavior
drs.route = unspecified
Pure-Go failure
- Treats
-drs.routeas unknown or ignores it - Incorrectly keeps
ro
Unset semantics are core to Git attributes and are difficult to model correctly.
3. info/attributes and Global Attributes
Git reads attributes from multiple sources:
Order of precedence (simplified):
.gitattributesin the same directory- Parent
.gitattributes .git/info/attributes- Global attributes (
core.attributesFile)
Scenario
.git/info/attributes
TARGET-ALL-P2/** drs.route=ro
No .gitattributes in the repo.
Correct Git behavior
drs.route = ro
Pure-Go failure
- Never looks at
.git/info/attributes - Returns
unspecified
This is extremely common in controlled or managed repos.
4. Attribute Macros and Composition
Scenario
[attr]readonly
drs.route=ro
data/** readonlyCorrect Git behavior
drs.route = ro
Pure-Go failure
- Does not expand attribute macros
- Misses the route entirely
Macros are first-class Git features and are used heavily in larger repos.
5. Path Normalization and Platform Semantics
Git attribute matching uses:
- forward-slash normalization
- repo-relative paths
- special handling for directories vs files
Scenario
- Windows paths (
\) - symlinked worktrees
- submodules
A custom matcher will almost always diverge from Git’s behavior across platforms.
6. Renames and History-Sensitive Evaluation
Git evaluates attributes based on the current tree context, not historical assumptions.
Scenario
- File moved from
scratch/→TARGET-ALL-P2/ - Different routing rules apply
Correct Git behavior
- Attributes reflect current path
Pure-Go failure
- Cached or inferred rules from old paths
- Incorrect routing after renames
Impact
Incorrect attribute resolution can cause:
- Files routed to the wrong backend (RO vs RW)
- Uploads denied or allowed incorrectly
- Silent policy violations
- Extremely difficult debugging (“works locally but not in CI”)
Because attribute resolution happens inside Git, any divergence introduces correctness risk.
Proposed Change
Deprecate
- The “best-effort pure-Go
.gitattributesmatcher”
Standardize On
- Calling
git check-attrfor all attribute lookups
Example:
git check-attr drs.route -- path/to/fileThis provides:
- Exact Git semantics
- Correct precedence handling
- Consistent behavior across platforms and environments
Migration Plan
-
Mark the pure-Go matcher as deprecated
-
Update internal callers to use
git check-attr -
Retain the pure-Go matcher only as:
- a test helper, or
- a last-ditch fallback with explicit warnings
Alternatives Considered
-
Re-implement full Git attribute resolution in Go
❌ High complexity, high maintenance, guaranteed drift over time -
Maintain both implementations
❌ Ambiguous correctness, inconsistent behavior
Using Git itself is the simplest, most robust solution.
Recommendation
Deprecate and remove the pure-Go attribute matcher in favor of authoritative resolution via git check-attr.
Git already solved this problem. We should not re-implement it.
Additional Rationale: Typical Git LFS Filter Scenarios Where Best-Effort Matching Fails
Git LFS usage amplifies the risk of incorrect attribute resolution because filter decisions affect both content storage and transfer semantics. A wrong answer doesn’t just misroute metadata — it can lead to missing objects, failed pushes, or corrupted workflows.
Below are common, real-world LFS patterns where a best-effort .gitattributes matcher will fail.
1. Mixed LFS / Non-LFS Paths with Overrides
Scenario
*.dat filter=lfs diff=lfs merge=lfs -text
# Explicitly exclude scratch outputs
scratch/** -filter -diff -mergePath
scratch/results/output.dat
Correct Git behavior
filter = unspecified(NOT LFS)- File is stored directly in Git
Best-effort failure
- Sees
*.dat filter=lfs - Ignores or mishandles
-filter - Treats file as LFS-managed
Impact
- Pointer file written where raw content was expected
- Downstream tools fail on unexpected pointer files
- Users see “why is my scratch output in LFS?”
2. Nested LFS Rules with Directory-Scoped Overrides
Scenario
.gitattributes
*.bin filter=lfs
data/raw/.gitattributes
*.bin -filter
Path
data/raw/sample.bin
Correct Git behavior
- Not tracked by LFS
Best-effort failure
- Only evaluates root
.gitattributes - Treats file as LFS-managed
Impact
- Large raw files unintentionally pushed through LFS
- Uploads fail or are routed incorrectly
- Hard to diagnose because the rule looks correct to the user
3. LFS Enablement via Attribute Macros
Scenario
[attr]lfsdata
filter=lfs diff=lfs merge=lfs -text
*.bam lfsdata
*.cram lfsdataCorrect Git behavior
.bamand.cramfiles are LFS-tracked
Best-effort failure
- Does not expand attribute macros
- Returns
filter=unspecified
Impact
- Large genomics files committed directly into Git
- Repository bloat
- Silent failure until repo size explodes
This pattern is very common in scientific and media repositories.
4. info/attributes Used to Enforce LFS Globally
Scenario
.git/info/attributes
*.mp4 filter=lfs diff=lfs merge=lfs -text
No .gitattributes committed to the repo.
Correct Git behavior
.mp4files are LFS-managed
Best-effort failure
- Never reads
.git/info/attributes - Treats files as non-LFS
Impact
- CI and developer machines behave differently
- LFS rules appear to “randomly not apply”
- Violates operator expectations in managed environments
5. Conditional LFS Usage by Directory
Scenario
data/** filter=lfs
data/tmp/** -filterPath
data/tmp/intermediate.bin
Correct Git behavior
- Not LFS-managed
Best-effort failure
- Applies first match only
- Or applies both incorrectly
- Returns
filter=lfs
Impact
- Temporary/intermediate files end up as LFS pointers
- Users delete temp dirs and break LFS history
- Garbage collection and pruning become unsafe
6. Rename-Sensitive LFS Semantics
Scenario
- File initially in
scratch/(not LFS) - Later renamed to
data/(LFS-tracked)
scratch/** -filter
data/** filter=lfsCorrect Git behavior
- LFS applies based on current path, not history
Best-effort failure
- Cached or inferred rules based on old location
- Incorrectly treats renamed file as non-LFS
Impact
- Pointer not created when expected
- Push fails with
(missing)because bytes aren’t in LFS store - Extremely confusing user experience
7. Cross-Platform Path Matching Issues
Git attribute matching:
- normalizes to
/ - applies repo-relative paths
- handles directories specially
Best-effort failure modes
- Windows
\paths - Case sensitivity mismatches
- Incorrect matching for
**patterns
Impact
- LFS works on macOS/Linux, fails on Windows
- Routing differs between developer machines and CI
Why This Matters More for LFS Than Other Attributes
For attributes like text or eol, a wrong answer is annoying.
For LFS, a wrong answer can cause:
- pointer files where raw data is expected
- raw data where pointers are required
- missing objects at push time
- irreversible repo pollution
Because LFS affects storage, transport, and history, correctness is non-negotiable.
Conclusion (Reinforced)
Any best-effort .gitattributes matcher will inevitably diverge from Git’s behavior in common LFS use cases.
For LFS-related decisions (filter=lfs, routing, policy enforcement):
git check-attris not just preferable — it is required for correctness.
This strengthens the case to deprecate the pure-Go matcher entirely and rely on Git as the single source of truth.