Skip to content
55 changes: 54 additions & 1 deletion docs/src/content/docs/getting-started/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ All token-bearing requests use HTTPS. Tokens are never sent over unencrypted con

For Azure DevOps, the only token source is `ADO_APM_PAT`.

For JFrog Artifactory, use `ARTIFACTORY_APM_TOKEN`.
For Artifactory registry proxies, use `PROXY_REGISTRY_TOKEN`. See [Registry proxy (Artifactory)](#registry-proxy-artifactory) below.

For runtime features (`GITHUB_COPILOT_PAT`), see [Agent Workflows](../../guides/agent-workflows/).

Expand Down Expand Up @@ -155,6 +155,59 @@ Create the PAT at `https://dev.azure.com/{org}/_usersSettings/tokens` with **Cod
| `contoso.ghe.com/org/repo` | *.ghe.com | Global env vars → credential fill | Auth-only (no public repos) |
| GHES via `GITHUB_HOST` | ghes.company.com | Global env vars → credential fill | Unauth for public repos |
| `dev.azure.com/org/proj/repo` | ADO | `ADO_APM_PAT` only | Auth-only |
| Artifactory registry proxy | custom FQDN | `PROXY_REGISTRY_TOKEN` | Error if `PROXY_REGISTRY_ONLY=1` |

## Registry proxy (Artifactory)

Air-gapped environments route all VCS traffic through a JFrog Artifactory proxy. APM supports this via three env vars:

| Variable | Purpose |
|----------|---------|
| `PROXY_REGISTRY_URL` | Full proxy base URL, e.g. `https://art.example.com/artifactory/github` |
| `PROXY_REGISTRY_TOKEN` | Bearer token for the proxy |
| `PROXY_REGISTRY_ONLY` | Set to `1` to block all direct VCS access -- only proxy downloads allowed |

```bash
export PROXY_REGISTRY_URL=https://art.example.com/artifactory/github
export PROXY_REGISTRY_TOKEN=your_bearer_token
export PROXY_REGISTRY_ONLY=1 # optional -- enforces proxy-only mode

apm install
```

When `PROXY_REGISTRY_URL` is set, APM rewrites download URLs to go through the proxy and sends `PROXY_REGISTRY_TOKEN` as the `Authorization: Bearer` header instead of the GitHub PAT.

### Lockfile and reproducibility

After a successful proxy install, `apm.lock.yaml` records the proxy host and path prefix as separate fields:

```yaml
dependencies:
- repo_url: owner/repo
host: art.example.com # pure FQDN -- no path
registry_prefix: artifactory/github # path prefix
resolved_commit: abc123def456
```

Subsequent `apm install` runs (without `--update`) read these fields to reconstruct the proxy URL and route auth to `PROXY_REGISTRY_TOKEN`, ensuring byte-for-byte reproducibility without needing the original env vars to be set identically.

### Proxy-only enforcement

With `PROXY_REGISTRY_ONLY=1`, APM will:

1. Validate the existing `apm.lock.yaml` at startup and exit with an error if any entry is locked to a direct VCS source (no `registry_prefix`)
2. Skip the download cache for entries that have no `registry_prefix` (forcing a fresh proxy download)
3. Raise an error for any package reference that does not route through the configured proxy

### Deprecated Artifactory env vars

The following env vars still work but emit a `DeprecationWarning`. Migrate to the `PROXY_REGISTRY_*` equivalents:

| Deprecated | Replacement |
|------------|-------------|
| `ARTIFACTORY_BASE_URL` | `PROXY_REGISTRY_URL` |
| `ARTIFACTORY_APM_TOKEN` | `PROXY_REGISTRY_TOKEN` |
| `ARTIFACTORY_ONLY` | `PROXY_REGISTRY_ONLY` |

## Troubleshooting

Expand Down
120 changes: 116 additions & 4 deletions src/apm_cli/commands/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -1326,18 +1326,70 @@ def _collect_descendants(node, visited=None):

# Collect installed packages for lockfile generation
from apm_cli.deps.lockfile import LockFile, LockedDependency, get_lockfile_path
from apm_cli.deps.installed_package import InstalledPackage
from apm_cli.deps.registry_proxy import RegistryConfig
from ..utils.content_hash import compute_package_hash as _compute_hash
installed_packages: List[tuple] = [] # List of (dep_ref, resolved_commit, depth, resolved_by, is_dev)
installed_packages: List[InstalledPackage] = []
package_deployed_files: builtins.dict = {} # dep_key → list of relative deployed paths
package_types: builtins.dict = {} # dep_key → package type string
_package_hashes: builtins.dict = {} # dep_key → sha256 hash (captured at download/verify time)

# Resolve registry proxy configuration once for this install session.
registry_config = RegistryConfig.from_env()

# Build managed_files from existing lockfile for collision detection
managed_files = builtins.set()
existing_lockfile = LockFile.read(get_lockfile_path(project_root)) if project_root else None
if existing_lockfile:
for dep in existing_lockfile.dependencies.values():
managed_files.update(dep.deployed_files)

# Conflict: registry-only mode requires all locked deps to route
# through the configured proxy. Deps locked to direct VCS sources
# (github.com, GHE Cloud, GHES) are incompatible.
if registry_config and registry_config.enforce_only:
conflicts = registry_config.validate_lockfile_deps(
list(existing_lockfile.dependencies.values())
)
if conflicts:
_rich_error(
"PROXY_REGISTRY_ONLY is set but the lockfile contains "
"dependencies locked to direct VCS sources:"
)
for dep in conflicts[:10]:
host = dep.host or "github.com"
name = dep.repo_url
if dep.virtual_path:
name = f"{name}/{dep.virtual_path}"
_rich_error(f" - {name} (host: {host})")
_rich_error(
"Re-run with 'apm install --update' to re-resolve "
"through the registry, or unset PROXY_REGISTRY_ONLY."
)
sys.exit(1)

# Supply chain warning: registry-proxy entries without a
# content_hash cannot be verified on re-install.
if registry_config and registry_config.enforce_only:
missing = registry_config.find_missing_hashes(
list(existing_lockfile.dependencies.values())
)
if missing:
diagnostics.warn(
"The following registry-proxy dependencies have no "
"content_hash in the lockfile. Run 'apm install "
"--update' to populate hashes for tamper detection.",
package="lockfile",
)
for dep in missing[:10]:
name = dep.repo_url
if dep.virtual_path:
name = f"{name}/{dep.virtual_path}"
diagnostics.warn(
f" - {name} (host: {dep.host})",
package="lockfile",
)

# Normalize path separators once for O(1) lookups in check_collision
from apm_cli.integration.base_integrator import BaseIntegrator
managed_files = BaseIntegrator.normalize_managed_files(managed_files)
Expand Down Expand Up @@ -1540,7 +1592,11 @@ def _collect_descendants(node, visited=None):
depth = node.depth if node else 1
resolved_by = node.parent.dependency_ref.repo_url if node and node.parent else None
_is_dev = node.is_dev if node else False
installed_packages.append((dep_ref, None, depth, resolved_by, _is_dev))
installed_packages.append(InstalledPackage(
dep_ref=dep_ref, resolved_commit=None,
depth=depth, resolved_by=resolved_by, is_dev=_is_dev,
registry_config=None, # local deps never go through registry
))
dep_key = dep_ref.get_unique_key()
if install_path.is_dir() and not dep_ref.is_local:
_package_hashes[dep_key] = _compute_hash(install_path)
Expand Down Expand Up @@ -1666,6 +1722,20 @@ def _collect_descendants(node, visited=None):
safe_rmtree(install_path, apm_modules_dir)
skip_download = False

# When registry-only mode is active, bypass cache if the
# cached artifact was NOT previously downloaded via the
# registry (no registry_prefix in lockfile). This handles
# the transition from direct-VCS installs to proxy installs
# for packages not yet in the lockfile.
if (
skip_download
and registry_config
and registry_config.enforce_only
and not dep_ref.is_local
):
if not _dep_locked_chk or _dep_locked_chk.registry_prefix is None:
skip_download = False

if skip_download:
display_name = (
str(dep_ref) if dep_ref.is_virtual else dep_ref.repo_url
Expand Down Expand Up @@ -1749,7 +1819,19 @@ def _collect_descendants(node, visited=None):
cached_commit = locked_dep.resolved_commit
if not cached_commit:
cached_commit = dep_ref.reference
installed_packages.append((dep_ref, cached_commit, depth, resolved_by, _is_dev))
# Determine if the cached package came from the registry:
# prefer the lockfile record, then the current registry config.
_cached_registry = None
if _dep_locked_chk and _dep_locked_chk.registry_prefix:
# Reconstruct RegistryConfig from lockfile to preserve original source
_cached_registry = registry_config
elif registry_config and not dep_ref.is_local:
_cached_registry = registry_config
installed_packages.append(InstalledPackage(
dep_ref=dep_ref, resolved_commit=cached_commit,
depth=depth, resolved_by=resolved_by, is_dev=_is_dev,
registry_config=_cached_registry,
))
if install_path.is_dir():
_package_hashes[dep_key] = _compute_hash(install_path)
# Track package type for lockfile
Expand Down Expand Up @@ -1895,10 +1977,40 @@ def _collect_descendants(node, visited=None):
depth = node.depth if node else 1
resolved_by = node.parent.dependency_ref.repo_url if node and node.parent else None
_is_dev = node.is_dev if node else False
installed_packages.append((dep_ref, resolved_commit, depth, resolved_by, _is_dev))
installed_packages.append(InstalledPackage(
dep_ref=dep_ref, resolved_commit=resolved_commit,
depth=depth, resolved_by=resolved_by, is_dev=_is_dev,
registry_config=registry_config if not dep_ref.is_local else None,
))
if install_path.is_dir():
_package_hashes[dep_ref.get_unique_key()] = _compute_hash(install_path)

# Supply chain protection: verify content hash on fresh
# downloads when the lockfile already records a hash.
# A mismatch means the downloaded content differs from
# what was previously locked — possible tampering.
if (
not update_refs
and _dep_locked_chk
and _dep_locked_chk.content_hash
and dep_ref.get_unique_key() in _package_hashes
):
_fresh_hash = _package_hashes[dep_ref.get_unique_key()]
if _fresh_hash != _dep_locked_chk.content_hash:
safe_rmtree(install_path, apm_modules_dir)
_rich_error(
f"Content hash mismatch for "
f"{dep_ref.get_unique_key()}: "
f"expected {_dep_locked_chk.content_hash}, "
f"got {_fresh_hash}. "
"The downloaded content differs from the "
"lockfile record. This may indicate a "
"supply-chain attack. Use 'apm install "
"--update' to accept new content and "
"update the lockfile."
)
sys.exit(1)

# Track package type for lockfile
if hasattr(package_info, 'package_type') and package_info.package_type:
package_types[dep_ref.get_unique_key()] = package_info.package_type.value
Expand Down
41 changes: 32 additions & 9 deletions src/apm_cli/deps/github_downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,10 +208,27 @@ def _setup_git_environment(self) -> Dict[str, Any]:

return env

# --- Registry proxy support ---

@property
def registry_config(self):
"""Lazily-constructed :class:`~apm_cli.deps.registry_proxy.RegistryConfig`.

Returns ``None`` when no registry proxy is configured.
"""
if not hasattr(self, "_registry_config_cache"):
from .registry_proxy import RegistryConfig
self._registry_config_cache = RegistryConfig.from_env()
return self._registry_config_cache

# --- Artifactory VCS archive download support ---

def _get_artifactory_headers(self) -> Dict[str, str]:
"""Build HTTP headers for Artifactory requests."""
"""Build HTTP headers for registry/Artifactory requests."""
cfg = self.registry_config
if cfg is not None:
return cfg.get_headers()
# Fallback: direct artifactory_token attribute (legacy path)
headers = {}
if self.artifactory_token:
headers['Authorization'] = f'Bearer {self.artifactory_token}'
Expand Down Expand Up @@ -328,8 +345,13 @@ def _download_file_from_artifactory(self, host: str, prefix: str, owner: str,

@staticmethod
def _is_artifactory_only() -> bool:
"""Return True when ARTIFACTORY_ONLY is set, blocking all direct git operations."""
return os.environ.get('ARTIFACTORY_ONLY', '').strip().lower() in ('1', 'true', 'yes')
"""Return True when registry-only mode is active.

Checks the canonical ``PROXY_REGISTRY_ONLY`` env var, falling back to the
deprecated ``ARTIFACTORY_ONLY`` alias.
"""
from .registry_proxy import is_enforce_only
return is_enforce_only()

def _should_use_artifactory_proxy(self, dep_ref: 'DependencyReference') -> bool:
"""Check if a dependency should be routed through the Artifactory transparent proxy."""
Expand Down Expand Up @@ -1863,15 +1885,16 @@ def download_package(
art_proxy = self._parse_artifactory_base_url()
if self._is_artifactory_only() and not dep_ref.is_artifactory() and not art_proxy:
raise RuntimeError(
f"ARTIFACTORY_ONLY is set but no Artifactory proxy is configured for '{repo_ref}'. "
"Set ARTIFACTORY_BASE_URL or use explicit Artifactory FQDN syntax."
f"PROXY_REGISTRY_ONLY is set but no Artifactory proxy is configured for '{repo_ref}'. "
"Set PROXY_REGISTRY_URL or use explicit Artifactory FQDN syntax."
)
if dep_ref.is_virtual_file():
return self.download_virtual_file_package(dep_ref, target_path, progress_task_id, progress_obj)
elif dep_ref.is_virtual_collection():
return self.download_collection_package(dep_ref, target_path, progress_task_id, progress_obj)
elif dep_ref.is_virtual_subdirectory():
# When ARTIFACTORY_ONLY is set, download full archive and extract subdir
# When PROXY_REGISTRY_ONLY is set, download full archive and extract subdir
art_proxy = self._parse_artifactory_base_url()
if self._is_artifactory_only() and art_proxy:
return self._download_subdirectory_from_artifactory(
dep_ref, target_path, art_proxy, progress_task_id, progress_obj
Expand All @@ -1893,11 +1916,11 @@ def download_package(
dep_ref, target_path, art_proxy, progress_task_id, progress_obj
)

# When ARTIFACTORY_ONLY is set but no Artifactory proxy matched, block direct git
# When PROXY_REGISTRY_ONLY is set but no Artifactory proxy matched, block direct git
if self._is_artifactory_only():
raise RuntimeError(
f"ARTIFACTORY_ONLY is set but no Artifactory proxy is configured for '{dep_ref}'. "
"Set ARTIFACTORY_BASE_URL or use explicit Artifactory FQDN syntax."
f"PROXY_REGISTRY_ONLY is set but no Artifactory proxy is configured for '{dep_ref}'. "
"Set PROXY_REGISTRY_URL or use explicit Artifactory FQDN syntax."
)

# Regular package download (existing logic)
Expand Down
54 changes: 54 additions & 0 deletions src/apm_cli/deps/installed_package.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
"""InstalledPackage: a record of a successfully installed dependency.

Used to accumulate install results during ``apm install`` before writing
the final lockfile. Previously represented as an ad hoc positional tuple;
using a dataclass eliminates positional-index brittleness and makes each
field self-documenting.
"""

from __future__ import annotations

from dataclasses import dataclass
from typing import TYPE_CHECKING, Optional

if TYPE_CHECKING:
from apm_cli.deps.registry_proxy import RegistryConfig
from apm_cli.models.dependency.reference import DependencyReference


@dataclass
class InstalledPackage:
"""Record of a single successfully-installed dependency.

Accumulated by ``install_command()`` and consumed by
:meth:`~apm_cli.deps.lockfile.LockFile.from_installed_packages` to
generate the lock file.

Attributes
----------
dep_ref:
The resolved :class:`~apm_cli.models.dependency.reference.DependencyReference`
that was installed.
resolved_commit:
The exact commit SHA that was installed, or ``None`` for local / Artifactory
packages where no commit is available.
depth:
Dependency tree depth (1 = direct, 2 = transitive, ...).
resolved_by:
``repo_url`` of the parent that introduced this dependency, or ``None``
for direct dependencies.
is_dev:
``True`` when the package is a dev-only dependency.
registry_config:
The :class:`~apm_cli.deps.registry_proxy.RegistryConfig` that was active
when this package was downloaded, or ``None`` for direct VCS installs.
When present, the lockfile stores the proxy host (FQDN) and prefix so
that subsequent installs replay through the same proxy.
"""

dep_ref: "DependencyReference"
resolved_commit: Optional[str]
depth: int
resolved_by: Optional[str]
is_dev: bool = False
registry_config: "Optional[RegistryConfig]" = None
Loading
Loading