Skip to content

PEP 723 parser + silent detection telemetry#1555

Merged
StellaHuang95 merged 9 commits into
microsoft:mainfrom
StellaHuang95:pep723/stage-1
Jun 3, 2026
Merged

PEP 723 parser + silent detection telemetry#1555
StellaHuang95 merged 9 commits into
microsoft:mainfrom
StellaHuang95:pep723/stage-1

Conversation

@StellaHuang95

Copy link
Copy Markdown
Contributor

Why we're doing this

PEP 723 ("inline script metadata") lets a single-file Python script declare its own runtime requirements in a TOML block at the top of the file:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# ///

import requests

Runners like uv run --script, pipx run, and hatch run already read this block to set up an environment automatically. Today, ms-python.vscode-python-envs does nothing with it.

Before we commit to any of that work, we want to answer a basic question: how many users of this extension actually have PEP 723 scripts in their workspaces, and how many edit them vs. just open them?

The follow-on stages involve UX choices that are still being worked through in design review — environment backend (venv vs uv vs cache-discovery), persistence model (sibling .venv vs content-addressed cache), the trigger surface for env creation, and so on. Sizing the population first lets us prioritize honestly instead of building speculatively.

This PR is the measurement-only first slice: a parser plus a silent telemetry observer. There is no UI, no setting, no project registration, and no environment creation. The extension behaves exactly as it does today; the only change a user could observe is two new entries in their telemetry stream.

What changed and how it works

src/common/inlineScriptMetadata.ts — the parser

Pure-function code that every later stage will build on. Exports:

Symbol Purpose
readInlineScriptMetadata(text) Parse from in-memory script source. Returns parsed requires-python, dependencies, opaque [tool] table, and the block's character offsets — or undefined for no block, malformed block, multiple blocks (per spec MUST error), or TOML errors.
readInlineScriptMetadataFromFile(uri) Same, but reads only the first 8 KiB of a file on disk via fs.open + fileHandle.read. Skips non-file: URIs. Swallows I/O errors.
matchesPythonVersion(specifier, version) Tests a Python version against a PEP 440 specifier: ==, !=, >=, <=, >, <, ~=, ===; comma-separated AND; ==3.12.* wildcards.

Handles the edges that come up in real files: leading UTF-8 BOM (Windows), CRLF and lone-CR line endings (normalized before regex), shebang lines, encoding declarations, the spec's content-extraction rule (# foofoo, bare # → blank line; rejects #foo, ##foo, #\tfoo), multiple script blocks (returns undefined + traceWarn), and unknown top-level block types (silently ignored, per spec). Test coverage in src/test/common/inlineScriptMetadata.unit.test.ts (~30 cases).

src/features/inlineScriptLazyDetector.ts — the silent observer

Subscribes to onDidOpenTextDocument, onDidSaveTextDocument, and onDidChangeTextDocument. For each .py file inside an open workspace folder, on open or save, it reads the first 8 KiB and runs the parser. When a valid block is detected, it emits anonymized telemetry — and that's it. No projects registered, no UI shown.

Behaviour worth flagging:

  • Per-session dedup by URI. PEP723.DETECTED fires at most once per URI per session; repeat opens/saves of the same file are silent. PEP723.EDITED only fires for URIs already counted as detected, and only on the first contentChanges-bearing event.
  • In-flight coalescing. Rapid open+save on the same URI shares one read promise via an inFlight map.
  • Activation catch-up. On activation, the detector replays every already-open .py document through the same handler. The extension's onLanguage:python activation event fires after VS Code has opened any restored editors, so the open events for those files are gone by the time we subscribe. The replay is deferred via setImmediate to avoid racing VS Code's own document registration; per-URI dedup keeps it idempotent if a live event happens to arrive too.
  • Disposal-safe. A disposed flag guards async continuations so a read that completes after dispose() does not emit telemetry on a torn-down host.
  • No setting gate. The operation is cheap (8 KiB head read, regex, TOML parse — sub-millisecond) and side-effect free. The python-envs.useInlineScriptMetadata setting reserved in plan.md becomes relevant in Stage 2+ when we start writing to disk.

src/common/telemetry/constants.ts — the two events

PEP723.DETECTED   properties: { trigger: 'open' | 'save', hasRequiresPython: boolean }
                  measures:   { dependencyCount: number }

PEP723.EDITED     measures:   { duration: number /* ms between detection and first edit */ }

Together, the events answer two questions: how many users have PEP 723 files at all (DETECTED count), and how many of those users actually edit them rather than just opening them once (EDITED / DETECTED ratio). Full GDPR annotations land alongside the enum members in the same file.

What is not in the events: no URIs, no file paths, no file content, no workspace identifiers, no project names. The events are pure counters plus a small set of shape metadata.

Supporting changes

  • src/common/workspace.apis.ts — wrapper exports for onDidOpenTextDocument, onDidSaveTextDocument, onDidChangeTextDocument, and getOpenTextDocuments, matching the existing wrapper pattern for testability.
  • src/extension.ts — constructs and activates the observer alongside the existing project creators; pushes it onto context.subscriptions for disposal.
  • src/test/features/inlineScriptLazyDetector.unit.test.ts — covers open/save dispatch, activation replay, per-URI dedup, edit-event gating, in-flight coalescing, and disposal safety.

- inlineScriptMetadata: switch BLOCK_RE consumption to text.matchAll() so the global regex's lastIndex is never shared/mutated across calls.

- inlineScriptLazyDetector: fix save-event coalescing bug by re-enqueuing a fresh read when a save races with an in-flight open (the previous cache may be stale).

- inlineScriptLazyDetector: add a disposed flag and guard processOnce against post-disposal project registration.

- inlineScriptDetector (creator): use uri.toString() for URI equality to match InlineScriptLazyDetector and avoid Windows drive-letter / trailing-separator divergence.

- inlineScriptDetector (creator): replace showErrorMessage with showInformationMessage for the four "no scripts found" toasts (informational, not an error).

- tests: replace the open+save coalescing test with two tests covering open-dedup and save-re-read separately; add a dispose-during-in-flight-read test; switch detector tests to stub showInformationMessage.
Strip user-facing PEP 723 inline-script-metadata wiring while keeping the lazy detector wired up as the planned telemetry ingest point.

- Remove the python-envs.useInlineScriptMetadata setting from package.json + package.nls.json.

- Delete the bulk-scan InlineScriptDetector project creator and its unit test.

- Drop the PEP 723 dependency-extraction branch from pipUtils.getProjectInstallable and its unit test.

- Remove isInlineScriptMetadataEnabled + setting constants from common/inlineScriptMetadata; keep the parser.

- Remove the inlineScriptMetadata field from PythonProjectsImpl.

- Remove the InlineScriptStrings localization namespace.

- Slim InlineScriptLazyDetector to a no-arg observer with a TODO(pep723-telemetry) marker; rewrite its unit tests for the new shape.

- Re-enable the lazy detector in extension.ts with an updated comment describing the telemetry-observer role.
Wires the existing silent inline-script-metadata detector to emit two anonymized telemetry events:

- PEP723.DETECTED: fires once per (URI, session) the first time a valid `# /// script` block is observed. Properties: `trigger` (open|save), `hasRequiresPython` (bool). Measure: `dependencyCount` (int). This is the denominator for the `how many users actually see PEP 723 files` question.

- PEP723.EDITED: fires once per (URI, session) the first time a previously-detected URI receives a non-empty content change. Measure: `duration` (ms since detection). Together with DETECTED this distinguishes viewers from editors.

No URIs, paths, dependency names, or version strings are sent.

Adds an `onDidChangeTextDocument` wrapper to `workspace.apis.ts` so the detector can subscribe through the same abstraction layer used for open/save. Extends the detector unit tests from 16 to 26 cases covering both events, per-URI dedup, empty-contentChanges no-ops, and disposal suppression.
…ge.json reformat

- src/extension.ts: update the inline comment beside the `InlineScriptLazyDetector` activation site to describe its actual behavior (emits anonymized telemetry) instead of the stale `feature is not shipped` wording from the slim-down commit.

- package.json: revert an unintended multi-line re-pretty-print of `python-envs.workspaceSearchPaths.default`. The branch now has zero drift on package.json against upstream/main.

@edvilme edvilme left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment we can implement later, but looks good to me :)

Comment thread src/common/inlineScriptMetadata.ts
@StellaHuang95 StellaHuang95 merged commit 713cb2d into microsoft:main Jun 3, 2026
84 of 86 checks passed
@StellaHuang95 StellaHuang95 deleted the pep723/stage-1 branch June 3, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants