refactor(sdk): introduce shared tree-sitter-bash entry point#3307
refactor(sdk): introduce shared tree-sitter-bash entry point#3307Fieldnote-Echo wants to merge 2 commits into
Conversation
Add `openhands-sdk/openhands/sdk/security/shell_parser.py` exposing `parse(command: str) -> ParseResult`. Declare `tree-sitter` and `tree-sitter-bash` in `openhands-sdk/pyproject.toml`; the existing declarations in `openhands-tools/pyproject.toml` are unchanged because both packages directly import these libraries and each should declare what it imports rather than rely on transitives. `uv.lock` is regenerated. No consumer is refactored in this PR. `openhands-tools.terminal.utils.command` keeps its inline parser construction unchanged. A follow-up will migrate it to `shell_parser.parse`, and a later PR will add the AST-aware security detector that OpenHands#2721 Phase 2b targets. Scope: 1 new module (~30 LOC), 1 test module, 1 pyproject edit, regenerated lockfile. The `ParseResult` surface is intentionally minimal (`tree`, `has_error`) so the first move is a substrate change with no API surface enlargement. Convenience iterators and a public `root_node` field are deferred per the direction comment on OpenHands#2721. Refs OpenHands#2721 Coding-Agent: claude-code Model: claude-opus-4-7 AI-assisted: drafting, adversarial audit synthesis, test generation
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste. Focused shared parser entry point with real coverage; dependency upload times are outside the 7-day guardrail.
Verified locally: uv run pytest tests/sdk/security/test_shell_parser.py -q and uv run pre-commit run --files openhands-sdk/openhands/sdk/security/shell_parser.py tests/sdk/security/test_shell_parser.py openhands-sdk/pyproject.toml both passed.
Risk Assessment: 🟢 LOW — additive SDK module/dependency declaration; no existing consumers or agent behavior changed.
VERDICT: ✅ Worth merging.
This review was created by an AI agent (OpenHands) on behalf of the user.
VascoSch92
left a comment
There was a problem hiding this comment.
LGTM!
Just left a couple of comments.
| shared substrate rather than an API surface enlargement. | ||
|
|
||
| The ``Parser`` is constructed per call. ``Language`` is built once at | ||
| import. This mirrors the convention established by ``#3237`` in |
There was a problem hiding this comment.
please don't refer to the issue number.
| object is safely reusable. | ||
| """ | ||
|
|
||
| from __future__ import annotations |
| _BASH_LANGUAGE = Language(tree_sitter_bash.language()) | ||
|
|
||
|
|
||
| @dataclass(frozen=True) |
| return ParseResult(tree=tree, has_error=tree.root_node.has_error) | ||
|
|
||
|
|
||
| __all__ = ["ParseResult", "parse"] |
There was a problem hiding this comment.
This should be at the top of the file
| whose ``has_error`` flag faithfully reflects ``tree.root_node.has_error``. | ||
| """ | ||
|
|
||
| from __future__ import annotations |
|
@Fieldnote-Echo — thanks for the cc. The substrate split is the right call; reviewing a ~30-line module + dep declaration in isolation is materially easier than reviewing it against either consumer. For the ATR adapter path (#3176), PR-A landing is what we want — the adapter eventually lives at the detector layer (sibling to No review blockers from my side on this PR; just watching the chain. |
- drop issue-number references from the module docstrings - remove the unneeded `from __future__ import annotations` (project is py>=3.12) - add `slots=True` to the frozen `ParseResult` - move `__all__` to the top of the module
Summary
Phase 2a foundation for #2721. Creates a shared tree-sitter-bash parse
entry point inside
openhands-sdkso the security analyzers canimport a parser without reaching across packages.
Specifically:
openhands-sdk/openhands/sdk/security/shell_parser.pywith asingle public function
parse(command: str) -> ParseResultand afrozen
ParseResultdataclass carryingtreeandhas_error.tree-sitter>=0.25andtree-sitter-bash>=0.25inopenhands-sdk/pyproject.toml. The 0.25 floor matches what theworkspace lockfile already resolves and avoids the documented ABI
mismatch between
tree-sitter==0.24.0andtree-sitter-bash==0.25.0.openhands-toolskeeps its directtree-sitterdeclarations(unchanged here, per chore(tools): replace bashlex (GPLv3) with tree-sitter-bash (MIT) #3237) because it still imports those libraries
directly via
terminal/utils/command.py. Tightening that spec is aseparate cleanup.
uv.lockto reflect the new declaration site.Out of scope (deliberate)
openhands-tools/openhands/tools/terminal/utils/command.pykeeps its inline parser construction; a follow-up PR will switch it
to
shell_parser.parseand confirmtests/tools/terminal/test_terminal_parsing.pypasses unchanged.
iter_*convenience accessors. The Phase 2c direction note onProposal: Replace regex-based shell command analysis with tree-sitter-bash #2721 floats
iter_simple_commands/iter_pipelinesetc.; thoseare easier to design once at least one detector exists to validate
them, so they land with Phase 2b or shortly after.
ERROR-node policy. Pattern and rail detectors are unchanged.Why the substrate change goes first
Following the suggestion to "divide the scope of the second part even
further": this is the smallest landable piece that unblocks Phase 2b
(security analyzers can now import a parser from the SDK without
acquiring a transitive dependency on
openhands-tools) and Phase 2aconsumer refactor (a single import line change in
command.py).Reviewing one ~30-line module plus a dependency move is meaningfully
cheaper than reviewing the substrate together with either consumer
refactor.
Phase 2 roadmap
This PR is the first link in a chain. Each later PR is small and independently mergeable.
shell_parser.pymodule + dep declaration. No consumer touched.openhands-tools/openhands/tools/terminal/utils/command.pyto consumeshell_parser.parse.defense_in_depth/pattern.pyshell-destructive detectors to AST. The xfail bypass corpus from #3304 flips to passing understrict=True.defense_in_depth/policy_rails.pyrails (fetch-to-exec, raw-disk-op, catastrophic-delete) to AST.root_nodeanditer_*convenience accessors toParseResultonce a detector validates the surface.Collaboration on lanes PR-B onward is welcome. /cc @eeee2345 in case the analyzer-side work aligns with the Agent Threat Rules direction in #3176.
Test plan
uv syncresolves cleanly;uv.lockregenerated withtree-sitter/tree-sitter-bashdeclared by both packages.tests/sdk/security/test_shell_parser.pypasses (new file, 18tests across success shape, error recovery, edge cases, immutability,
adversarial bytes, byte-offset semantics, determinism).
tests/tools/terminal/test_terminal_parsing.pypasses unchanged(no source changes in
openhands-tools).pre-commit run --all-filesclean (ruff format, ruff lint,pyright, pycodestyle).
Issue Number
Refs #2721