feat(integration-tests): Add search action and verification infra. by quinntaylormitchell · Pull Request #2235 · y-scope/clp

quinntaylormitchell · 2026-04-29T21:08:28Z

Description

This PR adds the action and verification infrastructure for testing package search.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Tested on development branch; all tests pass.

Summary by CodeRabbit

Tests
- Expanded integration test utilities for package search with multiple search modes (case, file-targeting, counts, time ranges).
- Added verification that compares search output to reference tool results and reports pass/fail with details.
- Test actions now include structured/typed arguments for clearer intent and reporting.
- Test dataset metadata includes explicit single-file match entries to stabilise single-match scenarios.

coderabbitai · 2026-04-29T21:08:38Z

Walkthrough

Adds a typed SearchArgs builder, ClpPackageSearchType modes, functions to construct and run CLP package search actions, grep-based verification helpers that normalize results per mode, and a new dataset metadata field plus validation for a single-match file.

Changes

Search Test Utilities

Layer / File(s)	Summary
Module header and SearchArgs builder `integration-tests/tests/package_tests/utils/search.py`	Module imports, logger, default interval constant, and `SearchArgs` with `to_cmd()` building CLI flags for dataset/file targeting, ignore-case, count modes, time bounds, raw output, and wildcard query.
Search modes and action construction `integration-tests/tests/package_tests/utils/search.py`	`ClpPackageSearchType` enum plus `search_clp_package()` and `_construct_args()` mapping dataset metadata to CLI flags and returning a `ClpAction` with typed `SearchArgs`.
Verification flow and helpers `integration-tests/tests/package_tests/utils/search.py`	`verify_search_action()` validates exit code and `SearchArgs`, builds a mode-specific `grep` command, runs grep, and compares normalized grep output to normalized CLP output. Helper functions: `_construct_grep_verification_cmd()`, `_get_grep_options_from_search_type()`, `_format_grep_result_for_search_type()`, `_format_search_result_for_search_type()`.

Dataset Metadata

Layer / File(s)	Summary
single_match_file metadata and validation `integration-tests/tests/utils/classes.py`, `integration-tests/tests/data/json_multifile/metadata.json`, `integration-tests/tests/data/text_multifile/metadata.json`	Adds `single_match_file` field to `SampleDatasetMetadata` and a `SampleDataset.__post_init__` check that raises `ValueError` if the specified file is not present in `metadata.file_names`. Corresponding dataset metadata files include the new field.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Code
    participant Builder as SearchArgs / search_clp_package
    participant CLP as CLP subprocess
    participant Verifier as verify_search_action
    participant Grep as grep subprocess

    Test->>Builder: request ClpAction(dataset, type, query)
    Builder-->>Test: ClpAction (cmd list, SearchArgs)
    Test->>CLP: run ClpAction subprocess
    CLP-->>Test: exit code + stdout
    Test->>Verifier: verify_search_action(action, type, dataset)
    Verifier->>Verifier: check exit code & args
    Verifier->>Grep: run mode-specific grep on dataset logs or file path
    Grep-->>Verifier: grep stdout / exit code
    Verifier->>Verifier: normalize outputs and compare
    Verifier-->>Test: VerificationResult.ok() or fail(message)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding search action and verification infrastructure for integration tests, which aligns with the new SearchArgs class, search_clp_package function, and verify_search_action function introduced in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/package_tests/utils/search.py`:
- Line 134: When running in FILE_PATH mode, guard against empty metadata
file_names before indexing: check that dataset.metadata.file_names is non-empty
(e.g., len(dataset.metadata.file_names) > 0 or truthiness) before using
dataset.metadata.file_names[0]; if empty, raise a clear assertion or ValueError
with a useful message rather than letting IndexError bubble up. Update the
assignment to args.file_path to only occur after this guard; reference the
existing identifiers args.file_path, dataset.metadata.file_names, and FILE_PATH
mode to locate and modify the code.
- Around line 176-177: Replace the bare assert on "args = action.args" with
explicit error handling: check "isinstance(args, SearchArgs)" and if false
return a VerificationResult.fail(...) with a clear message (matching other
validations in this function). Use the same VerificationResult type used
elsewhere in this file so invalid action.args is handled consistently rather
than being stripped by Python -O; reference the variables "args" and
"action.args" and the classes "SearchArgs" and "VerificationResult".
- Around line 184-188: The check treating any non-zero grep exit code as failure
is too strict: GNU grep returns 1 when no matches are found. Update the
condition in the block that inspects grep_action.completed_proc.returncode to
only fail when the return code is neither 0 nor 1 (i.e., allow 0 and 1), and
keep the pytest.fail call that references grep_action.log_file_path for true
errors; ensure you still reference grep_action.completed_proc.returncode and
pytest.fail in the same location.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef0fc03e-e41f-4ba2-b226-b05ab2ce3723

📥 Commits

Reviewing files that changed from the base of the PR and between 298e2e2 and 3b62e1f.

📒 Files selected for processing (2)

integration-tests/tests/package_tests/utils/search.py
integration-tests/tests/utils/classes.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/package_tests/utils/search.py`:
- Around line 282-286: The COUNT_BY_TIME branch currently uses re.search which
only returns the first "count: <n>" match; change the logic in the case handling
for ClpPackageSearchType.COUNT_RESULTS | ClpPackageSearchType.COUNT_BY_TIME to
use re.findall(r"count: (\d+)", search_result) (or an iterator like re.finditer)
to collect all count buckets, return them joined with "\n" and a trailing
newline, and call pytest.fail if the resulting list is empty; keep the same
match formatting and error message behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22395b96-f587-4100-b7c2-fa030589ad7f

📥 Commits

Reviewing files that changed from the base of the PR and between 3b62e1f and f9d3189.

📒 Files selected for processing (1)

integration-tests/tests/package_tests/utils/search.py

coderabbitai

♻️ Duplicate comments (2)

integration-tests/tests/package_tests/utils/search.py (2)

175-176: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace assert with explicit verification failure handling.

assert can be stripped with Python optimization flags, which can bypass this validation path. Use explicit runtime handling and return a typed verification failure instead.

Suggested fix

     args = action.args
-    assert isinstance(args, SearchArgs)
+    if not isinstance(args, SearchArgs):
+        return action.fail_verification(
+            "Search verification requires `action.args` to be a SearchArgs instance."
+        )

In Python, are assert statements removed when running with optimization flags like -O?

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/tests/package_tests/utils/search.py` around lines 175 -
176, The code uses an assert to check the type of action.args which can be
bypassed with Python -O; replace the assert with an explicit runtime check on
action.args (e.g., inspect isinstance(action.args, SearchArgs)) and on failure
raise or return the appropriate verification failure value used by this test
harness (do not rely on AssertionError). Update the handling around
args/action.args and SearchArgs so the function returns or raises the typed
verification failure the test framework expects instead of using assert.

269-273: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

COUNT_BY_TIME parsing only reads the first bucket.

re.search only captures one count: value, so multi-bucket count-by-time output is verified incorrectly. Aggregate all buckets before comparing.

Suggested fix

         case ClpPackageSearchType.COUNT_RESULTS | ClpPackageSearchType.COUNT_BY_TIME:
-            match = re.search(r"count: (\d+)", search_result)
-            if match:
-                return match.group(1) + "\n"
+            counts = [int(value) for value in re.findall(r"count:\s*(\d+)", search_result)]
+            if counts:
+                return f"{sum(counts)}\n"
             pytest.fail(f"The search result '{search_result}' wasn't in the correct format.")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/tests/package_tests/utils/search.py` around lines 269 -
273, The COUNT_BY_TIME branch currently uses re.search which only captures the
first "count: X" and misses additional buckets; update the block handling
ClpPackageSearchType.COUNT_RESULTS | ClpPackageSearchType.COUNT_BY_TIME to use
re.findall(r"count: (\d+)", search_result), convert all matches to integers, sum
them (or otherwise aggregate all bucket counts as required), and return the
aggregated value as a string with a trailing newline (e.g., str(total) + "\n");
keep the pytest.fail call if no matches are found.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@integration-tests/tests/package_tests/utils/search.py`:
- Around line 175-176: The code uses an assert to check the type of action.args
which can be bypassed with Python -O; replace the assert with an explicit
runtime check on action.args (e.g., inspect isinstance(action.args, SearchArgs))
and on failure raise or return the appropriate verification failure value used
by this test harness (do not rely on AssertionError). Update the handling around
args/action.args and SearchArgs so the function returns or raises the typed
verification failure the test framework expects instead of using assert.
- Around line 269-273: The COUNT_BY_TIME branch currently uses re.search which
only captures the first "count: X" and misses additional buckets; update the
block handling ClpPackageSearchType.COUNT_RESULTS |
ClpPackageSearchType.COUNT_BY_TIME to use re.findall(r"count: (\d+)",
search_result), convert all matches to integers, sum them (or otherwise
aggregate all bucket counts as required), and return the aggregated value as a
string with a trailing newline (e.g., str(total) + "\n"); keep the pytest.fail
call if no matches are found.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 07eaf020-0cb0-448f-80b7-6cc6a26a3283

📥 Commits

Reviewing files that changed from the base of the PR and between 89677f8 and 613925f.

📒 Files selected for processing (2)

integration-tests/tests/package_tests/utils/search.py
integration-tests/tests/utils/classes.py

Bill-hbrhbr

This PR doesn't actually include any search tests right? Let's include them instead of simply introducing a portion of the search test infra.
Also, most of my concerns are elaborated in the comments.

Bill-hbrhbr · 2026-05-29T07:13:03Z

+    )
+
+    args: SearchArgs = _construct_args(clp_package, dataset, search_type, wildcard_query)
+    return ClpAction(cmd=args.to_cmd(), args=args)


Suggested change

return ClpAction(cmd=args.to_cmd(), args=args)

return ClpAction.from_args(args)

Bill-hbrhbr · 2026-05-29T07:13:40Z

+        dataset.dataset_name,
+    )
+
+    args: SearchArgs = _construct_args(clp_package, dataset, search_type, wildcard_query)


again, feels like there's no need to create a helper for what feels like the meat of search_clp_package.

Bill-hbrhbr · 2026-05-29T07:15:10Z

+    returncode_result = action.verify_returncode()
+    if not returncode_result:
+        return returncode_result


Suggested change

returncode_result = action.verify_returncode()

if not returncode_result:

return returncode_result

result = action.verify_returncode()

if not result:

return result

Bill-hbrhbr · 2026-05-29T07:41:21Z

+    "single_match_wildcard_query": "\"detail\":\"Roll program complete, heads down attitude achieved for ascent\"",
+    "single_match_file": "sts-135-2011-07-08.jsonl"


This is starting to feel like these entries are more like test cases instead of metadata. Why don't we inline these queries and query results in test code directly?

Also each time you add something, you'd have to update the metadata format documentation.

Bill-hbrhbr · 2026-05-29T07:43:46Z

+class ClpPackageSearchType(Enum):
+    """Possible search types."""
+
+    BASIC = auto()
+    FILE_PATH = auto()
+    IGNORE_CASE = auto()
+    COUNT_RESULTS = auto()
+    COUNT_BY_TIME = auto()
+    TIME_RANGE = auto()


I don't really like different test case flavours hiding away as an Enum class inside a utils folder. Without looking at your existing infra, the clearest design to me would look something like:

@pytest.mark.parametrize( "clp_package", [CLP_TEXT_MODE, CLP_JSON_MODE], indirect=True, ids=[CLP_TEXT_MODE.mode_name, CLP_JSON_MODE.mode_name], ) class TestPackageSearch: def test_basic_search(self, clp_package: ClpPackage) -> None: ... def test_file_path_search(self, clp_package: ClpPackage) -> None: ... def test_ignore_case_search(self, clp_package: ClpPackage) -> None: ... def test_count_results_search(self, clp_package: ClpPackage) -> None: ... def test_count_by_time_search(self, clp_package: ClpPackage) -> None: ... def test_time_range_search(self, clp_package: ClpPackage) -> None: ...

However, now you already have a split file design. The split file design is sound if the package behaviors are meaningfully different and you want each file to document a specific package flavor.

But it becomes weak if the tests end up looking like:

test_clp_text.py → calls a generic helper

test_clp_json.py → calls the same generic helper

with all meaningful behavior hidden behind shared utilities.

A sound split file design with search tests would look more like:

test_clp_text.py

Uses text_multifile

Shows CLP/text specific command expectations

Does not pass --dataset

Does not pass --timestamp-key

Verifies plain text search behavior

test_clp_json.py

Uses json_multifile

Shows CLP-S/JSON specific command expectations

Passes --dataset

Passes --timestamp-key

Verifies structured search behavior and unstructured + log-converter.

There is an abundance of over-abstracting the package specific behavior that I'm seeing in these PRs.

A rule of thumb for this test suite could be:

Keep package lifecycle fixtures shared.

Keep low-level action, logging, and result helpers shared.

Keep subprocess calling shared.

However, keep package-specific command construction and expectations visible in test_clp_text.py and test_clp_json.py. That preserves the value of having separate test modules.

Bill-hbrhbr · 2026-05-29T07:50:13Z

+        if self.metadata.single_match_file not in self.metadata.file_names:
+            err_msg = (
+                f"`single_match_file` '{self.metadata.single_match_file}' is not listed in"
+                " `file_names`."
+            )
+            raise ValueError(err_msg)


Technically, there's no need to verify this, because if the single_match_file contains the wrong value, the search test will fail anyway.
This touches upon my previous comment on what should be metadata and what should merely be test case constants/inputs/outputs.

Add search infra.

2151c15

Docstrings.

3b62e1f

quinntaylormitchell changed the title ~~feat(integration-tests): Add search infra.~~ feat(integration-tests): Add search action and verification infra. Apr 29, 2026

quinntaylormitchell marked this pull request as ready for review April 29, 2026 23:41

quinntaylormitchell requested a review from a team as a code owner April 29, 2026 23:41

quinntaylormitchell requested a review from Bill-hbrhbr April 29, 2026 23:42

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread integration-tests/tests/package_tests/utils/search.py Outdated

Comment thread integration-tests/tests/package_tests/utils/search.py

Comment thread integration-tests/tests/package_tests/utils/search.py Outdated

Rabbit.

f9d3189

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread integration-tests/tests/package_tests/utils/search.py

quinntaylormitchell added 2 commits April 30, 2026 18:00

Make FILE_PATH search less flimsy.

89677f8

Merge branch 'main' into feature branch

f392d30

junhaoliao force-pushed the main branch from 4505f8c to 8977316 Compare May 13, 2026 22:48

Merge branch 'main' into feature branch

613925f

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Bill-hbrhbr requested changes May 29, 2026

View reviewed changes

	return ClpAction(cmd=args.to_cmd(), args=args)
	return ClpAction.from_args(args)

		"single_match_wildcard_query": "\"detail\":\"Roll program complete, heads down attitude achieved for ascent\"",
		"single_match_file": "sts-135-2011-07-08.jsonl"

Conversation

quinntaylormitchell commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr left a comment

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Bill-hbrhbr May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quinntaylormitchell commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading