Add Solr registry check to check_deprecated_lids#23
Conversation
Each deprecated LID is now verified against both the PDS REST Search API (expects 404) and the PDS Solr registry (expects numFound == 0). Failures from each source are reported separately. Adds 14 new unit tests covering query_solr error paths and the combined check_deprecated_lids behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nutjob4life
left a comment
There was a problem hiding this comment.
See comments above/below. And feel free to override/countermand them! 😁
Also, good news on the tests:
rootdir: /Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/en-ops-utils
collected 21 items
test/context/test_check_deprecated_lids.py::test_load_skips_comments_and_header PASSED [ 4%]
test/context/test_check_deprecated_lids.py::test_load_skips_blank_lines PASSED [ 9%]
test/context/test_check_deprecated_lids.py::test_load_empty_file_returns_empty_list PASSED [ 14%]
test/context/test_check_deprecated_lids.py::test_query_api_returns_404 PASSED [ 19%]
test/context/test_check_deprecated_lids.py::test_query_api_returns_200 PASSED [ 23%]
test/context/test_check_deprecated_lids.py::test_query_api_timeout PASSED [ 28%]
test/context/test_check_deprecated_lids.py::test_query_api_connection_error PASSED [ 33%]
test/context/test_check_deprecated_lids.py::test_query_solr_returns_zero PASSED [ 38%]
test/context/test_check_deprecated_lids.py::test_query_solr_returns_nonzero PASSED [ 42%]
test/context/test_check_deprecated_lids.py::test_query_solr_timeout PASSED [ 47%]
test/context/test_check_deprecated_lids.py::test_query_solr_connection_error PASSED [ 52%]
test/context/test_check_deprecated_lids.py::test_query_solr_http_error PASSED [ 57%]
test/context/test_check_deprecated_lids.py::test_query_solr_parse_error PASSED [ 61%]
test/context/test_check_deprecated_lids.py::test_query_solr_missing_key PASSED [ 66%]
test/context/test_check_deprecated_lids.py::test_all_clean_no_failures PASSED [ 71%]
test/context/test_check_deprecated_lids.py::test_api_non_404_detected_as_failure PASSED [ 76%]
test/context/test_check_deprecated_lids.py::test_solr_nonzero_detected_as_failure PASSED [ 80%]
test/context/test_check_deprecated_lids.py::test_api_network_error_recorded_as_failure PASSED [ 85%]
test/context/test_check_deprecated_lids.py::test_solr_network_error_recorded_as_failure PASSED [ 90%]
test/context/test_check_deprecated_lids.py::test_both_failures_reported_independently PASSED [ 95%]
test/context/test_check_deprecated_lids.py::test_mixed_api_results PASSED [100%]
================================================== 21 passed in 0.75s ==================================================
| return -1, f"connection error: {e}" | ||
| except requests.exceptions.HTTPError as e: | ||
| return -1, f"HTTP error: {e}" | ||
| except json.JSONDecodeError as e: |
There was a problem hiding this comment.
Depending on the underlying JSON API (for example, if simplejson is installed and certain requests versions), the decoding exception may not always be json.JSONDecodeError. You can more safely catch the superclass though:
except ValueError as e:
return -1, f"Rresponse parse error: {e}"| Tuple of (num_found, error_message). | ||
| num_found is -1 and error_message is set on network/parse errors. | ||
| """ | ||
| params = {"wt": "json", "q": f'lid:"{lid}"', "rows": 0} |
There was a problem hiding this comment.
This works, but you can make the intent ("search everything, filter by exact LID") by using fq:
params = {"wt": "json", "q": "*:*", "fq": f'lid:"{lid}"', "rows": 0}The difference is subtle, but with just q you get scoring (even if you don't want it) and builds a ful result set. With fq the scoring machinery isn't engaged and results are cached separately.
|
|
||
| lids = load_deprecated_lids(csv_path) | ||
| print(f"\n✅ All {len(lids)} deprecated LIDs correctly return 404!") | ||
| print(f"\n✅ All {len(lids)} deprecated LIDs correctly absent from REST API and Solr registry!") |
There was a problem hiding this comment.
"Correctly absent" feels just a little awkward; maybe this instead?
All {len(lids)} deprecated LIDs are absent from both the REST API and Solr registry! PARTY TIME! 🥳
Summary
check_deprecated_lids.pyto verify each deprecated LID against both the PDS REST Search API (expects HTTP 404) and the PDS Solr registry (https://pds.nasa.gov/services/search/search?wt=json&q=lid:"<lid>"&rows=0, expectsnumFound == 0)query_solr()with specific exception handling for Timeout, ConnectionError, HTTPError, JSONDecodeError, and KeyErrorTest plan
pytest test/context/test_check_deprecated_lids.py -v) — 14 new tests coveringquery_solrerror paths and the updatedcheck_deprecated_lidsbehavior