implemented NISAR adapation and provide template for future project adaptation by pymonger · Pull Request #1 · hysds/metrics_extractor

pymonger · 2025-10-10T15:23:52Z

🚀 Pull Request: NISAR Project Adaptation and Enhanced Metrics Extraction

📋 Overview

This PR introduces a comprehensive NISAR-specific adaptation of the HySDS metrics extractor, providing enhanced hierarchical job breakdown capabilities and specialized execution time analysis. The changes organize the codebase for scalability while maintaining backward compatibility with the original functionality.

🎯 Key Features

1. NISAR-Specific Enhancements

Hierarchical Job Breakdown: Three-level breakdown of SCIFLO_RSLC jobs by:
- Primary: NISAR beam modes (e.g., L_40_DH_05_DH, L_20_QP_05_QP)
- Secondary: Coverage type (full, partial)
- Tertiary: Acquisition mode (individual, mixed)
Execution Time Analysis: Specialized wall_time metrics analysis with PCM container runtime tracking
Regex Pattern Matching: Configurable job ID parsing using NISAR beam mode patterns

2. Project Organization

Dedicated NISAR Directory: Clean separation of NISAR-specific adaptations
Project Template: Reusable template for future project adaptations
Quick Start Guide: Easy-to-follow usage examples
Comprehensive Documentation: Technical details and usage instructions

3. Code Quality Improvements

Import Path Fixes: Proper module imports for scripts in subdirectories
Code Reusability: Moved imports to enable function reuse across scripts
Security: Removed all hardcoded IP addresses and credentials
Clean History: Rewritten git history without sensitive information

📁 File Changes

File	Change Type	Description
`.gitignore`	NEW	Comprehensive ignore rules for Python, macOS, CSV files, and common artifacts
`PROJECT_TEMPLATE.md`	NEW	Template for creating new project adaptations
`QUICK_START.md`	NEW	Quick reference guide for all available scripts
`nisar/`	NEW	Dedicated directory for NISAR-specific scripts and configurations
`nisar/hysds_metrics_es_extractor_enhanced.py`	NEW	Enhanced metrics extractor with hierarchical breakdown
`nisar/job_execution_time_extractor.py`	NEW	Specialized execution time analyzer
`nisar/NISAR_MIXED_MODES_CONFIG_20200101T000000_01.json`	NEW	NISAR beam mode patterns and configurations
`nisar/README.md`	NEW	NISAR project documentation
`nisar/README_ENHANCED.md`	NEW	Detailed technical documentation
`metrics_extractor/hysds_metrics_es_extractor.py`	MODIFIED	Moved imports to top for reusability

🔧 Technical Details

Enhanced Metrics Extractor

Three-Level Hierarchical Breakdown: beam_name → coverage → acquisition_mode
Regex Pattern Matching: _(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_
Comprehensive Metrics: Average runtime, counts, instance types per breakdown level
CSV Export: Structured output with hierarchical columns

Execution Time Extractor

Wall Time Analysis: Extracts wall_time from job.job_info.metrics.usage_stats
Dual Metrics:
- execution_time_minutes: Lesser of two wall_time values
- pcm_container_runtime_m: Larger of two wall_time values
Statistical Analysis: Min, max, average execution times per breakdown level

Security & Best Practices

No Hardcoded Credentials: All authentication handled via runtime prompts
Placeholder URLs: Documentation uses generic placeholders instead of real IPs
Clean Git History: No sensitive information in commit history
Comprehensive .gitignore: Prevents accidental commits of generated files

🚀 Usage Examples

NISAR Enhanced Metrics

cd nisar/
python hysds_metrics_es_extractor_enhanced.py \
  -u https://your-es-instance/mozart_es/logstash-*/_search \
  -b 56 \
  --breakdown_job "job-SCIFLO_RSLC:pcm_r4.0.7_pge_r4.1.0" \
  --nisar_config NISAR_MIXED_MODES_CONFIG_20200101T000000_01.json

NISAR Execution Time Analysis

cd nisar/
python job_execution_time_extractor.py \
  -u https://your-es-instance/mozart_es/logstash-*/_search \
  -b 56 \
  --breakdown_job "job-SCIFLO_RSLC:pcm_r4.0.7_pge_r4.1.0"

📊 Output Files

Hierarchical Breakdown CSV: job_three_level_breakdown_job_SCIFLO_RSLC_*.csv
Execution Time CSV: job_execution_times_job_SCIFLO_RSLC_*.csv
Standard Metrics: Original functionality preserved

🔮 Future Adaptations

The project template (PROJECT_TEMPLATE.md) provides a clear pattern for adapting this structure to other projects:

Create Project Directory: your_project/
Copy Base Scripts: Enhanced extractor and execution time analyzer
Update Configurations: Regex patterns, job types, and documentation
Customize Analysis: Project-specific metrics and breakdowns

✅ Testing

✅ Import Paths: Scripts work correctly from nisar/ directory
✅ Authentication: Secure credential prompting implemented
✅ CSV Generation: Files created in CWD as expected
✅ Documentation: All examples use placeholder URLs
✅ Git History: Clean history without sensitive information

📈 Impact

+3,520 lines added, -538 lines removed
11 files changed
7 commits with clean, logical progression
Zero breaking changes to existing functionality
Enhanced scalability for future project adaptations

🎯 Ready for Review

This PR provides a solid foundation for NISAR-specific analysis while establishing patterns for future project adaptations. All security concerns have been addressed, documentation is comprehensive, and the codebase maintains backward compatibility.

Branch: feature/nisar-adaptation
Target: main
Author: pymonger pymonger@gmail.com

Note

Medium Risk
Moderate risk due to substantial new querying/aggregation code paths (including PIT/scroll pagination) and a script that auto-installs openpyxl at runtime, which could affect reproducibility and operational safety; existing extractor behavior appears mostly unchanged aside from refactoring/formatting.

Overview
Adds a new nisar/ toolkit that extends the metrics extractor with NISAR-focused analyses: a three-level hierarchical breakdown of SCIFLO_RSLC/SCIFLO_INSAR jobs by parsing job_id (beam/mode → coverage → acquisition mode), plus execution-time extraction based on usage_stats.wall_time and a PGE-by-data-day execution-time extractor.

Introduces shared Elasticsearch pagination (search_after+PIT with scroll fallback) to reliably fetch results beyond the 10k hit limit, and adds an Excel-based version comparison script that matches jobs via match_id and outputs a multi-sheet report.

Improves repo usability with new .gitignore, quick-start and project adaptation template docs, and minor refactoring/formatting in metrics_extractor/hysds_metrics_es_extractor.py to make imports reusable from the new scripts.

^{Written by Cursor Bugbot for commit 3225654. This will update automatically on new commits. Configure here.}

- Move json, logging, requests, sys, getpass, datetime, and urllib imports to top - Remove duplicate imports from main block - Enables functions to be imported and used by other scripts

- Move NISAR-specific scripts to dedicated nisar/ directory - Create comprehensive project organization with hierarchical breakdown - Add job execution time extractor with wall_time analysis - Create PROJECT_TEMPLATE.md for future project adaptations - Add QUICK_START.md for easy reference - Clean up obsolete debugging and testing scripts - Update all documentation to reflect new structure

- Add sys.path.append to correctly import from parent metrics_extractor directory - Fixes ModuleNotFoundError when running scripts from nisar/ directory - Both hysds_metrics_es_extractor_enhanced.py and job_execution_time_extractor.py updated

- CSV files are created in CWD when scripts are run - Empty generated_csv_files directory was not needed - Simplifies directory structure

…ctory - CSV files are created in CWD when scripts run - Remove unnecessary directory references from README.md - Clean up directory structure documentation

- Ignore macOS .DS_Store files - Ignore Python __pycache__ directories and compiled files - Ignore CSV files generated by scripts - Ignore common IDE, backup, and temporary files - Ignore Python virtual environments and build artifacts - Comprehensive coverage of commonly ignored files/directories

- Remove trailing underscore from regex pattern to handle job IDs that continue after beam name - Pattern now matches: _full_individual_L_20_QP_05_QP_0_state-config-... - Previously only matched: _full_individual_L_20_QP_05_QP_ - Fixes issue where INSAR jobs were not being processed due to strict regex - Tested with example INSAR job ID: SCIFLO_INSAR__pcm_r4.0.7_pge_r4.1.0-network_pair_006_040_012_full_individual_L_20_QP_05_QP_0_state-config-20251010T201535.245443Z

cursor · 2025-10-10T21:01:50Z

+```bash
+cp /Users/gmanipon/dev/metrics_extractor/nisar/hysds_metrics_es_extractor_enhanced.py /Users/gmanipon/dev/metrics_extractor/your_project/
+cp /Users/gmanipon/dev/metrics_extractor/nisar/job_execution_time_extractor.py /Users/gmanipon/dev/metrics_extractor/your_project/
+```


Bug: Template Contains Hardcoded Developer Paths

The PROJECT_TEMPLATE.md includes hardcoded personal development paths, like /Users/gmanipon/dev/metrics_extractor/. These paths make the template non-portable and expose specific developer environment details, which could hinder its general usability for other developers.

cursor · 2025-12-04T00:05:04Z

+        patterns = {
+            "beam_name": r"_(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_",  # Primary: beam_name
+            "coverage": r"_(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_",  # Secondary: coverage
+            "acquisition_mode": r"_(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_",  # Tertiary: acquisition_mode


Bug: Regex excludes S-band jobs, only matches L-band

The beam_name regex pattern uses L_\d{2}_\w{2}_\d{2}_\w{2} which only matches L-band modes (e.g., L_20_DH_05_DH), but the NISAR mission includes both L-band and S-band modes. The documentation in README_ENHANCED.md correctly states the regex should be [LS]_\d{2}_\w{2}_\d{2}_\w{2} to match both bands, and the load_nisar_config function also correctly uses this pattern. However, the parse_job_id_patterns function only matches L-band, causing all S-band job metrics (e.g., S_37_QP_00_NA) to be silently skipped. This affects both extractor scripts.

Additional Locations (1)

nisar/job_execution_time_extractor.py#L119-L122

cursor · 2026-02-24T16:20:25Z

+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db


Duplicate entries in gitignore file

Low Severity

The .gitignore file contains duplicate entries. Lines 2-8 (.DS_Store, .DS_Store?, ._*, .Spotlight-V100, .Trashes, ehthumbs.db, Thumbs.db) are repeated verbatim at lines 126-132. Line 116 (*~) is also duplicated at line 123. These duplicates clutter the file without adding value and could cause confusion during maintenance.

cursor · 2026-02-24T16:20:25Z

 __version__ = "1.2.3"
 __url__ = "https://github.com/hysds/metrics_extractor"
-__description__ = "HySDS Metrics Extractor Tool"
+__description__ = "HySDS Metrics Extractor Tool"


No-op modification in module init file

Low Severity

The __init__.py file shows a deletion and re-addition of the same __description__ line with identical content. The diff indicates line 8 was deleted and line 9 added, but both contain __description__ = "HySDS Metrics Extractor Tool". This is a whitespace-only or no-op change that adds no value and may have been committed accidentally.

cursor · 2026-02-24T16:20:25Z

+            continue
+
+        pge_time = min(wall_times[0], wall_times[1]) / (1e9 * 60)  # nanoseconds to minutes
+        pcm_time = max(wall_times[0], wall_times[1]) / (1e9 * 60)


Unguarded array access in pge_execution_time_extractor

Medium Severity

The code checks len(wall_times) < 2 and continues if true, but then directly accesses wall_times[0] and wall_times[1] without considering that wall_times might have more than 2 elements. If there are 3 or more wall_time values, only the first two are used without documenting this behavior or warning about extra values, which could hide data quality issues.

cursor · 2026-02-24T16:20:25Z

+            continue
+
+        pge_time = min(wall_times[0], wall_times[1]) / (1e9 * 60)  # nanoseconds to minutes
+        pcm_time = max(wall_times[0], wall_times[1]) / (1e9 * 60)


Missing None value validation for wall_time

High Severity

The list comprehension filters for the presence of the wall_time key but doesn't verify the value isn't None. If wall_time exists but has a None value, it gets added to the wall_times list, then the arithmetic operations at lines 235-236 will crash with TypeError: unsupported operand type(s) for /: 'NoneType' and 'float' when attempting to divide None by 1e9 * 60.

cursor · 2026-02-24T16:20:26Z

+                match_id = row[1]
+                instance_type = row[3]
+                pge_time = float(row[4])
+                pcm_time = float(row[5])


Missing bounds check for CSV row parsing

Medium Severity

The CSV parsing code accesses row indices 0 through 5 without checking if the row has at least 6 elements. If the CSV format is slightly malformed or has missing columns, this will raise an IndexError and crash the script. Only line 70 has a length check for a different row type, but the job details parsing lacks this protection.

cursor · 2026-02-24T16:20:26Z

+    subprocess.check_call(['pip', 'install', 'openpyxl', '-q'])
+    import openpyxl
+    from openpyxl.styles import Font, PatternFill, Border, Side
+    from openpyxl.chart import BarChart, Reference


Automatic package installation without user consent

Medium Severity

The script automatically installs the openpyxl package using subprocess.check_call(['pip', 'install', 'openpyxl', '-q']) when the import fails, without asking for user permission. This modifies the user's Python environment without consent, which is a security and usability concern, especially in production or restricted environments where pip install might fail or be disallowed.

Six queries across 3 NISAR scripts used size:10000 with no truncation detection, risking silently incomplete results for wide time ranges. Replace with search_after + Point In Time pagination via a new shared es_pagination module. Also clean up redundant Kibana boilerplate (match_all, empty should/must_not) from queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-02-24T17:36:07Z

+                if breakdown_value not in breakdown_groups:
+                    breakdown_groups[breakdown_value] = []
+                breakdown_groups[breakdown_value].append(job_id)
+


Breakdown extraction uses wrong capture group

Medium Severity

get_job_id_breakdown_aggregation() sets breakdown_value = match.group(1) even though the configured patterns rely on named groups (coverage, acquisition_mode, beam_name). With the current regex, group(1) corresponds to coverage, so breakdowns for other fields can be mis-grouped even when the match succeeds.

L0B execution time reports now include per-job RCID (radar config ID), diagnostic mode flag (0/1/2/cal), beam mode, and PCM/PGE version. Stats are grouped by (version, instance_type, RCID, diagnostic_mode_flag) for L0B; other PGE types are unchanged (new CSV columns present but empty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document the --modes_config argument, L0B-specific extraction examples, and the new CSV columns (rcid, diagnostic_mode_flag, beam_mode, version). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

es_pagination.py: Auto-detect backend by trying the Elasticsearch PIT endpoint first (POST /{index}/_pit), then falling back to the OpenSearch endpoint (POST /_search/point_in_time) with the corresponding close API. pge_execution_time_extractor.py: Replace deprecated datetime.utcnow() with datetime.now(timezone.utc). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

If both ES and OpenSearch PIT endpoints fail (e.g. older OpenSearch clusters), fall back to the universally supported scroll API for pagination beyond 10K hits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Include the raw product ID in the detail section so we can see the actual filenames for jobs where RCID extraction fails (dtid jobs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CRSD products have a different filename format without the RCID field. Only process RRSD products for L0B metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add RCID, diagnostic mode flag, and version breakdown to L0B metrics

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-25T17:28:58Z

+                match_id = row[1]
+                instance_type = row[3]
+                pge_time = float(row[4])
+                pcm_time = float(row[5])


CSV reader uses wrong column indices for time values

High Severity

The read_csv function uses hardcoded indices row[4] and row[5] for pge_time and pcm_time, but according to the documented CSV format in nisar/README.md, the Individual Job Details header is job_id,match_id,data_day,instance_type,version,rcid,diagnostic_mode_flag,beam_mode,pge_execution_time_min,pcm_container_time_min. In that format, row[4] is the version string and row[5] is the rcid — the actual time columns are at indices 8 and 9. Calling float() on a version string like "pcm_r05.00.1_pge_r05.00.5.1" would raise a ValueError, crashing the tool. The header variable is captured but never used for column lookups.

Additional Locations (1)

nisar/README.md#L86-L87

cursor · 2026-02-25T17:28:58Z

+### 2. Copy Base Scripts
+```bash
+cp /Users/gmanipon/dev/metrics_extractor/nisar/hysds_metrics_es_extractor_enhanced.py /Users/gmanipon/dev/metrics_extractor/your_project/
+cp /Users/gmanipon/dev/metrics_extractor/nisar/job_execution_time_extractor.py /Users/gmanipon/dev/metrics_extractor/your_project/


Hardcoded local paths in project template documentation

Medium Severity

The project template contains hardcoded local filesystem paths (/Users/gmanipon/dev/metrics_extractor/) in several code examples and reference sections. These developer-specific paths won't work for anyone else and leak the author's local directory structure. The template is intended to be generic guidance for future projects but has absolute paths baked in.

Additional Locations (1)

PROJECT_TEMPLATE.md#L129-L130

cursor · 2026-02-25T17:28:58Z

+        logging.error(f"missing argument breakdown_job")
+        raise Exception(f"missing argument breakdown_job")
+
+    nisar_config = args["nisar_config"]


CLI argument --nisar_config is accepted but never used

Medium Severity

The --nisar_config CLI argument is accepted and passed to get_job_breakdown_metrics as nisar_config_path, but that function completely ignores it (the docstring even says "not used in this version"). The load_nisar_config function is defined but never called anywhere. This means the NISAR config file is never loaded, and users following the documented usage in QUICK_START.md and README_ENHANCED.md (which instruct passing --nisar_config) would have no idea their config has no effect on the analysis.

Additional Locations (1)

nisar/hysds_metrics_es_extractor_enhanced.py#L866-L877

cursor · 2026-02-25T17:28:58Z

+    for pattern_name, extractions in sample_extractions.items():
+        logging.info(f"  {pattern_name}: {extractions}")
+
+    return patterns, sample_extractions


Duplicated functions across two NISAR scripts

Low Severity

get_sample_job_ids and parse_job_id_patterns are fully duplicated between job_execution_time_extractor.py and hysds_metrics_es_extractor_enhanced.py. The implementations are character-for-character identical (except the enhanced version has an unused nisar_modes parameter on parse_job_id_patterns). Both files already import from shared modules via sys.path manipulation, so these functions could live in a shared location. A bug fix in one copy would need to be manually replicated to the other.

Additional Locations (1)

nisar/hysds_metrics_es_extractor_enhanced.py#L50-L186

Extends the three-level breakdown to all PGE types and adds an orchestrator that produces a complete set of deliverables (per-PGE CSVs, unified aggregated CSV, multi-sheet Excel workbook) for any CRID. Changes: - hysds_metrics_es_extractor_enhanced.py: accept GSLC, GCOV, and L3_SM in addition to RSLC and INSAR. Their job_ids carry the same _<coverage>_<acquisition_mode>_<beam_name>_ pattern, so the existing regex applies without modification. - l0b_three_level_breakdown.py (new): L0B-specific three-level breakdown (rcid -> beam_mode -> diagnostic_mode). Needed because L0B runs pre-focus and its job_id lacks beam info; breakdown is derived from the L0B product filename with rcid -> beam_mode mapped via the NISAR_MIXED_MODES_CONFIG. - combine_breakdowns.py (new): merges per-PGE breakdown CSVs into a single aggregated CSV with a unified schema. The two breakdown conventions (standard vs L0B) are projected onto generic level_1/ level_2/level_3 columns while preserving the native columns. - build_crid_workbook.py (new): produces a multi-sheet Excel workbook with a Summary sheet (per-PGE totals + bar chart), "All PGEs (unified)" sheet, and one sheet per PGE type with its native schema. - run_crid_report.sh (new): orchestrator that takes --crid, --cluster, --es_url, --netrc_os, --version, and --days_back, then chains the extractor, L0B script, combiner, and workbook builder. Reads OpenSearch credentials from the netrc-os file. - README.md: add Quickstart, new sections for L0B breakdown, combiner, workbook builder, and CRID report orchestrator. Update directory listing and section 3 to reflect broader PGE support. Usage example: ./nisar/run_crid_report.sh --crid X05013 --cluster POP1 --es_url "https://localhost:9202/logstash-*/_search" --netrc_os /path/to/netrc-os-<cluster> --version r05.01.3 --days_back 200 --output_dir ./output

…adata In r05.01.4 the SCIFLO_{RSLC,GSLC,GCOV,INSAR,L3_SM} job_id changed from embedding the long-form product name (e.g. `..._full_individual_L_22_DH_20_DH_...`) to encoding the state-config id (`SCIFLO_RSLC__release-r05.01.4-track_frame_<cycle>_<track>_<frame>_<v>_state-config-<ts>-<ts>`). The existing `_(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_` regex no longer matches, so pattern analysis returned `[]` for every dimension and the three-level breakdown produced no rows. Add a `--grq_url` arg. When set and the job_type is a GRQ-product PGE, take `job.job_info.metrics.products_staged[*].id` from each job_info doc, bulk-look-up `metadata.BeamName`, `metadata.FrameCoverage`, and `metadata.isJointObservation` in GRQ via an `ids` query, and group on those. `isJointObservation` true/false maps to mixed/individual. The legacy regex path is unchanged when `--grq_url` is omitted, so older releases and non-product job types (L0B etc.) keep working. Verified on POP1 X05014 over 30 days: 657/657 staged RSLC products resolved, 22 (beam x coverage x acquisition_mode) groups written.

Forgot to thread the new flag through the orchestrator after adding it to hysds_metrics_es_extractor_enhanced.py in 7d869a7. Without it the RSLC/GSLC/GCOV/INSAR/L3_SM PGEs still fall back to the legacy regex path and produce zero breakdown rows on r05.01.4+. When --grq_url is omitted, the extractor is invoked exactly as before, so pre-r05.01.4 runs are unchanged.

On modern NISAR clusters ES_CLUSTER_MODE=true and es-mozart/es-grq/ es-metrics all front the same 3-node OS cluster, so the metrics-ES endpoint already serves grq_* indices. Requiring users to set up a second tunnel and pass --grq_url for the common case was friction with no benefit. When --grq_url is omitted and the job_type is a GRQ-product PGE, derive the GRQ base URL from the --es_url scheme+netloc and log the choice. For non-clustered deployments where GRQ is a separate ES instance, --grq_url remains available as an explicit override. Verified on POP1: run with --es_url=https://es-metrics:9200/logstash-*/_search and no --grq_url resolves 702/702 RSLC products through the auto-derived https://es-metrics:9200 base.

pymonger added 7 commits October 10, 2025 08:08

Move imports to top of file for reusability

ea4faff

- Move json, logging, requests, sys, getpass, datetime, and urllib imports to top - Remove duplicate imports from main block - Enables functions to be imported and used by other scripts

update

e176b94

Remove unnecessary generated_csv_files directory from nisar

07db88f

- CSV files are created in CWD when scripts are run - Empty generated_csv_files directory was not needed - Simplifies directory structure

Update documentation to remove references to generated_csv_files dire…

cb7cb9c

…ctory - CSV files are created in CWD when scripts run - Remove unnecessary directory references from README.md - Clean up directory structure documentation

pymonger requested a review from hookhua October 10, 2025 15:25

This comment was marked as outdated.

Sign in to view

pymonger added 2 commits October 10, 2025 13:53

support INSAR

b5e4ea0

cursor Bot reviewed Oct 10, 2025

View reviewed changes

add new tools for comparing PGE execution between different versions

e08beba

cursor Bot reviewed Dec 4, 2025

View reviewed changes

remove

4fe6ded

cursor Bot reviewed Feb 24, 2026

View reviewed changes

pymonger and others added 7 commits February 24, 2026 16:39

Update README with L0B usage examples and new CSV columns

acf0260

Document the --modes_config argument, L0B-specific extraction examples, and the new CSV columns (rcid, diagnostic_mode_flag, beam_mode, version). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add scroll API fallback when PIT is not supported

889b71b

If both ES and OpenSearch PIT endpoints fail (e.g. older OpenSearch clusters), fall back to the universally supported scroll API for pagination beyond 10K hits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add product_id to detail CSV rows for diagnostics

47462ed

Include the raw product ID in the detail section so we can see the actual filenames for jobs where RCID extraction fails (dtid jobs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Skip L0B CRSD products (dtid jobs) in metrics extraction

643c43d

CRSD products have a different filename format without the RCID field. Only process RRSD products for L0B metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge pull request #2 from hysds/feature/l0b-rcid-version-breakdown

3225654

Add RCID, diagnostic mode flag, and version breakdown to L0B metrics

cursor Bot reviewed Feb 25, 2026

View reviewed changes

pymonger added 4 commits April 23, 2026 13:20

Conversation

pymonger commented Oct 10, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Pull Request: NISAR Project Adaptation and Enhanced Metrics Extraction

📋 Overview

🎯 Key Features

1. NISAR-Specific Enhancements

2. Project Organization

3. Code Quality Improvements

📁 File Changes

🔧 Technical Details

Enhanced Metrics Extractor

Execution Time Extractor

Security & Best Practices

🚀 Usage Examples

NISAR Enhanced Metrics

NISAR Execution Time Analysis

📊 Output Files

🔮 Future Adaptations

✅ Testing

📈 Impact

🎯 Ready for Review

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor Bot Oct 10, 2025

Choose a reason for hiding this comment

Bug: Template Contains Hardcoded Developer Paths

Uh oh!

cursor Bot Dec 4, 2025

Choose a reason for hiding this comment

Bug: Regex excludes S-band jobs, only matches L-band

Uh oh!

cursor Bot Feb 24, 2026

Choose a reason for hiding this comment

Duplicate entries in gitignore file

Uh oh!

cursor Bot Feb 24, 2026

Choose a reason for hiding this comment

No-op modification in module init file

Uh oh!

cursor Bot Feb 24, 2026

Choose a reason for hiding this comment

Unguarded array access in pge_execution_time_extractor

Uh oh!

cursor Bot Feb 24, 2026

Choose a reason for hiding this comment

Missing None value validation for wall_time

Uh oh!

cursor Bot Feb 24, 2026

Choose a reason for hiding this comment

Missing bounds check for CSV row parsing

Uh oh!

cursor Bot Feb 24, 2026

Choose a reason for hiding this comment

Automatic package installation without user consent

Uh oh!

cursor Bot Feb 24, 2026

Choose a reason for hiding this comment

Breakdown extraction uses wrong capture group

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Feb 25, 2026

Choose a reason for hiding this comment

CSV reader uses wrong column indices for time values

Uh oh!

cursor Bot Feb 25, 2026

Choose a reason for hiding this comment

Hardcoded local paths in project template documentation

Uh oh!

cursor Bot Feb 25, 2026

Choose a reason for hiding this comment

CLI argument --nisar_config is accepted but never used

Uh oh!

cursor Bot Feb 25, 2026

Choose a reason for hiding this comment

Duplicated functions across two NISAR scripts

Uh oh!

pymonger commented Oct 10, 2025 •

edited by cursor Bot

Loading

CLI argument `--nisar_config` is accepted but never used