-
Notifications
You must be signed in to change notification settings - Fork 3
Add archive stats page and improve run listings #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #60 +/- ##
==========================================
- Coverage 64.30% 60.37% -3.94%
==========================================
Files 8 8
Lines 706 752 +46
==========================================
Hits 454 454
- Misses 252 298 +46 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds an Archive dashboard to visualize storage usage and makes improvements to run listing pages. The Archive page features a Chart.js-powered bar chart showing archive sizes grouped by month, backed by a new /api/archive_sizes endpoint that scans filesystem directories to calculate storage totals.
Key changes:
- New
/archiveroute and template displaying storage statistics with warnings about missing or empty archives - New
/api/archive_sizesAPI endpoint that recursively scans archive directories to calculate sizes per month - Simplified platform display logic in run listings (removed conditional Illumina prefix handling)
- Updated run detail page to show full data_uri paths instead of just filenames
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| sample_registry/templates/show_run.html | Updated data file link to display full data_uri path and point to local filesystem path |
| sample_registry/templates/browse_runs.html | Simplified platform display by removing conditional Illumina prefix logic |
| sample_registry/templates/base.html | Added Archive navigation link to header menu |
| sample_registry/templates/archive.html | New template with Chart.js visualization and collapsible warnings list for archive statistics |
| sample_registry/app.py | Added archive functionality including helper functions for date parsing and directory size calculation, plus new API endpoint and archive route |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _parsed_month(date_str: str): | ||
| for fmt in ("%Y-%m-%d", "%Y/%m/%d", "%m/%d/%y", "%m/%d/%Y"): | ||
| try: | ||
| return datetime.strptime(date_str, fmt).strftime("%Y-%m") | ||
| except ValueError: | ||
| continue | ||
| return None |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper function lacks a docstring explaining its purpose, parameters, and return value. Since it handles multiple date formats and returns None on failure, documentation would help future maintainers understand the expected behavior and supported date formats.
| <button class="btn btn-link p-0" type="button" data-bs-toggle="collapse" data-bs-target="#warningsCollapse" aria-expanded="true" aria-controls="warningsCollapse"> | ||
| Warnings | ||
| </button> |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collapse button for warnings does not have proper ARIA attributes to indicate its toggle state. While it has aria-expanded="true", this value is static and won't update when the collapse state changes. Consider using Bootstrap's data attributes properly or adding JavaScript to update the aria-expanded attribute dynamically when the collapse state changes, ensuring screen reader users are aware of the current state.
| db = SQLAlchemy(model_class=Base) | ||
| db.init_app(app) | ||
|
|
||
| ARCHIVE_ROOT = Path("/mnt/isilon/microbiome/") |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded constant ARCHIVE_ROOT = Path("/mnt/isilon/microbiome/") makes the application inflexible for different deployment environments. Consider making this configurable through an environment variable or configuration file (similar to how SQLALCHEMY_DATABASE_URI is handled) to support different deployment scenarios such as development, testing, and production environments.
| ARCHIVE_ROOT = Path("/mnt/isilon/microbiome/") | |
| # Make ARCHIVE_ROOT configurable via environment variable, defaulting to the original path | |
| ARCHIVE_ROOT = Path(os.environ.get("ARCHIVE_ROOT", "/mnt/isilon/microbiome/")) |
| def _parsed_month(date_str: str): | ||
| for fmt in ("%Y-%m-%d", "%Y/%m/%d", "%m/%d/%y", "%m/%d/%Y"): | ||
| try: | ||
| return datetime.strptime(date_str, fmt).strftime("%Y-%m") | ||
| except ValueError: | ||
| continue | ||
| return None |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name _parsed_month uses past tense which is inconsistent with Python naming conventions for functions that perform an action. Consider renaming to _parse_month to follow the convention of using imperative verbs for function names.
| def _archive_size_for_run(run, warnings): | ||
| archive_path = (ARCHIVE_ROOT / run.data_uri).parent | ||
| run_label = f"CMR{run.run_accession:06d}" | ||
|
|
||
| if not archive_path.exists(): | ||
| warnings.append(f"{run_label}: Archive path {archive_path} does not exist") | ||
| return 0 | ||
|
|
||
| if not archive_path.is_dir(): | ||
| warnings.append(f"{run_label}: Archive path {archive_path} is not a directory") | ||
| return 0 | ||
|
|
||
| total_size = 0 | ||
| for entry in archive_path.rglob("*"): | ||
| try: | ||
| if entry.is_file(): | ||
| total_size += entry.stat().st_size | ||
| except OSError as exc: | ||
| warnings.append(f"{run_label}: Error accessing {entry}: {exc}") | ||
|
|
||
| if total_size == 0: | ||
| warnings.append(f"{run_label}: Archive at {archive_path} has size 0 bytes") | ||
|
|
||
| return total_size |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function lacks a docstring explaining its purpose, parameters, and return value. Given that it performs complex filesystem operations, modifies the warnings list passed by reference, and has specific error handling behavior, documentation would greatly improve maintainability.
| @app.route("/api/archive_sizes") | ||
| def archive_sizes(): | ||
| runs = db.session.query(Run).all() | ||
| warnings = [] | ||
| totals_by_month = defaultdict(int) | ||
|
|
||
| for run in runs: | ||
| month_label = _parsed_month(run.run_date) | ||
| if not month_label: | ||
| warnings.append( | ||
| f"CMR{run.run_accession:06d}: Unable to parse run_date '{run.run_date}'" | ||
| ) | ||
|
|
||
| archive_size = _archive_size_for_run(run, warnings) | ||
|
|
||
| if month_label: | ||
| totals_by_month[month_label] += archive_size | ||
|
|
||
| by_month = [ | ||
| {"month": month, "size_bytes": totals_by_month[month]} | ||
| for month in sorted(totals_by_month.keys()) | ||
| ] | ||
|
|
||
| return jsonify({"by_month": by_month, "warnings": warnings}) |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint performs potentially expensive filesystem operations (recursive directory traversal with rglob) for all runs in the database synchronously within a single HTTP request handler. This could cause severe performance issues and request timeouts if there are many runs or large archive directories. Consider implementing this as an asynchronous background job with cached results, or adding pagination/limiting to process only a subset of runs at a time.
| .catch((error) => { | ||
| renderWarnings([`Error loading archive data: ${error}`]); | ||
| }); |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling converts the error object to a string and displays it to the user, but this could potentially expose sensitive information about the server's internal structure or filesystem. Consider using a generic error message for users while logging the detailed error server-side for debugging purposes.
| <li><strong>Lane:</strong> {{ run.lane }}</li> | ||
| <li><strong>Platform:</strong> {{ run.machine_type }} {{ run.machine_kit }}</li> | ||
| <li><strong>Data file:</strong> <a href="respublica.research.chop.edu:/mnt/isilon/microbiome/{{ run.data_uri }}">{{ run.data_uri.split('/')|last }}</a></li> | ||
| <li><strong>Data file:</strong> <a href="/mnt/isilon/microbiome/{{ run.data_uri }}">{{ run.data_uri }}</a></li> |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The anchor tag uses an href with a local filesystem path (/mnt/isilon/microbiome/). This will not work as a clickable link in a web browser since browsers cannot navigate to local filesystem paths on the server. Consider either making this plain text (removing the anchor tag) or implementing a file download endpoint if users need to access these files through the web interface.
| def _parsed_month(date_str: str): | ||
| for fmt in ("%Y-%m-%d", "%Y/%m/%d", "%m/%d/%y", "%m/%d/%Y"): | ||
| try: | ||
| return datetime.strptime(date_str, fmt).strftime("%Y-%m") | ||
| except ValueError: | ||
| continue | ||
| return None | ||
|
|
||
|
|
||
| def _archive_size_for_run(run, warnings): | ||
| archive_path = (ARCHIVE_ROOT / run.data_uri).parent | ||
| run_label = f"CMR{run.run_accession:06d}" | ||
|
|
||
| if not archive_path.exists(): | ||
| warnings.append(f"{run_label}: Archive path {archive_path} does not exist") | ||
| return 0 | ||
|
|
||
| if not archive_path.is_dir(): | ||
| warnings.append(f"{run_label}: Archive path {archive_path} is not a directory") | ||
| return 0 | ||
|
|
||
| total_size = 0 | ||
| for entry in archive_path.rglob("*"): | ||
| try: | ||
| if entry.is_file(): | ||
| total_size += entry.stat().st_size | ||
| except OSError as exc: | ||
| warnings.append(f"{run_label}: Error accessing {entry}: {exc}") | ||
|
|
||
| if total_size == 0: | ||
| warnings.append(f"{run_label}: Archive at {archive_path} has size 0 bytes") | ||
|
|
||
| return total_size | ||
|
|
||
|
|
||
| @app.route("/api/archive_sizes") | ||
| def archive_sizes(): | ||
| runs = db.session.query(Run).all() | ||
| warnings = [] | ||
| totals_by_month = defaultdict(int) | ||
|
|
||
| for run in runs: | ||
| month_label = _parsed_month(run.run_date) | ||
| if not month_label: | ||
| warnings.append( | ||
| f"CMR{run.run_accession:06d}: Unable to parse run_date '{run.run_date}'" | ||
| ) | ||
|
|
||
| archive_size = _archive_size_for_run(run, warnings) | ||
|
|
||
| if month_label: | ||
| totals_by_month[month_label] += archive_size | ||
|
|
||
| by_month = [ | ||
| {"month": month, "size_bytes": totals_by_month[month]} | ||
| for month in sorted(totals_by_month.keys()) | ||
| ] | ||
|
|
||
| return jsonify({"by_month": by_month, "warnings": warnings}) | ||
|
|
||
|
|
||
| @app.route("/archive") | ||
| def archive(): | ||
| return render_template("archive.html") |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new archive-related functionality lacks test coverage. There are no tests for the archive_sizes API endpoint, _parsed_month helper function, or _archive_size_for_run function. Given that the repository has comprehensive test coverage for other modules (as seen in tests/test_registrar.py, tests/test_register.py), this new functionality should have corresponding tests to verify date parsing, filesystem operations, error handling, and API response structure.
| @app.route("/api/archive_sizes") | ||
| def archive_sizes(): | ||
| runs = db.session.query(Run).all() | ||
| warnings = [] | ||
| totals_by_month = defaultdict(int) | ||
|
|
||
| for run in runs: | ||
| month_label = _parsed_month(run.run_date) | ||
| if not month_label: | ||
| warnings.append( | ||
| f"CMR{run.run_accession:06d}: Unable to parse run_date '{run.run_date}'" | ||
| ) | ||
|
|
||
| archive_size = _archive_size_for_run(run, warnings) | ||
|
|
||
| if month_label: | ||
| totals_by_month[month_label] += archive_size | ||
|
|
||
| by_month = [ | ||
| {"month": month, "size_bytes": totals_by_month[month]} | ||
| for month in sorted(totals_by_month.keys()) | ||
| ] | ||
|
|
||
| return jsonify({"by_month": by_month, "warnings": warnings}) |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API endpoint /api/archive_sizes does not handle errors gracefully. If the database query fails or if there are critical filesystem errors, the endpoint will return a 500 error without a proper JSON response. Consider adding try-except blocks around the main logic and returning appropriate error responses with meaningful error messages in JSON format (e.g., {"error": "message"}) and appropriate HTTP status codes.
Summary
Testing
Codex Task