Skip to content

Conversation

@hanwen-cluster
Copy link
Contributor

@hanwen-cluster hanwen-cluster commented Oct 7, 2025

Description of changes

Add two new report generators to analyze integration test metrics from DynamoDB:

  • generate_launch_time_report: Queries ParallelCluster-IntegTest-Metadata table to analyze cluster creation time and compute node launch times. Generates statistics grouped by OS and test name with time-windowed aggregation to achieve consistent result with os rotation. This is done every 5 days to avoid querying the whole test database everyday.

  • generate_performance_report: Queries ParallelCluster-PerformanceTest-Metadata table to track performance data (OSU, StarCCM) by node count over time. This is done everyday because the database is separate and smaller. The execution time is around 30 seconds.

Both generators output Excel reports using pandas/openpyxl for easy analysis and visualization. Reports are automatically generated when JSON reports are requested via test_runner.py.

Tests

We are able to generate excel files at the end of integration tests

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


print(f"Excel file saved: {filename}")

def _get_launch_time(logs, instance_id):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note test

Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.
@hanwen-cluster hanwen-cluster changed the title [Draft] Retrieve test historical data Add launch time and performance report generation for integration tests Jan 7, 2026
@hanwen-cluster hanwen-cluster marked this pull request as ready for review January 7, 2026 20:49
@hanwen-cluster hanwen-cluster requested review from a team as code owners January 7, 2026 20:49
@hanwen-cluster hanwen-cluster changed the title Add launch time and performance report generation for integration tests [integ-tests-framework] Add launch time and performance report generation for integration tests Jan 7, 2026
if last_evaluated_key:
scan_params["ExclusiveStartKey"] = last_evaluated_key

response = dynamodb_client.scan(**scan_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanning the dynamodb table to get a year's worth of data every day seems like an overkill, especially when we are not analyzing it every day.

I would suggest to run this section of getting the data in XSLX maybe once a week or month?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! I amended the code to get launch time Excel every 5 days. I will still get the performance data Excel everyday. Because the source dynamoDB table for launch time is much larger than the dynamoDB table for performance. For performance data report, it only takes 30 seconds. We can keep it everyday

@hanwen-cluster hanwen-cluster force-pushed the developoct7 branch 3 times, most recently from 8a3c919 to eea7e26 Compare January 12, 2026 22:14
@hanwen-cluster hanwen-cluster added the skip-changelog-update Disables the check that enforces changelog updates in PRs label Jan 12, 2026
Add two new report generators to analyze integration test metrics from DynamoDB:

- generate_launch_time_report: Queries ParallelCluster-IntegTest-Metadata table to analyze cluster creation time and compute node launch times. Generates statistics grouped by OS and test name with time-windowed aggregation to achieve consistent result with os rotation. This is done every 5 days to avoid querying the whole test database everyday.

- generate_performance_report: Queries ParallelCluster-PerformanceTest-Metadata table to track performance data (OSU, StarCCM) by node count over time. This is done everyday because the database is separate and smaller. The execution time is around 30 seconds.

Both generators output Excel reports using pandas/openpyxl for easy analysis and visualization. Reports are automatically generated when JSON reports are requested via test_runner.py.
@hanwen-cluster hanwen-cluster enabled auto-merge (rebase) January 13, 2026 18:26
@hanwen-cluster hanwen-cluster merged commit bc081b9 into aws:develop Jan 13, 2026
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog-update Disables the check that enforces changelog updates in PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants