Add comprehensive integration test suite and CI workflow for gh-models CLI #90

Copilot · 2025-08-03T05:20:04Z

This PR adds a comprehensive integration test suite that validates the gh-models CLI executable against live LLM endpoints using process execution rather than mocks, plus automated CI integration.

Problem

The existing test suite only included unit tests with mocked clients, which didn't validate:

The actual compiled binary behavior
Real process execution and exit codes
File modification scenarios (e.g., prompt file augmentation)
Live endpoint integration when authenticated
End-to-end command workflows

Solution

Created a new integration test suite in integration/ that:

🏗️ Architecture

Uses //go:build integration build tags to exclude from regular test runs
Executes the actual compiled ./gh-models binary via os/exec
Provides dedicated make integration-test command with proper build dependency
Integration testing documentation moved to CONTRIBUTING.md for better discoverability

🧪 Test Coverage

Basic Commands (integration_test.go):

Help functionality and usage information
Error handling for unauthenticated scenarios
Command argument validation
Exit code verification

File Operations (file_modification_test.go):

Prompt file parsing and validation
File modification tracking for generate command
Template variable handling
Output format validation (JSON/default)
File system interaction (permissions, working directory independence)

Authentication Scenarios (authenticated_test.go):

Graceful handling of unauthenticated scenarios
Ready for live endpoint testing when gh auth login is configured
Error message validation and user guidance

🚀 CI Integration

GitHub Actions Workflow (.github/workflows/integration.yml):

Runs automatically on pushes to main branch
Includes required models: read permission for GitHub Models API
Uses make integration-test to build binary and run integration tests
Follows existing workflow patterns with proper concurrency handling

🎯 Key Features

# Integration tests are excluded from regular runs
make test        # Runs unit tests only (no integration package)

# Dedicated command for integration testing with build dependency
make integration-test   # Automatically builds binary + runs integration tests

# Exit code validation
./gh-models --help      # Exit code 0 (success)
./gh-models list        # Exit code 1 (auth required)

File Modification Validation:

// Track file changes for generate command
originalSize := getFileSize(promptFile)
runCommand("generate", promptFile)
newSize := getFileSize(promptFile)
// Validate file was augmented with test data

Live Endpoint Ready:

Tests detect authentication status automatically
Skip live tests when unauthenticated (CI-friendly)
Full validation when gh auth login is configured
Monitor actual API responses and file modifications

🔒 Safety & CI Integration

Build tag isolation prevents accidental execution
Graceful skipping when authentication unavailable
No impact on existing development workflows
Comprehensive error handling and logging
Automated CI execution on main branch pushes

The integration test suite provides confidence that the CLI binary works correctly in real-world scenarios while maintaining the fast feedback loop of existing unit tests, with automated validation on every main branch update.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

- Implement tests for Float32Ptr to validate pointer creation for float32 values. - Create tests for ExtractJSON to ensure correct extraction of JSON from various input formats. - Add tests for cleanJavaScriptStringConcat to verify string concatenation handling in JavaScript context. - Introduce tests for StringSliceContains to check for string presence in slices. - Implement tests for MergeStringMaps to validate merging behavior of multiple string maps, including overwrites and handling of nil/empty maps.

…ove unused ChatMessage type

…Pex context conversion

…ation

… tests in export_test.go - Changed modelParams from pointer to value in toGitHubModelsPrompt function for better clarity and safety. - Updated the assignment of ModelParameters to use the value directly instead of dereferencing a pointer. - Introduced a new test suite in export_test.go to cover various scenarios for GitHub models evaluation generation, including edge cases and expected outputs. - Ensured that the tests validate the correct creation of files and their contents based on the provided context and options.

- Added NewPromptPex function to create a new PromptPex instance. - Implemented Run method to execute the PromptPex pipeline with context management. - Created context from prompt files or loaded existing context from JSON. - Developed pipeline steps including intent generation, input specification, output rules, and tests. - Added functionality for generating groundtruth outputs and evaluating test results. - Implemented test expansion and rating features for improved test coverage. - Introduced error handling and logging throughout the pipeline execution.

- Implemented TestCreateContext to validate various prompt YAML configurations and their expected context outputs. - Added TestCreateContextRunIDUniqueness to ensure unique RunIDs are generated for multiple context creations. - Created TestCreateContextWithNonExistentFile to handle cases where the prompt file does not exist. - Developed TestCreateContextPromptValidation to check for valid and invalid prompt formats. - Introduced TestGithubModelsEvalsGenerate to test the generation of GitHub Models eval files with various scenarios. - Added TestToGitHubModelsPrompt to validate the conversion of prompts to GitHub Models format. - Implemented TestExtractTemplateVariables and TestExtractVariablesFromText to ensure correct extraction of template variables. - Created TestGetMapKeys and TestGetTestScenario to validate utility functions related to maps and test scenarios.

…tPex configuration

… summary generation

… improved summary reporting

…se and restore its implementation; remove obsolete promptpex.go and summary_test.go files

…covering various scenarios and error handling

…entiment analysis test prompt

…neFlags function and update flag parsing to use consistent naming

… in generate_test.go

…ck responses for sentiment analysis stages

…prompts

…odology for test generation

…derMessagesToString for message formatting

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan

@copilot rebase to main

pelikhan · 2025-08-03T05:43:12Z

Makefile

 	rm -rf ./gh-models
 .PHONY: clean
+
+integration-test:


add dependency on build step

Added build as a proper dependency to the integration-test target. The target now correctly depends on build rather than calling it explicitly. (0d67cc7)

pelikhan · 2025-08-03T05:43:38Z

integration/README.md

@@ -0,0 +1,42 @@
+# Integration Tests


Move this section to contributing.md

Moved the integration testing documentation from integration/README.md to CONTRIBUTING.md. The integration/README.md now contains just a brief description with a reference to the full documentation. (0d67cc7)

… docs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2025-08-03T05:51:44Z

@copilot rebase to main

I'm unable to perform a rebase operation due to system limitations that prevent force push operations. The changes in this PR can be rebased manually if needed, or the changes can be merged as-is to preserve the commit history.

pelikhan

@copilot Add GitHub action workflow step that builds and runs the integration tests only. Runs on pushes to main. Make sure to set the models: read permission

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2025-08-03T06:02:12Z

@copilot Add GitHub action workflow step that builds and runs the integration tests only. Runs on pushes to main. Make sure to set the models: read permission

Added GitHub Actions integration test workflow in .github/workflows/integration.yml. The workflow runs on pushes to main, includes the required models: read permission, and uses make integration-test to build and run the integration tests. (969b0a2)

pelikhan · 2025-08-03T06:17:07Z

@copilot rebase branch

Copilot · 2025-08-03T06:22:51Z

@copilot rebase branch

I'm unable to perform a rebase operation due to system limitations that prevent force push operations. The branch contains the integration test suite commits (59e0381, 0d67cc7, 969b0a2) on top of the current main branch. If you'd like to rebase, this would need to be done manually using standard git rebase commands.

pelikhan added 30 commits July 21, 2025 13:41

plumbing for commands

871788d

bringing promptpex

9e82844

Add comprehensive Copilot instructions for AI coding agents

d8fcb9d

Enhance ApplyEffortConfiguration to handle nil options gracefully

3ea7a6e

Refactor PromptPexContext to use ChatMessage from azuremodels and rem…

ef7d089

…ove unused ChatMessage type

Implement GitHub Models evaluation file generation and enhance Prompt…

96f9183

…Pex context conversion

Fix dereferencing of Frontmatter fields in GitHub Models prompt gener…

37b761c

…ation

clea content

ee90766

refactor: Remove obsolete export_test_new.go file

1c936c0

refactor: Remove obsolete output options and related tests from Promp…

292917a

…tPex configuration

feat: Add GenerateSummary function and corresponding tests for prompt…

e9c6668

… summary generation

feat: Implement runPipeline function and refactor GenerateSummary for…

5c5a167

… improved summary reporting

refactor: Rename parseTestsFromLLMResponse to ParseTestsFromLLMRespon…

b4b662f

…se and restore its implementation; remove obsolete promptpex.go and summary_test.go files

test: Add comprehensive tests for ParseTestsFromLLMResponse function …

393020f

…covering various scenarios and error handling

feat: Implement generate command with comprehensive options and add s…

6458590

…entiment analysis test prompt

refactor: Consolidate command-line flag definitions into AddCommandLi…

cdc38f1

…neFlags function and update flag parsing to use consistent naming

test: Add comprehensive tests for NewGenerateCommand and flag parsing…

bbdd748

… in generate_test.go

test: Enhance TestGenerateCommandWithValidPromptFile with detailed mo…

7dc3d7d

…ck responses for sentiment analysis stages

move test to common fodler

e812aec

feat: Update generate command description to include evaluations for …

341442f

…prompts

fix: Clarify command description to specify the use of PromptPex meth…

da294e2

…odology for test generation

fix: Update build instructions to include 'make build' command

50b853f

refactor: Rename runPipeline to RunTestGenerationPipeline and add Ren…

5018380

…derMessagesToString for message formatting

Merge remote-tracking branch 'origin/main' into pelikhan/promptpex

9391f0d

refactor: Update test prompt from sentiment analysis to joke analysis

f3f320b

fix: Disable usage help for pipeline failures in generate command

7ab63bc

pelikhan and others added 3 commits August 2, 2025 16:22

Update default tests per rule to use GetDefaultOptions function

b662738

Refactor generateTests to use TestsPerRule from GetDefaultOptions

caa8aa5

Initial plan

24d9159

Copilot AI assigned Copilot and pelikhan Aug 3, 2025

Copilot started work on behalf of pelikhan August 3, 2025 05:20 View session

Add comprehensive integration test suite for gh-models CLI

59e0381

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI requested a review from pelikhan August 3, 2025 05:38

Copilot finished work on behalf of pelikhan August 3, 2025 05:38

pelikhan reviewed Aug 3, 2025

View reviewed changes

Copilot started work on behalf of pelikhan August 3, 2025 05:46 View session

Address PR review comments: add build dependency and move integration…

0d67cc7

… docs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI requested a review from pelikhan August 3, 2025 05:54

Copilot finished work on behalf of pelikhan August 3, 2025 05:54

pelikhan reviewed Aug 3, 2025

View reviewed changes

Copilot started work on behalf of pelikhan August 3, 2025 05:57 View session

Add GitHub Actions integration test workflow

969b0a2

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~Add comprehensive integration test suite for gh-models CLI~~ Add comprehensive integration test suite and CI workflow for gh-models CLI Aug 3, 2025

Copilot AI requested a review from pelikhan August 3, 2025 06:03

Copilot finished work on behalf of pelikhan August 3, 2025 06:03

pelikhan changed the base branch from pelikhan/promptpex to main August 3, 2025 06:15

Copilot started work on behalf of pelikhan August 3, 2025 06:17 View session

pelikhan closed this Aug 3, 2025

pelikhan deleted the copilot/fix-c18cc30c-9836-4cdc-ac69-f4be9d6cc6c2 branch August 3, 2025 06:19

Copilot finished work on behalf of pelikhan August 3, 2025 06:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comprehensive integration test suite and CI workflow for gh-models CLI #90

Add comprehensive integration test suite and CI workflow for gh-models CLI #90

Uh oh!

Copilot AI commented Aug 3, 2025 •

edited

Loading

Uh oh!

pelikhan left a comment

Uh oh!

pelikhan Aug 3, 2025

Uh oh!

Copilot AI Aug 3, 2025

Uh oh!

pelikhan Aug 3, 2025

Uh oh!

Copilot AI Aug 3, 2025

Uh oh!

Copilot AI commented Aug 3, 2025

Uh oh!

pelikhan left a comment

Uh oh!

Copilot AI commented Aug 3, 2025

Uh oh!

pelikhan commented Aug 3, 2025

Uh oh!

Copilot AI commented Aug 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add comprehensive integration test suite and CI workflow for gh-models CLI #90

Add comprehensive integration test suite and CI workflow for gh-models CLI #90

Uh oh!

Conversation

Copilot AI commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

🏗️ Architecture

🧪 Test Coverage

🚀 CI Integration

🎯 Key Features

🔒 Safety & CI Integration

Uh oh!

pelikhan left a comment

Choose a reason for hiding this comment

Uh oh!

pelikhan Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

pelikhan Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Aug 3, 2025

Uh oh!

pelikhan left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Aug 3, 2025

Uh oh!

pelikhan commented Aug 3, 2025

Uh oh!

Copilot AI commented Aug 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Aug 3, 2025 •

edited

Loading