Skip to content

[e2e] Add nightly e2e test for submitting examples to flink standalone cluster#708

Open
matrixsparse wants to merge 1 commit into
apache:mainfrom
matrixsparse:feature/e2e-test-flink-standalone
Open

[e2e] Add nightly e2e test for submitting examples to flink standalone cluster#708
matrixsparse wants to merge 1 commit into
apache:mainfrom
matrixsparse:feature/e2e-test-flink-standalone

Conversation

@matrixsparse
Copy link
Copy Markdown
Contributor

Purpose of change

Add automated e2e test for submitting Java/Python quickstart examples to a Flink standalone cluster, replacing the current manual verification process before each release.

Closes #642

Changes

  • e2e-test/test-scripts/test_submit_examples_to_flink.sh: Test script that installs Flink via install.sh, starts a standalone cluster, submits all 6 examples (3 Java + 3 Python), verifies submission success, and cleans up.
  • .github/workflows/nightly-e2e.yml: Nightly GitHub Actions workflow that runs the test daily at UTC 00:00, with manual trigger support.

Key design decisions

  • Uses tools/install.sh --non-interactive (from [tools]Import Wizard for Installation Setup #599) for Flink installation
  • Validates job submission success (not full execution), since examples depend on LLM APIs
  • Each example tested independently; one failure doesn't block others
  • Flink logs archived as artifacts on failure for debugging

@matrixsparse
Copy link
Copy Markdown
Contributor Author

Hi @wenjin272, this PR implements the CI pipeline for #642 as discussed. Could you PTAL when you have time?

@matrixsparse matrixsparse force-pushed the feature/e2e-test-flink-standalone branch from 8189bc8 to 704e45c Compare May 26, 2026 17:23
@github-actions github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 26, 2026
Copy link
Copy Markdown
Collaborator

@weiqingy weiqingy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on — script reads cleanly. A few questions inline.

on:
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nightly + manual dispatch means a regression in examples/**, python/flink_agents/examples/**, or tools/install.sh can sit undetected for up to 24h. Would a path-filtered pull_request: trigger for those paths make sense here, with the cron staying as the safety net for transitive-dep changes? The Flink download + full build is non-trivial wall time per PR, so the nightly-only choice is defensible too — curious which trade-off you prefer.

failed=$((failed + 1))
fi
done
printf "\nTotal: %d Passed: %d Failed: %d\n" "$total" "$passed" "$failed"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If install_flink, build_project, stage_dist_jars, or start_cluster dies under set -e, no result is ever recorded, so print_summary walks an empty RESULT_NAMES and prints Total: 0 Passed: 0 Failed: 0 before cleanup propagates the original non-zero exit code. The CI job still fails on the exit code, but a person scanning the log sees a "zero failures" summary right before the red X, which is misleading when triaging a 45-minute nightly run.

One way it could read, if useful:

if (( total == 0 )); then
    log_error "Test setup failed before any example was submitted"
    return
fi

right above the existing if (( failed > 0 )) check.

@xintongsong xintongsong added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels May 31, 2026
Copy link
Copy Markdown
Contributor

@xintongsong xintongsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @matrixsparse . It's a good idea to test with the example jobs nightly.

I'm not sure about only validates the job submission success. I think currently all example jobs can run with local LLMs in Ollama. That shouldn't be a problem against verifying the full execution. Did I miss anything?

log_ok "Staged: $(basename "$flink_jar")"
}

package_examples() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think build_project should have already built the examples. We should not need to re-build them.

Comment on lines +328 to +336
log_section "Step 6: submit Java examples"
submit_java_example "org.apache.flink.agents.examples.ReActAgentExample"
submit_java_example "org.apache.flink.agents.examples.WorkflowSingleAgentExample"
submit_java_example "org.apache.flink.agents.examples.WorkflowMultipleAgentExample"

log_section "Step 7: submit Python examples"
submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/react_agent_example.py"
submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/workflow_single_agent_example.py"
submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/workflow_multiple_agent_example.py"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to not cover all the example jobs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Tech Debt] Add e2e test for submitting example to flink standalone cluster.

3 participants