Add execution engine run scripts: Databricks, Spark Standalone, Dataproc by Copilot · Pull Request #1 · arshewarTarun/execution_scripts

Copilot · 2026-04-20T06:58:27Z

Adds shell scripts for submitting and managing Spark workloads across three execution engines, starting from an empty repo.

Structure

databricks/
  run_job.sh          # Trigger existing job by ID or name; poll until done
  submit_notebook.sh  # One-time notebook run on existing or ephemeral cluster
spark-standalone/
  start_cluster.sh    # Start master + N local workers (configurable cores/mem)
  stop_cluster.sh     # Gracefully stop workers then master
  submit_job.sh       # spark-submit wrapper (JAR + Python, --conf, deploy mode)
dataproc/
  create_cluster.sh   # Create cluster (machine types, auto-delete, labels, properties)
  delete_cluster.sh   # Delete cluster with optional confirmation
  submit_job.sh       # Submit spark/pyspark/hadoop/hive/pig/presto jobs

Key design points

Databricks — uses Jobs API 2.1 (run-now / runs/submit); run_job.sh supports name→ID resolution via URL-encoded list query
Dataproc — all Spark/job properties collected into a single --properties key=val,key=val argument (multiple --properties flags are silently last-wins in gcloud)
All scripts: set -euo pipefail, named flags with defaults, --help, dependency checks (require_cmd), clear error messages on missing required inputs

Usage

# Databricks: trigger job by name and wait
DATABRICKS_HOST=https://adb-xxx.azuredatabricks.net DATABRICKS_TOKEN=dapi... \
./databricks/run_job.sh --job-name nightly_etl --params '{"date":"2024-01-01"}' --wait

# Spark Standalone: start cluster, submit, stop
SPARK_HOME=/opt/spark ./spark-standalone/start_cluster.sh --workers 2 --worker-cores 4 --worker-memory 8g
./spark-standalone/submit_job.sh --app /path/to/etl.py
./spark-standalone/stop_cluster.sh

# Dataproc: create, submit, teardown
GCP_PROJECT=my-project GCP_REGION=us-central1 \
./dataproc/create_cluster.sh --cluster-name etl --num-workers 4 --max-idle 30m
./dataproc/submit_job.sh --cluster-name etl --job-type pyspark --app gs://bucket/etl.py -- --date 2024-01-01
./dataproc/delete_cluster.sh --cluster-name etl --yes

Original prompt

I want to add different execution engines run scripts like databricks, Spark-Standalone, Dataproc

Copilot

Copilot wasn't able to review any files in this pull request.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…taproc Agent-Logs-Url: https://github.com/arshewarTarun/execution_scripts/sessions/54abfca2-741f-4e14-8812-0850828d3b58 Co-authored-by: arshewarTarun <180503309+arshewarTarun@users.noreply.github.com>

… flag Agent-Logs-Url: https://github.com/arshewarTarun/execution_scripts/sessions/54abfca2-741f-4e14-8812-0850828d3b58 Co-authored-by: arshewarTarun <180503309+arshewarTarun@users.noreply.github.com>

Agent-Logs-Url: https://github.com/arshewarTarun/execution_scripts/sessions/54abfca2-741f-4e14-8812-0850828d3b58 Co-authored-by: arshewarTarun <180503309+arshewarTarun@users.noreply.github.com>

Initial plan

186bbd5

Copilot AI assigned Copilot and arshewarTarun Apr 20, 2026

Copilot started work on behalf of arshewarTarun April 20, 2026 06:58 View session

arshewarTarun marked this pull request as ready for review April 20, 2026 07:03

Copilot AI review requested due to automatic review settings April 20, 2026 07:03

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Copilot AI and others added 3 commits April 20, 2026 07:04

Add execution engine scripts for Databricks, Spark-Standalone, and Da…

195ef92

…taproc Agent-Logs-Url: https://github.com/arshewarTarun/execution_scripts/sessions/54abfca2-741f-4e14-8812-0850828d3b58 Co-authored-by: arshewarTarun <180503309+arshewarTarun@users.noreply.github.com>

Fix URL encode helper and consolidate Dataproc properties into single…

334ca9d

… flag Agent-Logs-Url: https://github.com/arshewarTarun/execution_scripts/sessions/54abfca2-741f-4e14-8812-0850828d3b58 Co-authored-by: arshewarTarun <180503309+arshewarTarun@users.noreply.github.com>

Add require_cmd python3 check in run_job.sh validation section

3478690

Agent-Logs-Url: https://github.com/arshewarTarun/execution_scripts/sessions/54abfca2-741f-4e14-8812-0850828d3b58 Co-authored-by: arshewarTarun <180503309+arshewarTarun@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add different execution engines for running scripts~~ Add execution engine run scripts: Databricks, Spark Standalone, Dataproc Apr 20, 2026

Copilot finished work on behalf of arshewarTarun April 20, 2026 07:07

Copilot AI requested a review from arshewarTarun April 20, 2026 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add execution engine run scripts: Databricks, Spark Standalone, Dataproc#1

Add execution engine run scripts: Databricks, Spark Standalone, Dataproc#1
Copilot wants to merge 4 commits into
mainfrom
copilot/add-execution-engines-scripts

Copilot AI commented Apr 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Structure

Key design points

Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 20, 2026 •

edited

Loading