-
Notifications
You must be signed in to change notification settings - Fork 2
CHORE: AGENTS.md added #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,315 @@ | ||||||||||||||||||||||||||
| # caterpillar | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| A Go CLI that reads a YAML pipeline configuration and executes an ordered chain (or DAG) of typed data-processing tasks. Tasks receive `*record.Record` values from an upstream buffered channel, transform or route them, and emit results downstream. | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| The binary can operate in two modes depending on the pipeline: | ||||||||||||||||||||||||||
| - **Batch mode** (most pipelines) — runs, processes all records, and exits. | ||||||||||||||||||||||||||
| - **Server mode** — when the pipeline starts with an `http_server` task, the CLI acts as a long-running HTTP server. Incoming requests are converted to `*record.Record` values and emitted downstream to the rest of the pipeline. The binary does not exit until the server shuts down (or `end_after` is configured). | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Apply these instructions to `/Users/prasadlohakpure/Desktop/go_projects/src/github.com/patterninc/caterpillar`. Treat paths and commands below as relative to that location unless explicitly stated otherwise. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| Apply these instructions to `/Users/prasadlohakpure/Desktop/go_projects/src/github.com/patterninc/caterpillar`. Treat paths and commands below as relative to that location unless explicitly stated otherwise. | |
| Apply these instructions from the repository root. Treat paths and commands below as relative to the repository root unless explicitly stated otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can instruct it to refer the go version from the mod file so this doesn't become absolute when someone updates the version?
Copilot
AI
Apr 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tech stack lists "Go 1.22+", but the repo is pinned to Go 1.24.7 (go.mod "go 1.24.7" and CI uses 1.24.7). Please update this to match the actual required Go version so new contributors don't hit toolchain mismatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directory structure is adopted from https://github.com/golang-standards/project-layout
Copilot
AI
Apr 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory map says AWS tasks are currently parameter_store, but the registered YAML task type key is aws_parameter_store (see internal/pkg/pipeline/tasks.go). Please align the doc with the actual type: value users must put in pipeline YAML.
| | `internal/pkg/pipeline/task/aws/` | AWS-specific tasks (currently `parameter_store`) | | |
| | `internal/pkg/pipeline/task/aws/` | AWS-specific tasks (currently YAML type `aws_parameter_store`) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to create these folders.
Copilot
AI
Apr 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line claims t.GetRecord(input) "handles nil and close" (and earlier implies nil-record handling). In code, Base.GetRecord only returns (nil,false) when the channel itself is nil; if a nil record is sent on a non-nil channel it will return (nil,true). Please adjust wording to match the actual behavior (nil channel vs closed channel vs nil record value).
| - `t.GetRecord(input)` safely reads from the channel and handles nil and close. `t.SendRecord(r, output)` evaluates `context:` JQ expressions and forwards the record. | |
| - `t.GetRecord(input)` safely reads from the channel; it returns `ok == false` when the input channel is `nil` or closed. If a `nil` record is sent on a non-`nil`, open channel, it returns `r == nil` with `ok == true`. `t.SendRecord(r, output)` evaluates `context:` JQ expressions and forwards the record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add linter, formatting instructions if we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step 5 for testing.
Step 6 for updating documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer this to get some idea and incorporate those here - https://github.com/patterninc/caterpillar/pull/47/changes
Copilot
AI
Apr 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Task Interface section, it says task.Base satisfies every method except Run, but task.Base does implement Run (default pass-through) in internal/pkg/pipeline/task/task.go. Please fix this statement to avoid misleading task authors about what they must implement.
| `task.Base` satisfies every method except `Run`. Only override a method if you need non-default behavior. | |
| `task.Base` satisfies every method in this interface, including a default pass-through `Run`. Only override a method when you need behavior different from the default. |
Copilot
AI
Apr 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Known Broken Packages" section describes several paths/files as untracked/uncommitted (e.g., internal/pkg/pipeline/task/kinesis/, push_sqs_localstack.go, and file_success_path_test.go), but these do not exist in the repository tree. As written, this is misleading and discourages running go build ./... / go test ./... without a repo-backed reason; please either remove this section or rewrite it to only reference tracked files and current, reproducible build limitations.
| ### 2. `internal/pkg/pipeline/task/file/` — production code is fine; test build fails | |
| - **Problem:** `file_success_path_test.go` (untracked) references `resolveSuccessObjectPath` and `writerSchemeFromPath`, which do not yet exist in the production package. The test was written ahead of the implementation. | |
| - **Status:** `file.go` and `s3.go` build and run correctly. Only the test build is broken. | |
| - **Action:** Do not run `go test ./internal/pkg/pipeline/task/file/`. The production package is safe to import and extend. | |
| ### 3. Root package — scratch files with duplicate `main()` declarations | |
| - **Problem:** `push_sqs_localstack.go` and `push_kafka_message.go` both declare `package main` with a `func main()`, causing a duplicate-symbol error if you compile the root package. | |
| - **Status:** Untracked (not committed). Used locally for manual testing. | |
| - **Action:** Do not delete without asking. Do not attempt `go build .` at the repo root. |
Copilot
AI
Apr 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SRE/Operational section says "Caterpillar is a CLI tool, not a long-running server," but earlier the doc describes a supported "Server mode" via the http_server task where the process runs until shutdown. Please reconcile these statements (e.g., clarify it's usually batch/CLI, but can run long-lived when configured as an HTTP server).
| Caterpillar is a CLI tool, not a long-running server. There is no process to restart, no service to scale, and no health endpoint to query. | |
| **During an incident:** | |
| 1. Check CI logs first: `gh run list --repo patterninc/caterpillar` then `gh run view <run-id>` to inspect a specific run. | |
| 2. If CI is passing and runtime behavior is wrong, check the state of the relevant AWS service (S3 bucket access, SQS queue depth, SSM parameter existence) using the AWS Console or CLI. | |
| 3. There is no caterpillar daemon to restart — re-running the binary with a corrected YAML is the recovery action. | |
| Caterpillar is usually operated as a CLI/batch tool, not as a continuously running service. Most pipelines run, process all records, and exit, so there is typically no service to scale and no health endpoint to query. However, when the pipeline starts with an `http_server` task, the same binary runs as a long-lived HTTP server until shutdown (or until `end_after` is reached). | |
| **During an incident:** | |
| 1. Check CI logs first: `gh run list --repo patterninc/caterpillar` then `gh run view <run-id>` to inspect a specific run. | |
| 2. If CI is passing and runtime behavior is wrong, check the state of the relevant AWS service (S3 bucket access, SQS queue depth, SSM parameter existence) using the AWS Console or CLI. | |
| 3. For batch pipelines, re-running the binary with a corrected YAML is the recovery action. For `http_server` pipelines, treat it like a long-running process: inspect the running server configuration/logs and restart the process if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test is a must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to call it Caterpillar Service mode to have a common vocabulary for that?