Skip to content

Add a download_manifests tool to our agent harness mcp server#751

Open
JohnBlackwell wants to merge 2 commits intomainfrom
john/prod-4782-add-a-download_manifests-tool-to-our-agent-harness-mcp
Open

Add a download_manifests tool to our agent harness mcp server#751
JohnBlackwell wants to merge 2 commits intomainfrom
john/prod-4782-add-a-download_manifests-tool-to-our-agent-harness-mcp

Conversation

@JohnBlackwell
Copy link
Copy Markdown
Contributor

This should:

  • take a cluster handle + service name and fetch the manifests using our service files gql api.

  • place those files in a separate subdirectory (use the {handle}-{name} syntax to make it semantically meaningful perhaps)

  • instruct the agent harness that the files are located there and it can inspect them as it wishes

This will help agents better understand both plural's gitops system + things like external charts without mistakenly guessing via web searches.

Test Plan

Test environment: https://console.your-env.onplural.sh/

Checklist

  • I have added a meaningful title and summary to convey the impact of this PR to a user.
  • I have deployed the agent to a test environment and verified that it works as expected.
    • Agent starts successfully.
    • Service creation works without any issues when using raw manifests and Helm templates.
    • Service creation works when resources contain both CRD and CRD instances.
    • Service templating works correctly.
    • Service errors are reported properly and visible in the UI.
    • Service updates are reflected properly in the cluster.
    • Service resync triggers immediately and works as expected.
    • Sync waves annotations are respected.
    • Sync phases annotations are respected. Phases are executed in the correct order.
    • Sync hook delete policies are respected. Resources are not recreated once they reach the desired state.
    • Service deletion works and cleanups resources properly.
    • Services can be recreated after deletion.
    • Service detachment works and keeps resources unaffected.
    • Services can be recreated after detachment.
    • Service component trees are working as expected.
    • Cluster health statuses are being updated.
    • Agent logs do not contain any errors (after running for at least 30 minutes).
    • There are no visible anomalies in Datadog (after running for at least 30 minutes).
  • I have added tests to cover my changes.
  • If required, I have updated the Plural documentation accordingly.

This should:

take a cluster handle + service name and fetch the manifests using our service files gql api.

place those files in a separate subdirectory (use the {handle}-{name} syntax to make it semantically meaningful perhaps)

instruct the agent harness that the files are located there and it can inspect them as it wishes

This will help agents better understand both plural's gitops system + things like external charts without mistakenly guessing via web searches.
@JohnBlackwell JohnBlackwell requested a review from a team as a code owner May 6, 2026 19:24
@linear
Copy link
Copy Markdown

linear Bot commented May 6, 2026

@github-actions github-actions Bot added the size/L label May 6, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR adds a downloadServiceManifests MCP tool to the agent harness that fetches a Plural service's rendered Kubernetes tarball, extracts it into a manifests/<cluster>-<service>/ subdirectory, and instructs the agent where to find the files. The tool is wired in across Claude, Codex, and Gemini harnesses.

  • New DownloadManifests tool (internal/mcpserver/agent/tool/downloadmanifests.go): fetches tarball via existing manifests.GetReader + Untar, writes to a sanitized per-service directory, and returns a JSON result with the path and inspection instructions.
  • Harness integration: mcp__plural__downloadServiceManifests added to allow-lists in all three agent runtimes (Claude, Codex, Gemini) and all run modes (analyze, write, babysit).
  • Client extension: new GetServiceDeploymentByHandle(cluster, name) method added to the console.Client interface with a corresponding mock.

Confidence Score: 3/5

The core tool implementation has a data race that can cause wrong manifests to be fetched and extracted under concurrent requests, and the underlying tar extraction does not guard against path traversal — both should be addressed before merging.

The handler stores per-request fields (ClusterHandle, ServiceName) on the shared *DownloadManifests receiver, so two concurrent MCP calls will race, potentially extracting a different service's tarball into the wrong directory. Additionally, Untar does not validate that extracted paths stay within the target directory, a well-known zip-slip pattern now reachable via agent-callable tooling.

internal/mcpserver/agent/tool/downloadmanifests.go for the concurrent-state issue, and pkg/manifests/untar.go for the path-containment check.

Security Review

  • Zip-slip / path traversal (pkg/manifests/untar.go): filepath.Join(dst, header.Name) is used without verifying the resolved path stays within dst. A tarball with ../-prefixed entry names could write files outside the intended extraction directory. This is pre-existing code but is now reachable via the new agent-callable MCP tool, widening exposure if the Console or its object storage is ever compromised.

Important Files Changed

Filename Overview
internal/mcpserver/agent/tool/downloadmanifests.go New MCP tool implementation; per-request fields (ClusterHandle, ServiceName) mutated on a shared receiver creates a data race under concurrent calls
pkg/manifests/untar.go Pre-existing tar extraction function used by the new tool; missing path-containment check allows zip-slip writes outside the target directory
pkg/client/service.go Adds GetServiceDeploymentByHandle method; straightforward wrapper around the generated console client with nil-check
pkg/agentrun-harness/tool/claude/claude.go Adds mcp__plural__downloadServiceManifests to all three Claude run-mode allow-lists; consistent across analyze, write, and babysit modes
pkg/test/mocks/Client_mock.go Auto-generated mock for the new GetServiceDeploymentByHandle interface method; standard testify/mock pattern

Comments Outside Diff (1)

  1. pkg/manifests/untar.go, line 49 (link)

    P1 security Zip-slip / path traversal in tar extraction

    target is computed as filepath.Join(dst, header.Name) with no subsequent check that the result still lives under dst. A tarball entry with a name like ../../etc/cron.d/evil will resolve to a path outside the intended directory. While the tarball URL is fetched from the Plural Console (a trusted server), this tool is now callable by agents at runtime via MCP, widening the attack surface if the Console or its storage layer is ever compromised.

    Adding a guard after the join — verifying that strings.HasPrefix(filepath.Clean(target), filepath.Clean(dst)+string(os.PathSeparator)) — would eliminate the class of issue. Note: this is pre-existing code but is being newly exposed through the agent-callable MCP tool.

Reviews (1): Last reviewed commit: "Merge branch 'main' into john/prod-4782-..." | Re-trigger Greptile

Comment on lines +47 to +111
func (in *DownloadManifests) handler(_ context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
if err := in.fromRequest(request); err != nil {
return mcp.NewToolResultError(fmt.Sprintf("could not handle download manifests request: %v", err)), nil
}

svc, err := in.client.GetServiceDeploymentByHandle(in.ClusterHandle, in.ServiceName)
if err != nil {
return mcp.NewToolResultError(fmt.Sprintf("failed to look up service %q on cluster %q: %v", in.ServiceName, in.ClusterHandle, err)), nil
}
if svc.Tarball == nil || *svc.Tarball == "" {
return mcp.NewToolResultError(fmt.Sprintf("service %q on cluster %q does not yet have a rendered tarball available", in.ServiceName, in.ClusterHandle)), nil
}

_, token := in.client.GetCredentials()
if token == "" {
return mcp.NewToolResultError("Plural Console credentials are not configured for the MCP server"), nil
}

baseDir, err := resolveManifestsBaseDir()
if err != nil {
return mcp.NewToolResultError(fmt.Sprintf("failed to resolve manifests base directory: %v", err)), nil
}

targetDir := filepath.Join(baseDir, manifestsSubdir, sanitizeSegment(in.ClusterHandle)+"-"+sanitizeSegment(in.ServiceName))

if err := os.RemoveAll(targetDir); err != nil {
return mcp.NewToolResultError(fmt.Sprintf("failed to clear target directory %q: %v", targetDir, err)), nil
}
if err := os.MkdirAll(targetDir, 0o755); err != nil {
return mcp.NewToolResultError(fmt.Sprintf("failed to create target directory %q: %v", targetDir, err)), nil
}

reader, _, err := manifests.GetReader(*svc.Tarball, token)
if err != nil {
return mcp.NewToolResultError(fmt.Sprintf("failed to download service tarball: %v", err)), nil
}
defer reader.Close()

if err := manifests.Untar(targetDir, reader); err != nil {
return mcp.NewToolResultError(fmt.Sprintf("failed to extract service tarball into %q: %v", targetDir, err)), nil
}

return mcp.NewToolResultJSON(struct {
Success bool `json:"success"`
Message string `json:"message"`
Cluster string `json:"cluster"`
Service string `json:"service"`
ServiceID string `json:"serviceId"`
Directory string `json:"directory"`
Instructions string `json:"instructions"`
}{
Success: true,
Message: fmt.Sprintf("downloaded manifests for service %q on cluster %q", in.ServiceName, in.ClusterHandle),
Cluster: in.ClusterHandle,
Service: in.ServiceName,
ServiceID: svc.ID,
Directory: targetDir,
Instructions: fmt.Sprintf(
"The rendered Kubernetes manifests for the service have been written to %q. "+
"Use Read/Glob/Grep against this directory to inspect the actual resources Plural is applying "+
"(including resources rendered from external Helm charts) instead of guessing via web searches.",
targetDir,
),
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Per-request state stored on shared receiver

fromRequest writes in.ClusterHandle and in.ServiceName directly onto the *DownloadManifests receiver, which is a single instance shared across all concurrent MCP tool calls. Two simultaneous invocations can interleave: Request A sets ClusterHandle = "prod", Request B overwrites it with ClusterHandle = "dev", then Request A proceeds to call GetServiceDeploymentByHandle("dev", ...) — fetching and extracting the wrong service's manifests, and potentially writing them to the wrong targetDir.

The fix is to keep clusterHandle and serviceName as function-local variables in handler instead of struct fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant