Skip to content

CORENET-7091: Add enhancement proposal to productize ovn-kubernetes MCP tools#2002

Open
arkadeepsen wants to merge 1 commit into
openshift:masterfrom
arkadeepsen:ovnk-mcp
Open

CORENET-7091: Add enhancement proposal to productize ovn-kubernetes MCP tools#2002
arkadeepsen wants to merge 1 commit into
openshift:masterfrom
arkadeepsen:ovnk-mcp

Conversation

@arkadeepsen

Copy link
Copy Markdown
Member

No description provided.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 8, 2026
@openshift-ci-robot

openshift-ci-robot commented May 8, 2026

Copy link
Copy Markdown

@arkadeepsen: This pull request references CORENET-7091 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows) while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface.

The primary motivation for landing these tools in upstream kubernetes-mcp-server is **productization via downstream sync into openshift-mcp-server**. By first integrating the OVN toolset upstream, OpenShift can ship and support the same upstream code through the established downstream pipeline.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of productization, can we say that keeping all Openshft related MCP servers in a single repository is the main motivation? or we can keep both.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a line stating that the ovnk tools can be consumed from the same ocp mcp server.

kms --> Sync
```

**Downstream.** openshift-mcp-server consumes kubernetes-mcp-server changes through its normal fork sync or vendor workflow (exact mechanics follow that repository’s documented process).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont we want to add any more implementation details, like what exact tools will be added and what purpose those may serve?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added that all the tools under ovn and ovs packages will be added to the OCP MCP server. I have added some more details about how to go about the implementation. I didn't want to add specific details of the the local PoC I did as that might not be the only way of implementing the integration.

- Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality.
- Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools.
- Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature.
- Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is having an automated sync mechanism between ovn-mcp-server, kubernetes-mcp-server and openshift-mcp-server also a goal of this feature?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current plan is to import the the packages from ovn-kubernetes-mcp repo. Thus, whenever we need the latest changes in kubernetes-mcp-server, the go.mod and go.sum files can be updated to refer to the latest changes from ovn-kubernetes-mcp repo. Regarding the automation, since kubernetes-mcp-server is in a separate upstream repo where we are not maintainers, not sure whether adding the automatic sync process as part of this EP would be appropriate. We can figure that part out, if needed, in the future. For now, we'll just bump the import as we do for k8s bump in the different repos.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that must-gather is downstream specific, bringing it into the kubernetes-mcp-server would not be a problem, right?

@arkadeepsen arkadeepsen May 13, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already an existing downstream effort for must-gather. It differs from how it's been implemented in ovn-kubernetes-mcp repo. If we want to integrate the networking bits from the must-gather tool, we'll have to do it in the openshift-mcp-server directly, as kubernetes-mcp-server won't have must-gather related tools.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, we can skip using must-gather tool from ovn-kubernetes-mcp and use the existing one. We can try to directly add networking bits to kubernetes-mcp-server to imitate behaviour in ovn-kubernetes-mcp. Can we consider this one of the goals?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not work on kubernetes-mcp-server as the must-gather implementation is in downstream openshift-mcp-server.

@arkadeepsen arkadeepsen force-pushed the ovnk-mcp branch 2 times, most recently from aaccb39 to bbf81a5 Compare May 12, 2026 15:54
- Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality.
- Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools.
- Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature.
- Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that must-gather is downstream specific, bringing it into the kubernetes-mcp-server would not be a problem, right?


- Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality.
- Enable kubernetes-mcp-server to execute OVN/OVS tool commands in-cluster using its existing pod-exec capabilities, with only minor refactoring required in **ovn-kubernetes-mcp** and **kubernetes-mcp-server** to integrate that pod-exec path cleanly.
- Import the full OVN and OVS handler set from ovn-kubernetes-mcp (`pkg/ovn/mcp` and `pkg/ovs/mcp`) into the `ovn-kubernetes` toolset, while other upstream packages stay excluded per Non-Goals.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per today's discussion, we should mention kernel and sosreport tools which are helpful to explore node's kernel resources.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already changed.

- Full parity in the first iteration with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example kernel diagnostics, optional images such as pwru/tcpdump, must-gather, sosreport) where those require separate dependencies, images, or workflows.
- New Kubernetes or OpenShift APIs, CRDs, operators, or cluster-side agents solely for this feature.
- Replacing existing CLI-based troubleshooting; MCP tools are an additional interface.
- Importing ovn-kubernetes-mcp tools under `kernel` and `network-tools` packages in the first iteration, since those tools depend on a node debugging capability (for example a node-debug tool) that is not currently available in kubernetes-mcp-server.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to remove this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already changed.


### Non-Goals

- Full parity in the first iteration with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example kernel diagnostics, optional images such as pwru/tcpdump, must-gather, sosreport) where those require separate dependencies, images, or workflows.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ovn-kubernetes-mcp is an upstream repo, we can't expect all current and future tools to be applicable to an OpenShift environment.
Given that we plan to import the packages from ovn-kubernetes-mcp repo, how should we control access to tools that may not be supported?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only going to call the handlers of the tools which are supported. The import is for the packages where these handlers are defined. Unsupported handlers should not be used.

- https://redhat.atlassian.net/browse/CORENET-7091
see-also:
- https://github.com/ovn-kubernetes/ovn-kubernetes-mcp
- https://github.com/containers/kubernetes-mcp-server

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: do we need kubernetes-mcp-server and openshift-mcp-server here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Since the implementation of the EP will impact all the repos, we need all of them to be incuded here.

### User Stories

- As a cluster administrator or platform engineer, I want OVN-Kubernetes MCP troubleshooting tools in the same MCP server I already use for Kubernetes resources, so that I do not have to deploy, operate, or manage authentication for a second MCP server dedicated only to OVN-Kubernetes.
- As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports—NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling—so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server importsNB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture toolingso that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident.
- As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports (NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling) so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a bracket between the dashes.

- Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality.
- Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools.
- Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature.
- Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, we can skip using must-gather tool from ovn-kubernetes-mcp and use the existing one. We can try to directly add networking bits to kubernetes-mcp-server to imitate behaviour in ovn-kubernetes-mcp. Can we consider this one of the goals?


**Importing upstream tools into kubernetes-mcp-server.** The OVN troubleshooting MCP tools already exist in ovn-kubernetes-mcp. The integration approach for kubernetes-mcp-server is to add an `ovn-kubernetes` toolset that reuses those implementations as imported packages and exposes them through kubernetes-mcp-server’s tool registration.

**Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that kubernetes-macp-server is building its own node-debug method to allow host access using kubectl/oc CLI. However, in ovn-kubernetes-mcp we use a different method to do node debug for kernel and other network tools. I wonder how we can use tools from ovn-kubernetes-mcp while using the utility from kubernetes-mcp-server, considering it's downstream of ovn-kubernetes-mcp.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same way we'll use pod-exec from kubernetes-mcp-server for the OVN/OVS tools. The function definition should be similar, that is, the argument list and the return type should be same in both ovn-kubernetes-mcp and kubernetes-mcp-server, for the node-debug function, which will be called by the kernel and the network-tools handlers.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mention this explicitly in the document? for what I understand the current kubernetes-mcp-server does not have any node-debug method capability so far. so if that needs to be implemented is worth to call it out in this section.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effort is ongoing. I had a discussion with Surya and she had mentioned that in the EP we'll assume that the node-debug tool exists. We can expedite the merrging of the node-debug tool PR by helping with reviews, so that we can get started with the integration of the kernel/network-tools.


**Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation.

**Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces—must-gather, sosreport, and similar—remain out of scope unless separately agreed; see Non-Goals.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfacesmust-gather, sosreport, and similarremain out of scope unless separately agreed; see Non-Goals.
**Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces (must-gather, sosreport, and similar) remain out of scope unless separately agreed; see Non-Goals.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@taanyas taanyas left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


## Open Questions

- How to structure mcpchecker suites or task labels so OVN/OVS, **`kernel`**, and **`network-tools`** coverage stays maintainable under kubernetes-mcp-server’s pass-rate gates, given differing cluster prerequisites?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the mcpchecker structure — since kernel and network-tools require privileged node access which may not be available in all CI environments, would it make sense to have separate suites for OVN/OVS and kernel/network-tools so their pass rates are tracked independently?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more inclined towards creating a separate suite for each layer of ovnk mcp server tools. That is for each of OVN, OVS, kernel and network-tools, we'll have separate evals suites. But we can take a call when working on the evals for the tools.

@mattedallo mattedallo left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I added some "non blocking" comments.


OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows), host-oriented diagnostics, and packet or kernel-level capture workflows while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface.

The primary motivation for landing these tools in upstream kubernetes-mcp-server is **productization via downstream sync into openshift-mcp-server**. By first integrating the OVN toolset upstream, OpenShift can ship and support the same upstream code through the established downstream pipeline. This also lets OpenShift customers consume the OVN-Kubernetes tools from the same MCP server as the rest of the platform troubleshooting surface, openshift-mcp-server, after downstream sync.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe we can expand a bit what is the cost we are saving on exploiting the existing openshift-mcp-server productization pipeline.
That will strength the motivation of integrating versus keeping it separate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more details in the motivation section.


None. This work adds MCP tools only and does not extend the OpenShift or Kubernetes API surface.

### Topology Considerations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note : the topology section seems written with the local binary deployment model in mind. It might be worth a brief mention that the same considerations apply for in-cluster deployments, or a note that the OVN-K tools inherit whatever cluster-access model kubernetes-mcp-server provides.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubeconfig is mentioned specifically regarding hypershift since it has managment cluster and the hosted (guest) cluster which have separate kubeconfigs. The deployment is by default expected to be in-cluster and not local.


**Importing upstream tools into kubernetes-mcp-server.** The OVN troubleshooting MCP tools already exist in ovn-kubernetes-mcp. The integration approach for kubernetes-mcp-server is to add an `ovn-kubernetes` toolset that reuses those implementations as imported packages and exposes them through kubernetes-mcp-server’s tool registration.

**Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mention this explicitly in the document? for what I understand the current kubernetes-mcp-server does not have any node-debug method capability so far. so if that needs to be implemented is worth to call it out in this section.


**Split of work:** kubernetes-mcp-server decides how each capability is exposed to MCP users (tool names and parameters). ovn-kubernetes-mcp keeps handler logic that validates inputs, builds command lines, and defines execution contracts; kubernetes-mcp-server integrates by calling those libraries and supplying pod exec, node-level debugging, or other supported cluster operations against the target cluster.

```mermaid

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the diagram few things tripped me up:

  • The main call relationship (kubernetes-mcp-server's tool handler calling ovn-kubernetes-mcp's imported handler logic) isn't shown that's the core of the integration.
  • "delegated_in_cluster_execution" sits inside the ovn-kubernetes-mcp box, but the actual execution will happen in kubernetes-mcp-server's client AFAIU. ovn-kubernetes-mcp defines the contract/interface; kubernetes-mcp-server implements it.
  • The box only shows "OVN_OVS" but kernel and network-tools are also in scope, with a different execution path (node-debug vs pod-exec).
  • The two subgraphs connected by a dotted arrow could be read as two separate services communicating at runtime, when in practice ovn-kubernetes-mcp will be compiled into kubernetes-mcp-server as an imported Go package.

Would something like this be more accurate? Let me know your thoughts

flowchart TB
    subgraph kms [kubernetes-mcp-server process]
      ToolHandler["Tool handler\n(defines MCP tool name, schema)"]
      subgraph ovnkLib ["ovn-kubernetes-mcp (imported Go package)"]
        HandlerLogic["Handler logic\n(validates inputs, builds commands)"]
      end
      subgraph executor [kubernetes-mcp-server K8s client]
        PodExec["PodExec\n(OVN/OVS tools)"]
        NodeDebug["NodeDebug\n(kernel / network-tools)"]
      end
      ToolHandler -->|"calls imported package"| HandlerLogic
      HandlerLogic -->|"calls injected executor"| PodExec
      HandlerLogic -->|"calls injected executor"| NodeDebug
      PodExec -->|"exec in ovnkube pod"| Cluster["Cluster"]
      NodeDebug -->|"privileged debug pod on node"| Cluster
    end
Loading

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the mermaid diagram as it was getting messier. Hope the latest diagram helps in conveying the integration more clearly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is, thanks!


None. This work adds MCP tools only and does not extend the OpenShift or Kubernetes API surface.

### Topology Considerations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I would add a sentence under Topology Considerations to introduce what each section is going to address and to clarify that the OVN-K toolset inherits openshift-mcp-server's existing cluster-access mechanisms.
Something like:

The OVN-Kubernetes toolset uses openshift-mcp-server's existing pod-exec and node-debug capabilities and does not introduce new cluster-access mechanisms or deployment requirements. The considerations below describe topology-specific implications for those underlying capabilities, not for the OVN-K tools themselves.


#### Hypershift / Hosted Control Planes

The MCP server uses whatever cluster the kubeconfig targets. For HyperShift, that is typically the **hosted cluster** API when troubleshooting workload networking; there is no change to management-plane APIs. Operators must select the correct context (management versus guest) the same way they would for `kubectl exec`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (Optional): I found this subsection a bit hard to follow without prior HyperShift context. A brief mention of where the OVN-K components live in HyperShift (ovnkube-node on the hosted cluster, control-plane on the management cluster) would help the reader understand why the hosted cluster API is the right target. It would also be useful to note that this is inherited from openshift-mcp-server's existing cluster-targeting behavior rather than something new introduced by this feature.

Totally optional comment, just thinking about readers who aren't deeply familiar with HyperShift topology.

One possible rewording (feel free to ignore or adapt) :

The OVN-K toolset inherits openshift-mcp-server's existing cluster-targeting behavior and does not introduce any HyperShift-specific logic.

In HyperShift, ovnkube-node pods -- which contain the per-node OVN NB/SB databases, northd, and ovn-controller -- run on the hosted cluster worker nodes. All OVN-K troubleshooting targets (pod exec into ovnkube-node, node-debug for kernel/network-tools) therefore require the MCP server to reach the hosted cluster API, not the management cluster. The lightweight ovnkube-control-plane on the management cluster is not targeted by any tool in this toolset.

This is the same cluster-selection requirement that applies to any openshift-mcp-server toolset targeting workload-cluster resources. In kubeconfig mode, the operator selects the hosted cluster context; in an in-cluster deployment, the server must be deployed into (or configured to reach) the hosted cluster.


#### Standalone Clusters

Fully relevant: tools execute against pods on the same cluster the API client reaches.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: the "Fully relevant:" is unclear what refers to. Maybe "No special considerations:" is more clear.


**Split of work:** kubernetes-mcp-server decides how each capability is exposed to MCP users (tool names and parameters). ovn-kubernetes-mcp keeps handler logic that validates inputs, builds command lines, and defines execution contracts; kubernetes-mcp-server integrates by calling those libraries and supplying pod exec, node-level debugging, or other supported cluster operations against the target cluster.

```mermaid

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is, thanks!

@arghosh93 arghosh93 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One NIT comment. Otherwise, LGTM.


## Motivation

OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows), host-oriented diagnostics, and packet or kernel-level capture workflows while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This does not mention OVS OpenFlows. I do agree that we have mentioned this later on in the enhancement, and if you want to ignore it, that should be fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the motivation.

@mattedallo

Copy link
Copy Markdown

lgtm

@openshift-ci

openshift-ci Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: arghosh93, mattedallo, taanyas
Once this PR has been reviewed and has the lgtm label, please assign abhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@arkadeepsen

Copy link
Copy Markdown
Member Author

@tssurya PTAL


## Alternatives (Not Implemented)

- **Add the OVN toolset to kubernetes-mcp-server first, then rely on downstream sync into openshift-mcp-server:** Not chosen for this enhancement because OpenShift is landing the integration directly in openshift-mcp-server to ship on product cadence without gating on upstream kubernetes-mcp-server acceptance, release, and fork sync timing. The import-and-delegate pattern remains the same; a future upstream integration could still reduce long-term duplication if both codebases converge.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the concern around waiting for upstream merge and sync is, as upstream is currently 100% Red Hat. In general whenever a toolset has no hard requirements on openshift APIs we prefer to land upstream. Is the issue that this toolset requires openshift specifics?

Currently the upstream is synced downstream about 1-2 times per week, and can be done more frequently when needed. cc @mattedallo

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mentioned the other reasons here: openshift/openshift-mcp-server#315 (comment)

Adding the same here for easy readability:

one of the reasons not to add these tools to upstream is because some of the ovnk tools need the node-debug functionality and as per some existing conversations it seems that adding that tool upstream is not in current plans. Additionally, upstream k8s-mcp-server might want to be CNI agnostic whereas for openshift-mcp-server these tools will be very useful as most customers use ovnk as the CNI. We already have a separate upstream repo for ovnk mcp server (https://github.com/ovn-kubernetes/ovn-kubernetes-mcp) and thus adding these tools in k8s-mcp-server will mean that 2 separate upstream projects have the same tools, which probably is not ideal.

I think it's better if I add them in the EP itself.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the same in the EP.

@tssurya tssurya Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cali0707 Is upstream kubernetes-mcp-server planning to stay CNI agnostic like Kubernetes in general is? - I guess yes?
OR are there plans to allow calico, cilium, ovn-kubernetes, and other CNIs to add their stack troubleshooting ? I guess this decision depends on scope of kubernetes-mcp-server project..

@tssurya tssurya left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nicely written!

some inline comments/questions


## Motivation

OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows), OVS bridge and OpenFlow inspection (`ovs-ofctl` and related helpers), host-oriented diagnostics, and packet or kernel-level capture workflows while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think all sections call out the benefits of a single MCP server which is great - but main thing this enhancement brings is in "missing MCP tools in OCP for troubleshooting networking issues" - let's make that the main intent - its to provide the existing upstream mcp-server tools to support/operators/end-users to troubleshoot networking issues - I know its self-implied but let's call that part out.
i.e OCP MCP Server doesn't have core networking tools exposed like ovn, ovs ctl etc


### Non-Goals

- Full parity with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example must-gather, sosreport) where those require separate dependencies, images, or product workflows outside this MCP integration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the reason we don't bring in must-gather and sosreport is because they already exist downstream right - not because of any dependencies?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial plan is to add the tools for llive-cluster debugging. The next step is to add the tools for offline debugging.

The must-gather tool is added in openshift-mcp-server, but it doesn't have the network debugging functionality that is available in upstream ovn-kubernetes-mcp repo. The dependency here is on the availability of ovsdb-tool, since the the network tools use it for getting the relevant information.

So, it'll need additional considerations of how these tools can be integrated into openshift-mcp-server and is not part of the current integration effort.

- Enable openshift-mcp-server to run in-cluster troubleshooting for this toolset: OVN/OVS commands via existing pod-exec into suitable pods, and **`kernel`** / **`network-tools`** flows via whatever node-level debugging or host access path those upstream handlers require, implemented **as part of this same integration** (expect refactoring in **ovn-kubernetes-mcp** and **kubernetes-mcp-server**/**openshift-mcp-server** so execution is delegated cleanly to the host).
- Import the full handler sets from ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** into openshift-mcp-server’s OVN-Kubernetes tool registration, subject only to exclusions in Non-Goals.
- Ship the toolset to OpenShift users in openshift-mcp-server product builds (versioning and packaging follow that repository’s release process).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if OCP is running a cluster where the CNI is ovn-kubernetes? Do we have a way to turn it off in ocp-mcp-server? is that part of the goals?
example, our tools shouldn't be exposed if there is no openshift-ovn-kubernetes namespace even.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default only core and config toolsets will be enabled by default. Other toolsets have to be explicitly enabled for usage: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md#toolsets-and-functionality


### Workflow Description

1. An operator configures MCP clients (for example Cursor, other MCP hosts) to use openshift-mcp-server with a kubeconfig that can reach the target cluster and satisfies RBAC for pod read and pod exec where policies allow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any docs around the ocp-mcp-server product usage for end users since its already tech preview? I'm curious to see what's the workflow outlined for ocp-users to install the server and use it
we might benefit from referencing that here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is available in the github repo: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md#deployment-and-architectural-guardrails

I didn't find any docs in docs.redhat.com. I'll cross check with the openshift-mcp-server folks.


### Risks and Mitigations

- **RBAC and privilege:** Pod exec and node-level debugging are sensitive. Mitigation: reuse openshift-mcp-server permission models for `pods/exec`, node-scoped operations, and any debug-pod workflows; document required roles; keep tools read-only where possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity what's the permission model ocp-mcp-server is using? any doc links to their design - speaking of which if ocp-mcp-server had a design doc we should include that here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any separate design docs. I'll check with the maintainers.


- **Logs:** API server **audit logs** may record `pods/exec` and node- or debug-related API calls according to cluster policy. **openshift-mcp-server** logs should show handler errors, including which execution path failed (pod exec versus node debug). For node-debug failures, correlate MCP server timestamps with events on the target node and any debug pod namespace the integration uses.

- **Disable:** Disable or unregister the `ovn-kubernetes` toolset in MCP deployment configuration (exact mechanism depends on openshift-mcp-server packaging); no cluster-side toggle is defined here. Disabling the whole MCP server removes all toolsets, including OVN-Kubernetes; there is no per-path cluster toggle for pod exec versus node debug in this enhancement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have docs around this in how ocp-mcp-server does this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toolsets are by default disabled. They have to explicitly enabled. Once enabled, they can be disabled by removing the toolset names: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md#toolsets-and-functionality

### Dev Preview -> Tech Preview

- Imported OVN-Kubernetes MCP tools (OVN/OVS, **`kernel`**, **`network-tools`**) usable end to end against representative clusters where RBAC and cluster policy allow the required pod and node-level operations.
- Clear documentation for namespace/pod selection, node or debug-pod selection where applicable, and permissions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are the docs for ocp-mcp-server present? are we working closely with the docs team on what we plan to document as supported tools? - i think we are missing s documentation section..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I haven't found anything on the docs.redhat.com. The userguide is available in the github repo: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md


- **Disable:** Disable or unregister the `ovn-kubernetes` toolset in MCP deployment configuration (exact mechanism depends on openshift-mcp-server packaging); no cluster-side toggle is defined here. Disabling the whole MCP server removes all toolsets, including OVN-Kubernetes; there is no per-path cluster toggle for pod exec versus node debug in this enhancement.

## Infrastructure Needed [optional]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this at some point also to consider adding opensource models for CI - but we need to check with ocp-mcp-server team d/s on how they do this and if we can use or reuse that for u/s as well

for offline debugging using must-gather/sosreport - is that not in scope? -

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evals are configured to run using claude, gemini, openai. For each of them the corresponding API token has to be provided.

- **Logs:** API server **audit logs** may record `pods/exec` and node- or debug-related API calls according to cluster policy. **openshift-mcp-server** logs should show handler errors, including which execution path failed (pod exec versus node debug). For node-debug failures, correlate MCP server timestamps with events on the target node and any debug pod namespace the integration uses.

- **Disable:** Disable or unregister the `ovn-kubernetes` toolset in MCP deployment configuration (exact mechanism depends on openshift-mcp-server packaging); no cluster-side toggle is defined here. Disabling the whole MCP server removes all toolsets, including OVN-Kubernetes; there is no per-path cluster toggle for pod exec versus node debug in this enhancement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need some kind of perf/scale section? even if its adding some open ended questions still better to think about it than not have it - example number of tools, tool callback time evals (depends on where the model is running i guess) - but im curious to see if the ocp-mcp-server folks had any thoughts around this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the requirement for any tool to be added is to have evals added for it and the evals passing the minimum criteria. I am not aware of any perfscale requirement for now. I'll check with the maintainers regarding this.


- **Unit tests:** Ensure imported tool implementations can be exercised without requiring a live cluster (for example by substituting test doubles for in-cluster command execution and validating command construction and output handling), including **`kernel`** and **`network-tools`** handlers where feasible.
- **Integration:** Validate the `ovn-kubernetes` toolset end to end in openshift-mcp-server: pod-exec paths for OVN/OVS, and node-level paths for **`kernel`** / **`network-tools`** as implemented for this integration.
- **Manual:** Run MCP tool calls against a cluster with OVN-Kubernetes installed, verifying OVN/OVS output for a known `ovnkube-node` pod and representative **`kernel`** / **`network-tools`** scenarios supported by the cluster.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the testing scenarios we are targeting? - are we planning to induce something and then check if tools are executed in the right ordering and its doing top-down flow etc?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evals have a format of providing a prompt and the response of the prompt needs to pass a verification step. For now most of the existing evals are using simple scenarios so that the corresponding tools are called and the response is verified.

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@arkadeepsen: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants