CORENET-7091: Add enhancement proposal to productize ovn-kubernetes MCP tools#2002
CORENET-7091: Add enhancement proposal to productize ovn-kubernetes MCP tools#2002arkadeepsen wants to merge 1 commit into
Conversation
|
@arkadeepsen: This pull request references CORENET-7091 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
|
||
| OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows) while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface. | ||
|
|
||
| The primary motivation for landing these tools in upstream kubernetes-mcp-server is **productization via downstream sync into openshift-mcp-server**. By first integrating the OVN toolset upstream, OpenShift can ship and support the same upstream code through the established downstream pipeline. |
There was a problem hiding this comment.
Instead of productization, can we say that keeping all Openshft related MCP servers in a single repository is the main motivation? or we can keep both.
There was a problem hiding this comment.
Added a line stating that the ovnk tools can be consumed from the same ocp mcp server.
| kms --> Sync | ||
| ``` | ||
|
|
||
| **Downstream.** openshift-mcp-server consumes kubernetes-mcp-server changes through its normal fork sync or vendor workflow (exact mechanics follow that repository’s documented process). |
There was a problem hiding this comment.
Dont we want to add any more implementation details, like what exact tools will be added and what purpose those may serve?
There was a problem hiding this comment.
I have added that all the tools under ovn and ovs packages will be added to the OCP MCP server. I have added some more details about how to go about the implementation. I didn't want to add specific details of the the local PoC I did as that might not be the only way of implementing the integration.
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools. | ||
| - Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature. | ||
| - Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server. |
There was a problem hiding this comment.
Is having an automated sync mechanism between ovn-mcp-server, kubernetes-mcp-server and openshift-mcp-server also a goal of this feature?
There was a problem hiding this comment.
The current plan is to import the the packages from ovn-kubernetes-mcp repo. Thus, whenever we need the latest changes in kubernetes-mcp-server, the go.mod and go.sum files can be updated to refer to the latest changes from ovn-kubernetes-mcp repo. Regarding the automation, since kubernetes-mcp-server is in a separate upstream repo where we are not maintainers, not sure whether adding the automatic sync process as part of this EP would be appropriate. We can figure that part out, if needed, in the future. For now, we'll just bump the import as we do for k8s bump in the different repos.
There was a problem hiding this comment.
Given that must-gather is downstream specific, bringing it into the kubernetes-mcp-server would not be a problem, right?
There was a problem hiding this comment.
There's already an existing downstream effort for must-gather. It differs from how it's been implemented in ovn-kubernetes-mcp repo. If we want to integrate the networking bits from the must-gather tool, we'll have to do it in the openshift-mcp-server directly, as kubernetes-mcp-server won't have must-gather related tools.
There was a problem hiding this comment.
okay, we can skip using must-gather tool from ovn-kubernetes-mcp and use the existing one. We can try to directly add networking bits to kubernetes-mcp-server to imitate behaviour in ovn-kubernetes-mcp. Can we consider this one of the goals?
There was a problem hiding this comment.
It will not work on kubernetes-mcp-server as the must-gather implementation is in downstream openshift-mcp-server.
aaccb39 to
bbf81a5
Compare
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools. | ||
| - Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature. | ||
| - Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server. |
There was a problem hiding this comment.
Given that must-gather is downstream specific, bringing it into the kubernetes-mcp-server would not be a problem, right?
|
|
||
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN/OVS tool commands in-cluster using its existing pod-exec capabilities, with only minor refactoring required in **ovn-kubernetes-mcp** and **kubernetes-mcp-server** to integrate that pod-exec path cleanly. | ||
| - Import the full OVN and OVS handler set from ovn-kubernetes-mcp (`pkg/ovn/mcp` and `pkg/ovs/mcp`) into the `ovn-kubernetes` toolset, while other upstream packages stay excluded per Non-Goals. |
There was a problem hiding this comment.
As per today's discussion, we should mention kernel and sosreport tools which are helpful to explore node's kernel resources.
There was a problem hiding this comment.
This is already changed.
| - Full parity in the first iteration with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example kernel diagnostics, optional images such as pwru/tcpdump, must-gather, sosreport) where those require separate dependencies, images, or workflows. | ||
| - New Kubernetes or OpenShift APIs, CRDs, operators, or cluster-side agents solely for this feature. | ||
| - Replacing existing CLI-based troubleshooting; MCP tools are an additional interface. | ||
| - Importing ovn-kubernetes-mcp tools under `kernel` and `network-tools` packages in the first iteration, since those tools depend on a node debugging capability (for example a node-debug tool) that is not currently available in kubernetes-mcp-server. |
There was a problem hiding this comment.
This is already changed.
|
|
||
| ### Non-Goals | ||
|
|
||
| - Full parity in the first iteration with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example kernel diagnostics, optional images such as pwru/tcpdump, must-gather, sosreport) where those require separate dependencies, images, or workflows. |
There was a problem hiding this comment.
Since ovn-kubernetes-mcp is an upstream repo, we can't expect all current and future tools to be applicable to an OpenShift environment.
Given that we plan to import the packages from ovn-kubernetes-mcp repo, how should we control access to tools that may not be supported?
There was a problem hiding this comment.
We're only going to call the handlers of the tools which are supported. The import is for the packages where these handlers are defined. Unsupported handlers should not be used.
| - https://redhat.atlassian.net/browse/CORENET-7091 | ||
| see-also: | ||
| - https://github.com/ovn-kubernetes/ovn-kubernetes-mcp | ||
| - https://github.com/containers/kubernetes-mcp-server |
There was a problem hiding this comment.
NIT: do we need kubernetes-mcp-server and openshift-mcp-server here?
There was a problem hiding this comment.
Yes. Since the implementation of the EP will impact all the repos, we need all of them to be incuded here.
| ### User Stories | ||
|
|
||
| - As a cluster administrator or platform engineer, I want OVN-Kubernetes MCP troubleshooting tools in the same MCP server I already use for Kubernetes resources, so that I do not have to deploy, operate, or manage authentication for a second MCP server dedicated only to OVN-Kubernetes. | ||
| - As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports—NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling—so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident. |
There was a problem hiding this comment.
| - As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports—NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling—so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident. | |
| - As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports (NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling) so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident. |
There was a problem hiding this comment.
There's already a bracket between the dashes.
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools. | ||
| - Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature. | ||
| - Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server. |
There was a problem hiding this comment.
okay, we can skip using must-gather tool from ovn-kubernetes-mcp and use the existing one. We can try to directly add networking bits to kubernetes-mcp-server to imitate behaviour in ovn-kubernetes-mcp. Can we consider this one of the goals?
|
|
||
| **Importing upstream tools into kubernetes-mcp-server.** The OVN troubleshooting MCP tools already exist in ovn-kubernetes-mcp. The integration approach for kubernetes-mcp-server is to add an `ovn-kubernetes` toolset that reuses those implementations as imported packages and exposes them through kubernetes-mcp-server’s tool registration. | ||
|
|
||
| **Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation. |
There was a problem hiding this comment.
I understand that kubernetes-macp-server is building its own node-debug method to allow host access using kubectl/oc CLI. However, in ovn-kubernetes-mcp we use a different method to do node debug for kernel and other network tools. I wonder how we can use tools from ovn-kubernetes-mcp while using the utility from kubernetes-mcp-server, considering it's downstream of ovn-kubernetes-mcp.
There was a problem hiding this comment.
The same way we'll use pod-exec from kubernetes-mcp-server for the OVN/OVS tools. The function definition should be similar, that is, the argument list and the return type should be same in both ovn-kubernetes-mcp and kubernetes-mcp-server, for the node-debug function, which will be called by the kernel and the network-tools handlers.
There was a problem hiding this comment.
Shall we mention this explicitly in the document? for what I understand the current kubernetes-mcp-server does not have any node-debug method capability so far. so if that needs to be implemented is worth to call it out in this section.
There was a problem hiding this comment.
The effort is ongoing. I had a discussion with Surya and she had mentioned that in the EP we'll assume that the node-debug tool exists. We can expedite the merrging of the node-debug tool PR by helping with reviews, so that we can get started with the integration of the kernel/network-tools.
|
|
||
| **Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation. | ||
|
|
||
| **Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces—must-gather, sosreport, and similar—remain out of scope unless separately agreed; see Non-Goals. |
There was a problem hiding this comment.
| **Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces—must-gather, sosreport, and similar—remain out of scope unless separately agreed; see Non-Goals. | |
| **Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces (must-gather, sosreport, and similar) remain out of scope unless separately agreed; see Non-Goals. |
|
|
||
| ## Open Questions | ||
|
|
||
| - How to structure mcpchecker suites or task labels so OVN/OVS, **`kernel`**, and **`network-tools`** coverage stays maintainable under kubernetes-mcp-server’s pass-rate gates, given differing cluster prerequisites? |
There was a problem hiding this comment.
For the mcpchecker structure — since kernel and network-tools require privileged node access which may not be available in all CI environments, would it make sense to have separate suites for OVN/OVS and kernel/network-tools so their pass rates are tracked independently?
There was a problem hiding this comment.
I am more inclined towards creating a separate suite for each layer of ovnk mcp server tools. That is for each of OVN, OVS, kernel and network-tools, we'll have separate evals suites. But we can take a call when working on the evals for the tools.
mattedallo
left a comment
There was a problem hiding this comment.
lgtm
I added some "non blocking" comments.
|
|
||
| OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows), host-oriented diagnostics, and packet or kernel-level capture workflows while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface. | ||
|
|
||
| The primary motivation for landing these tools in upstream kubernetes-mcp-server is **productization via downstream sync into openshift-mcp-server**. By first integrating the OVN toolset upstream, OpenShift can ship and support the same upstream code through the established downstream pipeline. This also lets OpenShift customers consume the OVN-Kubernetes tools from the same MCP server as the rest of the platform troubleshooting surface, openshift-mcp-server, after downstream sync. |
There was a problem hiding this comment.
Nit: maybe we can expand a bit what is the cost we are saving on exploiting the existing openshift-mcp-server productization pipeline.
That will strength the motivation of integrating versus keeping it separate.
There was a problem hiding this comment.
Added some more details in the motivation section.
|
|
||
| None. This work adds MCP tools only and does not extend the OpenShift or Kubernetes API surface. | ||
|
|
||
| ### Topology Considerations |
There was a problem hiding this comment.
Minor note : the topology section seems written with the local binary deployment model in mind. It might be worth a brief mention that the same considerations apply for in-cluster deployments, or a note that the OVN-K tools inherit whatever cluster-access model kubernetes-mcp-server provides.
There was a problem hiding this comment.
The kubeconfig is mentioned specifically regarding hypershift since it has managment cluster and the hosted (guest) cluster which have separate kubeconfigs. The deployment is by default expected to be in-cluster and not local.
|
|
||
| **Importing upstream tools into kubernetes-mcp-server.** The OVN troubleshooting MCP tools already exist in ovn-kubernetes-mcp. The integration approach for kubernetes-mcp-server is to add an `ovn-kubernetes` toolset that reuses those implementations as imported packages and exposes them through kubernetes-mcp-server’s tool registration. | ||
|
|
||
| **Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation. |
There was a problem hiding this comment.
Shall we mention this explicitly in the document? for what I understand the current kubernetes-mcp-server does not have any node-debug method capability so far. so if that needs to be implemented is worth to call it out in this section.
|
|
||
| **Split of work:** kubernetes-mcp-server decides how each capability is exposed to MCP users (tool names and parameters). ovn-kubernetes-mcp keeps handler logic that validates inputs, builds command lines, and defines execution contracts; kubernetes-mcp-server integrates by calling those libraries and supplying pod exec, node-level debugging, or other supported cluster operations against the target cluster. | ||
|
|
||
| ```mermaid |
There was a problem hiding this comment.
On the diagram few things tripped me up:
- The main call relationship (kubernetes-mcp-server's tool handler calling ovn-kubernetes-mcp's imported handler logic) isn't shown that's the core of the integration.
- "delegated_in_cluster_execution" sits inside the ovn-kubernetes-mcp box, but the actual execution will happen in kubernetes-mcp-server's client AFAIU. ovn-kubernetes-mcp defines the contract/interface; kubernetes-mcp-server implements it.
- The box only shows "OVN_OVS" but kernel and network-tools are also in scope, with a different execution path (node-debug vs pod-exec).
- The two subgraphs connected by a dotted arrow could be read as two separate services communicating at runtime, when in practice ovn-kubernetes-mcp will be compiled into kubernetes-mcp-server as an imported Go package.
Would something like this be more accurate? Let me know your thoughts
flowchart TB
subgraph kms [kubernetes-mcp-server process]
ToolHandler["Tool handler\n(defines MCP tool name, schema)"]
subgraph ovnkLib ["ovn-kubernetes-mcp (imported Go package)"]
HandlerLogic["Handler logic\n(validates inputs, builds commands)"]
end
subgraph executor [kubernetes-mcp-server K8s client]
PodExec["PodExec\n(OVN/OVS tools)"]
NodeDebug["NodeDebug\n(kernel / network-tools)"]
end
ToolHandler -->|"calls imported package"| HandlerLogic
HandlerLogic -->|"calls injected executor"| PodExec
HandlerLogic -->|"calls injected executor"| NodeDebug
PodExec -->|"exec in ovnkube pod"| Cluster["Cluster"]
NodeDebug -->|"privileged debug pod on node"| Cluster
end
There was a problem hiding this comment.
I have removed the mermaid diagram as it was getting messier. Hope the latest diagram helps in conveying the integration more clearly.
|
|
||
| None. This work adds MCP tools only and does not extend the OpenShift or Kubernetes API surface. | ||
|
|
||
| ### Topology Considerations |
There was a problem hiding this comment.
Minor: I would add a sentence under Topology Considerations to introduce what each section is going to address and to clarify that the OVN-K toolset inherits openshift-mcp-server's existing cluster-access mechanisms.
Something like:
The OVN-Kubernetes toolset uses openshift-mcp-server's existing pod-exec and node-debug capabilities and does not introduce new cluster-access mechanisms or deployment requirements. The considerations below describe topology-specific implications for those underlying capabilities, not for the OVN-K tools themselves.
|
|
||
| #### Hypershift / Hosted Control Planes | ||
|
|
||
| The MCP server uses whatever cluster the kubeconfig targets. For HyperShift, that is typically the **hosted cluster** API when troubleshooting workload networking; there is no change to management-plane APIs. Operators must select the correct context (management versus guest) the same way they would for `kubectl exec`. |
There was a problem hiding this comment.
Nit (Optional): I found this subsection a bit hard to follow without prior HyperShift context. A brief mention of where the OVN-K components live in HyperShift (ovnkube-node on the hosted cluster, control-plane on the management cluster) would help the reader understand why the hosted cluster API is the right target. It would also be useful to note that this is inherited from openshift-mcp-server's existing cluster-targeting behavior rather than something new introduced by this feature.
Totally optional comment, just thinking about readers who aren't deeply familiar with HyperShift topology.
One possible rewording (feel free to ignore or adapt) :
The OVN-K toolset inherits openshift-mcp-server's existing cluster-targeting behavior and does not introduce any HyperShift-specific logic.
In HyperShift, ovnkube-node pods -- which contain the per-node OVN NB/SB databases, northd, and ovn-controller -- run on the hosted cluster worker nodes. All OVN-K troubleshooting targets (pod exec into ovnkube-node, node-debug for kernel/network-tools) therefore require the MCP server to reach the hosted cluster API, not the management cluster. The lightweight ovnkube-control-plane on the management cluster is not targeted by any tool in this toolset.
This is the same cluster-selection requirement that applies to any openshift-mcp-server toolset targeting workload-cluster resources. In kubeconfig mode, the operator selects the hosted cluster context; in an in-cluster deployment, the server must be deployed into (or configured to reach) the hosted cluster.
|
|
||
| #### Standalone Clusters | ||
|
|
||
| Fully relevant: tools execute against pods on the same cluster the API client reaches. |
There was a problem hiding this comment.
Minor nit: the "Fully relevant:" is unclear what refers to. Maybe "No special considerations:" is more clear.
|
|
||
| **Split of work:** kubernetes-mcp-server decides how each capability is exposed to MCP users (tool names and parameters). ovn-kubernetes-mcp keeps handler logic that validates inputs, builds command lines, and defines execution contracts; kubernetes-mcp-server integrates by calling those libraries and supplying pod exec, node-level debugging, or other supported cluster operations against the target cluster. | ||
|
|
||
| ```mermaid |
|
|
||
| ## Motivation | ||
|
|
||
| OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows), host-oriented diagnostics, and packet or kernel-level capture workflows while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface. |
There was a problem hiding this comment.
NIT: This does not mention OVS OpenFlows. I do agree that we have mentioned this later on in the enhancement, and if you want to ignore it, that should be fine.
There was a problem hiding this comment.
Updated the motivation.
|
lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: arghosh93, mattedallo, taanyas The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@tssurya PTAL |
|
|
||
| ## Alternatives (Not Implemented) | ||
|
|
||
| - **Add the OVN toolset to kubernetes-mcp-server first, then rely on downstream sync into openshift-mcp-server:** Not chosen for this enhancement because OpenShift is landing the integration directly in openshift-mcp-server to ship on product cadence without gating on upstream kubernetes-mcp-server acceptance, release, and fork sync timing. The import-and-delegate pattern remains the same; a future upstream integration could still reduce long-term duplication if both codebases converge. |
There was a problem hiding this comment.
I'm not sure what the concern around waiting for upstream merge and sync is, as upstream is currently 100% Red Hat. In general whenever a toolset has no hard requirements on openshift APIs we prefer to land upstream. Is the issue that this toolset requires openshift specifics?
Currently the upstream is synced downstream about 1-2 times per week, and can be done more frequently when needed. cc @mattedallo
There was a problem hiding this comment.
I have mentioned the other reasons here: openshift/openshift-mcp-server#315 (comment)
Adding the same here for easy readability:
one of the reasons not to add these tools to upstream is because some of the ovnk tools need the node-debug functionality and as per some existing conversations it seems that adding that tool upstream is not in current plans. Additionally, upstream k8s-mcp-server might want to be CNI agnostic whereas for openshift-mcp-server these tools will be very useful as most customers use ovnk as the CNI. We already have a separate upstream repo for ovnk mcp server (https://github.com/ovn-kubernetes/ovn-kubernetes-mcp) and thus adding these tools in k8s-mcp-server will mean that 2 separate upstream projects have the same tools, which probably is not ideal.
I think it's better if I add them in the EP itself.
There was a problem hiding this comment.
Added the same in the EP.
There was a problem hiding this comment.
@Cali0707 Is upstream kubernetes-mcp-server planning to stay CNI agnostic like Kubernetes in general is? - I guess yes?
OR are there plans to allow calico, cilium, ovn-kubernetes, and other CNIs to add their stack troubleshooting ? I guess this decision depends on scope of kubernetes-mcp-server project..
tssurya
left a comment
There was a problem hiding this comment.
nicely written!
some inline comments/questions
|
|
||
| ## Motivation | ||
|
|
||
| OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows), OVS bridge and OpenFlow inspection (`ovs-ofctl` and related helpers), host-oriented diagnostics, and packet or kernel-level capture workflows while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface. |
There was a problem hiding this comment.
nit: I think all sections call out the benefits of a single MCP server which is great - but main thing this enhancement brings is in "missing MCP tools in OCP for troubleshooting networking issues" - let's make that the main intent - its to provide the existing upstream mcp-server tools to support/operators/end-users to troubleshoot networking issues - I know its self-implied but let's call that part out.
i.e OCP MCP Server doesn't have core networking tools exposed like ovn, ovs ctl etc
|
|
||
| ### Non-Goals | ||
|
|
||
| - Full parity with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example must-gather, sosreport) where those require separate dependencies, images, or product workflows outside this MCP integration. |
There was a problem hiding this comment.
i think the reason we don't bring in must-gather and sosreport is because they already exist downstream right - not because of any dependencies?
There was a problem hiding this comment.
The initial plan is to add the tools for llive-cluster debugging. The next step is to add the tools for offline debugging.
The must-gather tool is added in openshift-mcp-server, but it doesn't have the network debugging functionality that is available in upstream ovn-kubernetes-mcp repo. The dependency here is on the availability of ovsdb-tool, since the the network tools use it for getting the relevant information.
So, it'll need additional considerations of how these tools can be integrated into openshift-mcp-server and is not part of the current integration effort.
| - Enable openshift-mcp-server to run in-cluster troubleshooting for this toolset: OVN/OVS commands via existing pod-exec into suitable pods, and **`kernel`** / **`network-tools`** flows via whatever node-level debugging or host access path those upstream handlers require, implemented **as part of this same integration** (expect refactoring in **ovn-kubernetes-mcp** and **kubernetes-mcp-server**/**openshift-mcp-server** so execution is delegated cleanly to the host). | ||
| - Import the full handler sets from ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** into openshift-mcp-server’s OVN-Kubernetes tool registration, subject only to exclusions in Non-Goals. | ||
| - Ship the toolset to OpenShift users in openshift-mcp-server product builds (versioning and packaging follow that repository’s release process). | ||
|
|
There was a problem hiding this comment.
What happens if OCP is running a cluster where the CNI is ovn-kubernetes? Do we have a way to turn it off in ocp-mcp-server? is that part of the goals?
example, our tools shouldn't be exposed if there is no openshift-ovn-kubernetes namespace even.
There was a problem hiding this comment.
By default only core and config toolsets will be enabled by default. Other toolsets have to be explicitly enabled for usage: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md#toolsets-and-functionality
|
|
||
| ### Workflow Description | ||
|
|
||
| 1. An operator configures MCP clients (for example Cursor, other MCP hosts) to use openshift-mcp-server with a kubeconfig that can reach the target cluster and satisfies RBAC for pod read and pod exec where policies allow. |
There was a problem hiding this comment.
are there any docs around the ocp-mcp-server product usage for end users since its already tech preview? I'm curious to see what's the workflow outlined for ocp-users to install the server and use it
we might benefit from referencing that here
There was a problem hiding this comment.
The documentation is available in the github repo: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md#deployment-and-architectural-guardrails
I didn't find any docs in docs.redhat.com. I'll cross check with the openshift-mcp-server folks.
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| - **RBAC and privilege:** Pod exec and node-level debugging are sensitive. Mitigation: reuse openshift-mcp-server permission models for `pods/exec`, node-scoped operations, and any debug-pod workflows; document required roles; keep tools read-only where possible. |
There was a problem hiding this comment.
out of curiosity what's the permission model ocp-mcp-server is using? any doc links to their design - speaking of which if ocp-mcp-server had a design doc we should include that here
There was a problem hiding this comment.
I didn't find any separate design docs. I'll check with the maintainers.
|
|
||
| - **Logs:** API server **audit logs** may record `pods/exec` and node- or debug-related API calls according to cluster policy. **openshift-mcp-server** logs should show handler errors, including which execution path failed (pod exec versus node debug). For node-debug failures, correlate MCP server timestamps with events on the target node and any debug pod namespace the integration uses. | ||
|
|
||
| - **Disable:** Disable or unregister the `ovn-kubernetes` toolset in MCP deployment configuration (exact mechanism depends on openshift-mcp-server packaging); no cluster-side toggle is defined here. Disabling the whole MCP server removes all toolsets, including OVN-Kubernetes; there is no per-path cluster toggle for pod exec versus node debug in this enhancement. |
There was a problem hiding this comment.
do we have docs around this in how ocp-mcp-server does this?
There was a problem hiding this comment.
Toolsets are by default disabled. They have to explicitly enabled. Once enabled, they can be disabled by removing the toolset names: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md#toolsets-and-functionality
| ### Dev Preview -> Tech Preview | ||
|
|
||
| - Imported OVN-Kubernetes MCP tools (OVN/OVS, **`kernel`**, **`network-tools`**) usable end to end against representative clusters where RBAC and cluster policy allow the required pod and node-level operations. | ||
| - Clear documentation for namespace/pod selection, node or debug-pod selection where applicable, and permissions. |
There was a problem hiding this comment.
where are the docs for ocp-mcp-server present? are we working closely with the docs team on what we plan to document as supported tools? - i think we are missing s documentation section..
There was a problem hiding this comment.
For now I haven't found anything on the docs.redhat.com. The userguide is available in the github repo: https://github.com/openshift/openshift-mcp-server/blob/main/docs/openshift/user-guide.md
|
|
||
| - **Disable:** Disable or unregister the `ovn-kubernetes` toolset in MCP deployment configuration (exact mechanism depends on openshift-mcp-server packaging); no cluster-side toggle is defined here. Disabling the whole MCP server removes all toolsets, including OVN-Kubernetes; there is no per-path cluster toggle for pod exec versus node debug in this enhancement. | ||
|
|
||
| ## Infrastructure Needed [optional] |
There was a problem hiding this comment.
I think we discussed this at some point also to consider adding opensource models for CI - but we need to check with ocp-mcp-server team d/s on how they do this and if we can use or reuse that for u/s as well
for offline debugging using must-gather/sosreport - is that not in scope? -
There was a problem hiding this comment.
The evals are configured to run using claude, gemini, openai. For each of them the corresponding API token has to be provided.
| - **Logs:** API server **audit logs** may record `pods/exec` and node- or debug-related API calls according to cluster policy. **openshift-mcp-server** logs should show handler errors, including which execution path failed (pod exec versus node debug). For node-debug failures, correlate MCP server timestamps with events on the target node and any debug pod namespace the integration uses. | ||
|
|
||
| - **Disable:** Disable or unregister the `ovn-kubernetes` toolset in MCP deployment configuration (exact mechanism depends on openshift-mcp-server packaging); no cluster-side toggle is defined here. Disabling the whole MCP server removes all toolsets, including OVN-Kubernetes; there is no per-path cluster toggle for pod exec versus node debug in this enhancement. | ||
|
|
There was a problem hiding this comment.
do we need some kind of perf/scale section? even if its adding some open ended questions still better to think about it than not have it - example number of tools, tool callback time evals (depends on where the model is running i guess) - but im curious to see if the ocp-mcp-server folks had any thoughts around this
There was a problem hiding this comment.
Currently the requirement for any tool to be added is to have evals added for it and the evals passing the minimum criteria. I am not aware of any perfscale requirement for now. I'll check with the maintainers regarding this.
|
|
||
| - **Unit tests:** Ensure imported tool implementations can be exercised without requiring a live cluster (for example by substituting test doubles for in-cluster command execution and validating command construction and output handling), including **`kernel`** and **`network-tools`** handlers where feasible. | ||
| - **Integration:** Validate the `ovn-kubernetes` toolset end to end in openshift-mcp-server: pod-exec paths for OVN/OVS, and node-level paths for **`kernel`** / **`network-tools`** as implemented for this integration. | ||
| - **Manual:** Run MCP tool calls against a cluster with OVN-Kubernetes installed, verifying OVN/OVS output for a known `ovnkube-node` pod and representative **`kernel`** / **`network-tools`** scenarios supported by the cluster. |
There was a problem hiding this comment.
what are the testing scenarios we are targeting? - are we planning to induce something and then check if tools are executed in the right ordering and its doing top-down flow etc?
There was a problem hiding this comment.
The evals have a format of providing a prompt and the response of the prompt needs to pass a verification step. For now most of the existing evals are using simple scenarios so that the corresponding tools are called and the response is verified.
|
@arkadeepsen: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.