Skip to content

[WIP] skills: Add monitoring skills (prometheus, monitoring-ops)#7

Open
harche wants to merge 1 commit into
openshift:mainfrom
harche:skills/monitoring
Open

[WIP] skills: Add monitoring skills (prometheus, monitoring-ops)#7
harche wants to merge 1 commit into
openshift:mainfrom
harche:skills/monitoring

Conversation

@harche
Copy link
Copy Markdown
Contributor

@harche harche commented Apr 21, 2026

Summary

  • Adds monitoring/prometheus/ — Prometheus query skill using promtool, ported from cluster-version-operator/lightspeed/skills/prometheus/
  • Adds monitoring/monitoring-ops/ — OpenShift monitoring stack troubleshooting skill (Prometheus, Alertmanager, Thanos), ported from cluster-monitoring-operator/lightspeed/skills/monitoring-ops/
  • Adds monitoring/OWNERS and monitoring/README.md

Reviewers

These skills relate to the Cluster Monitoring Operator domain.

Test plan

  • Verify prometheus SKILL.md allowed-tools includes Bash(promtool:*)
  • Verify monitoring-ops SKILL.md covers Prometheus, Alertmanager, and Thanos troubleshooting flows
  • Smoke-test skill loading in a Claude Code session with a live cluster

Note: These are initial drafts. They will evolve as we test and refine them based on real-world usage.

🤖 Generated with Claude Code

Add prometheus skill for querying cluster metrics via promtool,
and monitoring-ops skill for troubleshooting the OpenShift
monitoring stack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@openshift-ci openshift-ci Bot requested a review from mrunalp April 21, 2026 20:30
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harche

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 21, 2026

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@falox
Copy link
Copy Markdown

falox commented Apr 22, 2026

CMO only seems a bit limiting. From our experience and evals, a non trivial troubleshooting skills involve multi signals and operators. IMO, the scope of the directory should be broader. The intent is more "observability", "troubleshooting", and so on.

@harche @mrunalp @iNecas @xiormeesh let me know wdyt. Then we can decide whether to expand this one or add another (to check/eval if overlapping skills will confuse the LLM...)

@iNecas
Copy link
Copy Markdown

iNecas commented Apr 22, 2026

It's not clear whether its about troubleshooting the operator, or using monitoring for troubleshooting in general. It might be worth splitting that SKILL to two:

  1. troubleshooting the oprator
  2. querying the in-cluster prometheus

And having something separate perhaps for wider troubleshooting scenarios (that @falox is likely referring). As of content, only evals can tell.

@harche
Copy link
Copy Markdown
Contributor Author

harche commented Apr 22, 2026

@falox @iNecas thanks for this input, I don't mind closing this in favor of alternatives, I just wanted to get a conversation started around skills from individual teams.

@harche
Copy link
Copy Markdown
Contributor Author

harche commented Apr 22, 2026

Also I would like to remove myself from OWNERS file, looking for the github handles from the observability team to own monitoring/observability related skills.

@falox
Copy link
Copy Markdown

falox commented Apr 22, 2026

I just wanted to get a conversation started around skills from individual teams.

@harche You did great, and we're happily in :) If that's okay with you, I'll clarify a couple of points using this PR thread. If necessary, I will open another PR later. Ok?

@harche
Copy link
Copy Markdown
Contributor Author

harche commented Apr 22, 2026

I just wanted to get a conversation started around skills from individual teams.

@harche You did great, and we're happily in :) If that's okay with you, I'll clarify a couple of points using this PR thread. If necessary, I will open another PR later. Ok?

sounds great, again, thanks for your inputs.

Comment thread monitoring/OWNERS
approvers:
- harche
- mrunalp
- wking
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not me here 😅 Plenty of responsibilty already between the cluster-update directory and other repositories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants