[WIP] skills: Add monitoring skills (prometheus, monitoring-ops)#7
[WIP] skills: Add monitoring skills (prometheus, monitoring-ops)#7harche wants to merge 1 commit into
Conversation
Add prometheus skill for querying cluster metrics via promtool, and monitoring-ops skill for troubleshooting the OpenShift monitoring stack. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: harche The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@harche: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
CMO only seems a bit limiting. From our experience and evals, a non trivial troubleshooting skills involve multi signals and operators. IMO, the scope of the directory should be broader. The intent is more "observability", "troubleshooting", and so on. @harche @mrunalp @iNecas @xiormeesh let me know wdyt. Then we can decide whether to expand this one or add another (to check/eval if overlapping skills will confuse the LLM...) |
|
It's not clear whether its about troubleshooting the operator, or using monitoring for troubleshooting in general. It might be worth splitting that SKILL to two:
And having something separate perhaps for wider troubleshooting scenarios (that @falox is likely referring). As of content, only evals can tell. |
|
Also I would like to remove myself from OWNERS file, looking for the github handles from the observability team to own monitoring/observability related skills. |
@harche You did great, and we're happily in :) If that's okay with you, I'll clarify a couple of points using this PR thread. If necessary, I will open another PR later. Ok? |
sounds great, again, thanks for your inputs. |
| approvers: | ||
| - harche | ||
| - mrunalp | ||
| - wking |
There was a problem hiding this comment.
Definitely not me here 😅 Plenty of responsibilty already between the cluster-update directory and other repositories.
Summary
monitoring/prometheus/— Prometheus query skill usingpromtool, ported fromcluster-version-operator/lightspeed/skills/prometheus/monitoring/monitoring-ops/— OpenShift monitoring stack troubleshooting skill (Prometheus, Alertmanager, Thanos), ported fromcluster-monitoring-operator/lightspeed/skills/monitoring-ops/monitoring/OWNERSandmonitoring/README.mdReviewers
These skills relate to the Cluster Monitoring Operator domain.
Test plan
prometheusSKILL.md allowed-tools includesBash(promtool:*)monitoring-opsSKILL.md covers Prometheus, Alertmanager, and Thanos troubleshooting flows🤖 Generated with Claude Code