kubenow — Kubernetes Resource Analysis & Cost Optimization

Version 0.3.3 — Deterministic resource analysis, policy-gated apply, and real-time monitoring for Kubernetes clusters.

Quickstart

# Install
go install github.com/ppiankov/kubenow/cmd/kubenow@latest
# Or download from releases: https://github.com/ppiankov/kubenow/releases/latest

# Monitor cluster problems (real-time TUI)
kubenow monitor

# Analyze over-provisioned resources
kubenow analyze requests-skew --prometheus-url http://localhost:9090

# High-resolution resource sampling for a workload
kubenow pro-monitor latch deployment/payment-api -n production

# Export recommendation as SSA patch
kubenow pro-monitor export deployment/payment-api -n production --format patch

Exit Codes

0 — Success
2 — Invalid input (bad flags, missing required args)
3 — Runtime error (cluster connection failed, query timeout)

What is kubenow?

A Kubernetes cluster analysis tool that combines:

Deterministic cost analysis — evidence-based resource optimization using Prometheus metrics
Pro-Monitor — policy-gated resource alignment with bounded Server-Side Apply
Real-time monitoring — attention-first TUI for cluster problems
LLM-assisted analysis — optional incident triage via any OpenAI-compatible API

What kubenow is NOT

Not an auto-scaler — presents evidence, never auto-adjusts resources without explicit consent
Not a service mesh — queries existing APIs, installs nothing into the cluster
Not an APM — no agents, no sidecars, no instrumentation
Not a predictor — reports what would have worked historically, never what will work
Not a replacement for monitoring — complements Prometheus, Grafana, and alerting

Project Status

Status: Beta · v0.3.3 · Pre-1.0

Milestone	Status
Core functionality	Complete
Test coverage >85%	Complete
Security audit	Complete
golangci-lint config	Complete
CI pipeline (test/lint/scan)	Complete
Homebrew distribution	Complete
Safety model documented	Complete
API stability guarantees	Partial
v1.0 release	Planned

Pre-1.0: CLI flags and JSON output schemas may change between minor versions. Exit codes (0/2/3) are stable.

Safety Model

kubenow is designed to be safe to run against production clusters. Every mode has structural guarantees — not just warnings.

Zero Cluster Footprint

kubenow installs nothing into your cluster. No agents, no sidecars, no CRDs, no webhooks. It reads existing APIs and exits. Uninstall means deleting the binary.

Read-Only by Default

Mode	Cluster Access	Writes?
`monitor`	Watch API (pods, events, nodes)	Never
`analyze requests-skew`	List API + Prometheus queries	Never
`analyze node-footprint`	List API + Prometheus queries	Never
`pro-monitor latch`	Metrics API (read)	Never
`pro-monitor export`	Read current workload	Never
`pro-monitor apply`	Server-Side Apply	Yes — only with policy file + confirmation

Only pro-monitor apply can mutate cluster state, and it requires all of the following:

Apply Guardrails (10+ Pre-Flight Checks)

Before any mutation, every condition must pass:

Admin policy file loaded and apply.enabled: true
Safety rating meets policy minimum (UNSAFE always blocked)
Namespace not denied by policy
No HPA conflict detected (unless explicitly acknowledged)
Latch data fresh (within policy max_latch_age, default 7 days)
Change deltas within policy bounds (max_request_delta_percent, max_limit_delta_percent)
Audit directory exists and is writable
Rate limit not exceeded (global and per-workload)
GitOps field manager conflict check (ArgoCD, Flux, Helm, Kustomize)
User confirmation prompt

If any check fails, apply is denied. No partial applies.

Immutable Audit Trail

Every apply attempt — successful or denied — creates an audit bundle:

20260221T143022Z__production__deployment__payment-api/
├── before.yaml      # workload state before apply
├── after.yaml       # workload state after apply
├── diff.patch       # unified diff of changes
└── decision.json    # full decision record (identity, evidence, guardrails, result)

Bounded Changes

Admins control maximum change magnitude via policy:

apply:
  max_request_delta_percent: 25   # no single change > 25%
  max_limit_delta_percent: 25
  allow_limit_decrease: false     # limits can only increase
  min_safety_rating: SAFE         # block CAUTION/RISKY/UNSAFE
rate_limits:
  max_applies_per_hour: 5
  max_applies_per_workload: 2

Reversible

Apply uses Kubernetes Server-Side Apply (SSA). Changes are standard resource patches — revert with kubectl apply using the before.yaml from the audit bundle, or let GitOps controllers reconcile back to the desired state.

Deterministic Analysis

requests-skew: Find Over-Provisioned Resources

Compares resource requests against actual Prometheus metrics over a configurable time window.

kubenow analyze requests-skew --prometheus-url http://prometheus:9090

# With namespace filtering and 7-day window
kubenow analyze requests-skew \
  --prometheus-url http://prometheus:9090 \
  --window 7d \
  --namespace-include "prod-*"

# SARIF output for CI integration
kubenow analyze requests-skew \
  --prometheus-url http://prometheus:9090 \
  --output sarif --export-file results.sarif

# Compare against saved baseline
kubenow analyze requests-skew \
  --prometheus-url http://prometheus:9090 \
  --compare-baseline baseline.json

Output:

=== Requests-Skew Analysis (Prometheus metrics only) ===

NAMESPACE  WORKLOAD         REQ CPU  P99 CPU  SKEW   SAFETY      IMPACT
prod       payment-api      4.0      3.8      8.0x   RISKY       HIGH (42.5)
prod       checkout-worker  2.0      0.5      6.7x   SAFE        MED (18.2)

Namespace Prometheus Status:
  production     368 series
  staging        142 series
  ads-fraud      no data — use pro-monitor latch for these workloads

Key features:

Safety analysis: OOMKills, restarts, CPU throttling, spike patterns
Safety ratings: SAFE, CAUTION, RISKY, UNSAFE with automatic margins
Per-namespace Prometheus diagnostics with latch suggestions
Obfuscation mode (--obfuscate) for sharing without exposing names
Baseline comparison for tracking drift over time
Output formats: table, JSON, SARIF

node-footprint: Historical Capacity Simulation

Bin-packing simulation to test alternative node configurations against historical data.

kubenow analyze node-footprint --prometheus-url http://prometheus:9090

Tests alternative topologies using First-Fit Decreasing algorithm with feasibility checks and headroom calculation.

Pro-Monitor

Policy-gated resource alignment: latch, recommend, export, apply.

Latch: High-Resolution Resource Sampling

Samples workload resource usage at 1-5 second intervals via the Kubernetes Metrics API, capturing sub-scrape-interval spikes that Prometheus misses.

# Sample deployment for 30 minutes at 5-second intervals
kubenow pro-monitor latch deployment/payment-api -n production --duration 30m

# Sample a CRD-managed pod directly
kubenow pro-monitor latch pod/payments-main-db-2 -n production

The TUI shows real-time progress, and after completion computes a resource alignment recommendation with safety rating and confidence level.

CRD-managed workloads (CNPG, Strimzi, RabbitMQ, Redis, Elasticsearch) are automatically detected from pod labels and displayed with their operator type:

Workload:  pod/payments-main-db (CNPG)
Namespace: production

Recommendation

After latch completes, kubenow computes per-container resource recommendations:

Safety ratings: SAFE (no signals), CAUTION (minor restarts), RISKY (OOMKills), UNSAFE (blocked)
Confidence levels: HIGH (24h+ latch + Prometheus), MEDIUM (2h+ latch), LOW
Policy bounds: admin-defined max delta percentages, minimum safety rating
Evidence: sample count, gaps, percentiles (p50/p95/p99/max)

Export

Export recommendations in multiple formats:

# SSA-compatible YAML patch (pipe to kubectl apply)
kubenow pro-monitor export deployment/payment-api -n production --format patch

# Full manifest with recommended values
kubenow pro-monitor export deployment/payment-api --format manifest

# Unified diff for review
kubenow pro-monitor export deployment/payment-api --format diff

# Machine-readable JSON
kubenow pro-monitor export deployment/payment-api --format json

Apply: Bounded Server-Side Apply

Policy-gated mutation via Kubernetes Server-Side Apply. Requires an admin policy file.

Pre-flight checks before any mutation:

Policy loaded and apply enabled
Safety rating meets policy minimum
Namespace allowed
HPA not detected (unless acknowledged)
Latch data fresh (within policy MaxLatchAge)
Audit path writable
Rate limit not exceeded

GitOps conflict detection: inspects managedFields for ArgoCD, Flux, Helm, and Kustomize field managers. Reports conflict rather than overwriting.

Exposure Map

Press l during latch to view structural traffic topology:

Services matching the workload's pod selector
Ingress routes and TLS configuration
Network policies (allowed sources)
Namespace neighbors ranked by CPU usage

Shows possible traffic paths from Kubernetes API state, not measured traffic.

Policy Engine

Admin-controlled guardrails via a policy file:

kubenow pro-monitor latch deployment/payment-api --policy /etc/kubenow/policy.yaml

# Validate policy without running
kubenow pro-monitor validate-policy --policy policy.yaml --check-paths

Three operating modes:

Observe Only — no policy or disabled: view metrics, no recommendations
Export Only — policy present, apply disabled: recommendations with bounds, export only
Apply Ready — policy present, apply enabled: full latch-recommend-export-apply pipeline

Audit Trail

Every apply operation creates a tamper-evident audit bundle:

before.yaml / after.yaml — workload state snapshots
diff.patch — unified diff of changes
decision.json — full decision record (identity, evidence, guardrails, result)

Rate limiting: configurable global and per-workload apply limits per time window.

Real-time Monitor

Terminal UI for cluster problems, designed like top for Kubernetes issues.

kubenow monitor
# Press 1/2/3 to sort, arrow keys to scroll, c to copy, q to quit

Attention-first: empty screen when healthy, shows only broken things
Watches for: OOMKills, CrashLoopBackOff, ImagePullBackOff, failed pods, node issues
Service mesh health: linkerd/istio control plane failures and certificate expiry
Sortable by severity, recency, or count
Press c to dump everything to terminal for copying

Use --severity critical to filter for critical issues only.

Service mesh monitoring

Automatically detects linkerd and istio control plane failures and certificate expiry. Runs regardless of --namespace filter because mesh failures affect all namespaces. Silently skips if the mesh is not installed or RBAC denies access.

What's detected:

Check	Severity	Condition
Control plane down	FATAL	Deployment in mesh namespace has 0 available replicas
Certificate expiry	WARNING	Cert expires within 7 days
Certificate expiry	CRITICAL	Cert expires within 48 hours
Certificate expiry	FATAL	Cert expires within 24 hours or already expired

Supported meshes: Linkerd (linkerd namespace), Istio (istio-system namespace)

RBAC requirements: Read access to Deployments and Secrets in the mesh namespace(s). If access is denied, monitoring is silently skipped — no errors are reported.

Disable with: --no-mesh

# Monitor without service mesh checks
kubenow monitor --no-mesh

LLM Analysis (Optional)

Feed cluster snapshots into any OpenAI-compatible API for incident triage, pod debugging, compliance checks, and chaos suggestions.

# Incident triage
kubenow incident --llm-endpoint http://localhost:11434/v1 --model mixtral

# Pod debugging with filters
kubenow pod --llm-endpoint http://localhost:11434/v1 --model mixtral \
  --include-pods "payment-*" --namespace production

# Export report
kubenow incident --llm-endpoint https://api.openai.com/v1 --model gpt-4o \
  --output incident-report.md

Works with Ollama, OpenAI, Azure OpenAI, DeepSeek, Groq, Together, OpenRouter, or any /v1/chat/completions endpoint.

Available modes: incident, pod, teamlead, compliance, chaos

Architecture

                         kubenow CLI
  ┌──────────────┬──────────────┬──────────────┬──────────┐
  │   monitor    │   analyze    │  pro-monitor  │   LLM    │
  │              │              │               │  modes   │
  │  Real-time   │ requests-skew│ latch/export  │ incident │
  │  problem     │ node-footprint apply/status  │ pod      │
  │  detection   │              │               │ teamlead │
  │              │              │  Policy Engine│ compliance│
  │              │              │  Audit Trail  │ chaos    │
  │              │              │  Exposure Map │          │
  └──────┬───────┴──────┬───────┴───────┬───────┴────┬─────┘
         │              │               │            │
         ▼              ▼               ▼            ▼
   ┌───────────┐  ┌──────────┐  ┌─────────────┐  ┌─────┐
   │Kubernetes │  │Prometheus│  │Kubernetes   │  │ LLM │
   │  Watch    │  │  API     │  │Metrics API  │  │ API │
   │  API      │  │          │  │+ SSA Apply  │  │     │
   └───────────┘  └──────────┘  └─────────────┘  └─────┘

See docs/architecture.md for details.

Installation

From Source

Requires Go >= 1.25

git clone https://github.com/ppiankov/kubenow
cd kubenow
make build
sudo mv bin/kubenow /usr/local/bin/

Binary Downloads

Download from GitHub Releases.

Available for Linux (amd64, arm64), macOS (amd64, arm64), and Windows (amd64).

kubenow version
# kubenow version 0.3.3

Prometheus Connection

# Port-forward (recommended for local analysis)
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
kubenow analyze requests-skew --prometheus-url http://127.0.0.1:9090

# Auto-detect in-cluster Prometheus
kubenow analyze requests-skew --auto-detect-prometheus

# Via Kubernetes service
kubenow analyze requests-skew --k8s-service prometheus-operated --k8s-namespace monitoring

Use http://127.0.0.1:9090 (not http://prometheus:9090) for port-forward. Analysis is read-only.

Troubleshooting

"0 workloads analyzed" in requests-skew

Missing metrics: Check container_cpu_usage_seconds_total exists in Prometheus
Namespace has no Prometheus data: Use pro-monitor latch for workloads in unscraped namespaces
Time window too old: Try --window 7d
Prometheus unreachable: Test with curl http://127.0.0.1:9090/api/v1/query?query=up

Pro-Monitor issues

"No policy file found": Pass --policy path/to/policy.yaml or set env var
"Audit path not writable": Ensure the audit directory exists and is writable
"Latch data stale": Re-run latch — data expires after policy MaxLatchAge (default 7 days)
"Apply denied: HPA detected": Pass --acknowledge-hpa if HPA conflict is acceptable

CI/CD Integration

# Silent JSON output for pipelines
kubenow analyze requests-skew \
  --prometheus-url http://prometheus:9090 \
  --silent --output json --export-file results.json

# SARIF for GitHub Security tab
kubenow analyze requests-skew \
  --prometheus-url http://prometheus:9090 \
  --output sarif --export-file results.sarif

# Fail pipeline on critical issues
kubenow analyze requests-skew \
  --prometheus-url http://prometheus:9090 \
  --fail-on critical

Known Limitations

Prometheus metrics required for requests-skew (Metrics API alone is insufficient for historical data)
Pro-Monitor apply limited to Deployment, StatefulSet, DaemonSet (Pod apply blocked — managed by controllers)
CRD operator detection relies on well-known pod labels; custom operators need app.kubernetes.io/managed-by
LLM analysis quality depends on the model used

Roadmap

See CHANGELOG.md for version history. Planned:

Auto-detect Prometheus in-cluster
Cloud provider cost integration (AWS, GCP, Azure)
Historical trend tracking

Philosophy

This tool follows the principles of Attention-First Software:

The primary responsibility of software is to disappear once it works correctly.

Deterministic analysis over prescriptive recommendations
Evidence-based outputs ("this would have worked") not predictions ("you should do this")
Actions are reversible; irreversible ones require explicit consent and structural safeguards
Tools present evidence and let users decide — mirrors, not oracles

Read the full manifesto: MANIFESTO.md

Contributing

See CONTRIBUTING.md for development setup, testing guidelines, and code style.

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
cmd/kubenow		cmd/kubenow
docs		docs
examples		examples
internal		internal
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFESTO.md		MANIFESTO.md
Makefile		Makefile
README.md		README.md
SKILL.md		SKILL.md
go.mod		go.mod
go.sum		go.sum

License

ppiankov/kubenow

Folders and files

Latest commit

History

Repository files navigation

kubenow — Kubernetes Resource Analysis & Cost Optimization

Quickstart

Exit Codes

What is kubenow?

What kubenow is NOT

Project Status

Safety Model

Zero Cluster Footprint

Read-Only by Default

Apply Guardrails (10+ Pre-Flight Checks)

Immutable Audit Trail

Bounded Changes

Reversible

Deterministic Analysis

requests-skew: Find Over-Provisioned Resources

node-footprint: Historical Capacity Simulation

Pro-Monitor

Latch: High-Resolution Resource Sampling

Recommendation

Export

Apply: Bounded Server-Side Apply

Exposure Map

Policy Engine

Audit Trail

Real-time Monitor

Service mesh monitoring

LLM Analysis (Optional)

Architecture

Installation

From Source

Binary Downloads

Prometheus Connection

Troubleshooting

"0 workloads analyzed" in requests-skew

Pro-Monitor issues

CI/CD Integration

Known Limitations

Roadmap

Philosophy

Contributing

License

Documentation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Languages

Packages