feat: model serving + GPU cluster security skills by msaad00 · Pull Request #7 · msaad00/cloud-security

msaad00 · 2026-04-09T01:04:51Z

Summary

Two new security benchmark skills filling critical gaps in AI infrastructure security — no CIS benchmark exists for either domain today.

model-serving-security (16 checks, 31 tests)

Audits model deployment infrastructure: API gateways, Kubernetes serving pods, cloud-native endpoints.

Domain	Checks	Key Controls
Auth & RBAC	3	Endpoint auth required, no hardcoded secrets, role-based access
Abuse Prevention	2	Rate limiting, input size/token limits
Data Egress	3	Output filtering, memorization guard, PII redaction in logs
Container Runtime	3	No privileged, read-only rootfs, non-root user
TLS & Network	2	TLS enforced, no public endpoints
Safety Layers	3	Prompt injection guard, content safety, model version pinning

Mapped to: MITRE ATLAS (7 techniques), NIST CSF 2.0, OWASP LLM Top 10

gpu-cluster-security (13 checks, 31 tests)

Audits GPU compute clusters: Kubernetes, Docker, bare-metal.

Domain	Checks	Key Controls
Runtime Isolation	3	No privileged GPU pods, device plugin (not /dev mounts), no host IPC
Driver & CUDA	2	Known CVE driver check (6 versions), CUDA >= 12.2
Network	2	InfiniBand tenant segmentation, NetworkPolicy on GPU namespaces
Storage	2	/dev/shm size limits, model weight encryption at rest
Tenant Isolation	2	Namespace per tenant, GPU resource quotas
Observability	2	DCGM monitoring, GPU audit logging

Mapped to: MITRE ATT&CK (7 techniques), NIST CSF 2.0, CIS Controls v8

Both skills include

SKILL.md with Anthropic-spec frontmatter + Mermaid diagrams
Security guardrails (read-only, no API calls, safe for production)
Human-in-the-loop policy: automated assessment, human required for remediation
JSON/console output with CI-friendly exit codes
Full test suites (62 tests total)

Also includes

CI fix: PYTHONPATH + testpaths override for skill-level pytest
CI jobs for both new test suites
README + CLAUDE.md updated

Test plan

model-serving-security: 31 tests passing
gpu-cluster-security: 31 tests passing
ruff check + format passing
CI workflow includes both new test jobs

Two new skills filling gaps in AI infrastructure security: model-serving-security (16 checks, 31 tests): Auth & RBAC (3): endpoint auth, hardcoded secrets, role-based access Abuse prevention (2): rate limiting, input size limits Data egress (3): output filtering, memorization guard, PII in logs Runtime (3): no privileged, read-only rootfs, non-root user Network (2): TLS enforced, no public endpoints Safety (3): prompt injection guard, content safety, model versioning Mapped to: MITRE ATLAS, NIST CSF 2.0, OWASP LLM Top 10 gpu-cluster-security (13 checks, 31 tests): Runtime isolation (3): no privileged GPU, device plugin, no host IPC Driver/CUDA (2): known CVE check, CUDA version compliance Network (2): InfiniBand segmentation, NetworkPolicy on GPU namespaces Storage (2): /dev/shm limits, model weight encryption Tenant (2): namespace isolation, GPU resource quotas Observability (2): DCGM monitoring, audit logging Mapped to: MITRE ATT&CK, NIST CSF 2.0, CIS Controls v8 Both skills include: - Mermaid architecture diagrams - Security guardrails (read-only, no API calls, safe for production) - Human-in-the-loop policy (automated assessment, human for remediation) - Compliance framework mappings with specific control IDs - JSON/console output with exit codes for CI/CD - Full test suites (62 tests total, all passing) CI updated to run both new test suites. README + CLAUDE.md updated with new skills.

msaad00 added 2 commits April 8, 2026 21:04

fix(ci): exclude test files from hardcoded secret scan

7f9e676

msaad00 merged commit 64551fb into main Apr 9, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: model serving + GPU cluster security skills#7

feat: model serving + GPU cluster security skills#7
msaad00 merged 2 commits intomainfrom
feat/model-serving-gpu-security-skills

msaad00 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

msaad00 commented Apr 9, 2026

Summary

model-serving-security (16 checks, 31 tests)

gpu-cluster-security (13 checks, 31 tests)

Both skills include

Also includes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant