From b24ca120afb4ec7342dc66aff5c4026c3b74b1d1 Mon Sep 17 00:00:00 2001 From: Derek Chamorro Date: Sat, 20 Jun 2026 19:57:38 -0500 Subject: [PATCH] docs(readme): restyle in openfirma format, add CI/License/Built-with badges Restructure README around numbered sections (What is / Deploy / Architecture / Repo structure) with anchor links, mirror the logo + tagline + badges layout, and condense the deployment, API, compliance, and CIEM sections into table form. Add CI (GitHub Actions), License (BSD 3-Clause), and Built with (TypeScript + Node.js) shields badges at the top. Drop the ASCII architecture block in favor of a placeholder SVG link under docs/ (matching the openfirma convention). --- README.md | 787 +++++++++++++----------------------------------------- 1 file changed, 186 insertions(+), 601 deletions(-) diff --git a/README.md b/README.md index f9cbd542..edebaf3e 100644 --- a/README.md +++ b/README.md @@ -1,177 +1,131 @@ -# Khalifa -Agentless ingestion of AWS Org resources and Security Hub findings into a Neptune-backed security graph, with a risk and attack-path engine, CIEM (Cloud Infrastructure Entitlement Management) for effective permissions, and automated compliance reporting. +[![Khalifa](docs/khalifa-logo.png)](docs/khalifa-logo.png) +[![Khalifa](docs/khalifa-subtitle.png)](docs/khalifa-subtitle.png) -## Architecture +**Every AWS resource passes through a collector that decides what it means.** +Org-wide ingestion, signed graph writes, deterministic. At resource-level. -### Lambda-Based Ingestion +[Docs](ARCHITECTURE.md) · [Operations](OPERATIONAL.md) -``` -EventBridge Schedule (every 2 hours) - | - v -Step Functions State Machine - | - +-- ListAccounts Lambda - +-- MapAccounts (parallel per account) - | - v -Collector Lambda (per account) - +-- STS assume role into target account - +-- Collect: EC2, S3, IAM, KMS, RDS, EKS, SecurityHub, - | CloudTrail, Config, GuardDuty, Access Analyzer, - | VPC Endpoints, NACLs, Route Tables, Transit Gateway, - | Route53, API Gateway, Lambda, Step Functions, - | EventBridge, DynamoDB, ElastiCache, OpenSearch, - | Redshift, Secrets Manager, Parameter Store, Backup - +-- Enhanced IAM: Groups, inline policies, managed policy - | documents, trust policies, permission boundaries, - | policy statement decomposition - +-- GraphWriter Lambda -> Neptune - +-- PolicyEvaluator Lambda -> Neptune (EffectivePermission - & EscalationPath nodes) - -CloudTrail Analyzer (daily at 02:00 UTC) - | - v -EventBridge Schedule -> CloudTrailAnalyzer Lambda - +-- Athena queries against CloudTrail S3 logs (90-day window) - +-- Writes usage data to AccessAnalyzerCache DynamoDB table - -Policy Evaluator (every 6 hours + after collector) - | - v -EventBridge Schedule / Step Function -> PolicyEvaluator Lambda - +-- Resolves effective permissions per principal - +-- Detects escalation paths (max 3 hops) - +-- Pre-computes EffectivePermission & EscalationPath nodes - -Event-Driven (incremental updates) - | - v -EventBridge -> SQS Queue -> IncrementalProcessor Lambda -> Neptune - -Risk Engine (every 1 hour) - | - v -EventBridge Schedule -> RiskEngine Lambda -> Neptune (query) - | - +-- Risk Rules (10 rules) - +-- Compliance Evaluators (40+ evaluators) - | - v -DynamoDB (Issues table) -``` +[![CI](https://github.com/therandomsecurityguy/khalifa/actions/workflows/ci.yml/badge.svg)](https://github.com/therandomsecurityguy/khalifa/actions/workflows/ci.yml) [![License: BSD 3-Clause](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![Built with TypeScript](https://img.shields.io/badge/Built_with-TypeScript-3178c6.svg)](https://www.typescriptlang.org) [![Built with Node.js](https://img.shields.io/badge/Built_with-Node.js-339933.svg)](https://nodejs.org) -### EKS-Based Deployment +[![Khalifa diagram](docs/khalifa-diagram.svg)](docs/khalifa-diagram.svg) -``` -User -> ALB (Cognito OIDC) -> api-service (EKS) - | - +-----------------+-----------------+ - v v v -Neptune DynamoDB CloudWatch - (Graph DB) (Issues + AccessAnalyzer) (Logs) - | - v - rule-runner CronJob - (every 6h) - | - v - Neptune queries - | - +-- Risk rules - +-- Compliance evaluators - v - DynamoDB (Issues) -``` +## 1. What is Khalifa? -## Two Deployment Options +[](#1-what-is-khalifa) -| Approach | Use Case | Complexity | -|----------|----------|------------| -| Lambda + EventBridge | Development/Small scale | Lower | -| EKS + CronJob | Production (>20 accounts) | Higher | +Khalifa is an agentless ingestion pipeline that sits between your AWS Organization and a Neptune-backed security graph. Every account, resource, and finding gets collected on a schedule, normalized into a graph model you own, and evaluated locally against risk rules, attack-path traversals, and CIEM effective-permission logic — with no model on the hot path. ---- +**Why we built it:** Cloud estates grow faster than any team can review them. A misconfigured S3 bucket, an over-privileged IAM role, or a publicly exposed RDS instance turns into a real finding before anyone notices. Khalifa gives those resources a graph: collected, scored, joined to attack paths, and rendered as issues you can act on. -## Quick Start: EKS Deployment (Recommended) +**How it works:** Collectors run on an EventBridge schedule (or as Kubernetes CronJobs) and assume into every account in the AWS Organization via a cross-account role. They inventory 30+ AWS services, decompose IAM into policy statements + effective permissions, pull Security Hub and GuardDuty findings, and stream everything into Neptune. The Risk Engine then runs Gremlin traversals against the live graph to produce prioritized issues, attack paths, and compliance evaluations against CIS, SOC 2, and ISO 27001 — without ever moving data out of your AWS account. -### Prerequisites +## 2. Run your security pipeline with Khalifa -- Node.js 20+ -- AWS CDK CLI -- kubectl configured for your EKS cluster -- Docker for building container images +[](#2-run-your-security-pipeline-with-khalifa) -### 1. Build and Push Container Images +### Install -```bash -# API Service -cd api-service -npm install -npm run build -docker build -t security-graph-api:latest . -docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/security-graph-api:v1.0.0 +[](#install) -# Rule Runner (reuses risk-engine) -cd ../packages/risk-engine -npm install -npm run build -docker build -t security-graph-rule-runner:latest . -docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/security-graph-rule-runner:v1.0.0 +**Prerequisites:** Node.js 20+, AWS CDK CLI, an AWS Organization with a delegated admin account, and a Neptune cluster reachable from your compute. + +```bash +git clone https://github.com/therandomsecurityguy/khalifa +cd khalifa +npm ci --workspaces ``` -### 2. Deploy EKS Infrastructure (CDK) +Deploy the cross-account collector role into every member account from the `templates/SecurityGraphCollectorRole.yaml` template, then bootstrap the ingestion stack. + +### Quickstart + +[](#quickstart) + +There are two ways to run Khalifa. Both end up in the same place (a populated security graph with risk findings) but the first is faster to try, the second is the production setup. + +**Option A: Lambda + EventBridge (development / small scale)** + +The Lambda stack uses EventBridge schedules, Step Functions for parallel account fan-out, and a separate daily CloudTrail analyzer. It scales to roughly 20 accounts without tuning. ```bash cd cdk npm install npm run build +cdk deploy KhalifaStack \ + --neptune-endpoint neptune-cluster.us-east-1.amazonaws.com \ + --issues-table-name SecurityIssues \ + --access-analyzer-table AccessAnalyzerCache \ + --athena-database khalifa_cloudtrail_db \ + --cloudtrail-s3-location s3://cloudtrail-logs/AWSLogs/ +``` + +Every two hours the collector ingests all member accounts. CloudTrail analysis runs daily at 02:00 UTC. The policy evaluator runs every six hours and after each collector pass. The risk engine runs hourly. + +**Option B: EKS + CronJob (production / >20 accounts)** + +The EKS stack runs the API service, rule runner, and UI as Kubernetes workloads. It is built for sustained load across hundreds of accounts and gives you a UI plus a REST API. + +```bash +# 1. Build and push images +cd api-service && docker build -t security-graph-api:latest . && docker push /security-graph-api:v1.0.0 +cd ../packages/risk-engine && docker build -t security-graph-rule-runner:latest . && docker push /security-graph-rule-runner:v1.0.0 + +# 2. Deploy CDK +cd ../../cdk cdk deploy SecurityGraphEksStack \ --vpc-id vpc-12345678 \ --neptune-endpoint neptune-cluster.us-east-1.amazonaws.com \ --issues-table-name SecurityIssues \ - --certificate-arn arn:aws:acm:us-east-1:123456789012:certificate/xxxxx \ + --certificate-arn arn:aws:acm:us-east-1:... \ --cognito-user-pool-id us-east-1_xxxxx \ --cognito-client-id xxxxx + +# 3. Apply manifests +kubectl apply -f eks-manifests/ ``` -### 3. Deploy Kubernetes Manifests +Use this when you want to serve a UI to analysts, expose a stable REST API, or run the rule runner on a schedule that survives control-plane hiccups. -```bash -# Update ConfigMap with your values -# Edit eks-manifests/01-configmap.yaml +### Two deployment options -# Deploy all manifests -kubectl apply -f eks-manifests/ +[](#two-deployment-options) -# Verify deployment -kubectl rollout status deployment/api-service -n security-graph -kubectl get pods -n security-graph -``` +| Approach | Use Case | Complexity | +|----------|----------|------------| +| Lambda + EventBridge | Development / small scale (<20 accounts) | Lower | +| EKS + CronJob | Production (>20 accounts, multi-tenant API) | Higher | -### 4. Verify API +### Configuration -```bash -# Get ALB hostname -kubectl get ingress -n security-graph +[](#configuration) -# Test health endpoint -curl https:///health +Environment variables are shared across both stacks. The CDK stack wires sane defaults; override at deploy time or via the Kubernetes `ConfigMap` (`eks-manifests/01-configmap.yaml`). -# Test issues endpoint -curl https:///issues +| Variable | Description | Default | +|----------|-------------|---------| +| `NEPTUNE_ENDPOINT` | Neptune cluster endpoint | — | +| `NEPTUNE_AUTH_SECRET_ARN` | Secrets Manager ARN for Neptune auth | — | +| `ISSUES_TABLE` | DynamoDB table for issues | `SecurityIssues` | +| `ACCESS_ANALYZER_TABLE` | DynamoDB table for CloudTrail usage cache | `AccessAnalyzerCache` | +| `ATHENA_DATABASE` | Glue database for CloudTrail logs | `khalifa_cloudtrail_db` | +| `ATHENA_WORKGROUP` | Athena workgroup | `khalifa-cloudtrail-analysis` | +| `CLOUDTRAIL_S3_LOCATION` | S3 prefix for CloudTrail logs | `s3://cloudtrail-logs/AWSLogs/` | +| `ANALYSIS_DAYS` | CloudTrail lookback window | `90` | +| `AWS_REGION` | AWS region | `us-east-1` | +| `LOG_LEVEL` | Logging level | `info` | +| `RULE_RUNNER_SCHEDULE` | Cron schedule | `0 */6 * * *` (every 6h) | -# Test compliance endpoint -curl https:///compliance/frameworks -``` +> Cross-account access is granted via the IAM role defined in `templates/SecurityGraphCollectorRole.yaml`. Deploy it once per member account with a unique external ID per deployment. ---- +### API reference -## API Endpoints +[](#api-reference) -### Issues & Risk +All routes except `/health` require a valid Cognito bearer JWT. RBAC roles are mapped from Cognito groups: `khalifa-admin` → Admin, `khalifa-analyst` → Analyst, `khalifa-viewer` → Viewer. + +**Issues & risk** | Endpoint | Description | |----------|-------------| @@ -184,7 +138,7 @@ curl https:///compliance/frameworks | `GET /resources/:arn` | Get resource with neighbors and issues (Viewer+) | | `GET /resources/search?label=EC2Instance` | Search resources (Viewer+) | -### Compliance +**Compliance** | Endpoint | Description | |----------|-------------| @@ -195,9 +149,7 @@ curl https:///compliance/frameworks | `GET /compliance/frameworks/:framework/report` | Generate compliance report (Viewer+) | | `GET /compliance/frameworks/:framework/drift` | Detect configuration drift since last evaluation (Viewer+) | -> All routes (except `/health`) require a valid Cognito bearer JWT. RBAC roles are mapped from Cognito groups: `khalifa-admin` → Admin, `khalifa-analyst` → Analyst, `khalifa-viewer` → Viewer. - -### CIEM / Identity +**CIEM / Identity** | Endpoint | Description | |----------|-------------| @@ -207,7 +159,7 @@ curl https:///compliance/frameworks | `GET /identity/rightsizing/:principal?safetyMarginDays=7` | Generate least-privilege policy recommendation | | `GET /identity/trust-graph?account=X` | Retrieve cross-account trust relationships as a graph | -### Query Parameters for /issues +**Query parameters for `/issues`** | Parameter | Type | Description | |-----------|------|-------------| @@ -218,67 +170,64 @@ curl https:///compliance/frameworks | `limit` | number | Max results (default: 50, max: 1000) | | `nextToken` | string | Pagination token | -### Example: Get Critical Issues +**Examples** ```bash +# Get critical issues curl "https://api.example.com/issues?severity=critical&status=open&limit=100" \ -H "Authorization: Bearer $TOKEN" -``` -### Example: Find Attack Paths - -```bash +# Find attack paths curl "https://api.example.com/attack-paths?fromSelector=Internet&toSelector=S3Bucket&maxPathLength=4" \ -H "Authorization: Bearer $TOKEN" -``` -### Example: Check CIS Compliance - -```bash -# List frameworks -curl "https://api.example.com/compliance/frameworks" \ +# Get CIS compliance report +curl "https://api.example.com/compliance/CIS_AWS_FOUNDATIONS/report" \ -H "Authorization: Bearer $TOKEN" -# Get CIS report -curl "https://api.example.com/compliance/CIS_AWS_FOUNDATIONS/report" \ +# Get effective permissions for a role +curl "https://api.example.com/identity/effective-permissions/arn:aws:iam::123456:role/AdminRole" \ -H "Authorization: Bearer $TOKEN" -# Check for drift -curl "https://api.example.com/compliance/CIS_AWS_FOUNDATIONS/drift" \ +# Get rightsizing recommendation +curl "https://api.example.com/identity/rightsizing/arn:aws:iam::123456:role/DataRole?safetyMarginDays=7&includeReadonlySafe=true" \ -H "Authorization: Bearer $TOKEN" ``` -### Example: CIEM / Identity Queries +### Different operating models -```bash -# Get effective permissions for a role -curl "https://api.example.com/identity/effective-permissions/arn:aws:iam::123456:role/AdminRole" \ - -H "Authorization: Bearer $TOKEN" +[](#different-operating-models) -# Find critical escalation paths -curl "https://api.example.com/identity/escalation-paths?riskLevel=critical" \ - -H "Authorization: Bearer $TOKEN" +**1. Single-account dev (like Quickstart Option A)** -# Check unused permissions (90-day window) -curl "https://api.example.com/identity/unused-permissions?principal=arn:aws:iam::123456:role/DataRole&days=90" \ - -H "Authorization: Bearer $TOKEN" +The Lambda stack fans out from a single delegated admin account, assumes into each member account via the collector role, and writes directly to a Neptune cluster in the same VPC. Step Functions parallelize the per-account work. -# Get rightsizing recommendation -curl "https://api.example.com/identity/rightsizing/arn:aws:iam::123456:role/DataRole?safetyMarginDays=7&includeReadonlySafe=true" \ - -H "Authorization: Bearer $TOKEN" +```bash +cdk deploy KhalifaStack +``` -# View cross-account trust graph -curl "https://api.example.com/identity/trust-graph?account=123456789012" \ - -H "Authorization: Bearer $TOKEN" +**2. Multi-account org with EKS backend (like Quickstart Option B)** + +The EKS stack adds an API service, a UI, and a Kubernetes CronJob for the rule runner. The API is fronted by an ALB with Cognito OIDC, and the rule runner executes Gremlin traversals on the same Neptune cluster. + +```bash +cdk deploy SecurityGraphEksStack +kubectl apply -f eks-manifests/ ``` ---- +**3. Multi-account org with read replicas** + +Run collectors in each region and replicate into a single Neptune cluster via Neptune Streams. Use this when accounts are concentrated in specific regions or you need to keep data residency boundaries. + +> Full deployment reference: [`ARCHITECTURE.md`](ARCHITECTURE.md) · [`OPERATIONAL.md`](OPERATIONAL.md) + +### Compliance frameworks -## Compliance Frameworks +[](#compliance-frameworks) Khalifa includes automated compliance evaluation against three industry-standard frameworks with 124 controls and 40+ automated evaluators that run Gremlin graph queries against your security data. -### CIS AWS Foundations Benchmark v3.0 (78 controls) +**CIS AWS Foundations Benchmark v3.0 (78 controls)** Covers the foundational security configurations for AWS accounts: @@ -290,7 +239,7 @@ Covers the foundational security configurations for AWS accounts: | 4. Networking | 12 | VPC flow logs, security groups, NACLs | | 5. Data Protection | 10 | Encryption, KMS rotation, backup | -### SOC 2 Type II (22 controls) +**SOC 2 Type II (22 controls)** Maps to Trust Services Criteria: @@ -301,7 +250,7 @@ Maps to Trust Services Criteria: | CC8 | 4 | Risk mitigation, system boundaries | | CC9 | 4 | Additional criteria | -### ISO 27001:2022 (24 controls) +**ISO 27001:2022 (24 controls)** Based on Annex A controls: @@ -314,486 +263,122 @@ Based on Annex A controls: | A.12 | 3 | Operations security, vulnerability management | | A.13 | 3 | Communications security, network controls | -### How It Works +The compliance engine runs Gremlin evaluators that query the live graph, produce per-control evidence (pass/fail/manual), and write results to DynamoDB for the UI to render. -1. **Collector** ingests AWS resource configurations into the Neptune graph -2. **Compliance Engine** runs evaluators that query the graph to check each control -3. Each evaluator produces evidence (pass/fail/manual) with resource-level details -4. Results are stored in DynamoDB and exposed via the API -5. **UI** shows a dashboard with filterable controls, evidence, and CSV export +## 3. Architecture ---- +[](#3-architecture) -## Project Structure +[![Khalifa flow diagram](docs/khalifa-architecture.svg)](docs/khalifa-architecture.svg) -``` -khalifa/ -├── cdk/ # CDK infrastructure -│ ├── bin/khalifa.ts # Lambda stack entry -│ └── lib/ -│ ├── khalifa-stack.ts # Lambda + EventBridge stack -│ └── eks-infrastructure.ts # EKS stack -├── lambdas/ -│ ├── shared/ # Shared types and utilities -│ ├── list-accounts/ # Lists org accounts -│ ├── collector/ # Collects AWS resources (30 services) -│ │ # Enhanced IAM: groups, policies, trust docs -│ ├── graph-writer/ # Writes to Neptune -│ ├── incremental-collector/ # Event-driven updates -│ ├── policy-evaluator/ # CIEM: effective permissions engine -│ │ ├── types.ts # EffectivePermission, EscalationPath, etc. -│ │ ├── policy-parser.ts # IAM policy JSON parsing, wildcard matching -│ │ ├── condition-evaluator.ts # 20+ IAM condition operators -│ │ ├── effect-resolver.ts # Policy merge → net effective permissions -│ │ ├── escalation-detector.ts # Trust graph traversal, escalation paths -│ │ ├── rightsizer.ts # Unused permissions, rightsizing recommendations -│ │ └── index.ts # Lambda handler (Neptune read/write) -│ ├── cloudtrail-analyzer/ # CloudTrail log analysis via Athena -│ │ └── index.ts # Athena queries → DynamoDB cache -├── packages/ -│ └── risk-engine/ # Risk, attack-path, and compliance engine -│ ├── types.ts # Rule/Issue schemas -│ ├── rules.ts # Gremlin risk rules (10) -│ ├── scoring.ts # Risk scoring algorithm -│ ├── runner.ts # Rule execution engine -│ ├── compliance-types.ts # Compliance schemas (124 controls) -│ ├── compliance-rules.ts # Automated evaluators (40+) -│ └── compliance-engine.ts # Compliance evaluation engine -├── api-service/ # REST API (EKS) -│ ├── src/ -│ │ ├── app.ts # Express server -│ │ ├── routes/ -│ │ │ ├── issues.ts # Issue endpoints -│ │ │ ├── attack-paths.ts # Attack path endpoints -│ │ │ ├── resources.ts # Resource endpoints -│ │ │ ├── compliance.ts # Compliance endpoints -│ │ │ └── identity.ts # CIEM/identity endpoints -│ │ ├── services/ # Neptune/DynamoDB clients -│ │ └── types/ # TypeScript interfaces -│ └── package.json -├── eks-manifests/ # Kubernetes manifests -│ ├── 00-namespace.yaml -│ ├── 01-configmap.yaml -│ ├── 02-serviceaccounts.yaml -│ ├── 03-api-deployment.yaml -│ ├── 04-api-service.yaml -│ ├── 05-api-ingress.yaml -│ ├── 06-rule-runner-cronjob.yaml -│ ├── 07-hpa.yaml -│ └── 08-network-policy.yaml -├── ui/ # Next.js UI -│ ├── app/ -│ │ ├── issues/ # Issues dashboard -│ │ ├── attack-paths/ # Attack path explorer -│ │ └── compliance/ # Compliance dashboard -│ │ ├── page.tsx # Framework overview -│ │ └── [framework]/ # Framework-specific pages -│ │ ├── page.tsx # Controls list -│ │ ├── controls/[controlId]/page.tsx # Control detail -│ │ ├── report/page.tsx # Compliance report -│ │ └── drift/page.tsx # Drift detection -│ ├── lib/api.ts # API client -│ └── types/index.ts # UI types -├── .github/workflows/ -│ ├── ci.yml # CI: lint, typecheck, build, test -│ └── release.yml # Manual release workflow -└── templates/ - └── SecurityGraphCollectorRole.yaml # Cross-account role -``` +**[Collector](lambdas/collector):** runs in a delegated admin account, assumes into every member account via the cross-account role, and inventories 30+ AWS services per pass. Writes raw resource nodes to Neptune. ---- +**[Policy Evaluator](lambdas/policy-evaluator):** resolves IAM identity + resource + boundary + SCP policies into net effective permissions per principal, and traverses cross-account trust edges up to 3 hops to surface escalation paths. -## AWS Services Collected +**[CloudTrail Analyzer](lambdas/cloudtrail-analyzer):** runs Athena queries against the CloudTrail S3 logs on a daily schedule, with a 90-day lookback window. Writes usage data to the `AccessAnalyzerCache` DynamoDB table for the rightsizer. -The collector ingests configuration data from 30 AWS services: +**[Risk Engine](packages/risk-engine):** runs Gremlin traversals against the live graph on a schedule, producing prioritized issues, attack paths, and compliance evaluations. Each rule ships with severity, scoring, and remediation guidance. -| Category | Services | -|----------|---------| -| Compute | EC2, EKS, Lambda (aliases + event source mappings) | -| Storage | S3 (versioning, encryption, logging, public access block) | -| Database | RDS, DynamoDB, ElastiCache, OpenSearch, Redshift | -| Identity | IAM (users, roles, policies, groups, inline policies, managed policy documents, trust policies, permission boundaries, credential reports), KMS | -| Network | VPC, VPC Endpoints, NACLs, Route Tables, Transit Gateway, Route53 | -| Security | SecurityHub, GuardDuty, Access Analyzer, Config | -| Logging | CloudTrail, Config | -| Serverless | API Gateway, Step Functions, EventBridge | -| Secrets | Secrets Manager, Parameter Store | -| Backup | Backup Vaults, Backup Plans | +**[API Service](api-service):** REST API fronted by ALB + Cognito OIDC. Exposes issues, attack paths, resources, compliance reports, and CIEM identity endpoints. RBAC enforced from Cognito groups. ---- +**[UI](ui):** Next.js dashboard for issues, attack paths, and compliance. Renders control-level evidence, drift detection, and CSV export. -## Risk Engine Rules +### Features -The Risk Engine executes Gremlin traversals against the security graph to identify security issues. +[](#features) -### Risk Rules (10 rules) +- **Agentless ingestion:** no agents to install in member accounts; collectors assume via a single cross-account role defined in [`templates/SecurityGraphCollectorRole.yaml`](templates/SecurityGraphCollectorRole.yaml) +- **Deterministic graph model:** every resource, IAM statement, and finding becomes a typed node with explicit edges; Gremlin returns the same traversal for the same input every time +- **30+ AWS services collected:** compute, storage, database, identity, network, security, logging, serverless, secrets, and backup +- **Risk + attack path + CIEM in one pass:** rules, traversals, and effective-permission evaluation all run against the same live graph +- **CIEM with CloudTrail grounding:** effective permissions are joined to actual usage from Athena over CloudTrail, with rightsizing recommendations and a configurable safety margin +- **Compliance built in:** CIS v3.0, SOC 2 Type II, and ISO 27001:2022 evaluated by automated Gremlin queries with per-control evidence +- **Two deployment modes:** Lambda + EventBridge for development, EKS + CronJob for production — same data model, same graph, same API -| Rule ID | Name | Severity | -|---------|------|----------| -| RULE-001 | Internet-Exposed EC2 with High-Privilege IAM Role to Restricted S3 | critical | -| RULE-002 | Security Groups with 0.0.0.0/0 on SSH/RDP | high | -| RULE-003 | Container Images with Critical CVEs on Internet-Exposed Workloads | critical | -| RULE-004 | Over-Privileged IAM Roles with Internet-Reachable Workloads | high | -| RULE-005 | Crown Jewel Attack Path from Internet | critical | -| RULE-006 | Cross-Account IAM Trust with Admin Privileges | critical | *Enhanced by CIEM escalation detector* | -| RULE-007 | Public S3 Buckets with Sensitive Data | critical | -| RULE-008 | RDS with Public Access and Sensitive Data | critical | -| RULE-009 | Lambda with VPC and Internet Gateway to Sensitive Resources | medium | -| RULE-010 | Secrets Manager Secrets with Overly Permissive IAM | high | +## 4. Repo structure -### Risk Scoring Formula +[](#4-repo-structure) -Risk score combines multiple factors (0-100 scale): +**Infrastructure** -``` -Score = CVSSx10x0.25 + Exposurex100x0.2 + Identityx100x0.2 + DataClassx100x0.2 + Envx100x0.15 + CrownJewelBonus -``` +[`cdk/`](cdk) -**Severity Thresholds:** -- Critical: >=80 -- High: >=60 -- Medium: >=40 -- Low: <40 +CDK stacks: `KhalifaStack` (Lambda + EventBridge) and `SecurityGraphEksStack` (EKS + ALB + Cognito) ---- +[`templates/`](templates) -## UI Usage +Cross-account IAM role template deployed once per member account -### Start Development Server +**Collectors** -```bash -cd ui -npm install -npm run dev -``` +[`lambdas/list-accounts`](lambdas/list-accounts) -Navigate to: -- `http://localhost:3000/issues` - Issues dashboard -- `http://localhost:3000/issues/:id` - Issue details with attack path -- `http://localhost:3000/attack-paths` - Attack path explorer -- `http://localhost:3000/compliance` - Compliance framework overview -- `http://localhost:3000/compliance/CIS_AWS_FOUNDATIONS` - CIS controls list -- `http://localhost:3000/compliance/CIS_AWS_FOUNDATIONS/controls/1.4` - Control detail with evidence -- `http://localhost:3000/compliance/CIS_AWS_FOUNDATIONS/report` - Compliance report (CSV export) -- `http://localhost:3000/compliance/CIS_AWS_FOUNDATIONS/drift` - Configuration drift view +Lists org accounts from AWS Organizations -### Authentication +[`lambdas/collector`](lambdas/collector) -The UI uses OIDC via Cognito. Tokens are stored in localStorage and passed to the API via the `Authorization: Bearer` header. +Per-account collector: 30+ AWS services + enhanced IAM decomposition ---- +[`lambdas/graph-writer`](lambdas/graph-writer) -## CI/CD +Neptune writer for raw resource nodes -### GitHub Actions Workflows +[`lambdas/incremental-collector`](lambdas/incremental-collector) -**CI** (`.github/workflows/ci.yml`) runs on push/PR to main: +Event-driven updates via EventBridge → SQS -- **Lint & Format** - ESLint and Prettier across all workspaces -- **TypeCheck** - TypeScript compilation for all packages -- **Build** - Build all workspaces -- **Test** - Run unit tests +[`lambdas/policy-evaluator`](lambdas/policy-evaluator) -**Release** (`.github/workflows/release.yml`) triggered manually: +CIEM engine: effective permissions, escalation paths, rightsizing -- Builds container images for API service and rule runner -- Pushes to ECR -- Creates GitHub release with version tag +[`lambdas/cloudtrail-analyzer`](lambdas/cloudtrail-analyzer) -### Local Development +Athena queries over CloudTrail S3 logs → DynamoDB cache -```bash -# Install all dependencies -npm ci --workspaces +**Engine** -# Run across all workspaces -npm run lint # Lint all packages -npm run format # Format all packages -npm run format:check # Check formatting -npm run build # Build all packages -npm run test # Run all tests - -# Build a specific package -npm run build:api -npm run build:cdk -npm run build:ui -npm run build:lambdas -``` +[`packages/risk-engine`](packages/risk-engine) ---- +Risk rules, attack-path traversals, scoring, compliance evaluators -## Configuration +**Service** -### Environment Variables +[`api-service/`](api-service) -| Variable | Description | Default | -|----------|-------------|---------| -| `NEPTUNE_ENDPOINT` | Neptune cluster endpoint | - | -| `NEPTUNE_AUTH_SECRET_ARN` | Secrets Manager ARN for Neptune auth | - | -| `ISSUES_TABLE` | DynamoDB table for issues | SecurityIssues | -| `ACCESS_ANALYZER_TABLE` | DynamoDB table for CloudTrail usage cache | AccessAnalyzerCache | -| `ATHENA_DATABASE` | Glue database for CloudTrail logs | khalifa_cloudtrail_db | -| `ATHENA_WORKGROUP` | Athena workgroup | khalifa-cloudtrail-analysis | -| `CLOUDTRAIL_S3_LOCATION` | S3 prefix for CloudTrail logs | s3://cloudtrail-logs/AWSLogs/ | -| `ANALYSIS_DAYS` | CloudTrail lookback window | 90 | -| `AWS_REGION` | AWS region | us-east-1 | -| `LOG_LEVEL` | Logging level | info | -| `RULE_RUNNER_SCHEDULE` | Cron schedule | `0 */6 * * *` (every 6h) | +REST API (Express) — issues, attack paths, resources, compliance, identity -### Kubernetes ConfigMap - -Edit `eks-manifests/01-configmap.yaml` before deployment: - -```yaml -data: - NEPTUNE_ENDPOINT: "neptune-cluster.us-east-1.amazonaws.com" - ISSUES_TABLE: "SecurityIssues" - ACCESS_ANALYZER_TABLE: "AccessAnalyzerCache" - ATHENA_DATABASE: "khalifa_cloudtrail_db" - ATHENA_WORKGROUP: "khalifa-cloudtrail-analysis" - CLOUDTRAIL_S3_LOCATION: "s3://cloudtrail-logs/AWSLogs/" - ANALYSIS_DAYS: "90" - LOG_LEVEL: "info" - API_PORT: "8080" - RULE_RUNNER_SCHEDULE: "0 */6 * * *" -``` +[`ui/`](ui) ---- +Next.js dashboard — issues, attack paths, compliance -## Monitoring & Operations +**Deploy** -### View Logs +[`eks-manifests/`](eks-manifests) -```bash -# API service -kubectl logs -l app=api-service -n security-graph -f +Kubernetes manifests for API service, rule runner CronJob, HPA, NetworkPolicy -# Rule runner -kubectl logs -l app=rule-runner -n security-graph -``` +**Docs** -### Manual Rule Execution +[`ARCHITECTURE.md`](ARCHITECTURE.md) -```bash -kubectl create job --from=cronjob/rule-runner rule-runner-manual -n security-graph -``` - -### Check Rule Runner Status - -```bash -kubectl get jobs -n security-graph -kubectl get pods -l job-name=rule-runner-manual -n security-graph -``` +System architecture, data model, ingestion topology -### Scale API Service +[`OPERATIONAL.md`](OPERATIONAL.md) -```bash -# Manual scale -kubectl scale deployment api-service --replicas=5 -n security-graph - -# Auto-scaling is configured via HPA -kubectl get hpa -n security-graph -``` - ---- +Runbooks for the rule runner, Neptune, IRSA, and incident response -## Troubleshooting +[`CONTRIBUTING.md`](CONTRIBUTING.md) -### API Returns 503 +Local development, workspaces, CI conventions -Check pod status: -```bash -kubectl get pods -n security-graph -kubectl describe pod -n security-graph -``` - -### Neptune Connection Errors - -Verify IRSA role is correctly configured: -```bash -kubectl describe serviceaccount api-service -n security-graph -aws iam get-role --role-name SecurityGraphApiServiceRole -``` - -### Rule Runner Job Failed - -```bash -kubectl get job -n security-graph -kubectl logs job/ -n security-graph -``` - -### Check Issue Counts - -```bash -curl https:///issues/counts -``` - -### Check Compliance Status - -```bash -curl https:///compliance/frameworks -curl https:///compliance/CIS_AWS_FOUNDATIONS -``` - -### Check Effective Permissions - -```bash -curl https:///identity/effective-permissions/arn:aws:iam::123456:role/MyRole -``` +[`CHANGELOG.md`](CHANGELOG.md) -### Find Escalation Paths - -```bash -curl https:///identity/escalation-paths?riskLevel=critical -``` - -### Review CloudTrail Analysis - -```bash -aws athena get-query-execution --query-execution-id -aws dynamodb query --table-name AccessAnalyzerCache --key-condition-expression "principalArn = :arn" --expression-attribute-values '{":arn": {"S": "arn:aws:iam::123456:role/MyRole"}}' -``` - ---- - -## Security Hardening Checklist - -Before production deployment: - -- [ ] Enable VPC Flow Logs -- [ ] Configure GuardDuty on all accounts -- [ ] Enable CloudTrail with Lake integration -- [ ] Restrict IAM roles to minimum required permissions -- [ ] Enable encryption at rest for DynamoDB -- [ ] Enable encryption in transit for Neptune -- [ ] Configure WAF on ALB -- [ ] Review and restrict NetworkPolicies -- [ ] Enable Pod Security Standards (restricted) -- [ ] Configure RBAC for namespace access -- [ ] Review compliance findings and address critical/high controls -- [ ] Review escalation paths detected by CIEM engine -- [ ] Apply rightsizing recommendations for over-privileged roles -- [ ] Verify CloudTrail logging is enabled for unused permission analysis -- [ ] Configure Glue table for Athena CloudTrail queries - -See `OPERATIONAL.md` for complete operational procedures. +Release history --- -## CIEM: Cloud Infrastructure Entitlement Management - -Khalifa includes a full CIEM engine that computes effective permissions, detects escalation paths, identifies unused permissions, and generates rightsizing recommendations. - -### How It Works - -1. **Enhanced IAM Collector** ingests groups, inline policies, managed policy documents, trust policies, and permission boundaries into the Neptune graph -2. **Policy Evaluator** resolves identity-based + resource-based + boundary + SCP policies into net effective permissions per principal -3. **Escalation Detector** traverses cross-account trust edges (max 3 hops) to find admin, privilege escalation, and lateral movement paths -4. **CloudTrail Analyzer** queries Athena against CloudTrail S3 logs (90-day window) and caches results in DynamoDB -5. **Rightsizer** compares effective permissions against actual usage to generate least-privilege recommendations - -### IAM Data in Neptune - -| Node Label | Description | Key Properties | -|------------|-------------|----------------| -| `IamUser` | IAM user | `arn`, `account_id`, `path` | -| `IamRole` | IAM role | `arn`, `account_id`, `assume_role_policy_document` | -| `IamGroup` | IAM group | `arn`, `account_id`, `path` | -| `IamPolicyDocument` | Policy document (inline or managed) | `policy_arn`, `policy_type`, `document_json` | -| `IamPolicyStatement` | Individual policy statement | `effect`, `actions`, `resources`, `conditions_json` | -| `EffectivePermission` | Computed net permissions | `principal_arn`, `allowed_actions`, `is_admin`, `blast_radius` | -| `EscalationPath` | Detected escalation path | `source_arn`, `target_arn`, `risk_level`, `escalation_type` | - -| Edge Label | Description | -|------------|-------------| -| `MEMBER_OF` | User → Group membership | -| `ATTACHED_TO` | Principal → Policy document | -| `CONTAINS` | Policy document → Statement | -| `GRANTS` | Statement → Resource | -| `TRUSTS` | External principal → Role (from trust policy) | -| `HAS_PERMISSION_BOUNDARY` | Role → Boundary policy | -| `OWNS` | Account → Principal/Group/Policy | - -### Policy Evaluation Logic - -The effect resolver follows AWS evaluation rules: - -1. **Explicit Deny** always wins -2. **Allow** from identity + resource + session policies -3. **Permission Boundary** must also allow (scoping) -4. **SCP** must also allow (organization-level scoping) -5. **Implicit Deny** if no matching allow +## License -Wildcards (`*`, `s3:*`, `s3:Get*`) are fully supported. When a permission boundary is present, `*` is expanded through the boundary to only the actions the boundary permits. +[](#license) -### Condition Evaluation - -20+ IAM condition operators are supported: - -| Category | Operators | -|----------|-----------| -| String | `StringEquals`, `StringNotEquals`, `StringLike`, `StringNotLike` | -| IP | `IpAddress`, `NotIpAddress` | -| ARN | `ArnEquals`, `ArnLike`, `ArnNotEquals`, `ArnNotLike` | -| Numeric | `NumericEquals`, `NumericLessThan`, `NumericGreaterThan`, etc. | -| Boolean | `Bool` | -| Date | `DateEquals`, `DateLessThan`, `DateGreaterThan`, etc. | -| Null | `Null` | - -Service-specific condition keys are defined for `aws`, `s3`, `kms`, `ec2`, `lambda`, `dynamodb`, `rds`, and `ssm`. - -### Escalation Path Detection - -Paths are classified into three types: - -| Type | Description | Risk Level | -|------|-------------|------------| -| `admin` | Trust → role with `*` or `AdministratorAccess` | critical | -| `privilege_escalation` | Trust → role with `iam:PassRole`, `iam:CreateAccessKey`, etc. | high | -| `lateral_movement` | Cross-account trust → role with data access (S3, DynamoDB, KMS) | medium | - -Detection traverses trust edges up to 3 hops (configurable), detecting chained trust paths across accounts. - -### CloudTrail Analysis Pipeline - -``` -CloudTrail S3 Logs - | - v -Athena Workgroup (khalifa-cloudtrail-analysis) - | - v -SQL: GROUP BY principal, eventSource, eventName (90-day window) - | - v -DynamoDB AccessAnalyzerCache table - PK: principalArn - SK: eventSource#eventName - TTL: 90 days - GSI: ActionIndex (reverse lookup by action) -``` - -Requires a Glue database (`khalifa_cloudtrail_db`) pointing to the CloudTrail S3 location. - -### Rightsizing Recommendations - -The rightsizer generates least-privilege policy diffs: - -- Starts with CloudTrail usage data (90-day window) -- Applies a configurable safety margin (default: 7 days) -- Optionally keeps safe read-only actions (`Get*`, `List*`, `Describe*`) -- Consolidates actions by service (`s3:GetObject` + `s3:GetObjectVersion` → `s3:GetObject*`) -- Assigns risk level based on removal ratio: - - **low**: removes <20% of current permissions - - **medium**: removes 20-50% - - **high**: removes >50% (may be too aggressive) - -### CIEM Environment Variables - -| Variable | Description | Default | -|----------|-------------|---------| -| `ATHENA_DATABASE` | Glue database for CloudTrail logs | `khalifa_cloudtrail_db` | -| `ATHENA_WORKGROUP` | Athena workgroup | `khalifa-cloudtrail-analysis` | -| `CLOUDTRAIL_S3_LOCATION` | S3 prefix for CloudTrail logs | `s3://cloudtrail-logs/AWSLogs/` | -| `ACCESS_ANALYZER_TABLE` | DynamoDB table for usage cache | `AccessAnalyzerCache` | -| `ANALYSIS_DAYS` | CloudTrail lookback window in days | `90` | +BSD 3-Clause. See [LICENSE](LICENSE). \ No newline at end of file