Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
de371f4
chore: add skeleton files and requirements
Vishnu2707 Apr 25, 2026
dd24ce0
fix: remove embedded git repo
Vishnu2707 Apr 25, 2026
e872074
Core Structure Created
Vishnu2707 Apr 25, 2026
ee77377
feat: build complete core — scanner engine, 10 rules, API, playbooks,…
Vishnu2707 Apr 25, 2026
053be03
docs: replace ASCII architecture with interactive Mermaid diagram
Vishnu2707 Apr 25, 2026
b31ecb7
feat: Sentinel integration — ingest.py, 4 KQL rules, setup guide (#12)
TFT444 May 2, 2026
d545744
fix: add AZ-STOR-003 compliance mappings, correct NIST control to PR.…
Vishnu2707 May 4, 2026
6c0c58e
docs: add real-world breach scenarios for all 10 starter rules (#15)
TFT444 May 4, 2026
e4382cd
feat: add AZ-KV-002 key vault public access rule and remediation play…
parthrohit22 May 4, 2026
7593ba0
Merge branch 'main' into dev
Vishnu2707 May 4, 2026
0ec2290
Merge remote-tracking branch 'origin/main' into dev
Vishnu2707 May 4, 2026
e8fed83
docs: update README with rule count, roadmap progress and contributors
Vishnu2707 May 4, 2026
35312d4
feat: add network security rules AZ-NET-003 to AZ-NET-010 (#16)
TFT444 May 4, 2026
aee88b2
Merge remote-tracking branch 'origin/main' into dev
Vishnu2707 May 4, 2026
2badbce
Feat/az stor 003 (#21)
ritiksah141 May 5, 2026
1e7a81f
docs: add SOC 2 Type II compliance framework mapping (#33)
TFT444 May 8, 2026
f409b67
Refactor/azure client network methods (#22)
TFT444 May 9, 2026
bb47779
feat: add CI pipeline with 6 automated checks (#34)
ritiksah141 May 9, 2026
0d99e2d
Merge branch 'main' into dev
Vishnu2707 May 9, 2026
46096a6
Merge remote-tracking branch 'origin/main' into dev
Vishnu2707 May 9, 2026
9e5d355
docs: update .github/ISSUE_TEMPLATE/new_rule.md to reflect current co…
Vishnu2707 May 9, 2026
2a5655e
docs: update .github/PULL_REQUEST_TEMPLATE.md to reflect current code…
Vishnu2707 May 9, 2026
57f25a6
docs: update CONTRIBUTING.md to reflect current codebase state
Vishnu2707 May 9, 2026
309deca
docs: update README.md to reflect current codebase state
Vishnu2707 May 9, 2026
693b20c
docs: update compliance/frameworks/iso27001.json to reflect current c…
Vishnu2707 May 9, 2026
c292efc
docs: update compliance/frameworks/nist_csf.json to reflect current c…
Vishnu2707 May 9, 2026
034b9d5
docs: update docs/adding-a-rule.md to reflect current codebase state
Vishnu2707 May 9, 2026
936a7d6
docs: update docs/architecture.md to reflect current codebase state
Vishnu2707 May 9, 2026
3cd0f00
docs: update docs/az-stor-003-test-plan.md to reflect current codebas…
Vishnu2707 May 9, 2026
17c29f4
docs: update docs/azure-setup.md to reflect current codebase state
Vishnu2707 May 9, 2026
6275396
docs: update docs/ci-pipeline.md to reflect current codebase state
Vishnu2707 May 9, 2026
ab16a16
docs: update docs/sentinel-setup.md to reflect current codebase state
Vishnu2707 May 9, 2026
1cd89dd
docs: update sentinel/TEST_PLAN.md to reflect current codebase state
Vishnu2707 May 9, 2026
a2fed2e
docs: update docs/api-reference.md to reflect current codebase state
Vishnu2707 May 9, 2026
98894bc
docs: update docs/rules-reference.md to reflect current codebase state
Vishnu2707 May 9, 2026
fdae7e7
Merge remote-tracking branch 'origin/dev' into dev
Vishnu2707 May 9, 2026
85bbb7f
docs: update README.md for professional open source style
Vishnu2707 May 9, 2026
0643eaf
docs: update CONTRIBUTING.md for professional open source style
Vishnu2707 May 9, 2026
5ebcdd9
docs: update docs/adding-a-rule.md for professional open source style
Vishnu2707 May 9, 2026
eb88659
Merge branch 'main' into dev
Vishnu2707 May 9, 2026
2d230dd
docs: update deployment guide to use Render instead of Azure App Service
Vishnu2707 May 9, 2026
bac6146
Merge remote-tracking branch 'origin/dev' into dev
Vishnu2707 May 9, 2026
d4384fe
feat: add rule AZ-STOR-004 storage account diagnostic logging check (…
SHAURYAKSHARMA24 May 13, 2026
826396a
feat: add rule AZ-IDN-003 Adds scanner rule AZ-IDN-003 detecting Entr…
TFT444 May 13, 2026
cd47b68
feat: add rule AZ-CMP-002 — VM disk not protected by CMK or ADE (#47)
TFT444 May 13, 2026
1efe1f3
Feat/api deployment (#46)
ritiksah141 May 13, 2026
ba6c70c
feat: AZ-NET-011 Network Watcher not enabled in all regions (#42)
emon22-ts May 13, 2026
e7c3487
feat: add AZ-DB-003 PostgreSQL Flexible Server SSL enforcement rule a…
emon22-ts May 16, 2026
024e635
Merge branch 'main' into dev
Vishnu2707 May 16, 2026
bc146ef
[RULE] AZ-CMP-003: VM without endpoint protection installed (#57)
TFT444 May 23, 2026
923cc75
[DOCS] Add OpenShield learning and onboarding portal (#51)
parthrohit22 May 23, 2026
954505c
Merge branch 'main' into dev
Vishnu2707 May 24, 2026
4a2ef01
refactor: reuse database connection per request using Flask g (#41)
safidnadaf May 24, 2026
0e82402
docs: add security policy, issue template, and README badges (#64)
ritiksah141 May 24, 2026
1b25a74
feat: add rule AZ-KV-004 Key Vault purge protection disabled (#55)
aav-wh May 24, 2026
4a1b153
feat: add AZ-STOR-005 geo-redundant storage rule (#74)
SHAURYAKSHARMA24 May 27, 2026
cd339e1
feat: add rule AZ-DB-004 SQL Server firewall allows all Azure service…
aav-wh May 27, 2026
00dad53
docs: add 6 README badges (#79)
ritiksah141 May 28, 2026
d362cc7
feat: add AZ-KV-005 Key Vault certificate expiring within 30 days (#75)
TFT444 May 28, 2026
82efdfb
[RULE] AZ-CMP-004: VM without automatic OS patching enabled (#73)
TFT444 May 28, 2026
1757c84
Merge branch 'main' into dev
Vishnu2707 May 29, 2026
6ff2686
feat: add AI provider abstraction layer for Anthropic, Groq and Gemin…
TFT444 May 29, 2026
5dedde9
Smoke Test Alginment after the recent changes to the Repository causi…
ritiksah141 May 29, 2026
8cf18db
feat: add AZ-IDN-004 PIM not configured for admin roles rule and play…
emon22-ts May 30, 2026
4b2afb5
feat: add AI executive summary and remediation endpoint (#95)
SHAURYAKSHARMA24 May 30, 2026
3636dd7
feat(scanner): add AZ-NET-014 VNet peering gateway transit rule (#94)
aav-wh May 30, 2026
70cb686
feat: add AZ-NET-013 Azure Firewall VNet rule (#99)
SHAURYAKSHARMA24 May 31, 2026
bf82c39
Implement AI Q&A over scan findings (#98)
SHAURYAKSHARMA24 May 31, 2026
9a1f824
Merge branch 'main' into dev
Vishnu2707 May 31, 2026
c0116f8
Feat/CVE correlation (#96)
ritiksah141 Jun 1, 2026
3d17d7b
feat: add RAG powered AI insights layer with Azure security skill emb…
TFT444 Jun 1, 2026
a2263a4
feat: add AZ-NET-012 - NSG flow logs not enabled rule (#76)
safidnadaf Jun 1, 2026
808a9c6
fix: resolve CodeQL warnings in embed.py and test files
Vishnu2707 Jun 1, 2026
c9592c0
Merge branch 'main' into dev
Vishnu2707 Jun 1, 2026
931d32c
feat(frontend): build complete 7-page security dashboard (#111)
vogonPrayas Jun 3, 2026
673511e
Feat/jwt secret prod fail closed (#117)
ritiksah141 Jun 3, 2026
03cd7cb
feat: AI-004 RAG Pipeline - Document Ingestion and Vector Store (#104)
emon22-ts Jun 3, 2026
115320f
Potential fix for pull request finding 'Unused import'
Vishnu2707 Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ jobs:
if: steps.check_config.outputs.is_configured == 'true' || github.event_name == 'workflow_dispatch'
env:
API_URL: ${{ secrets.API_URL || 'https://openshield-api.onrender.com' }}
JWT_SECRET: ${{ secrets.JWT_SECRET || 'change-me-in-production' }}
JWT_SECRET: ${{ secrets.JWT_SECRET }}
AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -216,3 +216,4 @@ __marimo__/

# Streamlit
.streamlit/secrets.toml
ai/vectorstore/
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,11 @@ The OpenShield API is deployed to the Render free tier and is accessible at:
> **Note:** As this is hosted on the Render free tier, the service may spin down after 15 minutes of inactivity. The first request after a spin-down can take 30-60 seconds to complete.

> [!IMPORTANT]
> **Security Requirement:** For absolute security, any production deployment **must** override the default `JWT_SECRET` with a strong, unique value in the environment variables.
> **Security Requirement:** Production deployments **fail at startup** if `JWT_SECRET` is missing, set to the insecure default, or shorter than 32 characters. Generate a strong secret with:
> ```
> python -c "import secrets; print(secrets.token_urlsafe(32))"
> ```
> Set `OPENSHIELD_ENV=production` (or rely on Render's automatic `RENDER=true`) to enable this enforcement. Local development runs without these signals are allowed to use the default with a warning.

---

Expand Down
34 changes: 34 additions & 0 deletions ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# OpenShield RAG Pipeline

Document loader and chunker for OpenShield rules and compliance frameworks.
Loads all scanner rules and CIS, NIST, ISO 27001 and SOC2 controls
into structured documents for the RAG vector store.

## Files

- `ai/loader.py` — loads OpenShield rules and compliance frameworks as structured documents
- `ai/chunker.py` — splits documents into overlapping chunks for embedding
- `ai/embed.py` — builds the ChromaDB vector store (from PR 97)
- `ai/retriever.py` — queries the vector store (from PR 97)

## Vector Store

The vector store is persisted at `ai/vectorstore/` using ChromaDB.

## How loader.py works

Reads all `scanner/rules/az_*.py` files and extracts:
- Rule ID, name, severity, category
- Description and remediation text

Also reads all four compliance framework JSON files:
- CIS Azure Benchmark
- NIST CSF
- ISO 27001
- SOC2

## How chunker.py works

Splits documents into 512-character overlapping chunks with 64-character
overlap. Tries to split on newlines to avoid breaking mid-sentence.
Each chunk inherits the metadata of its parent document.
1 change: 1 addition & 0 deletions ai/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

47 changes: 47 additions & 0 deletions ai/chunker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""Chunking pipeline for OpenShield documents."""
import logging

logger = logging.getLogger(__name__)

DEFAULT_CHUNK_SIZE = 512
DEFAULT_CHUNK_OVERLAP = 64


def chunk_documents(documents, chunk_size=DEFAULT_CHUNK_SIZE, chunk_overlap=DEFAULT_CHUNK_OVERLAP):
chunks = []
for doc in documents:
doc_id = doc.get("id", "unknown")
content = doc.get("content", "")
metadata = doc.get("metadata", {})
doc_chunks = _split_text(content, chunk_size, chunk_overlap)
for idx, chunk_text in enumerate(doc_chunks):
chunks.append({
"id": f"{doc_id}_chunk_{idx}",
"content": chunk_text,
"metadata": {**metadata, "parent_doc_id": doc_id, "chunk_index": idx, "total_chunks": len(doc_chunks)},
})
logger.info("Chunked %d documents into %d chunks", len(documents), len(chunks))
return chunks


def _split_text(text, chunk_size, chunk_overlap):
if len(text) <= chunk_size:
return [text]
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
if end >= len(text):
chunks.append(text[start:].strip())
break
split_pos = text.rfind("
", start, end)
if split_pos == -1 or split_pos <= start:
split_pos = end
chunk = text[start:split_pos].strip()
if chunk:
chunks.append(chunk)
start = split_pos - chunk_overlap
if start < 0:
start = 0
return [c for c in chunks if c]
138 changes: 138 additions & 0 deletions ai/embed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
"""Build the OpenShield knowledge base vector store for RAG AI insights"""


import importlib.util
import json
import logging
from pathlib import Path

import chromadb

logger = logging.getLogger(__name__)

REPO_ROOT = Path(__file__).resolve().parent.parent
RULES_DIR = REPO_ROOT / "scanner" / "rules"
FRAMEWORKS_DIR = REPO_ROOT / "compliance" / "frameworks"
SKILLS_DIR = REPO_ROOT / "ai" / "knowledge" / "skills"
VECTORSTORE_DIR = REPO_ROOT / "ai" / "vectorstore"
COLLECTION_NAME = "openshield"


def _load_rule_module(path):
spec = importlib.util.spec_from_file_location(path.stem, path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module


def _collect_skill_documents():
documents = []
if not SKILLS_DIR.exists():
logger.warning("Skills directory not found, skipping: %s", SKILLS_DIR)
return documents
for path in sorted(SKILLS_DIR.rglob("SKILL.md")):
try:
text = path.read_text(encoding="utf-8")
except Exception as exc:
logger.warning("Skipping %s: %s", path.name, exc)
continue
if not text.strip():
continue
skill_name = path.parent.name
documents.append({
"id": f"skill-{skill_name}",
"text": text,
"source": skill_name,
"type": "skill",
})
return documents


def _collect_rule_documents():
documents = []
for path in sorted(RULES_DIR.glob("az_*.py")):
try:
module = _load_rule_module(path)
except Exception as exc:
logger.warning("Skipping %s: %s", path.name, exc)
continue
rule_id = getattr(module, "RULE_ID", None)
if not rule_id:
continue
text = (
f"OpenShield rule {rule_id}: {getattr(module, 'RULE_NAME', '')}\n"
f"Category: {getattr(module, 'CATEGORY', '')}\n"
f"Severity: {getattr(module, 'SEVERITY', '')}\n"
f"Description: {getattr(module, 'DESCRIPTION', '')}\n"
f"Remediation: {getattr(module, 'REMEDIATION', '')}"
)
documents.append({
"id": f"rule-{rule_id}",
"text": text,
"source": rule_id,
"type": "rule",
})
return documents


def _collect_compliance_documents():
documents = []
for path in sorted(FRAMEWORKS_DIR.glob("*.json")):
framework = path.stem
try:
data = json.loads(path.read_text(encoding="utf-8"))
except Exception as exc:
logger.warning("Skipping %s: %s", path.name, exc)
continue
for control_id, control in data.get("controls", {}).items():
description = control.get("description", "")
if not description:
continue
text = (
f"{framework} control {control_id}: "
f"{control.get('control_name', '')}\n{description}"
)
documents.append({
"id": f"{framework}-{control_id}",
"text": text,
"source": f"{framework} {control_id}",
"type": "control",
})
return documents


def build_vectorstore():
VECTORSTORE_DIR.mkdir(parents=True, exist_ok=True)
client = chromadb.PersistentClient(path=str(VECTORSTORE_DIR))

try:
client.delete_collection(COLLECTION_NAME)
except Exception as exc:
logger.info("Could not delete collection '%s' before rebuild: %s", COLLECTION_NAME, exc)
collection = client.create_collection(COLLECTION_NAME)

documents = (
_collect_skill_documents()
+ _collect_rule_documents()
+ _collect_compliance_documents()
)
if not documents:
raise RuntimeError("No documents found to embed. Check repo paths.")

collection.add(
ids=[d["id"] for d in documents],
documents=[d["text"] for d in documents],
metadatas=[
{"source": d["source"], "type": d["type"]} for d in documents
],
)
logger.info(
"Embedded %d documents into '%s'.", len(documents), COLLECTION_NAME
)
return len(documents)


if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
count = build_vectorstore()
print(f"Done. Vector store built with {count} documents at {VECTORSTORE_DIR}")
Loading
Loading