diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index f7053b1..6368d67 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -10,6 +10,22 @@ on: - main jobs: + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up uv + uses: astral-sh/setup-uv@v4 + with: + version: "latest" + + - name: Install dependencies + run: uv sync --dev + + - name: Lint with ruff + run: uv run ruff check . + test: runs-on: ubuntu-latest strategy: diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index bfa25e1..b3374cb 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,49 +1,89 @@ -# Contributing +# Contributing to mAIcro -Thanks for your interest in contributing to **mAIcro**. +Thank you for your interest in contributing to mAIcro. This document sets the technical and cultural standards for contributions to ensure the project remains high-quality, reliable, and aligned with its core architecture. -## Ways to help +## Project Philosophy -- Report bugs (include steps to reproduce and expected vs actual behavior). -- Suggest features (include the problem you’re trying to solve and constraints). -- Improve docs (typos, examples, setup clarity). -- Send PRs (small, focused changes are easiest to review). +mAIcro is designed as a focused, high-performance RAG (Retrieval-Augmented Generation) service for organizations. Every contribution must respect these foundational principles: -If you’ve found a security issue, please follow `SECURITY.md` instead of opening a public issue. +* **Simple and Clear** We favor straightforward implementations over complex abstractions. The codebase should be navigable without deep knowledge of specialized frameworks. +* **Controlled Deployments** mAIcro is built for trusted environments (e.g., behind a Discord bot). It is not intended for use as a multi-tenant public API. +* **Performance is a Feature** Every new abstraction or feature should be evaluated for its impact on performance and operational reliability. -Start the local dev service: +## Ways to Contribute -```bash -uv sync --dev -cp .env.example .env -# Edit .env and set at least: -# LLM_PROVIDER=google -# GEMINI_API_KEY=... -# QDRANT_URL=http://localhost:6333 -``` +* **Bug Reports** Report verified bugs via GitHub Issues with clear reproduction steps, logs, and environment details. +* **Feature Suggestions**: Propose new capabilities that align with the project's stateless philosophy. Open an issue for discussion before implementing. +* **Documentation**: Improve clarity in READMEs, docstrings, or deployment guides. +* **Pull Requests**: Submit code changes for verified bugs or approved features. -## Common commands +## Local Development -Run tests: +mAIcro uses a single **Docker Compose** configuration for local development. We provide a **Makefile** to simplify common commands. -```bash -uv run pytest -``` +### Prerequisites + +* [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/) +* **Google Gemini API Key**: Required for LLM generation. + +### Setup + +1. Initialize your environment: + ```bash + cp .env.example .env + ``` + +2. Set required variables in `.env`: + * `GEMINI_API_KEY`: Your Google AI Studio API key. + * `QDRANT_API_KEY`: Your Qdrant API key. + * `QDRANT_URL`: Your Qdrant URL. + * `DISCORD_BOT_TOKEN`: Your Discord bot token. + * `DISCORD_CHANNEL_ID`: Your Discord client ID. + + +3. Launch the development stack: + ```bash + make dev + ``` -Start the API locally: +The API will be available at `http://localhost:8000`. Documentation and interactive API testing are provided at `http://localhost:8000/api/v1/docs`. + +Hot-reloading is enabled through volume mapping, so code changes will automatically update the running service. + +## Testing + +We expect all logic changes to be covered by tests. All tests are run within the Docker environment to ensure accurate behavior. ```bash -uv run uvicorn main:app --reload --host 0.0.0.0 --port 8000 +make test ``` -## Pull requests +## Common Commands + +* `make dev` - Start the project in the background with hot-reloading. +* `make test` - Run the test suite reliably inside the container. +* `make lint` - Check code style and common errors using `ruff`. +* `make format` - Automatically reformat code to match project standards. +* `make build` - Build the production Docker image locally. +* `make stop` - Stop all project containers. +* `make clean` - Reset environment (remove volumes, images, and orphans). +* `make logs` - Stream output from the application container. +* `make shell` - Open a terminal inside the running application container. + +## Pull Request Process + +1. **Branching**: Create a feature branch from `main` using descriptive names and send pull request to `dev` (e.g., `feature/hybrid-search`). +2. **Atomic Commits**: Ensure each commit represents a single logical change. +3. **Documentation**: Update relevant documentation if the change affects usage or architecture. +4. **Review Cycle**: All PRs require at least one maintainer review. Be prepared to iterate on feedback. + + +## Security -- Keep PRs focused and small when possible. -- Include tests for behavior changes and bug fixes. -- Update `README.md` / docs for user-facing changes. -- Don’t commit secrets (keep `.env` local; update `.env.example` if you add a new setting). -- Don’t commit local state directories (for example `var/` or `local_qdrant/`). +If you discover a security vulnerability, do NOT open a public issue. Follow the process outlined in [SECURITY.md](SECURITY.md) for private disclosure. -## Reporting issues +## Issue Reporting -Use GitHub Issues for bugs and feature requests. +When opening an issue: +1. Be descriptive and objective. +2. For feature requests, explain why the feature belongs in the core stateless backend rather than a client-side implementation. diff --git a/Dockerfile b/Dockerfile index 148a656..d82f5a1 100644 --- a/Dockerfile +++ b/Dockerfile @@ -9,13 +9,11 @@ WORKDIR /app COPY pyproject.toml uv.lock README.md ./ -RUN --mount=type=cache,target=/root/.cache/uv \ - uv sync --frozen --no-install-project --no-dev +RUN uv sync --frozen --no-install-project --no-dev COPY src ./src -RUN --mount=type=cache,target=/root/.cache/uv \ - uv sync --frozen +RUN uv sync --frozen FROM python:3.12-slim @@ -30,4 +28,4 @@ COPY src ./src EXPOSE 8000 -CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] +CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] \ No newline at end of file diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..b5ba518 --- /dev/null +++ b/Makefile @@ -0,0 +1,56 @@ +.PHONY: dev build prod stop clean test lint format logs shell help + +# Default target +help: + @echo "mAIcro Development CLI" + @echo "----------------------" + @echo "make dev - Start the development stack with hot-reloading" + @echo "make test - Run the test suite inside the container" + @echo "make lint - Check code style with ruff" + @echo "make format - Reformat code with ruff" + @echo "make build - Build the production Docker image" + @echo "make prod - Start the production stack" + @echo "make stop - Stop all project containers" + @echo "make clean - Reset environment (remove volumes and orphans)" + @echo "make logs - Stream logs from the application" + @echo "make shell - Open a shell inside the container" + +# Start development stack +dev: + docker compose -f docker-compose.dev.yml up --build -d + +# Build production image +build: + docker build -t maicro:latest . + +# Start production stack +prod: + docker compose up --build -d + +# Stop the project +stop: + docker compose -f docker-compose.yml -f docker-compose.dev.yml down + +# Full clean reset +clean: + docker compose -f docker-compose.yml -f docker-compose.dev.yml down -v --rmi all --remove-orphans + +# Run tests +test: + docker compose exec maicro /app/.venv/bin/pytest + +# Lint code +lint: + docker compose exec maicro /app/.venv/bin/ruff check . + +# Format code +format: + docker compose exec maicro /app/.venv/bin/ruff format . + +# View logs +logs: + docker compose logs -f maicro + +# Interactive shell +shell: + docker compose exec maicro /bin/bash diff --git a/README.md b/README.md index a8bdd98..06fa889 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,8 @@ # mAIcro: Open Source Knowledge Service +Built and maintained by the **Dev Department of MicroClub**, the computer science club at **USTHB** (University of Science and Technology Houari Boumediene, Algiers). + **mAIcro** is an open-source AI service designed to centralize organizational knowledge and answer questions via RAG (Retrieval-Augmented Generation). It features a stateless architecture optimized for cloud deployment, automatic Discord integration, and production-ready performance. ## Table of Contents diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 0000000..2bdd538 --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,44 @@ +# Security Policy + +## Reporting a Vulnerability + +We take the security of mAIcro seriously. If you believe you have found a security vulnerability, please report it to us as soon as possible. + +**How to Report:** +- **Confidential Reporting**: Please email us at [microclubit@gmail.com](mailto:microclubit@gmail.com). +- **GitHub Security Advisory**: Alternatively, you can use the ["Report a vulnerability"](https://github.com/MicroClub-USTHB/mAIcro/security/advisories/new) button on GitHub. + +Please include: +1. A description of the vulnerability. +2. Steps to reproduce the issue. +3. Potential impact. + +We will acknowledge your report within 48 hours and provide a timeline for resolution. + +--- + +## Security Best Practices for Users + +To keep your mAIcro instance secure, please follow these guidelines: + +### 1. Protect Your API Keys +mAIcro relies on several sensitive API keys: +- `GEMINI_API_KEY` +- `QDRANT_API_KEY` +- `DISCORD_BOT_TOKEN` + +**Never commit these keys to version control.** Always use the `.env` file (which is included in `.gitignore`) or use a secure secret management service (like GitHub Secrets, AWS Secrets Manager, or HashiCorp Vault). + +### 2. Least Privilege (Discord Bot) +When setting up your Discord bot, only grant the minimum required permissions: +- `View Channels` +- `Read Message History` +- `Message Content Intent` (required for RAG functionality) + +Avoid granting `Administrator` or other broad permissions unless absolutely necessary for your specific use case. + +### 3. Environment Isolation +Use separate API keys and Qdrant collections for development and production environments to prevent accidental data loss or exposure. + +### 4. Regular Updates +Regularly pull the latest Docker image (`ghcr.io/microclub-usthb/maicro:latest`) to ensure you have the latest security patches and features. diff --git a/docker-compose.dev.yml b/docker-compose.dev.yml index 0766500..2685f40 100644 --- a/docker-compose.dev.yml +++ b/docker-compose.dev.yml @@ -11,10 +11,11 @@ services: env_file: - .env volumes: - - ./src:/app/src # bind-mount source for hot reload + - ./src:/app/src + - ./tests:/app/tests - ./pyproject.toml:/app/pyproject.toml - ./uv.lock:/app/uv.lock - - /app/.venv # keep venv persistent inside container + - /app/.venv ports: - "8000:8000" command: > diff --git a/pyproject.toml b/pyproject.toml index c280b39..9228b93 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -47,4 +47,5 @@ pythonpath = ["src"] [dependency-groups] dev = [ "pytest>=8.3.0", + "ruff>=0.9.3", ] diff --git a/src/api/error_handlers.py b/src/api/error_handlers.py index 2f4ebb2..d5fb270 100644 --- a/src/api/error_handlers.py +++ b/src/api/error_handlers.py @@ -32,4 +32,6 @@ async def handle_value_error(_: Request, exc: ValueError) -> JSONResponse: @app.exception_handler(Exception) async def handle_unexpected_error(_: Request, exc: Exception) -> JSONResponse: logger.exception("Unhandled API exception", exc_info=exc) - return JSONResponse(status_code=500, content={"detail": f"Internal error: {exc}"}) + return JSONResponse( + status_code=500, content={"detail": f"Internal error: {exc}"} + ) diff --git a/src/core/audit.py b/src/core/audit.py index 179d207..61ea006 100644 --- a/src/core/audit.py +++ b/src/core/audit.py @@ -2,21 +2,31 @@ Audit module — handles startup reconciliation of offline edits and deletes. """ +import logging from core.config import settings from core.discord_fetcher import fetch_channel_messages, fetch_message_by_id -from core.state import get_last_ingested_message_id, ensure_channel_in_state, update_last_ingested_message_id -import logging +from core.state import ( + get_last_ingested_message_id, + ensure_channel_in_state, + update_last_ingested_message_id, +) logger = logging.getLogger(__name__) +def _message_id_to_int(message_id: str) -> int | None: + try: + return int(str(message_id)) + except (TypeError, ValueError): + return None + + async def run_startup_audit( channel_ids: list[str], window: int = 1000, ) -> dict: - from qdrant_client.http import models as qdrant_models from core.ingestion import ( @@ -47,7 +57,11 @@ async def run_startup_audit( "[audit] Channel %s: no cursor found or channel never ingested, skipping audit", channel_id, ) - summary[channel_id] = {"deleted": 0, "updated": 0, "skipped": "new_channel"} + summary[channel_id] = { + "deleted": 0, + "updated": 0, + "skipped": "new_channel", + } continue cursor_msg = await fetch_message_by_id( @@ -59,7 +73,8 @@ async def run_startup_audit( if cursor_msg is None: logger.info( "[audit] Channel %s: cursor message %s was deleted, searching for new cursor", - channel_id, last_message_id, + channel_id, + last_message_id, ) recent_messages: list[dict] = await fetch_channel_messages( bot_token=settings.DISCORD_BOT_TOKEN, @@ -70,7 +85,9 @@ async def run_startup_audit( new_cursor = recent_messages[0]["id"] logger.info( "[audit] Channel %s: updating cursor from %s to %s", - channel_id, last_message_id, new_cursor, + channel_id, + last_message_id, + new_cursor, ) update_last_ingested_message_id(channel_id, new_cursor) last_message_id = new_cursor @@ -80,12 +97,16 @@ async def run_startup_audit( "[audit] Channel %s: no messages found in channel at all", channel_id, ) - summary[channel_id] = {"deleted": 0, "updated": 0, "skipped": "channel_empty"} + summary[channel_id] = { + "deleted": 0, + "updated": 0, + "skipped": "channel_empty", + } continue - + # Check if cursor message was edited - get stored content from Qdrant cursor_id_str = str(cursor_msg["id"]) - + # Get the stored content for cursor from Qdrant cursor_stored_content = "" try: @@ -106,10 +127,12 @@ async def run_startup_audit( ) cursor_points, _ = search_result if cursor_points: - cursor_stored_content = (cursor_points[0].payload.get("page_content") or "").strip() + cursor_stored_content = ( + cursor_points[0].payload.get("page_content") or "" + ).strip() except Exception as e: logger.warning(f"[audit] Failed to get cursor content from Qdrant: {e}") - + cursor_docs = _docs_from_discord_messages([cursor_msg], channel_id) cursor_content = cursor_docs[0].page_content if cursor_docs else "" @@ -120,29 +143,45 @@ async def run_startup_audit( ) update_message_in_store(cursor_msg, channel_id) updated += 1 - + recent: list[dict] = await fetch_channel_messages( bot_token=settings.DISCORD_BOT_TOKEN, channel_id=channel_id, limit=window, before=last_message_id, ) - + discord_index: dict[str, str] = {} + recent_lookup: dict[str, dict] = {} cursor_id_str = str(cursor_msg["id"]) discord_index[cursor_id_str] = cursor_content - + for msg in recent: docs = _docs_from_discord_messages([msg], channel_id) msg_id_str = str(msg["id"]) discord_index[msg_id_str] = docs[0].page_content if docs else "" + recent_lookup[msg_id_str] = msg if not discord_index: logger.info( "[audit] Channel %s: no messages before cursor to audit", channel_id, ) - summary[channel_id] = {"deleted": 0, "updated": 0, "skipped": "no_messages_before_cursor"} + summary[channel_id] = { + "deleted": 0, + "updated": 0, + "skipped": "no_messages_before_cursor", + } continue + + cursor_id_int = _message_id_to_int(cursor_id_str) + audited_ids = [ + parsed_id + for parsed_id in ( + _message_id_to_int(msg_id) for msg_id in discord_index + ) + if parsed_id is not None + ] + audit_lower_bound = min(audited_ids) if audited_ids else cursor_id_int try: _bootstrap_collection() except Exception as e: @@ -162,7 +201,9 @@ async def run_startup_audit( ) logger.info( "[audit] Channel %s: checking %d Qdrant points against %d messages before cursor", - channel_id, total_count_result.count, len(discord_index), + channel_id, + total_count_result.count, + len(discord_index), ) all_points_filter = qdrant_models.Filter( must=[ @@ -191,9 +232,29 @@ async def run_startup_audit( if not msg_id: continue msg_id_str = str(msg_id) + msg_id_int = _message_id_to_int(msg_id_str) + if msg_id_int is None: + logger.debug( + "[audit] Skipping non-numeric message_id=%s", msg_id_str + ) + continue + if ( + audit_lower_bound is not None + and cursor_id_int is not None + and not audit_lower_bound <= msg_id_int <= cursor_id_int + ): + logger.debug( + "[audit] Skipping message_id=%s outside audited window [%s, %s]", + msg_id_str, + audit_lower_bound, + cursor_id_int, + ) + continue logger.debug( "[audit] Checking msg_id=%s (type=%s) - in discord_index=%s", - msg_id_str, type(msg_id), msg_id_str in discord_index, + msg_id_str, + type(msg_id), + msg_id_str in discord_index, ) if msg_id_str not in discord_index: n = delete_message_from_store(channel_id, msg_id_str) @@ -201,7 +262,8 @@ async def run_startup_audit( if n: logger.info( "[audit] deleted message_id=%s from channel %s (was deleted offline)", - msg_id_str, channel_id, + msg_id_str, + channel_id, ) else: logger.debug( @@ -218,15 +280,14 @@ async def run_startup_audit( if msg_id_str == cursor_id_str: msg_dict = cursor_msg else: - msg_dict = next( - (m for m in recent if str(m["id"]) == msg_id_str), None - ) + msg_dict = recent_lookup.get(msg_id_str) if msg_dict: update_message_in_store(msg_dict, channel_id) updated += 1 logger.info( "[audit] updated message_id=%s in channel %s (edited offline)", - msg_id_str, channel_id, + msg_id_str, + channel_id, ) if next_offset is None: @@ -235,17 +296,27 @@ async def run_startup_audit( logger.info( "[audit] Channel %s audit complete: deleted=%d, updated=%d", - channel_id, deleted, updated, + channel_id, + deleted, + updated, ) except Exception as exc: - logger.warning("[audit] Unexpected error for channel %s: %s", channel_id, exc) + logger.warning( + "[audit] Unexpected error for channel %s: %s", channel_id, exc + ) errors_list.append(str(exc)) - summary[channel_id] = {"deleted": deleted, "updated": updated, "errors": errors_list} + summary[channel_id] = { + "deleted": deleted, + "updated": updated, + "errors": errors_list, + } logger.info( "[audit] channel=%s deleted=%d updated=%d", - channel_id, deleted, updated, + channel_id, + deleted, + updated, ) - return summary \ No newline at end of file + return summary diff --git a/src/core/config.py b/src/core/config.py index b140515..989b59c 100644 --- a/src/core/config.py +++ b/src/core/config.py @@ -29,10 +29,9 @@ class Settings(BaseSettings): QDRANT_URL: Optional[str] = None QDRANT_API_KEY: Optional[str] = None COLLECTION_NAME: str = "microclub_knowledge" - - - HYBRID_SEARCH_ALPHA: float = 0.7 - HYBRID_SEARCH_RRF_K: int = 60 + + HYBRID_SEARCH_ALPHA: float = 0.7 + HYBRID_SEARCH_RRF_K: int = 60 model_config = SettingsConfigDict( case_sensitive=True, @@ -52,4 +51,3 @@ def discord_channel_id_list(self) -> List[str]: settings = Settings() - diff --git a/src/core/discord_listener.py b/src/core/discord_listener.py index e9ed2b8..257de4e 100644 --- a/src/core/discord_listener.py +++ b/src/core/discord_listener.py @@ -19,7 +19,6 @@ logger = logging.getLogger(__name__) - async def handle_message_create(message: dict, channel_ids: set[str]) -> None: """ Process a single MESSAGE_CREATE payload (as a plain dict). @@ -33,12 +32,12 @@ async def handle_message_create(message: dict, channel_ids: set[str]) -> None: if not docs: return - count = await asyncio.to_thread(ingest_documents, docs) if count: update_last_ingested_message_id(channel_id, message["id"]) logger.info("[listener] ingested %d doc(s) from channel %s", count, channel_id) + async def handle_message_delete(payload: dict, channel_ids: set[str]) -> None: """ Process a MESSAGE_DELETE payload (plain dict). @@ -58,7 +57,9 @@ async def handle_message_delete(payload: dict, channel_ids: set[str]) -> None: n = await asyncio.to_thread(delete_message_from_store, channel_id, message_id) logger.info( "[listener] deleted %d point(s) for message_id=%s channel=%s", - n, message_id, channel_id, + n, + message_id, + channel_id, ) @@ -78,7 +79,9 @@ async def handle_message_update(payload: dict, channel_ids: set[str]) -> None: if count: logger.info( "[listener] updated message_id=%s in channel %s (%d doc(s))", - payload.get("id"), channel_id, count, + payload.get("id"), + channel_id, + count, ) else: logger.debug( @@ -87,9 +90,8 @@ async def handle_message_update(payload: dict, channel_ids: set[str]) -> None: ) - def _message_to_dict(msg) -> dict: - + return { "id": str(msg.id), "channel_id": str(msg.channel.id), @@ -106,14 +108,14 @@ def _message_to_dict(msg) -> dict: } - - async def run_discord_listener(bot_token: str, channel_ids: list[str]) -> None: """ Connect to Discord via discord.py and listen for new messages forever. """ if not bot_token: - logger.error("[listener] DISCORD_BOT_TOKEN is not set — listener will not start") + logger.error( + "[listener] DISCORD_BOT_TOKEN is not set — listener will not start" + ) return if not channel_ids: logger.error("[listener] No channel IDs configured — listener will not start") @@ -200,7 +202,10 @@ async def on_raw_message_edit(payload: discord.RawMessageUpdateEvent) -> None: delay = min(BASE_DELAY * 2 ** (attempt - 1), MAX_DELAY) logger.warning( "[listener] Disconnected (attempt %d/%d): %s. Retrying in %.0fs…", - attempt, MAX_RETRIES, exc, delay, + attempt, + MAX_RETRIES, + exc, + delay, ) # Close the old client before reconnecting try: @@ -214,7 +219,9 @@ async def on_raw_message_edit(payload: discord.RawMessageUpdateEvent) -> None: @client.event async def on_ready() -> None: logger.info( - "[listener] READY — logged in as %s (id=%s)", client.user, client.user.id + "[listener] READY — logged in as %s (id=%s)", + client.user, + client.user.id, ) @client.event @@ -230,23 +237,31 @@ async def on_message(msg: discord.Message) -> None: await handle_message_create(_message_to_dict(msg), watched) @client.event - async def on_raw_message_delete(payload: discord.RawMessageDeleteEvent) -> None: + async def on_raw_message_delete( + payload: discord.RawMessageDeleteEvent, + ) -> None: channel_id = str(payload.channel_id) message_id = str(payload.message_id) logger.debug( - "[listener] MESSAGE_DELETE channel=%s message_id=%s", channel_id, message_id + "[listener] MESSAGE_DELETE channel=%s message_id=%s", + channel_id, + message_id, ) await handle_message_delete( {"channel_id": channel_id, "id": message_id}, watched ) @client.event - async def on_raw_message_edit(payload: discord.RawMessageUpdateEvent) -> None: + async def on_raw_message_edit( + payload: discord.RawMessageUpdateEvent, + ) -> None: data: dict = payload.data channel_id = str(payload.channel_id) message_id = str(payload.message_id) logger.debug( - "[listener] MESSAGE_UPDATE channel=%s message_id=%s", channel_id, message_id + "[listener] MESSAGE_UPDATE channel=%s message_id=%s", + channel_id, + message_id, ) # Build a normalised dict compatible with handle_message_update. msg_dict = { @@ -254,9 +269,13 @@ async def on_raw_message_edit(payload: discord.RawMessageUpdateEvent) -> None: "channel_id": channel_id, "content": data.get("content", ""), "author": data.get("author", {"username": "unknown"}), - "timestamp": data.get("edited_timestamp") or data.get("timestamp", ""), + "timestamp": data.get("edited_timestamp") + or data.get("timestamp", ""), "embeds": [ - {"title": e.get("title", ""), "description": e.get("description", "")} + { + "title": e.get("title", ""), + "description": e.get("description", ""), + } for e in data.get("embeds", []) ], } diff --git a/src/core/hybrid_search.py b/src/core/hybrid_search.py index 7e1e125..d5da16b 100644 --- a/src/core/hybrid_search.py +++ b/src/core/hybrid_search.py @@ -6,7 +6,6 @@ import logging -from qdrant_client import QdrantClient from qdrant_client.http import models as qdrant_models from langchain_core.documents import Document from langchain_core.retrievers import RetrieverLike @@ -14,39 +13,41 @@ from core.config import settings from core.llm_provider import get_embeddings + def _get_qdrant_client(): from core.vector_store import get_qdrant_client + return get_qdrant_client() + logger = logging.getLogger(__name__) def _reciprocal_rank_fusion( - results_by_source: dict[str, list[tuple[str, float]]], - k: int | None = None + results_by_source: dict[str, list[tuple[str, float]]], k: int | None = None ) -> list[tuple[str, float]]: """Combine ranked lists using Reciprocal Rank Fusion. - + RRF_score(doc) = Σ (1 / (k + rank(doc))) - + Args: results_by_source: Dict mapping source name to list of (doc_id, score) tuples k: Constant for RRF (defaults to settings.HYBRID_SEARCH_RRF_K = 60) - + Returns: List of (doc_id, rrf_score) tuples, sorted by rrf_score descending """ if k is None: - k = getattr(settings, 'HYBRID_SEARCH_RRF_K', 60) - + k = getattr(settings, "HYBRID_SEARCH_RRF_K", 60) + rrf_scores: dict[str, float] = {} - + for source, results in results_by_source.items(): for rank, (doc_id, score) in enumerate(results, start=1): if doc_id not in rrf_scores: rrf_scores[doc_id] = 0.0 rrf_scores[doc_id] += 1.0 / (k + rank) - + # Sort by RRF score descending sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True) return sorted_results @@ -64,20 +65,20 @@ def hybrid_search( filter_condition: qdrant_models.Filter | None = None, ) -> list[Document]: """Execute hybrid search combining semantic (vector) and keyword (BM25) search. - - + + Returns: List of Documents sorted by RRF score """ if alpha is None: - alpha = getattr(settings, 'HYBRID_SEARCH_ALPHA', 0.7) - + alpha = getattr(settings, "HYBRID_SEARCH_ALPHA", 0.7) + client = _get_qdrant_client() collection_name = settings.COLLECTION_NAME embedding = get_embeddings() - + query_vector = embedding.embed_query(query) - + vector_results = client.query_points( collection_name=collection_name, query=query_vector, @@ -86,20 +87,19 @@ def hybrid_search( with_payload=True, with_vectors=False, ) - - + text_match_condition = qdrant_models.FieldCondition( key="page_content", match=qdrant_models.MatchText(text=query), ) - + if filter_condition: text_filter = qdrant_models.Filter( must=[filter_condition, text_match_condition] ) else: text_filter = qdrant_models.Filter(must=[text_match_condition]) - + text_results = client.scroll( collection_name=collection_name, scroll_filter=text_filter, @@ -107,21 +107,21 @@ def hybrid_search( with_payload=True, with_vectors=False, ) - + message_id_docs = [] if _is_message_id(query): message_id_condition = qdrant_models.FieldCondition( key="metadata.message_id", match=qdrant_models.MatchValue(value=query), ) - + if filter_condition: message_id_filter = qdrant_models.Filter( must=[filter_condition, message_id_condition] ) else: message_id_filter = qdrant_models.Filter(must=[message_id_condition]) - + message_id_results = client.scroll( collection_name=collection_name, scroll_filter=message_id_filter, @@ -129,43 +129,43 @@ def hybrid_search( with_payload=True, with_vectors=False, ) - + for hit in message_id_results[0]: payload = hit.payload or {} doc_id = hit.id score = 2.0 message_id_docs.append((doc_id, score, payload)) - + vector_docs = [] for hit in vector_results.points: payload = hit.payload or {} doc_id = hit.id - score = hit.score if hasattr(hit, 'score') else 1.0 + score = hit.score if hasattr(hit, "score") else 1.0 vector_docs.append((doc_id, score, payload)) - + text_docs = [] - for hit in text_results[0]: + for hit in text_results[0]: payload = hit.payload or {} doc_id = hit.id score = 1.0 text_docs.append((doc_id, score, payload)) - - rrf_k = getattr(settings, 'HYBRID_SEARCH_RRF_K', 60) - + + rrf_k = getattr(settings, "HYBRID_SEARCH_RRF_K", 60) + results_by_source = {} - + vector_ranked = [(doc_id, score) for doc_id, score, _ in vector_docs] - results_by_source['vector'] = vector_ranked - + results_by_source["vector"] = vector_ranked + text_ranked = [(doc_id, 1.0) for doc_id, _, _ in text_docs] - results_by_source['text'] = text_ranked - + results_by_source["text"] = text_ranked + if message_id_docs: message_id_ranked = [(doc_id, score) for doc_id, score, _ in message_id_docs] - results_by_source['message_id'] = message_id_ranked - + results_by_source["message_id"] = message_id_ranked + rrf_scores = _reciprocal_rank_fusion(results_by_source, k=rrf_k) - + doc_map = {} for doc_id, score, payload in vector_docs: doc_map[doc_id] = payload @@ -175,7 +175,7 @@ def hybrid_search( for doc_id, score, payload in message_id_docs: if doc_id not in doc_map: doc_map[doc_id] = payload - + documents = [] for doc_id, rrf_score in rrf_scores: if doc_id in doc_map: @@ -188,24 +188,24 @@ def hybrid_search( metadata=metadata, ) ) - + return documents[:k] class _HybridRetriever: """Wrapper class to make hybrid search compatible with LangChain's retriever interface.""" - + def __init__(self, alpha: float | None = None, k: int = 5): self.alpha = alpha self.k = k - + def invoke(self, query: str) -> list[Document]: return hybrid_search( query=query, alpha=self.alpha, k=self.k, ) - + def __call__(self, query: str) -> list[Document]: return self.invoke(query) @@ -215,8 +215,8 @@ def get_hybrid_retriever( k: int = 5, ) -> RetrieverLike: """Return a hybrid search retriever interface. - - + + Returns: A retriever with .invoke() method """ diff --git a/src/core/ingestion.py b/src/core/ingestion.py index 604fb81..8906024 100644 --- a/src/core/ingestion.py +++ b/src/core/ingestion.py @@ -28,7 +28,6 @@ logger = logging.getLogger(__name__) - def _is_missing_collection_error(message: str) -> bool: """Return True when an exception message indicates the Qdrant collection is missing.""" if not message: @@ -37,7 +36,9 @@ def _is_missing_collection_error(message: str) -> bool: collection = settings.COLLECTION_NAME.lower() if collection not in lowered: return False - return any(needle in lowered for needle in ("doesn't exist", "does not exist", "not found")) + return any( + needle in lowered for needle in ("doesn't exist", "does not exist", "not found") + ) def _is_missing_collection_exception(exc: Exception) -> bool: @@ -45,21 +46,23 @@ def _is_missing_collection_exception(exc: Exception) -> bool: if isinstance(exc, UnexpectedResponse): if exc.status_code != 404: return False - return _is_missing_collection_error(exc.content.decode("utf-8", errors="ignore")) + return _is_missing_collection_error( + exc.content.decode("utf-8", errors="ignore") + ) return _is_missing_collection_error(str(exc)) def _bootstrap_collection() -> None: """Create the Qdrant collection (if missing) and clear vector-store cache.""" client = get_qdrant_client() - + # Check if collection already exists if client.collection_exists(settings.COLLECTION_NAME): # Collection exists - just ensure indexes are created _ensure_collection_indexes(client) get_vector_store.cache_clear() return - + # Collection doesn't exist - need to get embedding size embedding = get_embeddings() vector_size = len(embedding.embed_query("collection bootstrap")) @@ -69,7 +72,7 @@ def _bootstrap_collection() -> None: def _ensure_collection_indexes(client) -> None: """Ensure payload indexes exist on the collection. - + Qdrant Cloud requires explicit payload indexes for filtered queries. """ # Ensure payload indexes exist (required by Qdrant Cloud for filtered queries) @@ -138,7 +141,6 @@ def _docs_from_discord_messages( return docs - def _ensure_collection_exists(vector_size: int) -> None: """Create collection (if missing) and ensure payload indexes exist. @@ -232,7 +234,6 @@ def ingest_documents(documents: list[Document], filter_duplicates: bool = True) return len(documents) - def _filter_by_message(channel_id: str, message_id: str) -> qdrant_models.Filter: """Build a Qdrant payload filter matching a specific (channel_id, message_id) pair.""" return qdrant_models.Filter( @@ -284,7 +285,9 @@ def _check_duplicate_message_ids(documents: list[Document]) -> set[str]: return existing_ids -def _filter_duplicate_documents(documents: list[Document]) -> tuple[list[Document], int]: +def _filter_duplicate_documents( + documents: list[Document], +) -> tuple[list[Document], int]: """ Filter out documents that already exist in the vector store based on message_id. Returns a tuple of (filtered_documents, duplicate_count). @@ -338,7 +341,9 @@ def delete_message_from_store(channel_id: str, message_id: str) -> int: ) logger.debug( "[ingestion] deleted %d point(s) for message_id=%s channel=%s", - n, message_id, channel_id, + n, + message_id, + channel_id, ) return n @@ -378,21 +383,20 @@ async def ingest_from_discord(limit_per_channel: int = 200) -> dict: limit=None, after=last_message_id, ) - + if not messages: summary[channel_id] = 0 continue - + docs = _docs_from_discord_messages(messages, channel_id) count = ingest_documents(docs) summary[channel_id] = count total += count - - + if messages: newest_message = messages[-1] if last_message_id else messages[0] update_last_ingested_message_id(channel_id, newest_message["id"]) - + except DiscordFetchError as exc: if exc.status_code == 403: errors[channel_id] = ( diff --git a/src/core/llm_provider.py b/src/core/llm_provider.py index 1920c65..2160d2e 100644 --- a/src/core/llm_provider.py +++ b/src/core/llm_provider.py @@ -34,7 +34,9 @@ def _resolve_model_name(*, secondary: bool) -> str: def _build_google_llm(*, secondary: bool): from langchain_google_genai import ChatGoogleGenerativeAI - api_key = settings.SECONDARY_GEMINI_API_KEY if secondary else settings.GEMINI_API_KEY + api_key = ( + settings.SECONDARY_GEMINI_API_KEY if secondary else settings.GEMINI_API_KEY + ) if not api_key: missing_key = "SECONDARY_GEMINI_API_KEY" if secondary else "GEMINI_API_KEY" raise ConfigurationError(f"{missing_key} not found in .env.") @@ -139,6 +141,5 @@ def get_embeddings(): ) return GoogleGenerativeAIEmbeddings( - model="models/gemini-embedding-001", - google_api_key=settings.GEMINI_API_KEY + model="models/gemini-embedding-001", google_api_key=settings.GEMINI_API_KEY ) diff --git a/src/core/state.py b/src/core/state.py index 4c4b6f1..9e0a93b 100644 --- a/src/core/state.py +++ b/src/core/state.py @@ -22,7 +22,7 @@ def get_last_ingested_message_id(channel_id: str) -> Optional[str]: """Get the last ingested message ID from Qdrant Cloud.""" client = get_qdrant_client() point_id = _get_cursor_id(channel_id) - + try: results = client.retrieve( collection_name=settings.COLLECTION_NAME, @@ -34,7 +34,7 @@ def get_last_ingested_message_id(channel_id: str) -> Optional[str]: return results[0].payload.get("message_id") except Exception as e: logger.debug(f"[state] Failed to retrieve cursor for {channel_id}: {e}") - + return None @@ -47,7 +47,7 @@ def update_last_ingested_message_id(channel_id: str, message_id: str) -> None: """Update or create the last ingested message ID in Qdrant Cloud.""" client = get_qdrant_client() point_id = _get_cursor_id(channel_id) - + # We store cursors as special points with no vectors (or zero vectors) # and a 'source=ingestion_cursor' tag to easily filter them out if needed. client.upsert( @@ -55,9 +55,9 @@ def update_last_ingested_message_id(channel_id: str, message_id: str) -> None: points=[ qdrant_models.PointStruct( id=point_id, - vector={}, # Empty dict for collections with named vectors? - # For our standard COSINE collection, we might need a dummy vector - # or just use payload. + vector={}, # Empty dict for collections with named vectors? + # For our standard COSINE collection, we might need a dummy vector + # or just use payload. payload={ "metadata": { "channel_id": channel_id, diff --git a/src/core/vector_store.py b/src/core/vector_store.py index 559d642..aaf2dc6 100644 --- a/src/core/vector_store.py +++ b/src/core/vector_store.py @@ -18,14 +18,14 @@ def _ensure_collection_with_indexes() -> None: """Ensure the Qdrant collection exists with proper indexes for filtering. - + This runs only once to avoid repeated API calls. If collection doesn't exist, creates it with proper vector config from embeddings. """ global _indexes_ensured if _indexes_ensured: return - + client = get_qdrant_client() collection_name = settings.COLLECTION_NAME @@ -43,7 +43,7 @@ def _ensure_collection_with_indexes() -> None: # Get the actual vector size from the embedding model test_vector = embedding.embed_query("test") vector_size = len(test_vector) - + client.create_collection( collection_name=collection_name, vectors_config=qdrant_models.VectorParams( @@ -51,7 +51,9 @@ def _ensure_collection_with_indexes() -> None: distance=qdrant_models.Distance.COSINE, ), ) - logger.info(f"Created collection {collection_name} with vector size {vector_size}") + logger.info( + f"Created collection {collection_name} with vector size {vector_size}" + ) except Exception as e: logger.warning(f"Could not create collection: {e}") # Let LangChain's QdrantVectorStore handle it @@ -69,7 +71,7 @@ def _ensure_collection_with_indexes() -> None: ) except Exception as e: logger.debug(f"Index creation for {field_name}: {e}") - + # Enable Full-Text Index on page_content for keyword/BM25 search (hybrid search) try: logger.info("Creating full-text index on page_content") @@ -86,7 +88,7 @@ def _ensure_collection_with_indexes() -> None: ) except Exception as e: logger.warning(f"Full-text index creation for page_content: {e}") - + _indexes_ensured = True @@ -109,11 +111,13 @@ def get_vector_store() -> QdrantVectorStore: """Return a vector store backed by the singleton Qdrant client.""" client = get_qdrant_client() collection_name = settings.COLLECTION_NAME - + # Debug: Get collection info to understand the vector config try: collection_info = client.get_collection(collection_name) - logger.info(f"Collection {collection_name} vectors: {collection_info.vectors_config}") + logger.info( + f"Collection {collection_name} vectors: {collection_info.vectors_config}" + ) except Exception as e: logger.warning(f"Could not get collection info: {e}") @@ -139,10 +143,9 @@ def _close_qdrant_client_on_exit() -> None: # Import hybrid search functions from separate module -from core.hybrid_search import ( +from core.hybrid_search import ( # noqa: E402 hybrid_search, get_hybrid_retriever, - _reciprocal_rank_fusion, ) __all__ = ["hybrid_search", "get_hybrid_retriever"] diff --git a/src/services/qa_service.py b/src/services/qa_service.py index d859aa9..76830c1 100644 --- a/src/services/qa_service.py +++ b/src/services/qa_service.py @@ -7,7 +7,7 @@ from langchain_core.documents import Document from langchain_core.output_parsers import StrOutputParser -from langchain_core.runnables import RunnableLambda, RunnablePassthrough +from langchain_core.runnables import RunnableLambda from core.config import settings from core.hybrid_search import get_hybrid_retriever @@ -46,8 +46,8 @@ class AskConfigError(AskError): ) # Maximum messages fetched for "today's updates" — keeps the prompt bounded. -_CONTEXT_CHAR_BUDGET = 6_000 -_DOC_CHAR_LIMIT = 1_200 +_CONTEXT_CHAR_BUDGET = 6_000 +_DOC_CHAR_LIMIT = 1_200 _TODAY_MESSAGES_LIMIT = 50 @@ -58,10 +58,14 @@ def _is_missing_collection_error(message: str) -> bool: collection = settings.COLLECTION_NAME.lower() if collection not in lowered: return False - return any(needle in lowered for needle in ("doesn't exist", "does not exist", "not found")) + return any( + needle in lowered for needle in ("doesn't exist", "does not exist", "not found") + ) -def _invoke_with_timeout(chain, question: str, timeout_seconds: int = _ASK_TIMEOUT_SECONDS) -> str: +def _invoke_with_timeout( + chain, question: str, timeout_seconds: int = _ASK_TIMEOUT_SECONDS +) -> str: """Run model invocation with a hard timeout to avoid long-hanging requests.""" executor = ThreadPoolExecutor(max_workers=1) future = executor.submit(chain.invoke, question) @@ -91,10 +95,13 @@ def _format_llm_error(exc: Exception) -> str: "Check your provider billing/quota, wait a bit, then retry." ) - if "api key" in lowered or "permission denied" in lowered or "unauthorized" in lowered: + if ( + "api key" in lowered + or "permission denied" in lowered + or "unauthorized" in lowered + ): return ( - "Invalid API credentials. " - "Verify your provider API key(s) in the .env file." + "Invalid API credentials. Verify your provider API key(s) in the .env file." ) return f"Request failed: {message}" @@ -120,7 +127,9 @@ def _format_context(docs: list[Document]) -> str: if total_chars + len(chunk) > _CONTEXT_CHAR_BUDGET: logger.debug( "[context] Budget exhausted after %d/%d docs (%d chars used).", - i - 1, len(docs), total_chars, + i - 1, + len(docs), + total_chars, ) break @@ -136,6 +145,7 @@ def _normalize_question(question: str) -> str: normalized = pattern.sub(replacement, normalized) return normalized + def _doc_key(doc: Document) -> tuple[str, str, str]: """Stable deduplication key for a retrieved document.""" metadata = doc.metadata or {} @@ -144,7 +154,10 @@ def _doc_key(doc: Document) -> tuple[str, str, str]: content_fp = " ".join((doc.page_content or "").split())[:200] return (source, message_id, content_fp) -def _merge_docs(primary: list[Document], secondary: list[Document], limit: int) -> list[Document]: + +def _merge_docs( + primary: list[Document], secondary: list[Document], limit: int +) -> list[Document]: merged: list[Document] = [] seen: set[tuple[str, str, str]] = set() @@ -323,7 +336,9 @@ def _today_discord_messages(reference_date) -> list[dict]: return results -def _answer_from_latest_message_with_llm(question: str, latest_message: str, llm) -> str: +def _answer_from_latest_message_with_llm( + question: str, latest_message: str, llm +) -> str: """Use the LLM to phrase an answer grounded in the latest Discord message.""" prompt = ( "You are answering a question about the latest Discord message.\n" @@ -336,7 +351,9 @@ def _answer_from_latest_message_with_llm(question: str, latest_message: str, llm return _invoke_with_timeout(chain, prompt) -def _answer_today_updates_with_llm(question: str, messages: list[dict], llm, reference_date: str) -> str: +def _answer_today_updates_with_llm( + question: str, messages: list[dict], llm, reference_date: str +) -> str: rows = [] for item in messages[:8]: metadata = item.get("metadata") or {} @@ -349,8 +366,7 @@ def _answer_today_updates_with_llm(question: str, messages: list[dict], llm, ref "Use only the provided message list and do not invent details.\n" f"Reference date (UTC): {reference_date}\n\n" f"User question: {question}\n\n" - "Messages:\n" - + "\n".join(rows) + "Messages:\n" + "\n".join(rows) ) chain = RunnableLambda(lambda p: _extract_llm_content(llm.invoke(p))) return _invoke_with_timeout(chain, prompt) @@ -433,7 +449,9 @@ def _debug_retrieve(q): docs = _retrieve_with_rewrites(q, retriever, k=6) logger.info(f"[DEBUG] Retrieved {len(docs)} documents for question: {q}") for i, doc in enumerate(docs): - logger.info(f"[DEBUG] Doc {i}: {doc.page_content[:200]}... | metadata: {doc.metadata}") + logger.info( + f"[DEBUG] Doc {i}: {doc.page_content[:200]}... | metadata: {doc.metadata}" + ) return docs return ( diff --git a/tests/api/test_routes.py b/tests/api/test_routes.py index 4c957c1..f8c340d 100644 --- a/tests/api/test_routes.py +++ b/tests/api/test_routes.py @@ -10,7 +10,9 @@ def request(method: str, path: str, **kwargs): async def _send(): transport = httpx.ASGITransport(app=app) - async with httpx.AsyncClient(transport=transport, base_url="http://testserver") as client: + async with httpx.AsyncClient( + transport=transport, base_url="http://testserver" + ) as client: return await client.request(method, path, **kwargs) return asyncio.run(_send()) diff --git a/tests/conftest.py b/tests/conftest.py index dc7a40f..9219c34 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -1,6 +1,7 @@ import pytest from core.config import settings + @pytest.fixture(autouse=True) def mock_env_vars(monkeypatch): """Provide dummy environment variables for all tests to bypass strict Cloud-only checks.""" diff --git a/tests/unit/test_audit.py b/tests/unit/test_audit.py new file mode 100644 index 0000000..499dc38 --- /dev/null +++ b/tests/unit/test_audit.py @@ -0,0 +1,152 @@ +import asyncio +from types import SimpleNamespace + +from core import audit +from core import ingestion + + +def _message(message_id: str, content: str) -> dict: + return { + "id": message_id, + "content": content, + "author": {"username": "alice"}, + "timestamp": f"2026-03-10T08:0{message_id[-1]}:00+00:00", + "embeds": [], + } + + +def _point(message_id: str, channel_id: str, content: str): + return SimpleNamespace( + payload={ + "page_content": f"[alice] {content}", + "metadata": { + "channel_id": channel_id, + "message_id": message_id, + "source": "discord", + }, + } + ) + + +def test_startup_audit_does_not_delete_messages_older_than_audited_window(monkeypatch): + channel_id = "chan-1" + deleted_ids: list[str] = [] + + cursor_msg = _message("1005", "cursor") + recent_messages = [ + _message("1004", "recent-1"), + _message("1003", "recent-2"), + ] + qdrant_points = [ + _point("1002", channel_id, "older-outside-window"), + _point("1003", channel_id, "recent-2"), + _point("1004", channel_id, "recent-1"), + _point("1005", channel_id, "cursor"), + ] + + class FakeClient: + def scroll(self, **kwargs): + filter_obj = kwargs.get("scroll_filter") + conditions = getattr(filter_obj, "must", []) if filter_obj else [] + message_ids = [ + getattr(getattr(cond, "match", None), "value", None) + for cond in conditions + if getattr(cond, "key", None) == "metadata.message_id" + ] + if message_ids == ["1005"]: + return [_point("1005", channel_id, "cursor")], None + return qdrant_points, None + + def count(self, **_kwargs): + return SimpleNamespace(count=len(qdrant_points)) + + async def fake_fetch_message_by_id(**_kwargs): + return cursor_msg + + async def fake_fetch_channel_messages(**_kwargs): + return recent_messages + + monkeypatch.setattr( + audit, "get_last_ingested_message_id", lambda _channel_id: "1005" + ) + monkeypatch.setattr(audit, "fetch_message_by_id", fake_fetch_message_by_id) + monkeypatch.setattr(audit, "fetch_channel_messages", fake_fetch_channel_messages) + monkeypatch.setattr(audit, "ensure_channel_in_state", lambda _channel_id: None) + monkeypatch.setattr(ingestion, "_bootstrap_collection", lambda: None) + monkeypatch.setattr(ingestion, "get_qdrant_client", lambda: FakeClient()) + monkeypatch.setattr( + ingestion, + "delete_message_from_store", + lambda _channel_id, message_id: deleted_ids.append(message_id) or 1, + ) + monkeypatch.setattr( + ingestion, "update_message_in_store", lambda *_args, **_kwargs: 0 + ) + + summary = asyncio.run(audit.run_startup_audit([channel_id], window=2)) + + assert deleted_ids == [] + assert summary[channel_id]["deleted"] == 0 + assert summary[channel_id]["updated"] == 0 + + +def test_startup_audit_deletes_missing_message_within_audited_window(monkeypatch): + channel_id = "chan-1" + deleted_ids: list[str] = [] + + cursor_msg = _message("1005", "cursor") + recent_messages = [ + _message("1003", "recent-2"), + _message("1002", "recent-3"), + ] + qdrant_points = [ + _point("1002", channel_id, "recent-3"), + _point("1003", channel_id, "recent-2"), + _point("1004", channel_id, "deleted-offline"), + _point("1005", channel_id, "cursor"), + ] + + class FakeClient: + def scroll(self, **kwargs): + filter_obj = kwargs.get("scroll_filter") + conditions = getattr(filter_obj, "must", []) if filter_obj else [] + message_ids = [ + getattr(getattr(cond, "match", None), "value", None) + for cond in conditions + if getattr(cond, "key", None) == "metadata.message_id" + ] + if message_ids == ["1005"]: + return [_point("1005", channel_id, "cursor")], None + return qdrant_points, None + + def count(self, **_kwargs): + return SimpleNamespace(count=len(qdrant_points)) + + async def fake_fetch_message_by_id(**_kwargs): + return cursor_msg + + async def fake_fetch_channel_messages(**_kwargs): + return recent_messages + + monkeypatch.setattr( + audit, "get_last_ingested_message_id", lambda _channel_id: "1005" + ) + monkeypatch.setattr(audit, "fetch_message_by_id", fake_fetch_message_by_id) + monkeypatch.setattr(audit, "fetch_channel_messages", fake_fetch_channel_messages) + monkeypatch.setattr(audit, "ensure_channel_in_state", lambda _channel_id: None) + monkeypatch.setattr(ingestion, "_bootstrap_collection", lambda: None) + monkeypatch.setattr(ingestion, "get_qdrant_client", lambda: FakeClient()) + monkeypatch.setattr( + ingestion, + "delete_message_from_store", + lambda _channel_id, message_id: deleted_ids.append(message_id) or 1, + ) + monkeypatch.setattr( + ingestion, "update_message_in_store", lambda *_args, **_kwargs: 0 + ) + + summary = asyncio.run(audit.run_startup_audit([channel_id], window=2)) + + assert deleted_ids == ["1004"] + assert summary[channel_id]["deleted"] == 1 + assert summary[channel_id]["updated"] == 0 diff --git a/tests/unit/test_discord_listener.py b/tests/unit/test_discord_listener.py index 9e99c8b..12885f2 100644 --- a/tests/unit/test_discord_listener.py +++ b/tests/unit/test_discord_listener.py @@ -7,9 +7,8 @@ """ import asyncio -from unittest.mock import MagicMock, patch +from unittest.mock import MagicMock -import pytest from core import discord_listener @@ -34,10 +33,13 @@ # handle_message_create — channel filtering # --------------------------------------------------------------------------- + def test_handle_message_create_ignores_wrong_channel(monkeypatch): """Messages in unwatched channels must not trigger ingestion.""" called = [] - monkeypatch.setattr(discord_listener, "ingest_documents", lambda docs: called.append(docs) or 0) + monkeypatch.setattr( + discord_listener, "ingest_documents", lambda docs: called.append(docs) or 0 + ) asyncio.run( discord_listener.handle_message_create( @@ -103,6 +105,7 @@ def test_handle_message_create_skips_blank_content(monkeypatch): # _message_to_dict — shape contract # --------------------------------------------------------------------------- + def test_message_to_dict_produces_expected_shape(): """_message_to_dict must produce the same dict shape as the REST API returns.""" from datetime import datetime, timezone @@ -136,9 +139,11 @@ def test_message_to_dict_produces_expected_shape(): # run_discord_listener — guard clauses # --------------------------------------------------------------------------- + def test_run_discord_listener_returns_early_without_token(caplog): """Missing token must log an error and return without connecting.""" import logging + with caplog.at_level(logging.ERROR, logger="core.discord_listener"): asyncio.run(discord_listener.run_discord_listener("", ["111"])) assert any("DISCORD_BOT_TOKEN" in r.message for r in caplog.records) @@ -147,6 +152,7 @@ def test_run_discord_listener_returns_early_without_token(caplog): def test_run_discord_listener_returns_early_without_channels(caplog): """Missing channel list must log an error and return without connecting.""" import logging + with caplog.at_level(logging.ERROR, logger="core.discord_listener"): asyncio.run(discord_listener.run_discord_listener("some-token", [])) assert any("No channel IDs" in r.message for r in caplog.records) diff --git a/tests/unit/test_hybrid_search.py b/tests/unit/test_hybrid_search.py index 227a7e1..77829cb 100644 --- a/tests/unit/test_hybrid_search.py +++ b/tests/unit/test_hybrid_search.py @@ -6,14 +6,13 @@ from langchain_core.documents import Document from core.hybrid_search import hybrid_search -from core.config import settings def test_hybrid_search_returns_specific_message_as_top_hit(): """Test that searching for a specific Discord Message ID returns correct result as top hit.""" - + # Sample documents with known message IDs - docs = [ + [ Document( page_content="[user1] Hello world", metadata={ @@ -21,8 +20,8 @@ def test_hybrid_search_returns_specific_message_as_top_hit(): "channel_id": "123", "message_id": "456", "author": "user1", - "timestamp": "2026-03-18T10:00:00" - } + "timestamp": "2026-03-18T10:00:00", + }, ), Document( page_content="[user2] Another message", @@ -30,9 +29,9 @@ def test_hybrid_search_returns_specific_message_as_top_hit(): "source": "discord", "channel_id": "123", "message_id": "789", - "author": "user2", - "timestamp": "2026-03-18T10:01:00" - } + "author": "user2", + "timestamp": "2026-03-18T10:01:00", + }, ), Document( page_content="[user3] Third message with ID-123456 in content", @@ -41,11 +40,11 @@ def test_hybrid_search_returns_specific_message_as_top_hit(): "channel_id": "123", "message_id": "999", "author": "user3", - "timestamp": "2026-03-18T10:02:00" - } + "timestamp": "2026-03-18T10:02:00", + }, ), ] - + # Mock the Qdrant client to return our test documents mock_hit1 = MagicMock() mock_hit1.payload = { @@ -55,10 +54,10 @@ def test_hybrid_search_returns_specific_message_as_top_hit(): "channel_id": "123", "message_id": "456", "author": "user1", - "timestamp": "2026-03-18T10:00:00" - } + "timestamp": "2026-03-18T10:00:00", + }, } - + mock_hit2 = MagicMock() mock_hit2.payload = { "page_content": "[user2] Another message", @@ -67,10 +66,10 @@ def test_hybrid_search_returns_specific_message_as_top_hit(): "channel_id": "123", "message_id": "789", "author": "user2", - "timestamp": "2026-03-18T10:01:00" - } + "timestamp": "2026-03-18T10:01:00", + }, } - + mock_hit3 = MagicMock() mock_hit3.payload = { "page_content": "[user3] Third message with ID-123456 in content", @@ -79,25 +78,24 @@ def test_hybrid_search_returns_specific_message_as_top_hit(): "channel_id": "123", "message_id": "999", "author": "user3", - "timestamp": "2026-03-18T10:02:00" - } + "timestamp": "2026-03-18T10:02:00", + }, } - # Mock query_points response - returns an object with .points attribute mock_result = MagicMock() mock_result.points = [mock_hit1, mock_hit2, mock_hit3] - + mock_client = MagicMock() mock_client.query_points.return_value = mock_result - + mock_embeddings = MagicMock() mock_embeddings.embed_query.return_value = [0.1] * 3072 - - with patch('core.hybrid_search._get_qdrant_client', return_value=mock_client): - with patch('core.hybrid_search.get_embeddings', return_value=mock_embeddings): + + with patch("core.hybrid_search._get_qdrant_client", return_value=mock_client): + with patch("core.hybrid_search.get_embeddings", return_value=mock_embeddings): # Search for message with ID 456 - BM25 will search metadata.message_id results = hybrid_search(query="456", k=3) - + # Verify the top result is the document with message_id 456 assert len(results) > 0 assert results[0].metadata["message_id"] == "456" @@ -105,7 +103,7 @@ def test_hybrid_search_returns_specific_message_as_top_hit(): def test_hybrid_search_with_keyword_match(): """Test that keyword search finds message with specific text.""" - + mock_hit = MagicMock() mock_hit.payload = { "page_content": "[user3] Third message with ID-123456 in content", @@ -114,25 +112,25 @@ def test_hybrid_search_with_keyword_match(): "channel_id": "123", "message_id": "999", "author": "user3", - "timestamp": "2026-03-18T10:02:00" - } + "timestamp": "2026-03-18T10:02:00", + }, } - + # Mock query_points response mock_result = MagicMock() mock_result.points = [mock_hit] - + mock_client = MagicMock() mock_client.query_points.return_value = mock_result - + mock_embeddings = MagicMock() mock_embeddings.embed_query.return_value = [0.1] * 3072 - - with patch('core.hybrid_search._get_qdrant_client', return_value=mock_client): - with patch('core.hybrid_search.get_embeddings', return_value=mock_embeddings): + + with patch("core.hybrid_search._get_qdrant_client", return_value=mock_client): + with patch("core.hybrid_search.get_embeddings", return_value=mock_embeddings): # Search for specific text results = hybrid_search(query="ID-123456", k=1) - + assert len(results) > 0 assert "ID-123456" in results[0].page_content diff --git a/tests/unit/test_ingestion.py b/tests/unit/test_ingestion.py index f21e4b4..262eea8 100644 --- a/tests/unit/test_ingestion.py +++ b/tests/unit/test_ingestion.py @@ -24,7 +24,9 @@ def test_docs_from_discord_messages_skips_blank_content_and_appends_embeds(): docs = ingestion._docs_from_discord_messages(messages, channel_id="chan-1") assert len(docs) == 1 - assert docs[0].page_content == "[alice] Standup in 10 minutes\nAgenda\nSprint review" + assert ( + docs[0].page_content == "[alice] Standup in 10 minutes\nAgenda\nSprint review" + ) assert docs[0].metadata == { "source": "discord", "channel_id": "chan-1", @@ -35,7 +37,9 @@ def test_docs_from_discord_messages_skips_blank_content_and_appends_embeds(): def test_ingest_documents_bootstraps_missing_collection(monkeypatch): - documents = [Document(page_content="Release planning", metadata={"source": "json_file"})] + documents = [ + Document(page_content="Release planning", metadata={"source": "json_file"}) + ] bootstrapped = {} class FakeClient: @@ -88,12 +92,18 @@ def cache_clear(self): assert count == 1 assert bootstrapped["vector_size"] == 4 - assert getter.cache_clears == 2 # cache cleared twice: once for index check, once after bootstrap + assert ( + getter.cache_clears == 2 + ) # cache cleared twice: once for index check, once after bootstrap assert working_store.added == documents -def test_ingest_documents_bootstraps_missing_collection_when_qdrant_returns_404(monkeypatch): - documents = [Document(page_content="Release planning", metadata={"source": "json_file"})] +def test_ingest_documents_bootstraps_missing_collection_when_qdrant_returns_404( + monkeypatch, +): + documents = [ + Document(page_content="Release planning", metadata={"source": "json_file"}) + ] bootstrapped = {} class FakeClient: @@ -151,11 +161,15 @@ def cache_clear(self): assert count == 1 assert bootstrapped["vector_size"] == 4 - assert getter.cache_clears == 2 # cache cleared twice: once for index check, once after bootstrap + assert ( + getter.cache_clears == 2 + ) # cache cleared twice: once for index check, once after bootstrap assert working_store.added == documents -def test_ingest_documents_ignores_missing_collection_during_duplicate_check(monkeypatch): +def test_ingest_documents_ignores_missing_collection_during_duplicate_check( + monkeypatch, +): documents = [ Document( page_content="Release planning", @@ -167,7 +181,7 @@ def test_ingest_documents_ignores_missing_collection_during_duplicate_check(monk class FakeClient: def collection_exists(self, name: str) -> bool: return False # Simulate missing collection - + def count(self, **_kwargs): raise UnexpectedResponse( status_code=404, @@ -222,7 +236,9 @@ def cache_clear(self): assert count == 1 assert bootstrapped["vector_size"] == 4 - assert getter.cache_clears == 2 # cache cleared twice: once for index check, once after bootstrap + assert ( + getter.cache_clears == 2 + ) # cache cleared twice: once for index check, once after bootstrap assert working_store.added == documents @@ -231,8 +247,12 @@ def test_ingest_from_discord_collects_successes_and_permission_errors(monkeypatc monkeypatch.setattr(ingestion.settings, "DISCORD_CHANNEL_IDS", "denied,open") # Simulate first run: no prior cursor for any channel - monkeypatch.setattr(ingestion, "get_last_ingested_message_id", lambda channel_id: None) - monkeypatch.setattr(ingestion, "update_last_ingested_message_id", lambda channel_id, msg_id: None) + monkeypatch.setattr( + ingestion, "get_last_ingested_message_id", lambda channel_id: None + ) + monkeypatch.setattr( + ingestion, "update_last_ingested_message_id", lambda channel_id, msg_id: None + ) async def fake_fetch_channel_messages( bot_token: str, @@ -241,10 +261,12 @@ async def fake_fetch_channel_messages( after=None, ): assert bot_token == "discord-token" - assert limit is None # bootstrap always passes limit=None now - assert after is None # no cursor on first run + assert limit is None # bootstrap always passes limit=None now + assert after is None # no cursor on first run if channel_id == "denied": - raise DiscordFetchError(channel_id=channel_id, status_code=403, message="forbidden") + raise DiscordFetchError( + channel_id=channel_id, status_code=403, message="forbidden" + ) return [ { "id": "m1", @@ -254,7 +276,9 @@ async def fake_fetch_channel_messages( } ] - monkeypatch.setattr(ingestion, "fetch_channel_messages", fake_fetch_channel_messages) + monkeypatch.setattr( + ingestion, "fetch_channel_messages", fake_fetch_channel_messages + ) monkeypatch.setattr(ingestion, "ingest_documents", lambda docs: len(docs)) result = asyncio.run(ingestion.ingest_from_discord()) diff --git a/tests/unit/test_qa_service.py b/tests/unit/test_qa_service.py index 5c782f5..f3cc2c9 100644 --- a/tests/unit/test_qa_service.py +++ b/tests/unit/test_qa_service.py @@ -90,7 +90,9 @@ def invoke(self, query): return [original_doc] return [normalized_doc] - docs = qa_service._retrieve_with_rewrites("whats the we have today", FakeRetriever(), k=6) + docs = qa_service._retrieve_with_rewrites( + "whats the we have today", FakeRetriever(), k=6 + ) assert len(docs) == 2 assert docs[0].page_content == "original result" @@ -109,8 +111,21 @@ def test_augment_temporal_question_keeps_non_temporal_query(): def test_today_updates_query_uses_llm_summary(monkeypatch): monkeypatch.setattr(qa_service, "_is_today_updates_query", lambda _q: True) - monkeypatch.setattr(qa_service, "_today_discord_messages", lambda reference_date: [{"metadata": {"timestamp": "2026-03-09T12:00:00+00:00", "author": "u"}, "page_content": "Meeting at 2pm"}]) - monkeypatch.setattr(qa_service, "_invoke_with_timeout", lambda chain, question, timeout_seconds=30: chain.invoke(question)) + monkeypatch.setattr( + qa_service, + "_today_discord_messages", + lambda reference_date: [ + { + "metadata": {"timestamp": "2026-03-09T12:00:00+00:00", "author": "u"}, + "page_content": "Meeting at 2pm", + } + ], + ) + monkeypatch.setattr( + qa_service, + "_invoke_with_timeout", + lambda chain, question, timeout_seconds=30: chain.invoke(question), + ) class FakeLLM: def invoke(self, _prompt): @@ -136,7 +151,9 @@ class FakeRetriever: def invoke(self, _query): return [shared_doc] - docs = qa_service._retrieve_with_rewrites("whats the we have today", FakeRetriever(), k=6) + docs = qa_service._retrieve_with_rewrites( + "whats the we have today", FakeRetriever(), k=6 + ) assert len(docs) == 1 assert docs[0].page_content == "same result" @@ -144,8 +161,14 @@ def invoke(self, _query): def test_today_updates_without_today_messages_falls_back_to_latest(monkeypatch): monkeypatch.setattr(qa_service, "_is_today_updates_query", lambda _q: True) - monkeypatch.setattr(qa_service, "_today_discord_messages", lambda reference_date: []) - monkeypatch.setattr(qa_service, "_latest_discord_message", lambda: "Latest Discord message:\n- content: fallback") + monkeypatch.setattr( + qa_service, "_today_discord_messages", lambda reference_date: [] + ) + monkeypatch.setattr( + qa_service, + "_latest_discord_message", + lambda: "Latest Discord message:\n- content: fallback", + ) monkeypatch.setattr( qa_service, "_invoke_with_timeout", @@ -162,7 +185,6 @@ def invoke(self, _prompt): assert answer == "Fallback latest summary" - def test_ask_question_recency_query_returns_latest_message_when_llm_fails(monkeypatch): latest_message = "Latest Discord message:\n- content: fallback" @@ -179,12 +201,16 @@ def test_ask_question_recency_query_returns_latest_message_when_llm_fails(monkey def test_ask_question_errors_when_qdrant_is_locked(monkeypatch): - monkeypatch.setattr(qa_service, "get_llm", lambda: RunnableLambda(lambda _prompt: "ignored")) + monkeypatch.setattr( + qa_service, "get_llm", lambda: RunnableLambda(lambda _prompt: "ignored") + ) monkeypatch.setattr( qa_service, "get_hybrid_retriever", lambda k=6: (_ for _ in ()).throw( - RuntimeError("Storage already accessed by another instance of Qdrant client") + RuntimeError( + "Storage already accessed by another instance of Qdrant client" + ) ), ) @@ -202,7 +228,9 @@ def invoke(self, _query): def fake_hybrid_retriever(k=6): return FakeRetriever() - monkeypatch.setattr(qa_service, "get_llm", lambda: RunnableLambda(lambda _prompt: "ignored")) + monkeypatch.setattr( + qa_service, "get_llm", lambda: RunnableLambda(lambda _prompt: "ignored") + ) monkeypatch.setattr(qa_service, "get_hybrid_retriever", fake_hybrid_retriever) monkeypatch.setattr( qa_service, diff --git a/uv.lock b/uv.lock index 96a5dd5..50beb53 100644 --- a/uv.lock +++ b/uv.lock @@ -1324,6 +1324,7 @@ dependencies = [ [package.dev-dependencies] dev = [ { name = "pytest" }, + { name = "ruff" }, ] [package.metadata] @@ -1343,7 +1344,10 @@ requires-dist = [ ] [package.metadata.requires-dev] -dev = [{ name = "pytest", specifier = ">=8.3.0" }] +dev = [ + { name = "pytest", specifier = ">=8.3.0" }, + { name = "ruff", specifier = ">=0.9.3" }, +] [[package]] name = "marshmallow" @@ -2327,6 +2331,31 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/64/8d/0133e4eb4beed9e425d9a98ed6e081a55d195481b7632472be1af08d2f6b/rsa-4.9.1-py3-none-any.whl", hash = "sha256:68635866661c6836b8d39430f97a996acbd61bfa49406748ea243539fe239762", size = 34696, upload-time = "2025-04-16T09:51:17.142Z" }, ] +[[package]] +name = "ruff" +version = "0.15.7" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/a1/22/9e4f66ee588588dc6c9af6a994e12d26e19efbe874d1a909d09a6dac7a59/ruff-0.15.7.tar.gz", hash = "sha256:04f1ae61fc20fe0b148617c324d9d009b5f63412c0b16474f3d5f1a1a665f7ac", size = 4601277, upload-time = "2026-03-19T16:26:22.605Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/41/2f/0b08ced94412af091807b6119ca03755d651d3d93a242682bf020189db94/ruff-0.15.7-py3-none-linux_armv6l.whl", hash = "sha256:a81cc5b6910fb7dfc7c32d20652e50fa05963f6e13ead3c5915c41ac5d16668e", size = 10489037, upload-time = "2026-03-19T16:26:32.47Z" }, + { url = "https://files.pythonhosted.org/packages/91/4a/82e0fa632e5c8b1eba5ee86ecd929e8ff327bbdbfb3c6ac5d81631bef605/ruff-0.15.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:722d165bd52403f3bdabc0ce9e41fc47070ac56d7a91b4e0d097b516a53a3477", size = 10955433, upload-time = "2026-03-19T16:27:00.205Z" }, + { url = "https://files.pythonhosted.org/packages/ab/10/12586735d0ff42526ad78c049bf51d7428618c8b5c467e72508c694119df/ruff-0.15.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:7fbc2448094262552146cbe1b9643a92f66559d3761f1ad0656d4991491af49e", size = 10269302, upload-time = "2026-03-19T16:26:26.183Z" }, + { url = "https://files.pythonhosted.org/packages/eb/5d/32b5c44ccf149a26623671df49cbfbd0a0ae511ff3df9d9d2426966a8d57/ruff-0.15.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b39329b60eba44156d138275323cc726bbfbddcec3063da57caa8a8b1d50adf", size = 10607625, upload-time = "2026-03-19T16:27:03.263Z" }, + { url = "https://files.pythonhosted.org/packages/5d/f1/f0001cabe86173aaacb6eb9bb734aa0605f9a6aa6fa7d43cb49cbc4af9c9/ruff-0.15.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:87768c151808505f2bfc93ae44e5f9e7c8518943e5074f76ac21558ef5627c85", size = 10324743, upload-time = "2026-03-19T16:27:09.791Z" }, + { url = "https://files.pythonhosted.org/packages/7a/87/b8a8f3d56b8d848008559e7c9d8bf367934d5367f6d932ba779456e2f73b/ruff-0.15.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:fb0511670002c6c529ec66c0e30641c976c8963de26a113f3a30456b702468b0", size = 11138536, upload-time = "2026-03-19T16:27:06.101Z" }, + { url = "https://files.pythonhosted.org/packages/e4/f2/4fd0d05aab0c5934b2e1464784f85ba2eab9d54bffc53fb5430d1ed8b829/ruff-0.15.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e0d19644f801849229db8345180a71bee5407b429dd217f853ec515e968a6912", size = 11994292, upload-time = "2026-03-19T16:26:48.718Z" }, + { url = "https://files.pythonhosted.org/packages/64/22/fc4483871e767e5e95d1622ad83dad5ebb830f762ed0420fde7dfa9d9b08/ruff-0.15.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4806d8e09ef5e84eb19ba833d0442f7e300b23fe3f0981cae159a248a10f0036", size = 11398981, upload-time = "2026-03-19T16:26:54.513Z" }, + { url = "https://files.pythonhosted.org/packages/b0/99/66f0343176d5eab02c3f7fcd2de7a8e0dd7a41f0d982bee56cd1c24db62b/ruff-0.15.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dce0896488562f09a27b9c91b1f58a097457143931f3c4d519690dea54e624c5", size = 11242422, upload-time = "2026-03-19T16:26:29.277Z" }, + { url = "https://files.pythonhosted.org/packages/5d/3a/a7060f145bfdcce4c987ea27788b30c60e2c81d6e9a65157ca8afe646328/ruff-0.15.7-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:1852ce241d2bc89e5dc823e03cff4ce73d816b5c6cdadd27dbfe7b03217d2a12", size = 11232158, upload-time = "2026-03-19T16:26:42.321Z" }, + { url = "https://files.pythonhosted.org/packages/a7/53/90fbb9e08b29c048c403558d3cdd0adf2668b02ce9d50602452e187cd4af/ruff-0.15.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5f3e4b221fb4bd293f79912fc5e93a9063ebd6d0dcbd528f91b89172a9b8436c", size = 10577861, upload-time = "2026-03-19T16:26:57.459Z" }, + { url = "https://files.pythonhosted.org/packages/2f/aa/5f486226538fe4d0f0439e2da1716e1acf895e2a232b26f2459c55f8ddad/ruff-0.15.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:b15e48602c9c1d9bdc504b472e90b90c97dc7d46c7028011ae67f3861ceba7b4", size = 10327310, upload-time = "2026-03-19T16:26:35.909Z" }, + { url = "https://files.pythonhosted.org/packages/99/9e/271afdffb81fe7bfc8c43ba079e9d96238f674380099457a74ccb3863857/ruff-0.15.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:1b4705e0e85cedc74b0a23cf6a179dbb3df184cb227761979cc76c0440b5ab0d", size = 10840752, upload-time = "2026-03-19T16:26:45.723Z" }, + { url = "https://files.pythonhosted.org/packages/bf/29/a4ae78394f76c7759953c47884eb44de271b03a66634148d9f7d11e721bd/ruff-0.15.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:112c1fa316a558bb34319282c1200a8bf0495f1b735aeb78bfcb2991e6087580", size = 11336961, upload-time = "2026-03-19T16:26:39.076Z" }, + { url = "https://files.pythonhosted.org/packages/26/6b/8786ba5736562220d588a2f6653e6c17e90c59ced34a2d7b512ef8956103/ruff-0.15.7-py3-none-win32.whl", hash = "sha256:6d39e2d3505b082323352f733599f28169d12e891f7dd407f2d4f54b4c2886de", size = 10582538, upload-time = "2026-03-19T16:26:15.992Z" }, + { url = "https://files.pythonhosted.org/packages/2b/e9/346d4d3fffc6871125e877dae8d9a1966b254fbd92a50f8561078b88b099/ruff-0.15.7-py3-none-win_amd64.whl", hash = "sha256:4d53d712ddebcd7dace1bc395367aec12c057aacfe9adbb6d832302575f4d3a1", size = 11755839, upload-time = "2026-03-19T16:26:19.897Z" }, + { url = "https://files.pythonhosted.org/packages/8f/e8/726643a3ea68c727da31570bde48c7a10f1aa60eddd628d94078fec586ff/ruff-0.15.7-py3-none-win_arm64.whl", hash = "sha256:18e8d73f1c3fdf27931497972250340f92e8c861722161a9caeb89a58ead6ed2", size = 11023304, upload-time = "2026-03-19T16:26:51.669Z" }, +] + [[package]] name = "sniffio" version = "1.3.1"