fix(docker): resolve 8 Docker deployment bugs#1926
Open
mrveiss wants to merge 8 commits intoDev_new_guifrom
Open
fix(docker): resolve 8 Docker deployment bugs#1926mrveiss wants to merge 8 commits intoDev_new_guifrom
mrveiss wants to merge 8 commits intoDev_new_guifrom
Conversation
Backend connects to ChromaDB for RAG/vector operations but only had depends_on for Redis. Without this, ChromaDB startup race causes connection errors on first boot.
SSOT config expects AUTOBOT_OLLAMA_HOST as a bare hostname (e.g.
"autobot-ollama") but docker-compose and .env.docker set it to a full
URL ("http://autobot-ollama:11434"). Code that constructs URLs from
the host field produces malformed "http://http://..." URLs.
Fix: set HOST to hostname only, add separate OLLAMA_PORT and
OLLAMA_ENDPOINT vars for direct URL use.
AUTOBOT_LLAMAINDEX_LLM_ENDPOINT and AUTOBOT_LLAMAINDEX_EMBEDDING_ENDPOINT default to http://127.0.0.1:11434 in SSOT config. In Docker, localhost is the container itself — Ollama is on the Docker network at autobot-ollama. Without these vars, all knowledge/RAG operations fail.
SLM git_tracker service errors every 5 minutes because git is not installed in the Docker image. Adding git to the runtime deps package list (~50MB) eliminates the log noise.
Frontend Dockerfile hardcoded VITE_BACKEND_HOST=autobot-backend which is a Docker-internal hostname unreachable from browsers. The frontend already supports proxy mode (empty host = relative URLs through nginx), so clearing the build args lets nginx handle API routing.
Backend logs warnings about missing llm_models.yaml and permission_rules.yaml on Docker startup. These files live in autobot-infrastructure/shared/config/ which isn't copied into the backend image. Add COPY instructions to place them in the backend config directory.
docker-compose.yml had no Celery worker — async tasks (Ansible deploys, code indexing, background jobs) queued in Redis but never executed. Adds autobot-worker service using the backend image with celery worker command, listening on all defined queues (celery, deployments, provisioning, services) with concurrency=2.
On first boot, SLM API calls fail with 500 errors because tables don't exist yet. While the SLM lifespan already runs migrations, an explicit entrypoint ensures migrations complete before uvicorn starts accepting connections. Also increases start_period from 30s to 60s and retries from 3 to 5 to accommodate first-boot migration time.
This was referenced Mar 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batch fix for 8 Docker deployment bugs discovered during #1809 review. Ordered from quick wins to most complex:
depends_on: autobot-chromadbto backend (prevents RAG startup race)AUTOBOT_OLLAMA_HOSTfrom full URL to hostname-only (prevents malformedhttp://http://...URLs)AUTOBOT_LLAMAINDEX_LLM_ENDPOINTandAUTOBOT_LLAMAINDEX_EMBEDDING_ENDPOINT(RAG defaults to localhost without these)gitpackage to SLM Docker image (eliminates git_tracker log spam)llm_models.yamlandpermission_rules.yamlinto backend image (fixes missing config warnings)autobot-workerCelery service to docker-compose (async tasks now actually execute)Files changed
docker-compose.yml— ChromaDB depends_on, OLLAMA_HOST fix, LlamaIndex vars, Celery worker servicedocker/.env.docker— OLLAMA_HOST hostname fix, LlamaIndex endpoint varsdocker/backend/Dockerfile— Copy shared config filesdocker/frontend/Dockerfile— Clear Vite build args for proxy modedocker/slm/Dockerfile— Add git, migration entrypoint, healthcheck tuningdocker/slm/entrypoint.sh— New: runs migrations before app startTest plan
docker compose configvalidates without errorsdocker compose buildsucceeds for all servicesdocker compose up -dstarts all services including workerhttp://http://prefix)Closes #1910, closes #1908, closes #1909, closes #1877, closes #1895, closes #1879, closes #1892, closes #1893