From a2812a8c49722bcaf0d4f9680211fb2f0cd84275 Mon Sep 17 00:00:00 2001 From: Lily Shen Date: Wed, 6 May 2026 10:15:20 -0700 Subject: [PATCH] fix(gateway-doc-toolchain): add lmodern + pandoc PDF build-time self-test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OPENCLAW_INSTALL_DOC_TOOLCHAIN already pulled pandoc + texlive-xetex + texlive-fonts-recommended, but pandoc PDF generation via xelatex needs lmodern.sty (the Latin Modern font set) which lives in a separate Debian package. Without it, every `pandoc input.md -o out.pdf --pdf-engine=xelatex` call fails with 'File lmodern.sty not found' — agents trying to produce PDF deliverables hit a runtime error. Two changes: 1. Add `lmodern` to the apt install list. ~2 MB extra in the toolchain layer; no new build-args. 2. Extend the build-time self-test to actually produce a one-page PDF from markdown and assert it's non-empty. A future regression that removes a required LaTeX package fails the docker build instead of surfacing at agent runtime as a missing-file error. Verified the failure mode on dev: agent producing a brief via pandoc errored with 'File lmodern.sty not found' on image clawdbot-gateway:0db858ec (no lmodern). DOCX/XLSX/PPTX via officecli are unaffected — they don't go through TeX. Co-Authored-By: Claude Opus 4.7 (1M context) --- Dockerfile | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/Dockerfile b/Dockerfile index e16b2f64dc8a..86daef589bf4 100644 --- a/Dockerfile +++ b/Dockerfile @@ -221,7 +221,7 @@ RUN if [ -n "$OPENCLAW_INSTALL_DOC_TOOLCHAIN" ]; then \ apt-get update && \ DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ ca-certificates curl \ - pandoc texlive-xetex texlive-fonts-recommended \ + pandoc texlive-xetex texlive-fonts-recommended lmodern \ poppler-utils python3 python3-pip && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* && \ @@ -249,11 +249,16 @@ RUN if [ -n "$OPENCLAW_INSTALL_DOC_TOOLCHAIN" ]; then \ pip3 install --break-system-packages --no-cache-dir \ markitdown pypdf && \ \ - # Self-test the toolchain so a regression (lost binary, broken pip) - # surfaces at build time, not at agent runtime via "command not found". + # Self-test the toolchain so a regression (lost binary, broken pip, + # missing LaTeX font package) surfaces at build time rather than at + # agent runtime via "command not found" or "lmodern.sty not found". officecli --version && \ pandoc --version | head -1 && \ - python3 -c "import markitdown, pypdf; print('parse-direction OK')"; \ + python3 -c "import markitdown, pypdf; print('parse-direction OK')" && \ + printf '# pandoc smoke test\n\nThis builds a real PDF.\n' > /tmp/pandoc-smoke.md && \ + pandoc /tmp/pandoc-smoke.md -o /tmp/pandoc-smoke.pdf --pdf-engine=xelatex && \ + test -s /tmp/pandoc-smoke.pdf && \ + rm -f /tmp/pandoc-smoke.md /tmp/pandoc-smoke.pdf; \ fi # Normalize extension paths so plugin safety checks do not reject