Skip to content

feat: Migrate to UBI 9 base images for enterprise compliance and security improvements#1738

Open
lucaseduoli wants to merge 4 commits into
mainfrom
fix/ubi9_images
Open

feat: Migrate to UBI 9 base images for enterprise compliance and security improvements#1738
lucaseduoli wants to merge 4 commits into
mainfrom
fix/ubi9_images

Conversation

@lucaseduoli
Copy link
Copy Markdown
Collaborator

@lucaseduoli lucaseduoli commented Jun 2, 2026

This pull request introduces major updates to the Dockerfiles, deployment scripts, and dependency management across the backend, frontend, and Langflow components to improve enterprise compatibility, security, and OpenShift readiness. The changes focus on migrating to Red Hat UBI 9 minimal images, enforcing non-root user execution, upgrading dependencies for security, and improving container build and runtime practices.

Containerization and Security Enhancements

  • Migrated all Dockerfiles (Dockerfile.backend, Dockerfile.frontend, Dockerfile.langflow) from Debian/Ubuntu-based images to Red Hat UBI 9 minimal images for better enterprise and OpenShift compatibility, and updated build steps to use microdnf instead of apt-get. [1] [2] [3]
  • Enforced non-root user execution in containers by creating and using specific users/groups, setting correct file ownership during COPY, and pre-creating writable directories with OpenShift-safe permissions. [1] [2]
  • Applied security updates and CVE remediations in both system and Node.js packages, including targeted upgrades for vulnerable dependencies (e.g., debug, glob-parent, minimatch, semver in Node.js, and system libraries like glibc, openssl). [1] [2]

Dependency and Compatibility Updates

  • Downgraded Python version requirement from 3.13 to 3.12 throughout the project for compatibility with UBI 9 and updated related tool configurations (.python-version, pyproject.toml, Dockerfiles). [1] [2] [3] [4] [5]
  • Updated Python and Node.js dependencies to address security, compatibility, and feature requirements, including new minimum versions for agentd, python-multipart, and additional packages like authlib, protobuf, idna, and urllib3. [1] [2] [3] [4]

Langflow and SQLite Improvements

  • Built and installed SQLite 3.46.0 from source in the Langflow image to meet minimum version requirements not available in UBI 9, and switched to direct installation of Langflow and its dependencies with CPU-only PyTorch wheels.
  • Enhanced the Langflow entrypoint script to ensure proper permissions for data directories, handle both root and non-root execution contexts, and set up environment variables for OpenShift and containerized environments. [1] [2] [3]
  • Updated the Langflow Kubernetes deployment to set required environment variables for correct runtime behavior and OpenShift compatibility.

Build and Ignore File Adjustments

  • Updated .dockerignore to allow package.json and package-lock.json into the build context, and improved exclusion patterns for build artifacts. [1] [2]

Entrypoint and Script Refinements

  • Replaced gosu with runuser in backend entrypoint for privilege dropping, and improved environment variable handling for non-root execution.

These changes collectively modernize the build and runtime environments for OpenRAG, increase security, and ensure compatibility with enterprise and OpenShift platforms.

Summary by CodeRabbit

  • Chores
    • Updated Python support to 3.12, aligned tooling and dependency constraints.
    • Reworked container builds, base images, runtime permissions, and startup behavior for backend, frontend, and Langflow components.
    • Pinned frontend transitive packages and updated tooling versions.
    • Adjusted Helm deployment startup args and environment settings.
  • Bug Fixes
    • Ignored build artifacts and ensured package files are respected by container tooling.
  • Tests
    • Increased integration test timeouts for longer-running document workflows.
  • Other
    • Clarified an SDK request payload construction.

…rity improvements (#1714)

* feat: migrate to UBI 9 base images for enterprise compliance

* fix: update scripts/backend-entrypoint.sh for UBI9 compliance

* fix: changed microdnf to dnf for ubi9 compatability

* fix: Added --allowerasing flag to resolve curl-minimal conflict

* fix: updated files

* fix: mitigate backend and frontend CVEs by tightening dependency versions and removing runtime pip

* fix: remediate frontend CVE dependency overrides and harden backend runtime image

* fix: remediate dependency CVEs and reduce runtime image surface

* fix: harden backend and frontend runtime images for CVE remediation

* fix: apply comprehensive CVE fixes for backend and frontend containers

* fix: explicitly copy package-lock.json to resolve npm ci build failure

* fix: allow package.json and package-lock.json in Docker build context

* fix: resolve CVEs across backend and frontend containers

* fix: resolve CVE's

* fix: resolve OSS CVE's

* fix: resolve frontend CVEs

* fix: upgrade system packages and nodemon dependencies to resolve issues

* fix: add pysqlite3-binary to resolve SQLite version incompatibility

* fix: resolved sqlite error

* Update scripts/langflow-entrypoint.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix: fixed path issues

* Update Dockerfile.langflow

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix: resolve UBI9 mem0 permission crash, preserve entrypoint, and pin base images

* fix: UBI9 mem0 crash

* fix: resolved dotenv issue

* Fixed permission and wheel issue with langflow and backend

* style: ruff autofix (auto)

* changed langflow to 1.9.6.rc0

* fix image version

* Reduce images size

---------

Co-authored-by: Gautham N Pai <gauthamnpai@ibm.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Lucas Oliveira <lucas.edu.oli@hotmail.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1f4094f2-d6bb-4603-81f5-fa9e53a38d49

📥 Commits

Reviewing files that changed from the base of the PR and between c714a14 and 70f26ee.

⛔ Files ignored due to path filters (1)
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (1)
  • frontend/package.json

Walkthrough

Migrates the project to Python 3.12 and modernizes backend, frontend, and Langflow containers to UBI9 multi-stage builds; updates entrypoints (privilege handling), adjusts dependency constraints and tooling targets, expands container ignore rules, and updates Helm args/env for Langflow.

Changes

Python 3.12 and UBI9 Container Migration

Layer / File(s) Summary
Python 3.12 baseline and dependencies
.python-version, pyproject.toml
Python requirement changed from 3.13 → 3.12; dependency constraints updated (agentd, protobuf, python-multipart, idna, urllib3); Ruff and mypy targets aligned to py312.
Backend containerization (UBI9, multi-stage, entrypoint)
Dockerfile.backend, scripts/backend-entrypoint.sh, .dockerignore
Backend Dockerfile switched to UBI9 python-312-minimal with BuildKit/microdnf multi-stage build, pip upgrade, --chown copy of artifacts, pre-chown writable dirs, and entrypoint changed from gosu→runuser with explicit VIRTUAL_ENV/PATH. .dockerignore updated to preserve package.json/package-lock.json and ignore **/.next/.
Frontend containerization (UBI9, multi-stage)
Dockerfile.frontend, frontend/package.json
Frontend converted to deps/builder/runtime UBI9 Node.js multi-stage build: pins transitive packages, runs npm ci and npm run build, prunes dev deps, sets production env (NODE_ENV, disables Next telemetry), and copies only built artifacts with non-root runtime user. package.json devDependency (postcss) and overrides expanded/bumped.
Langflow containerization (UBI9, SQLite, multi-stage, entrypoint, Helm)
Dockerfile.langflow, scripts/langflow-entrypoint.py, kubernetes/helm/openrag/templates/langflow/deployment.yaml
Langflow rebuilt as UBI9 multi-stage image that compiles SQLite, installs pinned Langflow + CPU-only PyTorch wheels, pre-creates /app/data with OpenShift-safe permissions, bakes runtime env vars, conditionally creates uid/gid 1000, updates entrypoint to chmod/chown /app/data and create HOME if missing, and Helm Deployment switches to args with explicit env settings.
TypeScript SDK small change
sdks/typescript/src/documents.ts, sdks/typescript/tests/integration.test.ts
DocumentsClient.delete now uses bracket-notation (body["filename"], body["filter_id"]) to populate request body keys; integration test timeouts increased to 120_000 ms.

Sequence Diagram(s)

sequenceDiagram
  participant Builder
  participant Runtime
  participant Entrypoint
  Builder->>Runtime: copy built artifacts (with --chown)
  Runtime->>Entrypoint: install entrypoint, pre-create & chown dirs, set ENV
  Entrypoint->>Runtime: fix perms/chown, create HOME, drop privileges if root, exec app
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • edwinjosechittilappilly
  • mfortman11
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly summarizes the main objective of the changeset: migrating to UBI 9 base images for enterprise compliance and security improvements, which aligns with all major changes across Dockerfiles, deployment configurations, and dependency updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ubi9_images

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added frontend 🟨 Issues related to the UI/UX docker enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 2, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Dockerfile.frontend`:
- Around line 14-18: The nodemon CVE remediation is being applied only in the
deps stage, so it does not affect the shipped runtime image. Update the
Dockerfile.frontend logic around the nodemon install/removal block to either
remove it entirely if it is just scanner noise, or move the fix into the runtime
stage where the final image is built. Make sure the remediation targets the
actual nodemon package paths present in the image and avoid masking failures
with || true so broken or no-op patches are visible.

In `@Dockerfile.langflow`:
- Around line 20-26: Replace the direct curl|tar pipeline with a safe
download-and-verify flow: curl -o the tarball (sqlite-autoconf-3460000.tar.gz)
and also fetch the corresponding SHA3-HASH file from the SQLite site, then
verify the tarball's SHA3-256 digest against the official SHA3-HASH (using a
tool that supports SHA3-256, e.g., sha3sum or openssl dgst -sha3-256) before
extracting; only run ./configure/make/install and rm -rf sqlite-autoconf-3460000
after the checksum passes, and fail the build if verification fails.

In `@kubernetes/helm/openrag/templates/langflow/deployment.yaml`:
- Around line 53-61: Replace the hardcoded /app/data values for the environment
variables HOME, XDG_CACHE_HOME, MEM0_DIR, and LANGFLOW_CONFIG_DIR in the
Langflow container spec with the Helm chart value for the data mount path (use
the existing langflow.persistence.mountPath value used elsewhere in the chart);
update the env entries (names HOME, XDG_CACHE_HOME, MEM0_DIR,
LANGFLOW_CONFIG_DIR) to reference that template variable so they follow any
overridden PVC mountPath at deploy time.

In `@scripts/langflow-entrypoint.py`:
- Around line 19-23: The code hardcodes data_dir = pathlib.Path("/app/data") and
unconditionally attempts shutil.chown and overwrites HOME/USER even when not
dropping privileges; change this to respect arbitrary UIDs by (1) using an
environment-provided writable path fallback (e.g.,
os.environ.get("APP_DATA_DIR") or keep existing default but don't force
ownership), (2) only perform shutil.chown(data_dir, ...) and chmod when running
as root (check os.geteuid() == 0) and when the target uid/gid exist, and (3)
only set os.environ["HOME"] and os.environ["USER"] after you actually drop
privileges (i.e., inside the branch where you perform the privilege change),
leaving runtime-provided HOME/USER untouched otherwise; reference the data_dir,
shutil.chown, chmod calls and the HOME/USER environment assignments to locate
changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 25706348-1651-4417-8093-2a02ff32993e

📥 Commits

Reviewing files that changed from the base of the PR and between 17e8a19 and c28b2d0.

⛔ Files ignored due to path filters (2)
  • frontend/package-lock.json is excluded by !**/package-lock.json
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • .dockerignore
  • .python-version
  • Dockerfile.backend
  • Dockerfile.frontend
  • Dockerfile.langflow
  • frontend/package.json
  • kubernetes/helm/openrag/templates/langflow/deployment.yaml
  • pyproject.toml
  • scripts/backend-entrypoint.sh
  • scripts/langflow-entrypoint.py

Comment thread Dockerfile.frontend
Comment on lines +14 to +18
RUN cd /usr/lib/node_modules/nodemon 2>/dev/null && npm install debug@4.3.7 glob-parent@5.1.2 minimatch@10.2.3 semver@7.5.4 --save 2>/dev/null || true \
&& rm -rf /usr/lib/node_modules/nodemon/node_modules/debug-3.0.1 2>/dev/null || true \
&& rm -rf /usr/lib/node_modules/nodemon/node_modules/glob-parent-3.0.1 2>/dev/null || true \
&& rm -rf /usr/lib/node_modules/nodemon/node_modules/minimatch-3.0.1 2>/dev/null || true \
&& rm -rf /usr/lib/node_modules/nodemon/node_modules/semver-3.0.1 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether the minimal runtime base even contains nodemon, and inspect the layout the patch assumes.
fd -t f 'Dockerfile.frontend' --exec rg -n 'nodejs-20|nodemon|node_modules' {}
# Heuristic: npm uses flat dirs (node_modules/debug), not versioned (debug-3.0.1).
rg -n 'node_modules/.*-3\.0\.1' -g 'Dockerfile.frontend'

Repository: langflow-ai/openrag

Length of output: 1481


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the full Dockerfile.frontend around deps/builder/runtime + nodemon mentions
echo "=== Dockerfile.frontend (relevant excerpts) ==="
nl -ba Dockerfile.frontend | sed -n '1,120p' | cat

# 2) Find every nodemon occurrence in the repo (esp. package.json / lockfiles)
echo "=== nodemon occurrences ==="
rg -n "nodemon" -S . --glob '!**/node_modules/**' --glob '!**/.git/**' --glob '!**/dist/**'

Repository: langflow-ai/openrag

Length of output: 172


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Dockerfile.frontend (with line numbers, first 120 lines) ==="
cat -n Dockerfile.frontend | sed -n '1,120p'

echo "=== nodemon occurrences (excluding node_modules/dist/.git) ==="
rg -n "nodemon" -S . \
  --glob '!**/node_modules/**' \
  --glob '!**/dist/**' \
  --glob '!**/.git/**'

Repository: langflow-ai/openrag

Length of output: 4135


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect frontend deps for nodemon
if [ -f frontend/package.json ]; then
  echo "=== frontend/package.json (nodemon mentions) ==="
  rg -n "nodemon" frontend/package.json || true
fi

# Inspect lockfile for nodemon
for f in frontend/package-lock.json package-lock.json; do
  if [ -f "$f" ]; then
    echo "=== $f (nodemon mentions) ==="
    rg -n "nodemon" "$f" || true
  fi
done

Repository: langflow-ai/openrag

Length of output: 166


🌐 Web query:

Does the Red Hat UBI9 Node.js 20 minimal image (registry.access.redhat.com/ubi9/nodejs-20-minimal) include nodemon by default (as a global npm package under /usr/lib/node_modules)?

💡 Result:

No, the Red Hat UBI9 Node.js 20 minimal image (registry.access.redhat.com/ubi9/nodejs-20-minimal) does not include nodemon by default [1][2][3]. The UBI minimal images are designed to be extremely lightweight, production-ready runtime environments that contain only the necessary dependencies to run a Node.js application, excluding build-time tools, development dependencies, and extra global packages like nodemon [2][4]. While some Red Hat errata advisories mention nodejs-nodemon as a separate package available within the broader RHEL/UBI software ecosystem [5][6][7], it is not pre-installed in the nodejs-20-minimal container image. If you require nodemon for your development workflow, it should be installed as a devDependencies in your project's package.json [8] or installed within your Dockerfile if needed, although it is standard practice to exclude such development tools from final production images [4][9].

Citations:


Fix/relocate the nodemon CVE remediation so it affects the shipped runtime image

  • The patch modifies system nodemon in the deps stage (/usr/lib/node_modules/nodemon), but the final runtime image is based on registry.access.redhat.com/ubi9/nodejs-20-minimal, which does not include global nodemon; the app also doesn’t list nodemon in frontend/package*.json, so this block won’t change the runtime image CVE posture.
  • The rm -rf /usr/lib/node_modules/nodemon/node_modules/<pkg>-3.0.1 ... || true removals target version-suffixed paths and mask failures, so even if nodemon were present they can silently do nothing.

Either remove the block if it’s only for scanner appeasement, or move the remediation into the runtime stage (and patch the actual directories/packages present there) without || true.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.frontend` around lines 14 - 18, The nodemon CVE remediation is
being applied only in the deps stage, so it does not affect the shipped runtime
image. Update the Dockerfile.frontend logic around the nodemon install/removal
block to either remove it entirely if it is just scanner noise, or move the fix
into the runtime stage where the final image is built. Make sure the remediation
targets the actual nodemon package paths present in the image and avoid masking
failures with || true so broken or no-op patches are visible.

Comment thread Dockerfile.langflow
Comment on lines +20 to +26
# Install SQLite >= 3.35.0 (UBI9 has 3.34.1) from source
RUN curl -L https://www.sqlite.org/2024/sqlite-autoconf-3460000.tar.gz | tar -xz \
&& cd sqlite-autoconf-3460000 \
&& ./configure --prefix=/usr --libdir=/usr/lib64 \
&& make -j$(nproc) install \
&& cd .. \
&& rm -rf sqlite-autoconf-3460000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

sqlite-autoconf-3460000.tar.gz sha256 checksum

💡 Result:

The SHA256 checksum for sqlite-autoconf-3460000.tar.gz is 6f8e6a7b335273748816f9b3b62bbdc372a889de8782d7f048c653a447417a7d [1]. This checksum has been verified in third-party supply chain and package management records [1].

Citations:


🌐 Web query:

sqlite-autoconf-3460000.tar.gz SHA256 site:sqlite.org

💡 Result:

The official SQLite download page publishes SHA3-256 hashes for all distributed files, rather than SHA256 hashes [1][2]. For the file sqlite-autoconf-3460000.tar.gz specifically, an official SHA256 hash is not provided by the SQLite project, as they have standardized on SHA3-256 for integrity verification of their distribution artifacts [1][3]. If you are looking for the integrity verification value for that specific version's autoconf package, you should consult the official SQLite download page (sqlite.org/download.html) and locate the SHA3-256 hash associated with that file [1]. Be aware that SQLite occasionally updates distribution files (such as precompiled binaries) if errors are found, so always verify the hash against the current official source [4]. For automated verification, the SQLite download page contains an embedded HTML comment with a CSV table (containing headers: PRODUCT,VERSION,RELATIVE-URL,SIZE-IN-BYTES,SHA3-HASH) that can be parsed to programmatically retrieve the correct, up-to-date SHA3-256 hash for any current release [5][6].

Citations:


🌐 Web query:

sqlite-autoconf-3460000.tar.gz SHA3-256

💡 Result:

The SHA3-256 hash for the sqlite-autoconf-3460000.tar.gz file is 83d2acf79453deb7d6520338b1f4585f12e39b27cd370fb08593afa198f471fc [1][2]. This hash is publicly cited in technical documentation and build logs, such as those maintained by supply chain security services tracking the sqlite3-ruby gem, to verify the integrity of the source archive [1][2].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
ls -la
sed -n '1,120p' Dockerfile.langflow
echo '---'
nl -ba Dockerfile.langflow | sed -n '1,120p'
echo '---'
nl -ba Dockerfile.langflow | sed -n '120,240p'

Repository: langflow-ai/openrag

Length of output: 6508


🏁 Script executed:

#!/bin/bash
set -euo pipefail

url="https://sqlite.org/download.html"
tmp="/tmp/sqlite-download.html"

curl -fsSL "$url" -o "$tmp"

# Extract the embedded CSV rows (SQLite hides it in an HTML comment on the page)
# Then find the row for sqlite-autoconf-3460000.tar.gz and print VERSION + SHA3-HASH.
python3 - <<'PY'
import re, csv, sys, pathlib
html = pathlib.Path("/tmp/sqlite-download.html").read_text(errors="ignore")

# Pull the CSV table from an HTML comment if present
# Example markers we’ve seen: "<!--" ... "PRODUCT,VERSION,RELATIVE-URL,SIZE-IN-BYTES,SHA3-HASH" ... "-->"
m = re.search(r'<!--.*?PRODUCT,VERSION,RELATIVE-URL,SIZE-IN-BYTES,SHA3-HASH.*?-->',
              html, flags=re.S)
if not m:
    # Fallback: maybe CSV is not in comment on this variant of the page
    m = re.search(r'PRODUCT,VERSION,RELATIVE-URL,SIZE-IN-BYTES,SHA3-HASH.*', html, flags=re.S)

block = m.group(0)

# Extract the CSV lines starting from header until blank line (best-effort)
header = "PRODUCT,VERSION,RELATIVE-URL,SIZE-IN-BYTES,SHA3-HASH"
start = block.find(header)
if start == -1:
    print("Could not find CSV header in extracted block", file=sys.stderr)
    sys.exit(1)

csv_text = block[start:]
# Trim any trailing non-CSV garbage
csv_text = csv_text.split("-->")[0] if "-->" in csv_text else csv_text
lines = [ln for ln in csv_text.splitlines() if ln.strip()]

# Parse CSV with python's CSV reader
reader = csv.reader(lines)
rows = list(reader)

# Skip until header row
hdr_idx = None
for i,r in enumerate(rows):
    if r and r[0] == "PRODUCT":
        hdr_idx = i
        break
if hdr_idx is None:
    print("Header row not found after CSV extraction", file=sys.stderr)
    sys.exit(1)

rows = rows[hdr_idx+1:]
target = "sqlite-autoconf-3460000.tar.gz"
found = False
for r in rows:
    if len(r) < 5: 
        continue
    relative_url = r[2].strip()
    sha3 = r[4].strip()
    if relative_url.endswith("/"+target) or relative_url.endswith(target) or target in relative_url:
        # VERSION is column 1
        version = r[1].strip()
        print(f"{target} VERSION={version} SHA3-256={sha3}")
        found = True
        break
if not found:
    print(f"Did not find {target} in extracted SQLite download CSV", file=sys.stderr)
    sys.exit(2)
PY

Repository: langflow-ai/openrag

Length of output: 140


🏁 Script executed:

#!/bin/bash
set -euo pipefail

target="sqlite-autoconf-3460000.tar.gz"

curl -fsSL "https://sqlite.org/download.html" -o /tmp/sqlite-download.html

echo "== grep occurrences =="
rg -n --fixed-string "$target" /tmp/sqlite-download.html || true

echo "== show surrounding context (first hit) =="
line=$(rg -n --fixed-string "$target" /tmp/sqlite-download.html | head -n1 | cut -d: -f1 || true)
if [ -n "${line:-}" ]; then
  start=$((line-20))
  end=$((line+20))
  sed -n "${start},${end}p" /tmp/sqlite-download.html
fi

echo "== attempt SHA3 value nearby (first 200 chars after match) =="
if [ -n "${line:-}" ]; then
  # Print the line containing the match and the next 2 lines; hash may be in same row/cell
  sed -n "${line},$((line+2))p" /tmp/sqlite-download.html
fi

Repository: langflow-ai/openrag

Length of output: 1105


🏁 Script executed:

#!/bin/bash
set -euo pipefail

target="sqlite-autoconf-3460000.tar.gz"

curl -fsSL "https://sqlite.org/download.html" -o /tmp/sqlite-download.html

echo "== grep occurrences =="
rg -n -F "$target" /tmp/sqlite-download.html || true

echo "== show surrounding context (first 3 hits) =="
mapfile -t lines < <(rg -n -F "$target" /tmp/sqlite-download.html | head -n 3 | cut -d: -f1)
for line in "${lines[@]}"; do
  start=$((line-30))
  end=$((line+30))
  echo "--- around line $line ---"
  sed -n "${start},${end}p" /tmp/sqlite-download.html | cat
done

echo "== attempt extract SHA3-ish values near matches =="
if [ "${`#lines`[@]}" -gt 0 ]; then
  python3 - <<'PY'
import re, pathlib
html = pathlib.Path("/tmp/sqlite-download.html").read_text(errors="ignore")
target = "sqlite-autoconf-3460000.tar.gz"
for m in re.finditer(re.escape(target), html):
    i = m.start()
    snippet = html[i:i+800]
    # SHA3 hashes in the page are 64 hex chars (sha3-256)
    hashes = re.findall(r'\b[0-9a-f]{64}\b', snippet, flags=re.I)
    print("match at", i, "first_hashes", hashes[:5])
    break
PY
fi

Repository: langflow-ai/openrag

Length of output: 208


🏁 Script executed:

#!/bin/bash
set -euo pipefail

curl -fsSL "https://sqlite.org/download.html" -o /tmp/sqlite-download.html
echo "bytes: $(wc -c </tmp/sqlite-download.html)"
echo "head:"
sed -n '1,60p' /tmp/sqlite-download.html | cat

echo "---- string searches (HTML) ----"
rg -n "SHA3-HASH|PRODUCT,VERSION|sqlite-autoconf|3460000|autoconf" /tmp/sqlite-download.html || true

echo "---- show small context around 3460000 occurrences ----"
python3 - <<'PY'
import pathlib, re
p = pathlib.Path("/tmp/sqlite-download.html")
s = p.read_text(errors="ignore")
for m in re.finditer(r'3460000', s):
    i = m.start()
    print("\n--- occurrence @", i, "---")
    print(s[i-200:i+400])
    break
else:
    print("\nNo 3460000 literal found in HTML text.")
PY

Repository: langflow-ai/openrag

Length of output: 2564


🏁 Script executed:

#!/bin/bash
set -euo pipefail

for url in \
  "https://www.sqlite.org/matrix/download.html" \
  "https://sqlite.org/matrix/download.html" \
  "https://www.sqlite.org/sha3sum.html" \
  "https://sqlite.org/sha3sum.html" \
  "https://www.sqlite.org/2024/sha3sum.html" \
  "https://www.sqlite.org/2024/sha3sum" \
  "https://www.sqlite.org/2024/sha3sum.txt" \
  "https://www.sqlite.org/2024/sha3sum.csv"
do
  echo "==== $url ===="
  f="/tmp/$(echo "$url" | sed 's#https\?://##; s#[/]+#_#g').html"
  if curl -fsSL "$url" -o "$f" 2>/dev/null; then
    echo "downloaded: $(wc -c < "$f") bytes"
    rg -n "sqlite-autoconf-3460000\.tar\.gz|3460000|sqlite-autoconf-3460000" "$f" || true
    rg -n "SHA3-HASH|SHA3" "$f" | head -n 5 || true
  else
    echo "failed"
  fi
done

Repository: langflow-ai/openrag

Length of output: 508


🌐 Web query:

sqlite-autoconf-3460000.tar.gz SHA3-256 SHA3-HASH sqlite.org download

💡 Result:

I could not find an authoritative, up-to-date source on sqlite.org that lists the SHA3-256 hash for the specific file sqlite-autoconf-3460000.tar.gz. 1) The canonical file download URL for a given autoconf tarball is typically https://sqlite.org//sqlite-autoconf-.tar.gz, and mirrors are reachable elsewhere, e.g. a directory index for sqlite3-3460000 shows sqlite-autoconf-3460000.tar.gz but does not expose a SHA3-256 in the snippet I could retrieve [1]. 2) The sqlite.org download pages I found in the search results only clearly show SHA3-256 values for other autoconf tarballs (e.g., sqlite-autoconf-3530100.tar.gz, sqlite-autoconf-3410000.tar.gz) rather than sqlite-autoconf-3460000.tar.gz [2]. If you want, paste the SHA3-256 line you see on your sqlite.org download page (or the specific sqlite.org download.html/matrix page content around sqlite-autoconf-3460000), and I’ll verify it matches the referenced file name/size and show you how to check it locally with sha3sum.

Citations:


Add integrity verification for the SQLite source tarball download (SQLite publishes SHA3-256/SHA3-HASH)

Current Dockerfile pipes curl .../sqlite-autoconf-3460000.tar.gz straight into tar with no integrity check, so a tampered/corrupt archive would be compiled silently. Download the tarball to a file, verify its SHA3-256 against the official SHA3-HASH value for sqlite-autoconf-3460000.tar.gz from the SQLite download page, then extract and clean up.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.langflow` around lines 20 - 26, Replace the direct curl|tar
pipeline with a safe download-and-verify flow: curl -o the tarball
(sqlite-autoconf-3460000.tar.gz) and also fetch the corresponding SHA3-HASH file
from the SQLite site, then verify the tarball's SHA3-256 digest against the
official SHA3-HASH (using a tool that supports SHA3-256, e.g., sha3sum or
openssl dgst -sha3-256) before extracting; only run ./configure/make/install and
rm -rf sqlite-autoconf-3460000 after the checksum passes, and fail the build if
verification fails.

Comment on lines +53 to +61
env:
- name: HOME
value: /app/data
- name: XDG_CACHE_HOME
value: /app/data/.cache
- name: MEM0_DIR
value: /app/data/.mem0
- name: LANGFLOW_CONFIG_DIR
value: /app/data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Template env paths from langflow.persistence.mountPath instead of hardcoding /app/data.

Line 73 already makes the data mount path configurable, but Line 55-Line 61 hardcode /app/data. If mountPath is overridden, HOME/cache/config paths won’t follow the PVC mount.

Suggested fix
           env:
+            - name: LANGFLOW_DATA_DIR
+              value: {{ .Values.langflow.persistence.mountPath | quote }}
             - name: HOME
-              value: /app/data
+              value: {{ .Values.langflow.persistence.mountPath | quote }}
             - name: XDG_CACHE_HOME
-              value: /app/data/.cache
+              value: {{ printf "%s/.cache" .Values.langflow.persistence.mountPath | quote }}
             - name: MEM0_DIR
-              value: /app/data/.mem0
+              value: {{ printf "%s/.mem0" .Values.langflow.persistence.mountPath | quote }}
             - name: LANGFLOW_CONFIG_DIR
-              value: /app/data
+              value: {{ .Values.langflow.persistence.mountPath | quote }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env:
- name: HOME
value: /app/data
- name: XDG_CACHE_HOME
value: /app/data/.cache
- name: MEM0_DIR
value: /app/data/.mem0
- name: LANGFLOW_CONFIG_DIR
value: /app/data
env:
- name: LANGFLOW_DATA_DIR
value: {{ .Values.langflow.persistence.mountPath | quote }}
- name: HOME
value: {{ .Values.langflow.persistence.mountPath | quote }}
- name: XDG_CACHE_HOME
value: {{ printf "%s/.cache" .Values.langflow.persistence.mountPath | quote }}
- name: MEM0_DIR
value: {{ printf "%s/.mem0" .Values.langflow.persistence.mountPath | quote }}
- name: LANGFLOW_CONFIG_DIR
value: {{ .Values.langflow.persistence.mountPath | quote }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kubernetes/helm/openrag/templates/langflow/deployment.yaml` around lines 53 -
61, Replace the hardcoded /app/data values for the environment variables HOME,
XDG_CACHE_HOME, MEM0_DIR, and LANGFLOW_CONFIG_DIR in the Langflow container spec
with the Helm chart value for the data mount path (use the existing
langflow.persistence.mountPath value used elsewhere in the chart); update the
env entries (names HOME, XDG_CACHE_HOME, MEM0_DIR, LANGFLOW_CONFIG_DIR) to
reference that template variable so they follow any overridden PVC mountPath at
deploy time.

Comment on lines +19 to +23
data_dir = pathlib.Path("/app/data")
try:
data_dir.chmod(0o777)
except OSError:
shutil.chown(data_dir, user=1000, group=1000)
except (OSError, PermissionError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid forcing uid-1000 home/path settings when running as arbitrary non-root UID.

Line 19 hardcodes /app/data, and Line 56/Line 57 always overwrite HOME/USER even when Line 51 skips privilege dropping. That can break OpenShift-style arbitrary UID runs and override runtime-provided writable paths.

Suggested fix
-# Ensure data directory is writable by the langflow user
-data_dir = pathlib.Path("/app/data")
+# Keep data path aligned with runtime/chart configuration.
+data_dir = pathlib.Path(os.environ.get("LANGFLOW_DATA_DIR", "/app/data"))
@@
 if os.getuid() == 0:
     os.setgid(1000)
     os.setuid(1000)
-
-# Restore environment variables to reflect the unprivileged user.
-os.environ["HOME"] = home
-os.environ["USER"] = user
+    # Restore env only when we actually switched identities.
+    os.environ["HOME"] = home
+    os.environ["USER"] = user
+else:
+    os.environ.setdefault("HOME", str(data_dir))
+    os.environ.setdefault("USER", str(os.getuid()))

Also applies to: 51-57

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/langflow-entrypoint.py` around lines 19 - 23, The code hardcodes
data_dir = pathlib.Path("/app/data") and unconditionally attempts shutil.chown
and overwrites HOME/USER even when not dropping privileges; change this to
respect arbitrary UIDs by (1) using an environment-provided writable path
fallback (e.g., os.environ.get("APP_DATA_DIR") or keep existing default but
don't force ownership), (2) only perform shutil.chown(data_dir, ...) and chmod
when running as root (check os.geteuid() == 0) and when the target uid/gid
exist, and (3) only set os.environ["HOME"] and os.environ["USER"] after you
actually drop privileges (i.e., inside the branch where you perform the
privilege change), leaving runtime-provided HOME/USER untouched otherwise;
reference the data_dir, shutil.chown, chmod calls and the HOME/USER environment
assignments to locate changes.

@github-actions github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 2, 2026
@github-actions github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 2, 2026
@github-actions github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 2, 2026
Copy link
Copy Markdown
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM awaiting integration test

@github-actions github-actions Bot added the lgtm label Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docker enhancement 🔵 New feature or request frontend 🟨 Issues related to the UI/UX lgtm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants