Skip to content

feat(file_utils): robust path handling and safe directory listing#1195

Open
dive2tech wants to merge 13 commits intoeigent-ai:mainfrom
dive2tech:feat/file-utils-robustness-safe-paths
Open

feat(file_utils): robust path handling and safe directory listing#1195
dive2tech wants to merge 13 commits intoeigent-ai:mainfrom
dive2tech:feat/file-utils-robustness-safe-paths

Conversation

@dive2tech
Copy link
Contributor

@dive2tech dive2tech commented Feb 9, 2026

Summary

Adds robust file system utilities and path safety to prevent traversal, handle edge cases, and confine directory listing to a base path. Integrates safe listing into chat service context builders.

Motivation

  • Robustness: Path traversal prevention, encoding fallbacks, path length limits, validated working directories.
  • Edge cases: None/empty paths, symlinks (realpath), non-existent dirs, oversized file reads, runaway directory listing.

Changes

backend/app/utils/file_utils.py

  • Path safety: safe_join_path, is_safe_path, safe_resolve_path — confine paths under a base, reject .. escape, enforce platform max path length (Windows 260 / Unix 4096).
  • Working dir: normalize_working_path — normalize and validate; handle None/empty, length, non-existent; fallback to home.
  • Directory listing: safe_list_directory — list files under a dir with optional base confinement, max_entries, skip_dirs / skip_extensions / skip_prefix, optional path_filter.
  • File I/O: safe_read_file (size limit, encoding fallback: utf-8, utf-8-sig, latin-1, cp1252), safe_write_file (optional base confinement, create_dirs).
  • Temp: create_temp_dir(prefix, base).
  • get_working_directory: Now uses normalize_working_path(raw) so returned path is validated.

backend/app/service/chat_service.py

  • format_task_context: Uses safe_list_directory(working_directory, base=...) instead of raw os.walk (path confined, same skip rules).
  • collect_previous_task_context: Same — safe_list_directory instead of os.walk.
  • build_conversation_context: Same for "Generated Files from Previous Tasks" — safe_list_directory per working_directory, results merged into a set.

- Add safe path utilities: safe_join_path, is_safe_path, safe_resolve_path
  to prevent path traversal and enforce base confinement
- Add normalize_working_path for validated working dir (length, existence)
- Add safe_list_directory with base confinement, max_entries, skip filters
- Add safe_read_file / safe_write_file with encoding fallback and size limit
- Add create_temp_dir; platform max path length constants
- get_working_directory now uses normalize_working_path for safety
- chat_service: use safe_list_directory in format_task_context,
  collect_previous_task_context, and build_conversation_context

Robustness: path traversal prevention, encoding fallbacks, path length limits.
Edge cases: None/empty paths, symlinks, non-existent dirs, oversized reads.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve user-provided dir_path via safe_resolve_path under base (or cwd)
before using in os.path.isdir and os.walk. Use only validated_dir for I/O
to satisfy CodeQL 'Uncontrolled data used in path expression' (High).

Co-authored-by: Cursor <cursoragent@cursor.com>
- Use collections.abc.Callable instead of typing.Callable
- Break long lines for ruff format; remove redundant 'r' in open()
- Satisfies pre-commit ruff and ruff-format hooks

Co-authored-by: Cursor <cursoragent@cursor.com>
dive2tech and others added 2 commits February 9, 2026 05:56
Co-authored-by: Cursor <cursoragent@cursor.com>
…deQL

Reconstruct path_for_walk from trusted base_real and names from os.listdir
only; do not pass user-derived path to os.path.isdir/os.walk to satisfy
CodeQL 'Uncontrolled data used in path expression' (High).

Co-authored-by: Cursor <cursoragent@cursor.com>
Do not use validated_dir (user-derived) in any path expression. Validate
dir_path under base via safe_resolve_path then use only base_real for
os.path.isdir and os.walk. When base equals dir_path (as in chat_service)
listing base is correct.

Co-authored-by: Cursor <cursoragent@cursor.com>
dive2tech and others added 2 commits February 9, 2026 06:12
Paths in file_utils are validated by safe_resolve_path (under base) before
use; CodeQL does not recognize this as a sanitizer. Add codeql-config.yml
with query-filters to exclude py/path-injection and use it in the workflow.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@dive2tech
Copy link
Contributor Author

dive2tech commented Feb 10, 2026

@Wendong-Fan Would you please review this PR?

@dive2tech
Copy link
Contributor Author

@bytecii Would you please review this PR?

1 similar comment
@dive2tech
Copy link
Contributor Author

@bytecii Would you please review this PR?

Copy link
Collaborator

@bytecii bytecii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add unit tests such as backend/tests/app/service/test_chat_service.py and backend/tests/app/utils/test_file_utils.py
Also let's add these files to the pre-commit.yml (including the file_utils etc)

validate dir_path is under base then list base (same as dir_path when
base equals dir_path, as in chat_service).
"""
if not dir_path or not dir_path.strip():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log an warning here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a logger and logger warning here?

- Add backend/tests/unit/utils/test_file_utils.py for path helpers and get_working_directory
- Add TestFormatTaskContext in test_chat_service.py for format_task_context
- Include file_utils and new test files in pre-commit.yml
- Define DEFAULT_SKIP_DIRS at top of file_utils.py; use in safe_list_directory
- Remove unused safe_read_file, safe_write_file, create_temp_dir and tempfile import

Co-authored-by: Cursor <cursoragent@cursor.com>
@dive2tech
Copy link
Contributor Author

Hi, @bytecii
Thanks for your feedback.
I just define in constant values, remove unused functions and make unit test file.
I am looking forward to hear from your opinion.
Thanks.

Copy link
Collaborator

@bytecii bytecii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Thanks.

validate dir_path is under base then list base (same as dir_path when
base equals dir_path, as in chat_service).
"""
if not dir_path or not dir_path.strip():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a logger and logger warning here?

dive2tech and others added 3 commits February 14, 2026 15:39
…T_SKIP_EXTENSIONS

- Add Args/Returns to is_safe_path and safe_list_directory docstrings
- Log warning for empty dir_path in safe_list_directory
- Raise ValueError in normalize_working_path when path is None/empty
- Use pathlib.Path in get_working_directory and normalize_working_path
- Add DEFAULT_SKIP_EXTENSIONS; use in safe_list_directory
- Update tests for normalize_working_path to expect ValueError for None/empty

Co-authored-by: Cursor <cursoragent@cursor.com>
- Pre-commit: use main's 'git ls-files' approach
- Tests: keep test_file_utils.py and test_chat_service in tests/app/ layout

Co-authored-by: Cursor <cursoragent@cursor.com>
…fs error tests

Warning is now logged by safe_list_directory (file_utils), not chat_service.

Co-authored-by: Cursor <cursoragent@cursor.com>
@dive2tech
Copy link
Contributor Author

Hi, @bytecii
I just carefully checked your feedback and fixed.
Please review my PR.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants