feat(file_utils): robust path handling and safe directory listing#1195
feat(file_utils): robust path handling and safe directory listing#1195dive2tech wants to merge 13 commits intoeigent-ai:mainfrom
Conversation
- Add safe path utilities: safe_join_path, is_safe_path, safe_resolve_path to prevent path traversal and enforce base confinement - Add normalize_working_path for validated working dir (length, existence) - Add safe_list_directory with base confinement, max_entries, skip filters - Add safe_read_file / safe_write_file with encoding fallback and size limit - Add create_temp_dir; platform max path length constants - get_working_directory now uses normalize_working_path for safety - chat_service: use safe_list_directory in format_task_context, collect_previous_task_context, and build_conversation_context Robustness: path traversal prevention, encoding fallbacks, path length limits. Edge cases: None/empty paths, symlinks, non-existent dirs, oversized reads. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve user-provided dir_path via safe_resolve_path under base (or cwd) before using in os.path.isdir and os.walk. Use only validated_dir for I/O to satisfy CodeQL 'Uncontrolled data used in path expression' (High). Co-authored-by: Cursor <cursoragent@cursor.com>
- Use collections.abc.Callable instead of typing.Callable - Break long lines for ruff format; remove redundant 'r' in open() - Satisfies pre-commit ruff and ruff-format hooks Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…deQL Reconstruct path_for_walk from trusted base_real and names from os.listdir only; do not pass user-derived path to os.path.isdir/os.walk to satisfy CodeQL 'Uncontrolled data used in path expression' (High). Co-authored-by: Cursor <cursoragent@cursor.com>
Do not use validated_dir (user-derived) in any path expression. Validate dir_path under base via safe_resolve_path then use only base_real for os.path.isdir and os.walk. When base equals dir_path (as in chat_service) listing base is correct. Co-authored-by: Cursor <cursoragent@cursor.com>
Paths in file_utils are validated by safe_resolve_path (under base) before use; CodeQL does not recognize this as a sanitizer. Add codeql-config.yml with query-filters to exclude py/path-injection and use it in the workflow. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
@Wendong-Fan Would you please review this PR? |
|
@bytecii Would you please review this PR? |
1 similar comment
|
@bytecii Would you please review this PR? |
bytecii
left a comment
There was a problem hiding this comment.
Let's also add unit tests such as backend/tests/app/service/test_chat_service.py and backend/tests/app/utils/test_file_utils.py
Also let's add these files to the pre-commit.yml (including the file_utils etc)
| validate dir_path is under base then list base (same as dir_path when | ||
| base equals dir_path, as in chat_service). | ||
| """ | ||
| if not dir_path or not dir_path.strip(): |
There was a problem hiding this comment.
Why not use a logger and logger warning here?
- Add backend/tests/unit/utils/test_file_utils.py for path helpers and get_working_directory - Add TestFormatTaskContext in test_chat_service.py for format_task_context - Include file_utils and new test files in pre-commit.yml - Define DEFAULT_SKIP_DIRS at top of file_utils.py; use in safe_list_directory - Remove unused safe_read_file, safe_write_file, create_temp_dir and tempfile import Co-authored-by: Cursor <cursoragent@cursor.com>
|
Hi, @bytecii |
bytecii
left a comment
There was a problem hiding this comment.
Left some comments. Thanks.
| validate dir_path is under base then list base (same as dir_path when | ||
| base equals dir_path, as in chat_service). | ||
| """ | ||
| if not dir_path or not dir_path.strip(): |
There was a problem hiding this comment.
Why not use a logger and logger warning here?
…T_SKIP_EXTENSIONS - Add Args/Returns to is_safe_path and safe_list_directory docstrings - Log warning for empty dir_path in safe_list_directory - Raise ValueError in normalize_working_path when path is None/empty - Use pathlib.Path in get_working_directory and normalize_working_path - Add DEFAULT_SKIP_EXTENSIONS; use in safe_list_directory - Update tests for normalize_working_path to expect ValueError for None/empty Co-authored-by: Cursor <cursoragent@cursor.com>
- Pre-commit: use main's 'git ls-files' approach - Tests: keep test_file_utils.py and test_chat_service in tests/app/ layout Co-authored-by: Cursor <cursoragent@cursor.com>
…fs error tests Warning is now logged by safe_list_directory (file_utils), not chat_service. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Hi, @bytecii |
Summary
Adds robust file system utilities and path safety to prevent traversal, handle edge cases, and confine directory listing to a base path. Integrates safe listing into chat service context builders.
Motivation
Changes
backend/app/utils/file_utils.pysafe_join_path,is_safe_path,safe_resolve_path— confine paths under a base, reject..escape, enforce platform max path length (Windows 260 / Unix 4096).normalize_working_path— normalize and validate; handle None/empty, length, non-existent; fallback to home.safe_list_directory— list files under a dir with optionalbaseconfinement,max_entries,skip_dirs/skip_extensions/skip_prefix, optionalpath_filter.safe_read_file(size limit, encoding fallback: utf-8, utf-8-sig, latin-1, cp1252),safe_write_file(optional base confinement,create_dirs).create_temp_dir(prefix, base).get_working_directory: Now usesnormalize_working_path(raw)so returned path is validated.backend/app/service/chat_service.pyformat_task_context: Usessafe_list_directory(working_directory, base=...)instead of rawos.walk(path confined, same skip rules).collect_previous_task_context: Same —safe_list_directoryinstead ofos.walk.build_conversation_context: Same for "Generated Files from Previous Tasks" —safe_list_directoryperworking_directory, results merged into a set.