Skip to content

fix: auto-detect CSV delimiter via Sniffer in import_set and detect()#639

Open
terminalchai wants to merge 1 commit into
jazzband:masterfrom
terminalchai:fix/csv-sniffer-auto-detect-delimiter
Open

fix: auto-detect CSV delimiter via Sniffer in import_set and detect()#639
terminalchai wants to merge 1 commit into
jazzband:masterfrom
terminalchai:fix/csv-sniffer-auto-detect-delimiter

Conversation

@terminalchai
Copy link
Copy Markdown

Fixes #622

Problem

tablib.Dataset().load(data, format='csv') always uses , as the delimiter, even when the file uses ;, :, |, or other separators. This is because import_set calls kwargs.setdefault('delimiter', cls.DEFAULT_DELIMITER) without ever sniffing the actual content.

A secondary issue: CSVFormat.detect() passed delimiters=delimiter or cls.DEFAULT_DELIMITER to csv.Sniffer().sniff(), which restricted sniffing to comma-only. This meant non-comma CSV files were never recognised during format auto-detection.

Fix

import_set: when no delimiter kwarg is provided, read up to 2048 bytes as a sample, call csv.Sniffer().sniff(sample) with no delimiter restriction, and use the detected delimiter. The stream is then seeked back to 0 before the actual read. A guard rejects alphabetic/digit detections (the Sniffer occasionally misidentifies a letter on very short or ambiguous samples). Falls back to DEFAULT_DELIMITER on any csv.Error.

detect(): when no explicit delimiter is given, uses a candidate string of common non-tab separators (',;:|') for CSVFormat, and the format's DEFAULT_DELIMITER for subclasses (e.g. TSVFormat uses tab). This keeps tab-delimited files out of CSV auto-detection while still recognising ;, :, and | separated files.

Explicit delimiter kwargs are fully respected and take precedence.

Tests

  • Updated test_csv_formatter_support_kwargs: old assertion documented the broken behaviour (1 header instead of 3); updated to assert correct parsing.
  • Added test_csv_import_set_auto_detect_delimiter: colon- and pipe-delimited CSV auto-detection.
  • Added test_csv_detect_non_comma_delimiter: detect() correctly recognises :, |, and ; separated files.

All 155 tests pass.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 93.18%. Comparing base (564619d) to head (e8c3477).

Files with missing lines Patch % Lines
src/tablib/formats/_csv.py 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #639      +/-   ##
==========================================
+ Coverage   93.14%   93.18%   +0.03%     
==========================================
  Files          29       29              
  Lines        3226     3256      +30     
==========================================
+ Hits         3005     3034      +29     
- Misses        221      222       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simple csv file can't be parsed correctly because of Sniffer() parameters

1 participant