Skip to content

feat(v2): papaparse for CSV + mammoth for .docx preview#272

Open
samxu01 wants to merge 1 commit intomainfrom
feat/v2-preview-libs
Open

feat(v2): papaparse for CSV + mammoth for .docx preview#272
samxu01 wants to merge 1 commit intomainfrom
feat/v2-preview-libs

Conversation

@samxu01
Copy link
Copy Markdown
Contributor

@samxu01 samxu01 commented May 3, 2026

First of three "artifact polish" PRs queued for review (will not auto-merge).

What this changes

CSV preview — replaces the hand-rolled `split(',')` with papaparse. The previous parser broke on real spreadsheet exports (quoted fields containing commas, escaped quotes, multi-line cells). papaparse is RFC 4180.

DOCX preview — Word documents render inline via mammoth (`mammoth/mammoth.browser`, lazy-imported so inspectors that never open a Word file don't pay the bundle cost). Mammoth produces clean semantic HTML — headings, lists, tables, bold/italic. For pixel-faithful render the user clicks Open and uses Word / Office Online.

Bundle impact

  • papaparse@^5.5.3 + @types/papaparse@^5.5.2 — ~45 KB gzipped, no runtime side effects.
  • mammoth@^1.12.0 — ~200 KB, lazy-loaded via dynamic `import('mammoth/mammoth.browser')` so the cost only lands when a docx artifact is opened.

Skipped (deferred)

Format Why
xlsx / xls SheetJS is ~600 KB. Demo uses CSV instead; revisit if a beta user asks.
ppt / pptx No clean OSS option that's lightweight. PowerPoint inline preview needs server-side render.
odt / ods / odp Same shape as their MS counterparts; covered by the same Office Online viewer follow-up.

Test plan

  • Deploy dev: `gh workflow run deploy-dev.yml --ref feat/v2-preview-libs`
  • Upload a CSV with quoted-comma fields (e.g., `"Smith, John",42`) — confirm cells aren't split mid-field
  • Upload a real .docx with headings/lists/tables — confirm semantic HTML renders
  • Confirm mammoth doesn't load on page load (Network tab shouldn't pull mammoth.browser until a docx artifact is opened)

🤖 Generated with Claude Code

CSV: replace the hand-rolled split-on-comma with papaparse. The previous
parser broke on real spreadsheet exports — quoted fields containing
commas, escaped quotes, multi-line cells. papaparse is RFC 4180.

DOCX: render Word documents inline via mammoth (lazy-imported via
mammoth/mammoth.browser so inspectors that never open a Word file
don't pay the bundle cost). Mammoth produces clean semantic HTML —
headings, lists, tables, bold/italic. For pixel-faithful render the
user clicks Open and uses Word / Office Online.

Skipped (deferred):
- xlsx / xls — SheetJS is ~600 KB and demo-day uses CSV instead.
- ppt / pptx — no clean OSS option that's lightweight.
- odt / ods / odp — same shape as their MS counterparts; covered by
  the same Office Online viewer follow-up.

Bundle additions:
- papaparse@^5.5.3 + @types/papaparse@^5.5.2 (~45 KB gzipped)
- mammoth@^1.12.0 (~200 KB, lazy-loaded — only when a docx is opened)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant