Skip to content

Fix Samsung Notes stroke offsets#10

Merged
twangodev merged 2 commits into
twangodev:mainfrom
Kreijstal:fix-samsung-notes-stroke-offsets
May 25, 2026
Merged

Fix Samsung Notes stroke offsets#10
twangodev merged 2 commits into
twangodev:mainfrom
Kreijstal:fix-samsung-notes-stroke-offsets

Conversation

@Kreijstal

@Kreijstal Kreijstal commented May 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • correct stroke record offsets for Samsung Notes exports where start coordinates were decoded three bytes too late
  • keep record advancement aligned after moving the start point offset
  • clamp invalid/extreme rendered pen widths so corrupt width metadata does not blank SVG output

Testing

  • cargo test -q -p sdocx
  • cargo test -q -p sdocx-cli

I validated this against affected Samsung Notes exports locally; before this change they rendered as tiny collapsed diagonal strokes or blank pages, and after the change they render as readable pages.

Summary by CodeRabbit

  • New Features

    • SVG output uses declared page dimensions; images are embedded as data URIs. Text boxes support positioned/rotated rich text, underline, and highlights. Parsed notes and media are surfaced as page elements. Document info now reports background color, per-page template details, and dark-mode compatibility.
  • Bug Fixes

    • More robust stroke decoding and safer stroke-width normalization with validation and fallbacks.
  • Tests

    • Added unit tests for stroke-width normalization, empty-page SVG/background rendering, and dark-mode handling.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 17, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

parse_page now decodes page background and template, centralizes stroke decoding in parse_stroke (tries two layouts, picks the larger). Container extracts media assets and rich note text and sets dark-mode metadata. CLI renders pages to SVG using page dimensions, embedding media as base64 and using normalized stroke widths.

Changes

Parse & render feature

Layer / File(s) Summary
CLI: render pipeline and SVG element rendering
crates/sdocx-cli/Cargo.toml, crates/sdocx-cli/src/main.rs
Adds base64 dependency; extend CLI imports; change render_page_svg to use page.width/page.height, accept media assets and fallback background; render PageElement::Image as embedded base64 data URIs and PageElement::TextBox with positioned/rotated rich text, highlights, and underlines.
Container: media assets & note text extraction
crates/sdocx/src/container.rs
Parse media/ images into MediaAsset entries (MIME/type derivation, ordering), extract UTF‑16 rich note text with TLV style runs, append note text as a TextBox to first page when present, and set metadata.dark_mode_compatibility from optional note flags; add unit test.
Page parsing and stroke decoding (two-layout)
crates/sdocx/src/page.rs
Compute background_color and template early, add ParsedStroke and parse_stroke to centralize stroke decoding (bbox, n_points, payload, validations), try two StrokeLayout variants per record and choose the one with more points, and populate Page.elements via parse_page_elements.
Stroke width normalization & print_info
crates/sdocx-cli/src/main.rs
Introduce normalized_stroke_width(pen_width: f32) -> f64 with finiteness/positivity checks, clamping and fallback; replace prior pen-width division with normalized value; extend print_info to include document and per-page background/template info; add unit tests for normalization and empty-page SVG rendering.
Types: page/template/dark-mode metadata
crates/sdocx/src/types.rs
Add DocumentMetadata.dark_mode_compatibility: Option<bool>, Page.background_color: Option<Color), Page.template: Option<PageTemplate>, and new PageTemplate/PageTemplateSource types (BuiltIn, CustomPdf).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • twangodev/sdocx#9: Also modifies crates/sdocx/src/page.rs stroke parsing/offset decoding for alternate stroke layouts.

Poem

🐰 I hopped through bytes and bitty streams,
I nudged the starts where hidden headers gleam.
I bundled art in base64 delight,
I clamped the widths so strokes draw right.
Now notes unfurl as tidy SVG scenes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Fix Samsung Notes stroke offsets' directly addresses the main issue described in the PR objectives: correcting stroke record offsets and handling rendering edge cases, which is reflected across all file changes in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 17, 2026

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@twangodev

Copy link
Copy Markdown
Owner

Hey @Kreijstal , thanks for the PR. Just curious, does #9 solve the issue as well? Or is this a separate issue?

@Kreijstal

Copy link
Copy Markdown
Contributor Author

I checked #9 against the same affected exports. It does not solve this specific case for me.

With #9 checked out locally, the files still render as effectively empty SVGs because each stroke decodes to only its start point:

Bubu_220301_141059: page 0 = 401 strokes, 401 points, output SVG 167 bytes
Notes_220227_204211: page 0 = 1507 strokes, 1507 points, output SVG 167 bytes
Notes_220228_202818: page 0 = 2310 strokes, 2310 points, output SVG 167 bytes

With this PR applied, the same files decode to full stroke geometry and render readable pages, for example:

Notes_220227_204211: 1507 strokes, 202765 points
Notes_220228_202818: 2310 strokes, 274007 points

So I think the issues are related but not identical. #9 handles the newer v4.4.x record/flag/color format. This PR fixes files where the stroke start point/data offset is three bytes earlier than the current parser expects, plus the CLI width guard for bogus decoded pen widths that can blank a page.

If #9 lands first, I can rebase this on top and adapt the offset handling so both layouts are covered.

@twangodev

Copy link
Copy Markdown
Owner

Just merged #9, a rebase would be great.

Just curious about the 3 byte offset, do you have a sdocx file I can test/validate against? I'll double check tomorrow that this doesn't cause a regression on some of my other sdocx files.

@Kreijstal Kreijstal force-pushed the fix-samsung-notes-stroke-offsets branch 3 times, most recently from d5bdca8 to f69e20d Compare May 18, 2026 07:24

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sdocx-cli/src/main.rs`:
- Around line 334-350: The function first_inserted_image currently bails out on
any entry error because of using ok()? on archive.by_index and
entry.read_to_end, so change it to skip (continue) on errors instead of
returning None: in first_inserted_image<R: Read + Seek>, replace the uses of
archive.by_index(i).ok()? and entry.read_to_end(&mut data).ok()? with
error-handling that logs or ignores the Err case and continues the loop (e.g.,
match or if let Ok(entry) = ... and if entry.read_to_end(&mut data).is_ok() then
return Some(EmbeddedImage { mime, data })), ensuring only image entries with
successful reads produce an EmbeddedImage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1c64fd8-7b74-4e70-8b41-219f3e2eddb1

📥 Commits

Reviewing files that changed from the base of the PR and between 1e7438b and 9b12aa8.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • crates/sdocx-cli/Cargo.toml
  • crates/sdocx-cli/src/main.rs
  • crates/sdocx/src/container.rs
  • crates/sdocx/src/page.rs
  • crates/sdocx/src/types.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/sdocx-cli/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/sdocx/src/page.rs

Comment thread crates/sdocx-cli/src/main.rs Outdated
Comment on lines +334 to +350
fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> {
for i in 0..archive.len() {
let mut entry = archive.by_index(i).ok()?;
let name = entry.name().to_string();
let mime = if name.starts_with("media/") && name.ends_with(".jpg") {
"image/jpeg"
} else if name.starts_with("media/") && name.ends_with(".png") {
"image/png"
} else {
continue;
};
let mut data = Vec::new();
entry.read_to_end(&mut data).ok()?;
return Some(EmbeddedImage { mime, data });
}
None
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Early return on any entry error prevents finding valid images.

Using ok()? causes the function to return None if any archive entry fails to read, even for entries that aren't images. If a corrupted non-image entry exists before a valid image, the image will never be found.

Proposed fix: continue on errors instead of early return
 fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> {
     for i in 0..archive.len() {
-        let mut entry = archive.by_index(i).ok()?;
+        let Ok(mut entry) = archive.by_index(i) else { continue };
         let name = entry.name().to_string();
         let mime = if name.starts_with("media/") && name.ends_with(".jpg") {
             "image/jpeg"
         } else if name.starts_with("media/") && name.ends_with(".png") {
             "image/png"
         } else {
             continue;
         };
         let mut data = Vec::new();
-        entry.read_to_end(&mut data).ok()?;
+        if entry.read_to_end(&mut data).is_err() {
+            continue;
+        }
         return Some(EmbeddedImage { mime, data });
     }
     None
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> {
for i in 0..archive.len() {
let mut entry = archive.by_index(i).ok()?;
let name = entry.name().to_string();
let mime = if name.starts_with("media/") && name.ends_with(".jpg") {
"image/jpeg"
} else if name.starts_with("media/") && name.ends_with(".png") {
"image/png"
} else {
continue;
};
let mut data = Vec::new();
entry.read_to_end(&mut data).ok()?;
return Some(EmbeddedImage { mime, data });
}
None
}
fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> {
for i in 0..archive.len() {
let Ok(mut entry) = archive.by_index(i) else { continue };
let name = entry.name().to_string();
let mime = if name.starts_with("media/") && name.ends_with(".jpg") {
"image/jpeg"
} else if name.starts_with("media/") && name.ends_with(".png") {
"image/png"
} else {
continue;
};
let mut data = Vec::new();
if entry.read_to_end(&mut data).is_err() {
continue;
}
return Some(EmbeddedImage { mime, data });
}
None
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sdocx-cli/src/main.rs` around lines 334 - 350, The function
first_inserted_image currently bails out on any entry error because of using
ok()? on archive.by_index and entry.read_to_end, so change it to skip (continue)
on errors instead of returning None: in first_inserted_image<R: Read + Seek>,
replace the uses of archive.by_index(i).ok()? and entry.read_to_end(&mut
data).ok()? with error-handling that logs or ignores the Err case and continues
the loop (e.g., match or if let Ok(entry) = ... and if entry.read_to_end(&mut
data).is_ok() then return Some(EmbeddedImage { mime, data })), ensuring only
image entries with successful reads produce an EmbeddedImage.

@Kreijstal

Copy link
Copy Markdown
Contributor Author

samsung-notes-sdocx-render-report-2026-05-18.zip

Just merged #9, a rebase would be great.

Just curious about the 3 byte offset, do you have a sdocx file I can test/validate against? I'll double check tomorrow that this doesn't cause a regression on some of my other sdocx files.

I added more files with some exports

@Kreijstal Kreijstal marked this pull request as draft May 18, 2026 08:56
@Kreijstal Kreijstal force-pushed the fix-samsung-notes-stroke-offsets branch 4 times, most recently from e094edc to 03b4fac Compare May 18, 2026 09:30
@Kreijstal Kreijstal force-pushed the fix-samsung-notes-stroke-offsets branch from 03b4fac to 61b3b0e Compare May 18, 2026 09:32
@Kreijstal Kreijstal marked this pull request as ready for review May 18, 2026 09:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/sdocx/src/page.rs (1)

35-36: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the UUID slice before indexing.

uuid_char_len comes from the file, but data[0x28..0x28 + uuid_char_len * 2] can still panic on truncated/corrupt pages. Since this API already returns Result, malformed input should produce Error::Format instead of unwinding.

Possible fix
     // Page UUID at 0x28, length at 0x26
     let uuid_char_len = u16::from_le_bytes(data[0x26..0x28].try_into().unwrap()) as usize;
-    let uuid_bytes = &data[0x28..0x28 + uuid_char_len * 2];
+    let uuid_bytes_len = uuid_char_len
+        .checked_mul(2)
+        .ok_or_else(|| Error::Format("page UUID length overflow".into()))?;
+    let uuid_end = 0x28 + uuid_bytes_len;
+    if uuid_end > data.len() {
+        return Err(Error::Format("page file too short for UUID".into()));
+    }
+    let uuid_bytes = &data[0x28..uuid_end];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sdocx/src/page.rs` around lines 35 - 36, The slice for uuid_bytes is
unguarded and can panic on truncated input; in the function that reads the page
(where uuid_char_len is computed), validate that data.len() >= 0x28 +
uuid_char_len * 2 before taking the slice and return an Error::Format when the
buffer is too short instead of letting the slice panic; update code paths using
uuid_bytes to handle the Result so malformed pages surface as Error::Format
rather than unwinding.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sdocx/src/container.rs`:
- Around line 220-233: The heuristic in looks_like_note_text (in
crates/sdocx/src/container.rs) and its duplicate in page.rs rejects non-ASCII
letters because it uses is_ascii_alphanumeric; update both functions to treat
Unicode letters/digits as “common” by replacing ch.is_ascii_alphanumeric() with
ch.is_alphanumeric() (keep the existing ASCII punctuation check or extend later
if needed) so Hangul/CJK characters are counted and metadata.note_text is
populated for non‑ASCII notes.

In `@crates/sdocx/src/page.rs`:
- Around line 247-269: parse_page_elements currently resets image_count to 0 per
page, producing page-local media_index values that mismatch the renderer which
resolves media_index against document-global doc.metadata.media_assets; change
parse_page_elements to stop using a page-local counter by accepting and updating
a caller-provided/global image counter (e.g., add a parameter like
starting_image_index: &mut usize or return the number of images consumed) so
media_index values are globally unique across pages, update callers to pass a
shared/global counter, and ensure render_element still uses media_index against
doc.metadata.media_assets consistently.
- Around line 51-52: The template lookup is incorrectly gated on
background_color which causes page_template to be skipped when
page_background_color returns None; call page_template(data, base)
unconditionally and assign its result to template (remove the
background_color.and_then(...) usage) so background_color and template are
computed independently (keep variables page_background_color, page_template,
background_color, template intact).

---

Outside diff comments:
In `@crates/sdocx/src/page.rs`:
- Around line 35-36: The slice for uuid_bytes is unguarded and can panic on
truncated input; in the function that reads the page (where uuid_char_len is
computed), validate that data.len() >= 0x28 + uuid_char_len * 2 before taking
the slice and return an Error::Format when the buffer is too short instead of
letting the slice panic; update code paths using uuid_bytes to handle the Result
so malformed pages surface as Error::Format rather than unwinding.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9a63b68-9de4-4ab2-b72d-a27b1e3f4049

📥 Commits

Reviewing files that changed from the base of the PR and between 9b12aa8 and 61b3b0e.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • crates/sdocx-cli/Cargo.toml
  • crates/sdocx-cli/src/main.rs
  • crates/sdocx/src/container.rs
  • crates/sdocx/src/page.rs
  • crates/sdocx/src/types.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/sdocx/src/types.rs

Comment thread crates/sdocx/src/container.rs
Comment thread crates/sdocx/src/page.rs
Comment thread crates/sdocx/src/page.rs
For dark-mode notes, prefer the document's dark background over the light page
template so light ink stays visible. For light-mode notes, default uncolored
strokes to dark ink (#1a1a1a) instead of white so they don't vanish on the
light background.
@twangodev twangodev merged commit 4e8243d into twangodev:main May 25, 2026
11 of 12 checks passed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/sdocx-cli/src/main.rs (1)

100-105: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Document-global media lookup does not match the page-local image indices.

PageElement::Image.media_index is produced from a per-page counter in crates/sdocx/src/page.rs, but every page render here receives the full doc.metadata.media_assets slice. On multi-page notes with images, later pages can reuse page 0's assets or pick the wrong file.

Also applies to: 502-506, 521-525

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sdocx-cli/src/main.rs` around lines 100 - 105, The image branch is
indexing into the document-global doc.metadata.media_assets using a per-page
PageElement::Image.media_index (produced by the per-page counter), which causes
later pages to pick wrong assets; change the lookup to resolve against the
page-local media mapping instead of the global slice — inside the page render
loop in main.rs (the PageElement::Image { bbox, media_index } match arm) use the
page's media-assets collection (e.g. page.media_assets or a per-page media_map)
to get the asset for that media_index, falling back to a safe log/skip if
missing; apply the same fix to the other similar image-handling sites in this
file that reference doc.metadata.media_assets so all per-page indices are
resolved from the page-local mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sdocx-cli/src/main.rs`:
- Around line 39-55: The text rendering still uses a hard-coded fallback ink
instead of the mode-aware color chosen for backgrounds; update render_text_box
(and the other similar branch at lines 77-84) so that when RichTextBox.color is
None it uses a mode-aware fallback ink computed from dark_mode (mirror the bg
selection logic that uses fallback_bg_color/page.background_color): derive a
text_color = rich_text_box.color.or(mode_aware_fallback_ink) where
mode_aware_fallback_ink chooses a light ink when dark_mode and a dark ink
otherwise, convert via color_hex as needed, and thread this color into
render_text_box (or change render_text_box to accept an explicit default color
parameter) so uncolored text is visible on dark pages; reference
RichTextBox.color, render_text_box, color_hex, and the existing FALLBACK_*
constants when implementing.

---

Outside diff comments:
In `@crates/sdocx-cli/src/main.rs`:
- Around line 100-105: The image branch is indexing into the document-global
doc.metadata.media_assets using a per-page PageElement::Image.media_index
(produced by the per-page counter), which causes later pages to pick wrong
assets; change the lookup to resolve against the page-local media mapping
instead of the global slice — inside the page render loop in main.rs (the
PageElement::Image { bbox, media_index } match arm) use the page's media-assets
collection (e.g. page.media_assets or a per-page media_map) to get the asset for
that media_index, falling back to a safe log/skip if missing; apply the same fix
to the other similar image-handling sites in this file that reference
doc.metadata.media_assets so all per-page indices are resolved from the
page-local mapping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa32b2cc-8539-4004-8147-29028e99f04d

📥 Commits

Reviewing files that changed from the base of the PR and between 61b3b0e and c9d5ed8.

📒 Files selected for processing (1)
  • crates/sdocx-cli/src/main.rs

Comment thread crates/sdocx-cli/src/main.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants