Fix Samsung Notes stroke offsets#10
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughparse_page now decodes page background and template, centralizes stroke decoding in parse_stroke (tries two layouts, picks the larger). Container extracts media assets and rich note text and sets dark-mode metadata. CLI renders pages to SVG using page dimensions, embedding media as base64 and using normalized stroke widths. ChangesParse & render feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
|
Hey @Kreijstal , thanks for the PR. Just curious, does #9 solve the issue as well? Or is this a separate issue? |
|
I checked #9 against the same affected exports. It does not solve this specific case for me. With #9 checked out locally, the files still render as effectively empty SVGs because each stroke decodes to only its start point: With this PR applied, the same files decode to full stroke geometry and render readable pages, for example: So I think the issues are related but not identical. #9 handles the newer v4.4.x record/flag/color format. This PR fixes files where the stroke start point/data offset is three bytes earlier than the current parser expects, plus the CLI width guard for bogus decoded pen widths that can blank a page. If #9 lands first, I can rebase this on top and adapt the offset handling so both layouts are covered. |
|
Just merged #9, a rebase would be great. Just curious about the 3 byte offset, do you have a sdocx file I can test/validate against? I'll double check tomorrow that this doesn't cause a regression on some of my other sdocx files. |
d5bdca8 to
f69e20d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/sdocx-cli/src/main.rs`:
- Around line 334-350: The function first_inserted_image currently bails out on
any entry error because of using ok()? on archive.by_index and
entry.read_to_end, so change it to skip (continue) on errors instead of
returning None: in first_inserted_image<R: Read + Seek>, replace the uses of
archive.by_index(i).ok()? and entry.read_to_end(&mut data).ok()? with
error-handling that logs or ignores the Err case and continues the loop (e.g.,
match or if let Ok(entry) = ... and if entry.read_to_end(&mut data).is_ok() then
return Some(EmbeddedImage { mime, data })), ensuring only image entries with
successful reads produce an EmbeddedImage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b1c64fd8-7b74-4e70-8b41-219f3e2eddb1
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
crates/sdocx-cli/Cargo.tomlcrates/sdocx-cli/src/main.rscrates/sdocx/src/container.rscrates/sdocx/src/page.rscrates/sdocx/src/types.rs
✅ Files skipped from review due to trivial changes (1)
- crates/sdocx-cli/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/sdocx/src/page.rs
| fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> { | ||
| for i in 0..archive.len() { | ||
| let mut entry = archive.by_index(i).ok()?; | ||
| let name = entry.name().to_string(); | ||
| let mime = if name.starts_with("media/") && name.ends_with(".jpg") { | ||
| "image/jpeg" | ||
| } else if name.starts_with("media/") && name.ends_with(".png") { | ||
| "image/png" | ||
| } else { | ||
| continue; | ||
| }; | ||
| let mut data = Vec::new(); | ||
| entry.read_to_end(&mut data).ok()?; | ||
| return Some(EmbeddedImage { mime, data }); | ||
| } | ||
| None | ||
| } |
There was a problem hiding this comment.
Early return on any entry error prevents finding valid images.
Using ok()? causes the function to return None if any archive entry fails to read, even for entries that aren't images. If a corrupted non-image entry exists before a valid image, the image will never be found.
Proposed fix: continue on errors instead of early return
fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> {
for i in 0..archive.len() {
- let mut entry = archive.by_index(i).ok()?;
+ let Ok(mut entry) = archive.by_index(i) else { continue };
let name = entry.name().to_string();
let mime = if name.starts_with("media/") && name.ends_with(".jpg") {
"image/jpeg"
} else if name.starts_with("media/") && name.ends_with(".png") {
"image/png"
} else {
continue;
};
let mut data = Vec::new();
- entry.read_to_end(&mut data).ok()?;
+ if entry.read_to_end(&mut data).is_err() {
+ continue;
+ }
return Some(EmbeddedImage { mime, data });
}
None
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> { | |
| for i in 0..archive.len() { | |
| let mut entry = archive.by_index(i).ok()?; | |
| let name = entry.name().to_string(); | |
| let mime = if name.starts_with("media/") && name.ends_with(".jpg") { | |
| "image/jpeg" | |
| } else if name.starts_with("media/") && name.ends_with(".png") { | |
| "image/png" | |
| } else { | |
| continue; | |
| }; | |
| let mut data = Vec::new(); | |
| entry.read_to_end(&mut data).ok()?; | |
| return Some(EmbeddedImage { mime, data }); | |
| } | |
| None | |
| } | |
| fn first_inserted_image<R: Read + Seek>(archive: &mut zip::ZipArchive<R>) -> Option<EmbeddedImage> { | |
| for i in 0..archive.len() { | |
| let Ok(mut entry) = archive.by_index(i) else { continue }; | |
| let name = entry.name().to_string(); | |
| let mime = if name.starts_with("media/") && name.ends_with(".jpg") { | |
| "image/jpeg" | |
| } else if name.starts_with("media/") && name.ends_with(".png") { | |
| "image/png" | |
| } else { | |
| continue; | |
| }; | |
| let mut data = Vec::new(); | |
| if entry.read_to_end(&mut data).is_err() { | |
| continue; | |
| } | |
| return Some(EmbeddedImage { mime, data }); | |
| } | |
| None | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/sdocx-cli/src/main.rs` around lines 334 - 350, The function
first_inserted_image currently bails out on any entry error because of using
ok()? on archive.by_index and entry.read_to_end, so change it to skip (continue)
on errors instead of returning None: in first_inserted_image<R: Read + Seek>,
replace the uses of archive.by_index(i).ok()? and entry.read_to_end(&mut
data).ok()? with error-handling that logs or ignores the Err case and continues
the loop (e.g., match or if let Ok(entry) = ... and if entry.read_to_end(&mut
data).is_ok() then return Some(EmbeddedImage { mime, data })), ensuring only
image entries with successful reads produce an EmbeddedImage.
|
samsung-notes-sdocx-render-report-2026-05-18.zip
I added more files with some exports |
e094edc to
03b4fac
Compare
03b4fac to
61b3b0e
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/sdocx/src/page.rs (1)
35-36:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard the UUID slice before indexing.
uuid_char_lencomes from the file, butdata[0x28..0x28 + uuid_char_len * 2]can still panic on truncated/corrupt pages. Since this API already returnsResult, malformed input should produceError::Formatinstead of unwinding.Possible fix
// Page UUID at 0x28, length at 0x26 let uuid_char_len = u16::from_le_bytes(data[0x26..0x28].try_into().unwrap()) as usize; - let uuid_bytes = &data[0x28..0x28 + uuid_char_len * 2]; + let uuid_bytes_len = uuid_char_len + .checked_mul(2) + .ok_or_else(|| Error::Format("page UUID length overflow".into()))?; + let uuid_end = 0x28 + uuid_bytes_len; + if uuid_end > data.len() { + return Err(Error::Format("page file too short for UUID".into())); + } + let uuid_bytes = &data[0x28..uuid_end];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/sdocx/src/page.rs` around lines 35 - 36, The slice for uuid_bytes is unguarded and can panic on truncated input; in the function that reads the page (where uuid_char_len is computed), validate that data.len() >= 0x28 + uuid_char_len * 2 before taking the slice and return an Error::Format when the buffer is too short instead of letting the slice panic; update code paths using uuid_bytes to handle the Result so malformed pages surface as Error::Format rather than unwinding.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/sdocx/src/container.rs`:
- Around line 220-233: The heuristic in looks_like_note_text (in
crates/sdocx/src/container.rs) and its duplicate in page.rs rejects non-ASCII
letters because it uses is_ascii_alphanumeric; update both functions to treat
Unicode letters/digits as “common” by replacing ch.is_ascii_alphanumeric() with
ch.is_alphanumeric() (keep the existing ASCII punctuation check or extend later
if needed) so Hangul/CJK characters are counted and metadata.note_text is
populated for non‑ASCII notes.
In `@crates/sdocx/src/page.rs`:
- Around line 247-269: parse_page_elements currently resets image_count to 0 per
page, producing page-local media_index values that mismatch the renderer which
resolves media_index against document-global doc.metadata.media_assets; change
parse_page_elements to stop using a page-local counter by accepting and updating
a caller-provided/global image counter (e.g., add a parameter like
starting_image_index: &mut usize or return the number of images consumed) so
media_index values are globally unique across pages, update callers to pass a
shared/global counter, and ensure render_element still uses media_index against
doc.metadata.media_assets consistently.
- Around line 51-52: The template lookup is incorrectly gated on
background_color which causes page_template to be skipped when
page_background_color returns None; call page_template(data, base)
unconditionally and assign its result to template (remove the
background_color.and_then(...) usage) so background_color and template are
computed independently (keep variables page_background_color, page_template,
background_color, template intact).
---
Outside diff comments:
In `@crates/sdocx/src/page.rs`:
- Around line 35-36: The slice for uuid_bytes is unguarded and can panic on
truncated input; in the function that reads the page (where uuid_char_len is
computed), validate that data.len() >= 0x28 + uuid_char_len * 2 before taking
the slice and return an Error::Format when the buffer is too short instead of
letting the slice panic; update code paths using uuid_bytes to handle the Result
so malformed pages surface as Error::Format rather than unwinding.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b9a63b68-9de4-4ab2-b72d-a27b1e3f4049
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
crates/sdocx-cli/Cargo.tomlcrates/sdocx-cli/src/main.rscrates/sdocx/src/container.rscrates/sdocx/src/page.rscrates/sdocx/src/types.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/sdocx/src/types.rs
For dark-mode notes, prefer the document's dark background over the light page template so light ink stays visible. For light-mode notes, default uncolored strokes to dark ink (#1a1a1a) instead of white so they don't vanish on the light background.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/sdocx-cli/src/main.rs (1)
100-105:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDocument-global media lookup does not match the page-local image indices.
PageElement::Image.media_indexis produced from a per-page counter incrates/sdocx/src/page.rs, but every page render here receives the fulldoc.metadata.media_assetsslice. On multi-page notes with images, later pages can reuse page 0's assets or pick the wrong file.Also applies to: 502-506, 521-525
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/sdocx-cli/src/main.rs` around lines 100 - 105, The image branch is indexing into the document-global doc.metadata.media_assets using a per-page PageElement::Image.media_index (produced by the per-page counter), which causes later pages to pick wrong assets; change the lookup to resolve against the page-local media mapping instead of the global slice — inside the page render loop in main.rs (the PageElement::Image { bbox, media_index } match arm) use the page's media-assets collection (e.g. page.media_assets or a per-page media_map) to get the asset for that media_index, falling back to a safe log/skip if missing; apply the same fix to the other similar image-handling sites in this file that reference doc.metadata.media_assets so all per-page indices are resolved from the page-local mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/sdocx-cli/src/main.rs`:
- Around line 39-55: The text rendering still uses a hard-coded fallback ink
instead of the mode-aware color chosen for backgrounds; update render_text_box
(and the other similar branch at lines 77-84) so that when RichTextBox.color is
None it uses a mode-aware fallback ink computed from dark_mode (mirror the bg
selection logic that uses fallback_bg_color/page.background_color): derive a
text_color = rich_text_box.color.or(mode_aware_fallback_ink) where
mode_aware_fallback_ink chooses a light ink when dark_mode and a dark ink
otherwise, convert via color_hex as needed, and thread this color into
render_text_box (or change render_text_box to accept an explicit default color
parameter) so uncolored text is visible on dark pages; reference
RichTextBox.color, render_text_box, color_hex, and the existing FALLBACK_*
constants when implementing.
---
Outside diff comments:
In `@crates/sdocx-cli/src/main.rs`:
- Around line 100-105: The image branch is indexing into the document-global
doc.metadata.media_assets using a per-page PageElement::Image.media_index
(produced by the per-page counter), which causes later pages to pick wrong
assets; change the lookup to resolve against the page-local media mapping
instead of the global slice — inside the page render loop in main.rs (the
PageElement::Image { bbox, media_index } match arm) use the page's media-assets
collection (e.g. page.media_assets or a per-page media_map) to get the asset for
that media_index, falling back to a safe log/skip if missing; apply the same fix
to the other similar image-handling sites in this file that reference
doc.metadata.media_assets so all per-page indices are resolved from the
page-local mapping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fa32b2cc-8539-4004-8147-29028e99f04d
📒 Files selected for processing (1)
crates/sdocx-cli/src/main.rs
Summary
Testing
I validated this against affected Samsung Notes exports locally; before this change they rendered as tiny collapsed diagonal strokes or blank pages, and after the change they render as readable pages.
Summary by CodeRabbit
New Features
Bug Fixes
Tests