Skip to content

Comments

Incremental Update Implementation (cleaner)#111

Open
adnsistemas wants to merge 79 commits intocantoo-scribe:masterfrom
adnsistemas:cantoo-scribe-master
Open

Incremental Update Implementation (cleaner)#111
adnsistemas wants to merge 79 commits intocantoo-scribe:masterfrom
adnsistemas:cantoo-scribe-master

Conversation

@adnsistemas
Copy link

What?

Ability to make incremental updates to PDFs

Why?

Required for proper electronic signing of documents.

How?

Making an snapshot of the original PDF and marking changed Refs that need to be saved in the "increment"

Testing?

All tests and Node app.

New Dependencies?

No

Screenshots

N/A

Suggested Reading?

Yes

Anything Else?

Had to make a couple of small changes to package.json in order to be able to build, test, and commit the changes.

Checklist

  • I read CONTRIBUTING.md.
  • I read MAINTAINERSHIP.md#pull-requests.
  • I added/updated unit tests for my changes.
  • I added/updated integration tests for my changes.
  • I ran the integration tests.
  • I tested my changes in Node, Deno, and the browser.
  • I viewed documents produced with my changes in Adobe Acrobat, Foxit Reader, Firefox, and Chrome.
  • I added/updated doc comments for any new/modified public APIs.
  • My changes work for both new and existing PDF files.
  • I ran the linter on my changes.

@SamiSaves
Copy link

@Sharcoux Hey, is this PR in a clean enough state to be merged or worked on (the previous pr was #110) ? I am also interested in this as we need it for PAdes long term validation support.

I could probably try and help with this to make it merge ready

@benjinus
Copy link

benjinus commented Jul 1, 2025

I think this is a very important feature update, everyone needs this👏

@Sharcoux
Copy link
Collaborator

Sharcoux commented Jul 1, 2025

Ok, I'll try to have a look. It's quite complex. If one of you want to have a second look at the change, that would be great.

@SamiSaves
Copy link

SamiSaves commented Jul 2, 2025

I'll see if I have time to check the pr later this week and I also should be able to test it out and validate that it works for our scenario too 👍

@SamiSaves
Copy link

Here is some reading related to this PR:

PDF spec 7.5.6 Incremental Updates (pdf page 52)
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#page=52

ETSI has a document "Part 1: Building blocks and PAdES baseline signatures" in which part 5.4 mentions incremental updates
https://www.etsi.org/deliver/etsi_en/319100_319199/31914201/01.02.01_60/en_31914201v010201p.pdf#page=10

@SamiSaves
Copy link

SamiSaves commented Jul 7, 2025

I was able to create an incrementally updated pdf with the branch:

async function addIncrementalUpdate(signedPdfBuffer: Buffer, update: string): Promise<Buffer> {
  // Load for incremental update
  const pdfDoc = await PDFDocument.load(signedPdfBuffer, {    forIncrementalUpdate: true  })

  // Add the update string as a text annotation to the first page to demonstrate any update
  const pages = pdfDoc.getPages()
  if (pages.length > 0) {
    pages[0].drawText(update, { x: 50, y: 700, size: 12 })
  }

  // Save incrementally (creates a new revision at the end)
  const newPdf = await pdfDoc.save({ useObjectStreams: false })
  return Buffer.from(newPdf)
}

I checked that increment didn't modify any of the old pdf bytes with this test:

const testPdf = await readFile(path.join(__dirname, './test.pdf'))
const incrementedPdf = addIncrementalUpdate(testPdf, 'testing incremental update')
assert(incrementalPdf.subarray(0, testPdf.length).equals(testPdf))

And I also observed from the pdf bytes that the new increment contains a new catalog, trailer and EOF.

@adnsistemas
Copy link
Author

I added some tests in the PR, unit and app tests. One tests that the original PDF is unchanged.
There is not one that checks the increment, maybe is worth adding it.

Copy link

@SamiSaves SamiSaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the PR and didn't find much to say as I am not familiar with the codebase.

To me it looks good, it doesn't seem to break existing functionality and this is confirmed by existing tests. It has test coverage for the new feature as well.

I think the only thing missing here is some documentation what the incremental update is, what does it do and how to use it.

package.json Outdated
Comment on lines 164 to 165
"prettier --check || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)",
"eslint --fix || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should revert these changes. I wonder what here caused issues 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make changes in order to be able to run the tests. Don't know why, but in my environment the original failed with an error. Should be "reverted" to merge, but I can't do it in my branch.

Comment on lines 32 to 34
let tw = font.widthOfTextAtSize('Electronic Signature Test #20', tamLetra);
let lw = fontBold.widthOfTextAtSize(usarLibreOffice, tamAdvertencia);
if (lw > tw) tw = lw;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally prefer more descriptive names for variables. As I am new to this project and the world of pdfs, it takes quite a bit of effort to understand what these are for.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. And the code seems unnecessarily hard to read. What about:

const textWidth = Math.max(
  font.widthOfTextAtSize('Electronic Signature Test #20', tamLetra),
  fontBold.widthOfTextAtSize(usarLibreOffice, tamAdvertencia)
)
const textHeight = Math.max(
  font.heightAtSize(tamLetra),
  fontBold.heightAtSize(tamAdvertencia)
)

Comment on lines 715 to 718
expect(pdfIncrementalBytes.byteLength).toBeGreaterThan(0);
expect(finalPdfBytes.byteLength).toBeLessThanOrEqual(
pdfSaveBytes.byteLength,
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are expecting same output in both cases, would it make sense to compare that the actual bytes are equal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, then do it, no?

package.json Outdated
"description": "Create and modify PDF files with JavaScript",
"author": "Andrew Dillon <andrew.dillon.j@gmail.com>",
"packageManager": "yarn@1.22.19+",
"packageManager": "yarn@1.22.19",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original don't run in my environment. Should be equivalent, but in my case npm complains about it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cantoo scribe master has

"packageManager": "yarn@1.22.19+",

This pr has

"packageManager": "yarn@1.22.19",

And your forks master has:

"packageManager": "yarn@1.22.22+sha512.a6b2f7906b721bba3d67d4aff083df04dad64c399707841b7acf00f6b133b7ac24255f2652fa22ae3534329dc6180534e98d17432037ff6fd140556e2bb3137e",

Have you tried with the cantoo master version? Perhaps the issue was the yarn with the full sha?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I try a lot of combinations that didn't work. But I'm not keen at all in configs.. so probably is something simple that I don't know about. Config and package changes shouldn't be necessaries, but I don't know how to exclude them from the PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do somthing like this:

  1. Revert the changes (no commiting yet)
  2. Temporarily remove this from package.json
  "husky": {
    "hooks": {
      "pre-commit": "lint-staged"
    }
  },
  1. Stage and commit the reverted code (excluding the husky change above)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is done..
Also added a fix in largestObjectNumber determination, when XREF is a Stream. For that I had to fix some PDFparser tests expectations, a couple of which I'm not totally sure about, but it seems that all the expectations in those tests are "taken" from the test result, and not from the PDF actual content, so I'm guessing is "all right".

"{src,tests,apps}/**/*.{ts,js,json,html,css}": [
"prettier --check || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)",
"eslint --fix || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)"
"prettier --check",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why those changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't run it without the change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, in your forks branch cantoo-scribe-master eslint gets stuck trying to lint /dist/pdf-lib.esm.min.js. So it seems to be trying to scan files should be ignored.

It seems to be due to this change: adnsistemas@417a8a6#diff-a32a0887ed9d1d707bbb3b845b7df7fd40e673c47e7b60a3ebd896b68d3b8839R10

Could this be the reason why it didn't work earlier with these changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't got stuck for me, it always failed with the message after the ||
I have the same problem with both projects (the one I forked from, and this). I guess is the versions of node/npm/Linux I'm running in, but I really don't know (and to be honest I don't care much about it so I didn't dig into the actual problem).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you run yarn lint?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In @cantoo/pdf-lib master branch (node 20.17.0, npm 10.8.2):
Invalid package manager specification in package.json (yarn@1.22.19+); expected a semver version
Then, changing that line to yarn@1.22.19:
yarn run v1.22.19 $ yarn lint:prettier && yarn lint:eslint $ prettier --write "./{src,tests,apps}/**/*.{ts,js,json,html,css}" --log-level error $ eslint .
And there it hungs, after running only yarn lint:eslint --verbose, and then again yarn lint it works.
Then I changed a file and staged it. Running yarn lint-staged gives:
`yarn run v1.22.19
$ lint-staged
✔ Backed up original state in git stash (d85c0fa)
⚠ Running tasks for staged files...
❯ package.json — 1 file
❯ {src,tests,apps}/**/*.{ts,js,json,html,css} — 1 file
✖ prettier --check || (echo '
⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.' && exit 1) [FAILED]
◼ eslint --fix || (echo '
⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.' && exit 1)
↓ Skipped because of errors from tasks.
✔ Reverting to original state because of errors...
✔ Cleaning up temporary files...

✖ prettier --check || (echo '
⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.' && exit 1):
[error] No files matching the pattern were found: "||".
[error] No files matching the pattern were found: "(echo".
[error] No files matching the pattern were found: "
[error] ⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.".
[error] No files matching the pattern were found: "&&".
[error] No files matching the pattern were found: "exit".
[error] No files matching the pattern were found: "1)".
Checking formatting...
All matched files use Prettier code style!
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.`

But yarn lint finds no problems.

import PDFContext from '../PDFContext';

class PDFObject {
registerChange() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an error will probably break something PDFPageLeaf, will it not? Shouldn't we just do nothing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every PDFObject should 'registerChange' (do noting is an option for specific descendants).
Otherwise you can't guarantee the increment will be correct.
If you just "forgot" to implement it, and the base implementation does nothing, will be difficult to trace why you have missing elements in the increment.

'Circular dependency: es/api/PDFPage.js -> es/api/PDFDocument.js -> es/api/form/PDFForm.js -> es/api/form/PDFOptionList.js -> es/api/PDFPage.js',
'Circular dependency: es/api/PDFPage.js -> es/api/PDFDocument.js -> es/api/form/PDFForm.js -> es/api/form/PDFRadioGroup.js -> es/api/PDFPage.js',
'Circular dependency: es/api/PDFPage.js -> es/api/PDFDocument.js -> es/api/form/PDFForm.js -> es/api/form/PDFTextField.js -> es/api/PDFPage.js',
'Circular dependency: es/core/index.js -> es/core/writers/PDFWriter.js -> es/api/snapshot/index.js -> es/api/snapshot/IncrementalDocumentSnapshot.js -> es/core/index.js',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IncrementalDocumentSnapshot should not depend on core/index.js.

'Circular dependency: es\\api\\PDFPage.js -> es\\api\\PDFDocument.js -> es\\api\\form\\PDFForm.js -> es\\api\\form\\PDFOptionList.js -> es\\api\\PDFPage.js',
'Circular dependency: es\\api\\PDFPage.js -> es\\api\\PDFDocument.js -> es\\api\\form\\PDFForm.js -> es\\api\\form\\PDFRadioGroup.js -> es\\api\\PDFPage.js',
'Circular dependency: es\\api\\PDFPage.js -> es\\api\\PDFDocument.js -> es\\api\\form\\PDFForm.js -> es\\api\\form\\PDFTextField.js -> es\\api\\PDFPage.js',
'Circular dependency: es\\core\\index.js -> es\\core\\writers\\PDFWriter.js -> es\\api\\snapshot\\index.js -> es\\api\\snapshot\\IncrementalDocumentSnapshot.js -> es\\core\\index.js',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -0,0 +1,29 @@
import { PDFObject, PDFRef } from '../../core';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to import only the type? That might solve the cyclic dependency.

@@ -0,0 +1,14 @@
import { PDFObject, PDFRef } from '../../core';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to import only the type? That might solve the cyclic dependency.

@@ -0,0 +1,59 @@
import { PDFContext, PDFObject, PDFRef } from '../../core';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to import only the type? That might solve the cyclic dependency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those imports are "inherited" from the project I took the incremental update base functionality. Not sure about the circular dependency issue, and solution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this solution would work with DocumentSnapshot and DefaultDocumentSnapshot, in both all imports can be switched to import type I think. But in this file we have obj instanceof PDFRef ? obj : this.context.getObjectRef(obj) and that requires a normal import from core

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help if we import PDFRef straight from the file itself like PDFWriter? 🤔

import PDFRef from '../../core/objects/PDFRef';

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this.context.getObjectRef(obj) could handle the case of PDFRef and return the obj in that case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented most of the suggested changes and all seems to be working as before, no more circular dependency message, and no more '!' after getObjectRef.
Had to add a method to context, to be able to import only the type for PDFRef.
Is there a way to update this PR with the changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my own question: Yes, by pushing to the branch the PR comes from.
Updated.

@adnsistemas
Copy link
Author

I think the only thing missing here is some documentation what the incremental update is, what does it do and how to use it.
Didn't include it because makes the PR "messy". You can take the documentation from my project @adnsistemas/pdf-lib, the README.md has the documentation.

@@ -0,0 +1,59 @@
import { PDFContext, PDFObject, PDFRef } from '../../core';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this.context.getObjectRef(obj) could handle the case of PDFRef and return the obj in that case?

markObjsForSave(objs: PDFObject[]): void {
this.markRefsForSave(
objs.map((obj) =>
obj instanceof PDFRef ? obj : this.context.getObjectRef(obj)!,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not a good idea to assume this.context.getObjectRef(obj)!. It might be better to handle the case where it won't return what you expect. Does it make sense to accept filter the map? .filter((obj): result is PDFObject => !!obj)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like that '!' there, but as I didn't wrote that, I supposed it was just for TS not complaining. I like the filter a lot more, and makes more sense: if you can't get a Ref, then there is nothing to update in the increment.

adnsistemas and others added 24 commits February 18, 2026 16:28
…f deletions. Deletion of pages objects on page removal. Snapshots ref deletion registration
- Add commit() method to PDFDocument enabling chained incremental updates
- Fix font duplication: change 'modified' flag to 'alreadyEmbedded' in PDFFont
- Use PDFWriter instead of PDFStreamWriter for incremental saves
- Add comprehensive tests for commit() functionality (17 test cases)
Previously, inline objects (arrays/dicts embedded directly in other
objects, not registered as indirect objects) would not be tracked
for incremental saves. This caused changes to inline arrays like
MediaBox to be lost.

The fix enhances PDFContext.registerObjectChange() to:
1. First check if the object is directly registered (indirect object)
2. If not, search through all indirect objects to find which one
   contains the inline object, and mark that parent for save

Also fixed lint-staged config that had broken shell commands.

Added test: 'tracks inline array modifications for incremental saves'

All 673 tests pass.
…e and incremental generation. Parameter adding for incremental update saving, to force useObjectStreams preference
… fully saved (rewritted). Added test for this scenario.
@Sharcoux
Copy link
Collaborator

Sharcoux commented Feb 19, 2026

image

I have this: "This branch cannot be rebased due to conflicts". This is in GitHub. I could work around and manually apply the patch locally, but the change would lose paternity, I'd like to avoid if possible.

Maybe you can just update your branch to the top of master? That should be enough.

Ah! Maybe you didn't fetch the last version of master locally and that's why you don't see the conflict?

@adnsistemas
Copy link
Author

image I have this: "This branch cannot be rebased due to conflicts". This is in GitHub. I could work around and manually apply the patch locally, but the change would lose paternity, I'd like to avoid if possible.

Maybe you can just update your branch to the top of master? That should be enough.

Ah! Maybe you didn't fetch the last version of master locally and that's why you don't see the conflict?

Ok, I'll finish the rebase and update this branch. I'm not good at git, so no idea what that message means, as I can checkout cantoo-scribe/master and merge this branch onto it without conflicts.
I'll have to check the rebase result, so it will take me some time.

@adnsistemas
Copy link
Author

Ok, I'm not sure I did it properly. This is what I did:

  • Added this project as an additional remote origin with name 'cantoo'
  • Checked out master branch of this project as 'cantoo-master'
  • git checkout cantoo-scribe-master
  • git rebase cantoo-master
  • Fixed all the conflicts
  • Checked the resulting changes
  • Added the corrections requested and fixed a problem with XRef Streams skipping process
  • Tried to push to this pull-request branch and git complained that I needed to pull first
  • Pulled remote cantoo-scribe-master into local cantoo-scribe-master branch
  • Rechecked changes and rerun tests and apps
  • Pushed my local cantoo-scribe-master to this branch
    Everything seems ok to me..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants