feat(parity): padDiff port, compactPad, anonymizeAuthor, preAuthorize hooks#280
Conversation
… hooks
Next parity round vs the original etherpad, implemented in parallel:
createDiffHTML + compactPad:
- new lib/paddiff package: full port of padDiff.ts (clear-authorship
detection, deletion changesets re-inserting removed text with
author + removed attributes, composition, valid revision ranges)
- GET /admin/api/pads/{padId}/diffHTML?startRev=N&endRev=M renders the
diff via the existing HTML export and returns {html, authors}
- POST /admin/api/pads/{padId}/compact reuses the admin-UI revision
compaction (AdminMessageHandler.DeleteRevisions)
anonymizeAuthor (GDPR, original API 1.3.1):
- Manager.AnonymizeAuthor mirrors the original ordering: sever the
token binding first, zero name/color, null chat authorship
- POST /admin/api/author/{authorId}/anonymize endpoint
- new DataStore primitives RemoveTokenOfAuthor and ClearChatAuthorship
on all four backends; GetChatsOfPad now LEFT JOINs so anonymized
messages stay retrievable, GetAuthorIdsOfPadChats skips NULL authors
preAuthorize / preAuthzFailure plugin hooks:
- ported from the original webaccess Step 1: explicit permit skips
authentication (never on /admin-auth paths), deny returns 403 unless
a preAuthzFailure hook overrides the response, no answer defers to
the regular flow; wired into the server middleware
Bug fixed along the way:
- adminutils.CreateRevision mutated a by-value pool copy that shared
the caller's maps, corrupting the pad pool (Pad.Check then failed
with "numToAttrib length does not match nextNum" and revision
cleanup 500ed). It now works on a real deep copy (new APool.Clone);
the dead, nil-map-writing APool.clone was removed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Code Review by Qodo
1. DiffHTML mutates cached pool
|
PR Summary by QodoAdd pad diff/compact & author anonymization APIs; preAuthorize hooks; fix pool cloning WalkthroughsDescription• Add admin APIs for pad diffHTML rendering and revision compaction. • Implement GDPR author anonymization across managers and all datastore backends. • Add plugin preAuthorize/preAuthzFailure hooks to short-circuit web access decisions. • Fix attribute-pool copy bug that could corrupt pads during revision creation. Diagramgraph TD
req["HTTP request"] --> mw["CheckAccessWithHooks"] --> preauth{"preAuthorize\npermit/deny?"}
preauth -->|"permit/defer"| api["/admin/api routes"] --> diff["paddiff diff atext"] --> exp["Export HTML"]
api --> compact["DeleteRevisions"] --> db[("DataStore")]
api --> anon["AnonymizeAuthor"] --> db
preauth -->|"deny"| fail["preAuthzFailure"]
subgraph Legend
direction LR
_req["Request/middleware"] ~~~ _dec{"Decision"} ~~~ _db[("Database")]
end
High-Level AssessmentThe following are alternative approaches to this PR: 1. Implement compaction as a dedicated PadManager/Store service
2. Stop at first preAuthorize answer (aCallFirst semantics)
Recommendation: Current approach is reasonable for a parity round: reuse the existing DeleteRevisions implementation to minimize behavioral drift, and aggregate preAuthorize results so any explicit deny reliably blocks access. If future work grows around compaction, consider extracting the DeleteRevisions core into a PadManager-level service to remove the handler dependency. File ChangesEnhancement (16)
Bug fix (2)
Tests (5)
|
| diffAText, authors, err := paddiff.CreateDiffAText(foundPad, &foundPad.Pool, from, to) | ||
| if err != nil { | ||
| initStore.Logger.Errorf("Error creating diff atext for pad %s: %v", padId, err) | ||
| return c.Status(500).JSON(errors2.InternalServerError) | ||
| } | ||
|
|
||
| // Render the diff atext with the regular export-HTML pipeline (it | ||
| // understands the 'removed' attribute). GetPadHTML reads pad.AText when | ||
| // no revision is requested, so a shallow copy of the pad carrying the | ||
| // diff atext is passed. | ||
| padWithDiff := *foundPad | ||
| padWithDiff.AText = *diffAText | ||
|
|
||
| authorColors := buildAuthorColors(&foundPad.Pool, initStore.AuthorManager) | ||
| exporter := io2.NewExportHtml(initStore.PadManager, initStore.AuthorManager, initStore.Hooks) |
There was a problem hiding this comment.
1. Diffhtml mutates cached pool 🐞 Bug ☼ Reliability
CreateDiffHTML passes the cached pad’s Pool into paddiff.CreateDiffAText, which mutates it via PutAttrib; this makes a read-only endpoint modify shared pad state and can introduce request-time data races/corruption. Because pads are globally cached and reused across requests, the mutation persists beyond the diff request.
Agent Prompt
## Issue description
`CreateDiffHTML` currently calls `paddiff.CreateDiffAText(foundPad, &foundPad.Pool, ...)`, and paddiff mutates the provided pool (`PutAttrib`). Because `foundPad` is a cached/shared pad instance, this causes a GET endpoint to mutate shared state and can create data races under concurrent requests.
## Issue Context
Pads are returned from a global cache and reused across requests; the pad struct contains an `apool.APool` with mutable maps.
## Fix
Clone the pad’s pool in `CreateDiffHTML` and use the clone consistently:
- `poolClone := foundPad.Pool.Clone()`
- call `CreateDiffAText(foundPad, &poolClone, ...)`
- set `padWithDiff.Pool = poolClone` before rendering
- build colors from the clone (`buildAuthorColors(&poolClone, ...)`)
This keeps diff generation side-effect-free and avoids concurrent mutation of cached pad state.
## Fix Focus Areas
- lib/api/pad/compactDiff.go[121-149]
- lib/paddiff/paddiff.go[51-56]
- lib/paddiff/paddiff.go[209-226]
- lib/apool/APool.go[94-108]
- lib/pad/padManager.go[69-166]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| func (m *Manager) AnonymizeAuthor(authorId string) error { | ||
| if _, err := m.Db.GetAuthor(authorId); err != nil { | ||
| return errors.New(db.AuthorNotFoundError) | ||
| } |
There was a problem hiding this comment.
2. Db errors returned as 404 🐞 Bug ≡ Correctness
author.Manager.AnonymizeAuthor converts any error from Db.GetAuthor into db.AuthorNotFoundError, so DB/query failures are misreported as “not found” and the API returns 404 instead of 500. This hides real backend failures and breaks correct error semantics.
Agent Prompt
## Issue description
`Manager.AnonymizeAuthor()` currently does:
```go
if _, err := m.Db.GetAuthor(authorId); err != nil {
return errors.New(db.AuthorNotFoundError)
}
```
This collapses all failures (including database/query errors) into "author not found".
## Issue Context
Datastore `GetAuthor` implementations can return non-not-found errors (query/scan/driver errors). The API layer maps `db.AuthorNotFoundError` to HTTP 404.
## Fix
Return the original `GetAuthor` error, or only map to `db.AuthorNotFoundError` when the underlying error is actually the not-found case.
A minimal safe change:
- `if _, err := m.Db.GetAuthor(authorId); err != nil { return err }`
If you want to keep explicit mapping, do:
- `if err.Error() == db.AuthorNotFoundError { return err } else { return err }` (i.e., don’t replace non-not-found errors).
## Fix Focus Areas
- lib/author/authorManager.go[188-191]
- lib/db/PostgresDB.go[285-314]
- lib/api/author/init.go[236-248]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
Summary
Third parity round vs the original etherpad-lite — four areas implemented in parallel:
createDiffHTML (padDiff port)
lib/paddiffpackage: faithful port ofpadDiff.ts— clear-authorship detection, deletion changesets that re-insert removed text carryingauthor+removedattributes, changeset composition, revision-range validation.GET /admin/api/pads/{padId}/diffHTML?startRev=N[&endRev=M]renders via the existing HTML export and returns{html, authors}.compactPad (original API 1.3.1)
POST /admin/api/pads/{padId}/compact({keepRevisions: N}) reusing the admin UI's revision compaction (AdminMessageHandler.DeleteRevisions).anonymizeAuthor (GDPR, original API 1.3.1)
Manager.AnonymizeAuthormirrors the original's ordering (sever token binding first, zero name/color, null chat authorship; pad content stays intact; idempotent).POST /admin/api/author/{authorId}/anonymize.RemoveTokenOfAuthor+ClearChatAuthorshipon all four backends;GetChatsOfPadLEFT JOINs so anonymized messages stay retrievable.preAuthorize / preAuthzFailure plugin hooks
/admin-auth), deny → 403 unless apreAuthzFailurehook overrides the response, no answer defers to the regular flow. Wired into the server middleware.Bug fixed along the way
adminutils.CreateRevisionmutated a by-value pool copy that shared the caller's maps, corrupting the pad pool —Pad.Check()then failed withnumToAttrib length does not match nextNumand the admin revision cleanup 500ed (reproducible: create a pad without an author, runCheck()twice). Fixed with a real deep copy (newAPool.Clone); the dead, nil-map-writingAPool.clonewas removed.Companion change (separate repo)
The client side of
suggestUserName(adopting a suggested name + sendingUSERINFO_UPDATE) was implemented inetherpad-webcomponentson branchfeat/client-message.Test plan
go test -p 1 ./...— all 28 packages green (Memory/SQLite/Postgres/MySQL),go vet ./...clean🤖 Generated with Claude Code