feat: Improve progress reporting during identifier conversion#23186
feat: Improve progress reporting during identifier conversion#23186thykel wants to merge 8 commits into
Conversation
The [A-Z][A-Z0-9_]* shape that defines a semantic project identifier appears in three places that each redefine it inline: the validator's body check, the work-package semantic-id pattern composed into the route constraint, and the wiki reverse-link parser. Lift the unanchored shape to Projects::Identifier::SEMANTIC_FORMAT and have WorkPackage::SemanticIdentifier::SEMANTIC_ID_PATTERN compose from it via .source, with ID_ROUTE_CONSTRAINT in turn composing from that. The validator keeps its own anchored start/body patterns because each pattern produces a distinct error message — composing the two from SEMANTIC_FORMAT would obscure that contract. Split out of #22976 so #23186 can introduce its CLASSIC_IDENTIFIER_CHARS counterpart without conflicting with the larger feature PR.
|
Caution The provided work package version does not match the core version Details:
Please make sure that:
|
Deploying openproject with ⚡ PullPreview
|
akabiru
left a comment
There was a problem hiding this comment.
I'll do my own review in the morning, but here are some notes review agents flagged verbatim.
| def with_non_classic_identifier | ||
| where("identifier !~ ?", "^#{CLASSIC_FORMAT_CHARS.source}$") |
There was a problem hiding this comment.
🍎 with_non_classic_identifier and classic_identifier_format? disagree on all-numeric input
The new scope is built from CLASSIC_FORMAT_CHARS (no anti-all-numeric guard) while classic_identifier_format? uses CLASSIC_FORMAT (which has the guard). So "123" is classic in SQL and non-classic in Ruby — and the scope name implies it's the complement of the method.
It happens to be safe today only because RevertInstanceToClassicIdsJob#perform (app/workers/project_identifiers/revert_instance_to_classic_ids_job.rb:47) also skips numerics via the Ruby method, so the rows SQL misses are the same rows the job would skip anyway. But the scope is now a public class method on Project and the next caller won't carry that guarantee.
Verification
CLASSIC_FORMAT_CHARS = /[a-z0-9\-_]+/
CLASSIC_FORMAT = /\A(?!\d+\z)#{CLASSIC_FORMAT_CHARS}\z/
sql_anchored = /^#{CLASSIC_FORMAT_CHARS.source}$/
# input ruby_classic sql_classic divergence
# "my-project" true true
# "PROJ" false false
# "MY_PROJ_1" false false
# "123" false true DIVERGES
# "abc_42" true trueSimplest fix: drop CLASSIC_FORMAT_CHARS entirely, encode the lookahead in one SQL regex. Postgres ~/!~ supports (?!…).
| def with_non_classic_identifier | |
| where("identifier !~ ?", "^#{CLASSIC_FORMAT_CHARS.source}$") | |
| def with_non_classic_identifier | |
| where("identifier !~ ?", '^(?!\\d+$)[a-z0-9_-]+$') | |
| end |
There was a problem hiding this comment.
I think the lookahead doesn't work for SQL in this query 👀 and if nothing else, it's a bit faster without it.
There was a problem hiding this comment.
But, interestingly enough, we may want to include all-numeric IDs here for projects created before 2021.
(Though, that will require adding a little bit of extra error handling for the back-conversion, because conversion back to 123 would yield a validation error.)
| if pending_count.zero? | ||
| I18n.t("#{scope}.zero") | ||
| else | ||
| I18n.t(scope, count: pending_count) | ||
| end |
There was a problem hiding this comment.
🍎 I18n already routes count: 0 to the zero: key — drop the manual branch
Rails I18n auto-resolves zero: under a pluralized entry when count: 0 is passed. The explicit if pending_count.zero? + "#{scope}.zero" string interpolation is reimplementing existing behaviour, and it reads as if zero were a peer key rather than a CLDR plural form.
Verification
# /tmp/i18n_check.rb — store_translations with {zero, one, other}
count=0: Finalizing conversion...
count=1: 1 project remaining.
count=7: 7 projects remaining.If you add a parallel zero: to classic_conversion_status_message (see comment on config/locales/en.yml), the whole method collapses to a symmetric single call:
| if pending_count.zero? | |
| I18n.t("#{scope}.zero") | |
| else | |
| I18n.t(scope, count: pending_count) | |
| end | |
| if pending_count.zero? | |
| I18n.t("#{scope}.zero") | |
| else | |
| I18n.t(scope, count: pending_count) | |
| end |
→ becomes just I18n.t(scope, count: pending_count).
There was a problem hiding this comment.
Interesting, I thought that automatic routing doesn't actually work and 0 gets direct to other. Will re-check.
| classic_conversion_status_message: | ||
| one: Project identifiers are being converted to classic format. 1 project remaining. | ||
| other: Project identifiers are being converted to classic format. %{count} projects remaining. | ||
| workflows: | ||
| tabs: |
There was a problem hiding this comment.
🍎 Add zero: to classic_conversion_status_message for symmetry
Forward direction has a zero: finalizing message; reverse direction doesn't. The poll can render count: 0 against the other plural ("0 projects remaining") for one tick — Project.with_non_classic_identifier.count can hit zero before IdentifierAutofix.reversion_in_progress? flips. Adding the parallel key also unlocks collapsing the Ruby branch in identifier_settings_form_component.rb#in_progress_banner_message.
| classic_conversion_status_message: | |
| one: Project identifiers are being converted to classic format. 1 project remaining. | |
| other: Project identifiers are being converted to classic format. %{count} projects remaining. | |
| workflows: | |
| tabs: | |
| classic_conversion_status_message: | |
| zero: Finalizing conversion to classic format... | |
| one: Project identifiers are being converted to classic format. 1 project remaining. | |
| other: Project identifiers are being converted to classic format. %{count} projects remaining. |
| I18n.t("admin.settings.work_packages_identifier.in_progress.classic_conversion_status_message", | ||
| count: pending_count) | ||
| else | ||
| pending_count = ProjectIdentifiers::PendingProjectsFinder.project_ids.size |
There was a problem hiding this comment.
I'm not sure the proposal here on adopting a cache is valid- as some projects will take less than 10s and afaik, cache is an opt-in; not guaranteed to be present. But the indexing point is sth worth exploring?
🍊 PendingProjectsFinder.project_ids is three unindexed queries every 5s
PendingProjectsFinder.project_ids (app/services/project_identifiers/pending_projects_finder.rb:40-58 — not in this PR) materializes three Sets per call:
ProblematicIdentifiers#scope.ids— 4 OR'd regex predicates onprojects.identifier(no functional index; the file's own perf note acknowledges this).WorkPackage.unsequenced.distinct.pluck(:project_id)— scan filtered onsequence_number IS NULL.WorkPackage.non_semantic.distinct.pluck(:project_id)—JOIN projects+ per-rowwork_packages.identifier IS DISTINCT FROM projects.identifier || '-' || sequence_number::text. No index supports the concat.
This runs every 5s (poll_for_changes_interval_value: 5000 below) for every admin polling the page, against the same work_packages rows the conversion job is mutating. On a large instance that's seconds of DB time per poll while the job is contending for the same rows.
A monotonically-decreasing counter doesn't need second-level freshness — short cache window is enough:
| pending_count = ProjectIdentifiers::PendingProjectsFinder.project_ids.size | |
| pending_count = Rails.cache.fetch("wp_identifier_pending_count", expires_in: 10.seconds) do | |
| ProjectIdentifiers::PendingProjectsFinder.project_ids.size | |
| end |
Alternative: add PendingProjectsFinder.count that aggregates the three buckets in one SQL union — the caller doesn't actually need the IDs, only the size.
| context "when reversion is in progress" do | ||
| before do | ||
| allow(ProjectIdentifiers::IdentifierAutofix).to receive(:reversion_in_progress?).and_return(true) | ||
| allow(Project).to receive(:with_non_classic_identifier).and_return(double(count: 3)) |
There was a problem hiding this comment.
🍊 Stubbing with_non_classic_identifier defeats the test
allow(Project).to receive(:with_non_classic_identifier).and_return(double(count: 3)) stubs out the very SQL scope this PR introduces. If the regex breaks (see the asymmetry comment on identifier.rb), this spec still passes.
The same file uses real create_list/shared_let data elsewhere — let the scope run for real here too. Even stronger: add a dedicated spec/models/projects/identifier_spec.rb#with_non_classic_identifier covering edge cases (all-digit, uppercase, mixed-case, underscores) — the regex split is exactly the surface that wants direct coverage.
| allow(Project).to receive(:with_non_classic_identifier).and_return(double(count: 3)) | |
| create_list(:project, 3, identifier: "PROJ#{_1}") |
(And drop the allow(Project).to receive(:with_non_classic_identifier)… stub.)
The forward-direction stub on PendingProjectsFinder.project_ids (lines 58, 67, 76) is fine — it's a module-level pure function, not an AR scope.
| it "returns 200 with a turbo stream replacing the form component in change_in_progress state" do | ||
| get :status, format: :turbo_stream | ||
|
|
||
| expect(response).to have_http_status(:ok) |
There was a problem hiding this comment.
🍊 Both #status examples assert the same turbo-stream wrapper — the state difference isn't exercised
The in-progress branch (state: :change_in_progress) and the completed branch (state: :completed) render different components, but both assertions check only that some turbo-stream replaces the wrapper. A future change that returns :edit while in progress, or swaps the component class entirely, ships green.
Assert against the rendered body for the meaningful difference:
| expect(response).to have_http_status(:ok) | |
| expect(response).to have_http_status(:ok) | |
| expect(response).to have_turbo_stream(action: "replace", target: component_target) | |
| expect(response.body).to include("projects remaining").or include("Finalizing conversion") |
Mirror with Successfully updated work package identifier format. in the completed example (line 60-62).
| # Simple character set meant only for SQL matching. No anchoring or anti-all-numeric guard. | ||
| CLASSIC_FORMAT_CHARS = /[a-z0-9\-_]+/ |
There was a problem hiding this comment.
🍏 Comment reads as if the SQL is unanchored
Simple character set meant only for SQL matching. No anchoring or anti-all-numeric guard.
The constant itself isn't anchored, but the only caller (line 126) adds ^…$ inline — so the comment narrates a property of this Regexp object that's invisible at the use site. If the regex-asymmetry comment is taken and CLASSIC_FORMAT_CHARS is removed, this disappears with it; otherwise tighten to "Character class only — callers anchor."
Ticket
https://community.openproject.org/projects/stream-jira-exit/work_packages/74903/activity
What are you trying to accomplish?
Show a very basic progress indicator: Number of remaining projects to convert.
Screenshots
What approach did you choose and why?
Merge checklist