-
Notifications
You must be signed in to change notification settings - Fork 800
SOLR-17725: CoreAdmin API to upgrade an index in-place to facilitate Solr upgrade across major versions #3903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
First (half-baked) draft. WIP. |
…ion for custom update.chain
3085bea to
daaaecb
Compare
|
Looking good. Basic (manual) tests with /admin/cores?action=UPGRADECOREINDEX working fine. Will publish response with segment status before and after index upgrade. Also need to add tests. |
… has been renamed. It's being used in this feature..
|
Sample response in synchronous mode (/admin/cores?action=UPRGADECOREINDEX&core=techproducts) Sample response in async mode ("/admin/cores?action=UPRGADECOREINDEX&core=techproducts&async=request_id" and checking status with action=REQUESTSTATUS) |
|
@dsmiley @broustant @janhoy This is ready for review. Would appreciate your inputs. Thanks! |
…keep forbiddenApisTest happy
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
Thanks for working on this!
@broustant you may find some of this rather familiar from index encryption
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| public class UpgradeCoreIndex extends CoreAdminAPIBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always add at least a line of javadoc to new public classes.
|
|
||
| @Schema(description = "Request ID to track this action which will be processed asynchronously.") | ||
| @JsonProperty | ||
| public String async; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @gerlowskija might review the V2 API spect.
But what I notice here is this async --- I'm suspicious as I figure it'd be handled in a more general way without each command needing to redundantly specify it, which is basically all commands.
| coreName, | ||
| requestBody.async, | ||
| "upgrade-index", | ||
| () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a stylistic point but this lambda is way too big; please use another technique like define a method, and call it with a one-liner lambda.
| () -> { | ||
| try (SolrCore core = coreContainer.getCore(coreName)) { | ||
|
|
||
| log.info("Received UPGRADECOREINDEX request for core: {}", coreName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging at request level seems like a framework infrastructure level concern (probably already logged), not one unique to this particular command
| int numSegmentsEligibleForUpgrade = 0, numSegmentsUpgraded = 0; | ||
| try { | ||
| iwRef = core.getSolrCoreState().getIndexWriter(core); | ||
| if (iwRef != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this can't be null?!
| if (iwRef != null) { | ||
| IndexWriter iw = iwRef.get(); | ||
| if (iw != null && originalMergePolicy != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again; null checks needless
| if (resolvedChain != null) { | ||
| return resolvedChain; | ||
| } else { | ||
| log.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should throw an error (and not log)
| // Try to find chain configured in /update handler | ||
| String updateChainName = null; | ||
| SolrRequestHandler reqHandler = core.getRequestHandler("/update"); | ||
|
|
||
| NamedList initArgs = ((RequestHandlerBase) reqHandler).getInitArgs(); | ||
|
|
||
| if (initArgs != null) { | ||
| // Check invariants first | ||
| Object invariants = initArgs.get("invariants"); | ||
| if (invariants instanceof NamedList) { | ||
| updateChainName = (String) ((NamedList) invariants).get(UpdateParams.UPDATE_CHAIN); | ||
| } | ||
|
|
||
| // Check defaults if not found in invariants | ||
| if (updateChainName == null) { | ||
| Object defaults = initArgs.get("defaults"); | ||
| if (defaults instanceof NamedList) { | ||
| updateChainName = (String) ((NamedList) defaults).get(UpdateParams.UPDATE_CHAIN); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're trying too hard to find the default chain. If it's actually this hard, then this code should live elsewhere and we just call it. But I'm still suspicious. If it wasn't so late I'd go look.
|
|
||
| private boolean isIndexUpgraded(SolrCore core) throws IOException { | ||
|
|
||
| try (FSDirectory dir = FSDirectory.open(Path.of(core.getIndexDir())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woah woah... lets instead get the Directory from the factory from the core. See SolrIndexSplitter.
|
|
||
| private void doCommit(SolrCore core) throws IOException { | ||
| try (LocalSolrQueryRequest req = new LocalSolrQueryRequest(core, new ModifiableSolrParams())) { | ||
| CommitUpdateCommand cmd = new CommitUpdateCommand(req, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a little EOL comment on what "false" means
|
@dsmiley Thanks for an initial review. I am working on addressing the review comments. |
https://issues.apache.org/jira/browse/SOLR-17725
Description
Provide a /admin/cores api (action=UPGRADECOREINDEX) to upgrade older lucene segments to latest version (by targeted reindexing). Calling this REST endpoint on an index created in version X-1 (assuming you are on Solr version X) would reindex documents in any segments of the older version (X-1) so that they are re-written in version X. This would help prepare the index for when Solr is upgraded to X+1 without having to recreate the index from source (as is the requirement today).
Solution
This API helps to upgrade an index where all fields are either stored or docValues enabled or copy field target.
Here's the algorithm:
Set the merge policy as LatestVersionMergePolicy (which is a filter merge policy that delegates to whichever policy is actual defined in solrconfig or the TieredMergePolicy by default). This policy filters out any segments with minVersion=X-1 and only allows the latest version (X) segments to participate in merges. This ensures that the older version segments don't merge and "pollute" the index over the course of the reindexing.
Iterate over the segments and for any segment which is a pure X-1 version segment (result of flush; minVersion=X-1, version=X-1) or a result of merge between version X-1 and X segments (minVersion=X-1, version=X), reindex all live documents in the segment.
commit and validate that all segments in the index have been converted to version X.
Reset the merge policy back to the original merge policy.
This works because after the change in apache/lucene#14607 (and the backport to 10x) opening an index no longer cares about which version the index was originally created in. Lucene just needs all of the segments to be of version X or later in order to be able to open the index in version X+1 (even though the index could have been originally created in version X-1)
[AI Coding Tools Disclosure: Only use of AI coding tools was in the tests (UpgradeCoreIndexActionTest.java). Claude Code and OpenAI Codex were used to generate, review, iterate and fix the tests mentioned in the "Tests" section. The same were also manually reviewed thereafter for correctness. ]
Tests
seg1 - pure 9x
seg2 - pure 10x
seg3 - minVersion 9x, version 10x (result of merge between 9x and 10x segements)
seg1 and seg3 get reindexed into new 10x segment, and get deleted. seg2 remains untouched (as expected).
DocValues-only fields are preserved during reindexing (this is tested specifically since we are reading docs from segments and decorating the SolrInputDocument with docValues thereafter)
older segments get deleted by the end of the process
async mode works fine. Tested by calling /admin/cores?action=UPGRADECOREINDEX&async=request_id. And later calling action=REQUESTSTATUS&requestid=request_id
For an index where all segments belong to the latest major version, calling the UPGRADECOREINDEX API is a no-op, and returns "upgradeStatus":"NO_UPGRADE_NEEDED" in the response.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.