Add support for re-adopting physical disks#10221
Conversation
67bdc40 to
ce4b577
Compare
This change implements the determinations in RFD 693. It allows re-adopting physical disks in the control plane after the control plane level disk in the `physical_disk` table is expunged. It does this by forcing manual adoption of disks by an operator, where requests are placed in the `physical_disk_adoption_request` table. A disk will now only be adopted or re-adopted by the disk adoption background task if its physical vendor/model/serial information is present in a `physical_disk_adoption_request` row. The typical flow for an operator is to list unitialized disks and then issue an adoption request via the external API.
ce4b577 to
edc851b
Compare
Nit: 663 |
Whoops. I even have it open in a tab. Thanks! |
ahl
left a comment
There was a problem hiding this comment.
To the best of my understanding, we have a couple of metaphors in here. Disks that are unknown to the control plane are "uninitialized" and appear in that list. The verb we're using is "adopt", as in "this uninitialized disk is adopted by the control plane". Is the term "adopt" intended to be the opposite of "expunge"?
I'm not clear on the how the operator would use this. Presumably there's supposed to be some step (e.g. of exploration or cognition) between "list uninit disks" and "approve disk(s)", but I'm not sure what it is.
We don’t want to allow automatic disk adoption due to the risk of the insertion of malicious hardware during casual physical access. This is especially problematic before we have disk attestation support, and in the case of existing sleds with empty disk bays.
Presumably I want to make sure the hardware I just put into the U.2 bay is the same as what I'm about to adopt. How do I do that?
The API changes look fine; I'd ask you think about nomenclature ("uninitialized" "adopt").
| query: Query<PaginationParams<EmptyScanParams, String>>, | ||
| ) -> Result< | ||
| HttpResponseOk< | ||
| ResultsPage<latest::physical_disk::UninitializedPhysicalDisk>, |
There was a problem hiding this comment.
if I expunge a disk, I think that changes both the policy and state properties (I'm not sure why there are both of these--is one intent and the other is status?), does the disk immediately show up in this list?
Do disks show up in only one place or the other? Do some show up in both?
There was a problem hiding this comment.
Expunge happens immediately, and is a terminating enum variant. It's the intention of the desired state. Once it occurs we can assume it will never change and most things just look at expunge. Decommissioned state occurs after other steps, when the intended state is realized. So there is some delay there.
Is the idea that one would do this by comparing the manufacturer/model number/serial listed by the API endpoint with those physically printed on the actual disk, and also based on foreknowledge that disks have or have not been inserted in specific locations at the current point in time? |
| pub async fn physical_disk_adoptable_list( | ||
| &self, | ||
| opctx: &OpContext, | ||
| inventory_collection_id: CollectionUuid, | ||
| ) -> ListResultVec<InvPhysicalDisk> { |
There was a problem hiding this comment.
i thought about suggesting that this be paginated, but...do we expect the maximum number of rows to be limited by the physical fact that 32 sleds * 10 U.2s - 320 disks maximum?
There was a problem hiding this comment.
There's not really a good way to paginate this right now AFAIK.
| /// A physical disk that has not yet been adopted by the control plane | ||
| #[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, JsonSchema)] | ||
| pub struct UninitializedPhysicalDisk { | ||
| pub sled_id: SledUuid, |
There was a problem hiding this comment.
do we expect the client to hydrate this sled UUID into a sled's physical location? it seems like it would be desirable for a UI listing physical disks that need to be adopted to be able to say which sled they are in as well as the slot within that sled...
There was a problem hiding this comment.
Good question. I suppose we could also provide the sled cubby. That would help operators out a bit probably.
There was a problem hiding this comment.
It's not guaranteed we have this, right? Uninitalized physical disks come from sled-agent inventory, and sled-agent doesn't know its own cubby number. We could try to match up the sled serial against the SP inventory contents to identify a cubby, and that will usually work, but we'd still need to be able to represent "physical disk for sled X for cubby I Dunno Ask Again Later".
Yes, basically, if they actually cared. I think the larger security issue mitigated is that any disk inserted can only be activated by an operator and not just adopted automatically for usage. Checking the serial is how they would know which disk they were validating against. |
| ), | ||
| ), | ||
| ) | ||
| // Ensure that each inventory disk has a valid adoption request |
There was a problem hiding this comment.
Is there a precondition that "you cannot have an adoption request for an already in-service disk"?
(Having already-in-service disks show up here seems wrong, just confirming what prevents that. I think the answer is "yes, a non-deleted adoption request means you have no live disks here")
There was a problem hiding this comment.
Yes, that's a precondition.
It's not necessarily the opposite of
As Eliza noted, an operator would have to look at the vendor/model/serial on the drive and compare it to what shows up in the API.
I'm open to changing this. |
| /// A physical disk that has not yet been adopted by the control plane | ||
| #[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, JsonSchema)] | ||
| pub struct UninitializedPhysicalDisk { | ||
| pub sled_id: SledUuid, |
There was a problem hiding this comment.
It's not guaranteed we have this, right? Uninitalized physical disks come from sled-agent inventory, and sled-agent doesn't know its own cubby number. We could try to match up the sled serial against the SP inventory contents to identify a cubby, and that will usually work, but we'd still need to be able to represent "physical disk for sled X for cubby I Dunno Ask Again Later".
@smklein and I have been talking through disk replacement scenarios a bit from the fault management context, and I think the adoption requests will eventually be part of the service flow for disk replacements. I think in particular, we would really like it if the adoption requests could easily include the physical location of the disk as part of a "you just replaced the disk in sled 19 slot 3, okay yeah it's that one" kinda spot check. |
I just talked about this with John a bit and if you don't like the term |
|
I'm good with whatever of those you choose. I appreciate you discussing. |
As @jgallagher pointed out earlier today, we don't actually have this information in sled-agent inventory. What happens if the sp inventory doesn't exist for this collection? Then we can't list the sled as uninitialized or we always need to carry around an option here. Would it make sense to tell a customer: "Hey, actually I don't know what cubby this disk is in right now." or "This sled is not uninitialized" event though the customer just inserted it into the rack and expects to see it? |
|
One thing I just realized was that with the current code, we will no longer automatically adopt disks when sleds are added to a rack. I confirmed with @rmustacc that this was not what he intended. Unfortunately, I'm not immediately sure how to fit this behavior in with the current one. Disks are detected asynchronously after a sled is added and currently adopted by the background task automatically. The new behavior ensures that a disk adoption request is made by a user to allow adoption by the background task, but the adoption itself is still done asynchronously in the background task and is separate from the sled add request. What we would want is a state attached to the sled that says automatically adopt disks that were present when the sled was added to the rack. But we don't really have any mechanism to discover that information. I think the best we can do with the current code base is say something like "automatically adopt disks for this sled for 5 minutes" after it has been created. Given inventory delays this is also problematic as a disk the customer expected to be adopted that was in the sled when it was added may not actually get adopted. That's a terrible user experience. We could lessen the likelihood of non-adoption by increasing this window, but that now lengthens the time when casual physical attack is possible by inserting arbitrary disks. The only other thing I can think of doing is adding disk information to the |
NOT TO BE ANNOYING BUT: I really don't like "import" in this context, it feels like it is too easily misconstrued as "import the data that was on this disk", which is not what one would expect to be offered but which is a somewhat conceivable thing that might occur. |
NOT ANNOYING AT ALL!!! I appreciate the feedback. I'm not a huge fan of import either. I still like I think to make the metaphors less mixed I'd switch to listing |
|
Thanks for the reviews @smklein @hawkw @jgallagher I believe I've addressed everything. Unfortunately, I discovered an issue that should probably be resolved before merge or explicitly decided not to implement: #10221 (comment) |
There are further problems that make the implementation next to impossible to do in an ideal manner.
We really seem to be restricted to a time based setup, or forcing manual disk adoption at all times. |
|
@andrewjstone and I chatted about this a bit offline. Recording some of our thoughts here:
|
Based on discussion in update huddle last week, we decided that to move forward we would auto-adopt disks that haven't been part of the control plane before. c9618fc makes this change. Importantly it makes this change by inserting new disks into the |
|
This is ready for a re-review @jgallagher @smklein. I still need to test it on hardware which I'll do after feedback. I'd like to test either Thursday afternoon or Friday. Thanks! |
There was a problem hiding this comment.
This was preexisting but since we're here, this is also an update
| opctx.authorize(authz::Action::Modify, &authz::FLEET).await?; |
When we have an inventory disk we pass that in to `new` and set the fields inside that method. This prevents passing strings in the wrong order. However, most callse to `PhysicalDisk::new` are in tests. Rather than forcing the creation of an intermediate type we add a new `from_parts` method to allow more verbose construction.
| } | ||
|
|
||
| Ok(()) | ||
| // Idempotent case: an adoption request already exists for this disk. |
There was a problem hiding this comment.
I'm not a big fan of "doing a transaction, then immediately doing another operation" - it makes this function non-atomic.
Suppose you try to insert a request, and see the unique violation. Normally we'd read the value from the DB (non transactionally!), but what if a concurrent DELETE happened? We'd proceed to the SELECT below, but we wouldn't find a request.
Then the physical_disk_enable_adoption function is returning "NOT FOUND"? that's weird, right?
Couldn't we catch the unique violation and do this query within the transaction?
There was a problem hiding this comment.
Thanks for the review @smklein. I agreed with you here and made this change. Unfortunately it doesn't work. Once you hit a unique constraint error the transaction gets aborted. According to claude I can try to use a savepoint, then rollback to that savepoint and do the read. But this seems like it may lead to the same weirdness:
Inside the transaction there is a uniqueness error, so we rollback. Before the rollback someone deletes the adoption request that caused the rollback. We try to read it and return a not found error.
I think I prefer leaving things as is, as it's at least less complex to read. Since only one operator should be driving, there's very little likelihood of a request being added and deleted simultaneously.
There was a problem hiding this comment.
huh, I'm surprised by that. I would have figured that the insert_into would be failing when we hit that constraint (it's an INSERT error, right??) and so if we call .get_result_async(&conn).await?, we'd return that error, exit the function, and cause the transaction to abort.
However, if we handle that error instead by matching on the result, shouldn't we be able to convert the "error" case of that Result to Ok(request), using a synthetic request (basically, matching whatever we were about to insert)?
I don't think there's anything magical about "hitting an error" in a transaction which causes it to abort - since transactions in diesel are mapped to functions, we should be only aborting when we return from this closure with an error, right? So as long as we return Ok(request), it should avoid aborting, right?
EDIT: Re-reading this; we could also "do the read once we see that first error" within the transaction still, right? I'm realizing we don't actually want to create a synthetic request - but we should still be able to do the read afterwards, as long as we don't bail early with an error?
There was a problem hiding this comment.
Okay, following up, I suppose you're right. The transaction aborted state exists on the postgres side!
One caveat; we could do:
- "Optional" read, store the value
- Then insert
That way: if we hit the insert on-conflict case, we already read the value prior to hitting a DB error
There was a problem hiding this comment.
I thought the same thing, but it doesn't work. Cockroach sets the internal transaction state to aborted and you can't return OK after that. Here's the patch I made:
diff --git a/nexus/db-queries/src/db/datastore/physical_disk.rs b/nexus/db-queries/src/db/datastore/physical_disk.rs
index f503e0343..d6d975a0b 100644
--- a/nexus/db-queries/src/db/datastore/physical_disk.rs
+++ b/nexus/db-queries/src/db/datastore/physical_disk.rs
@@ -177,68 +177,72 @@ impl DataStore {
}
// Insert the adoption request.
- let request = diesel::insert_into(
+ let res = diesel::insert_into(
adoption_dsl::physical_disk_adoption_request,
)
.values((
adoption_dsl::id.eq(Uuid::new_v4()),
- adoption_dsl::vendor.eq(vendor),
- adoption_dsl::serial.eq(serial),
- adoption_dsl::model.eq(model),
+ adoption_dsl::vendor.eq(vendor.clone()),
+ adoption_dsl::serial.eq(serial.clone()),
+ adoption_dsl::model.eq(model.clone()),
adoption_dsl::time_created.eq(Utc::now()),
))
.returning(PhysicalDiskAdoptionRequest::as_returning())
.get_result_async(&conn)
- .await?;
-
- Ok(request)
+ .await;
+
+ let diesel_err = match res {
+ Ok(request) => return Ok(request),
+ Err(e) => e,
+ };
+
+ match diesel_err {
+ // Check for a unique index violation for an active
+ // adoption request with the same vendor/serial/model.
+ //
+ // Return the existing request in this case as the
+ // request is idempotent.
+ DieselError::DatabaseError(
+ DieselErrorKind::UniqueViolation,
+ _,
+ ) => {
+ // Idempotent case: an adoption request already exists for this disk.
+ // Query it back and return it.
+ adoption_dsl::physical_disk_adoption_request
+ .filter(
+ adoption_dsl::vendor.eq(vendor),
+ )
+ .filter(
+ adoption_dsl::serial.eq(serial),
+ )
+ .filter(
+ adoption_dsl::model.eq(model),
+ )
+ .filter(
+ adoption_dsl::time_deleted.is_null(),
+ )
+ .select(
+ PhysicalDiskAdoptionRequest::as_select(
+ ),
+ )
+ .first_async(&conn)
+ .await
+ }
+ e => Err(e),
+ }
}
})
.await;
match txn_res {
- Ok(request) => return Ok(request),
+ Ok(request) => Ok(request),
Err(e) => match err.take() {
// A called function performed its own error propagation.
- Some(txn_error) => {
- return Err(txn_error.into_public_ignore_retries());
- }
+ Some(txn_error) => Err(txn_error.into_public_ignore_retries()),
// The transaction setup/teardown itself encountered a diesel error.
- None => match e {
- // Check for a unique index violation for an active
- // adoption request with the same vendor/serial/model.
- //
- // Return the existing request in this case as the
- // request is idempotent.
- DieselError::DatabaseError(
- DieselErrorKind::UniqueViolation,
- _,
- ) => {
- // Fall through to query the existing request below.
- }
- _ => {
- return Err(public_error_from_diesel(
- e,
- ErrorHandler::Server,
- ));
- }
- },
+ None => Err(public_error_from_diesel(e, ErrorHandler::Server)),
},
}
-
- // Idempotent case: an adoption request already exists for this disk.
- // Query it back and return it.
- let conn = &*self.pool_connection_authorized(opctx).await?;
- let request = adoption_dsl::physical_disk_adoption_request
- .filter(adoption_dsl::vendor.eq(disk_id.vendor))
- .filter(adoption_dsl::serial.eq(disk_id.serial))
- .filter(adoption_dsl::model.eq(disk_id.model))
- .filter(adoption_dsl::time_deleted.is_null())
- .select(PhysicalDiskAdoptionRequest::as_select())
- .first_async(conn)
- .await
- .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))?;
- Ok(request)
}
/// Stores a new physical disk in the database.And here's the error I got back from the idempotent test:
thread 'db::datastore::physical_disk::test::physical_disk_adoptable_list' (2) panicked at nexus/db-queries/src/db/datastore/physical_disk.rs:1590:14:
adoption request succeeds: InternalError { internal_message: "unexpected database error: current transaction is aborted, commands ignored until end of transaction block" }
```
There was a problem hiding this comment.
Or, if we still want this fast on the "no conflict case":
self.transaction_retry_wrapper("physical_disk_enable_adoption")
.transaction(&conn, |conn| async move {
// Try the insert. If a row with the same unique key already
// exists, this returns Ok(None) instead of raising a unique
// violation — so the txn stays healthy.
let inserted = diesel::insert_into(dsl::physical_disk_adoption)
.values(&new_request)
.on_conflict(dsl::physical_disk_id) // or whatever key
.do_nothing()
.returning(PhysicalDiskAdoption::as_returning())
.get_result_async(conn)
.await
.optional()?;
// On conflict, fetch the existing row in the same txn —
// atomic w.r.t. concurrent DELETEs, no NotFound race.
let row = match inserted {
Some(r) => r,
None => {
dsl::physical_disk_adoption
.filter(dsl::physical_disk_id.eq(disk_id))
.filter(dsl::time_deleted.is_null())
.select(PhysicalDiskAdoption::as_select())
.get_result_async(conn)
.await?
}
};
Ok(row)
})
.awaitThere was a problem hiding this comment.
Thanks Sean. I'll give this strategy a try. I"m also curious what CRDB will do if we use .optional().
There was a problem hiding this comment.
I have now tried multiple ways to do what you suggested. Trying to use on_conflict((adoption_dsl::vendor, adoption_dsl::model, adoption_dsl::serial)) doesn't work because it doesn't include the WHERE clause in the index, and diesel doesn't support adding where clauses in on_conflict. I get the following error:
thread 'db::datastore::physical_disk::test::physical_disk_adoptable_list' (2) panicked at nexus/db-queries/src/db/datastore/physical_disk.rs:1565:14:
adoption request succeeds: InternalError { internal_message: "unexpected database error: there is no unique or exclusion constraint matching the ON CONFLICT specification" }
I then tried to use: .on_conflict(on_constraint("physical_disk_adoption_request_by_physical_id"))
but that also doesn't work with partial indexes. I ended up with he following error:
thread 'db::datastore::physical_disk::test::physical_disk_unadopted_list' (2) panicked at nexus/db-queries/src/db/datastore/physical_disk.rs:1368:14:
adoption request succeeds: InternalError { internal_message: "unexpected database error: unique constraint \"physical_disk_adoption_request_by_physical_id\" for table \"physical_disk_adoption_request\" is partial, so it cannot be used as an arbiter via the ON CONSTRAINT syntax" }
I'm pretty much out of ideas here besides trying to break out to raw sql, which I don't think is necessarily worth it and I'm not sure will actually help.
There was a problem hiding this comment.
@jgallagher suggested doing the lookup first and then if it's not there doing the insert. It's slightly more expensive, but that should fix this. I'm going to give it a try.
There was a problem hiding this comment.
@jgallagher suggested doing the lookup first and then if it's not there doing the insert. It's slightly more expensive, but that should fix this. I'm going to give it a try.
This worked!
smklein
left a comment
There was a problem hiding this comment.
LGTM modulo one last comment about transaction atomicity expectations
|
Expunging and re-adding a disk succeeded. Show disks in omdb before expungeShow unadopted disks via API before expunge➜ oxide.rs git:(main) ✗ target/debug/oxide --profile recovery3 api '/v1/system/hardware/disks-unadopted' { Expunge first disk in the list via omdbShow unadopted disks after the expungeRequest an adoption of the disk via APIWait for disk to be adopted and list unadopted again via APIShow that the disk exists with a different ID in omdb:Show that there is a deleted adoption request in the DBShow that there are two rows for the given physical disk, with one active.I still need to test the expunge sled path, which I will do after dinner. |
|
I forgot to delete the zpool before adding back the disk, and as expected we see: Then I removed the zpool via And, yay, successfully reconciled! |
This change implements the determinations in RFD 663. It allows
re-adopting physical disks in the control plane after the control plane
level disk in the
physical_disktable is expunged.It does this by forcing manual adoption of disks by an operator, where
requests are placed in the
physical_disk_adoption_requesttable.A disk will now only be adopted or re-adopted by the disk adoption
background task if its physical vendor/model/serial information is
present in a
physical_disk_adoption_requestrow.The typical flow for an operator is to list uninitialized disks and then
issue an adoption request via the external API.