-
Notifications
You must be signed in to change notification settings - Fork 66
Open
Labels
customerFor any bug reports or feature requests tied to customer requestsFor any bug reports or feature requests tied to customer requests
Milestone
Description
i tried to create an instance on dogfood after updating this week and it ended up stuck in Starting.
switch 0 on dogfood seems in some kind of nebulous unhappy state (issue to come: short of it, we accidentally filled the switch zone with a core file when copying an old one out, and everything there went sideways), but separately the instance-start saga for this instance seems stuck in instance_start.dpd_ensure:
root@oxz_switch1:~# /tmp/omdb-saga db sagas show 727d4812-9383-4df3-985b-0c1bce68d5ad
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:109::3]:32221,[fd00:1122:3344:105::3]:32221,[fd00:1122:3344:10b::3]:32221,[fd00:1122:3344:107::3]:32221,[fd00:1122:3344:108::3]:32221/omicron?sslmode=disable
WARN: found schema version 144.0.0, expected 7.0.0
It's possible the database is running a version that's different from what this
tool understands. This may result in errors or incorrect output.
id | time_created | name | state
--------------------------------------+--------------------------------+----------------+--------------------------
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.112472 UTC | instance-start | SagaCachedState(Running)
saga id | event time | node id | event type | data
------------------------------------ | ------------------------------ | ---------------------------------------- | ---------- | ---
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.120631 UTC | 10: start | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.126739 UTC | 10: start | succeeded |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.130584 UTC | 0: instance_start.generate_propolis_id | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.134919 UTC | 0: instance_start.generate_propolis_id | succeeded | "b5bf8281-09fc-43e1-b12c-c91c0bb18543"
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.138944 UTC | 1: instance_start.alloc_server | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.179417 UTC | 1: instance_start.alloc_server | succeeded | "b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88"
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.183787 UTC | 2: instance_start.alloc_propolis_ip | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.193104 UTC | 2: instance_start.alloc_propolis_ip | succeeded | "fd00:1122:3344:106::1:9b7"
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.196560 UTC | 3: instance_start.create_vmm_record | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.205571 UTC | 3: instance_start.create_vmm_record | succeeded | {"id":"b5bf8281-09fc-43e1-b12c-c91c0bb18543","instance_id":"bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4","propolis_ip":"fd00:1122:3344:106::1:9b7/128","propolis_port":12400,"runtime":{"gen":1,"state":"Creating","time_state_updated":"2025-05-23T00:24:21.200216Z"},"sled_id":"b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88","time_created":"2025-05-23T00:24:21.200216Z","time_deleted":null}
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.208748 UTC | 4: instance_start.mark_as_starting | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.314530 UTC | 4: instance_start.mark_as_starting | succeeded | {"auto_restart":{"cooldown":null,"policy":null},"boot_disk_id":null,"hostname":"ixi-600g-mem","identity":{"description":"beeeeeg memory (shouldn't panic a sled, probably)","id":"bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4","name":"ixi-600g-mem","time_created":"2025-05-23T00:24:19.204040Z","time_deleted":null,"time_modified":"2025-05-23T00:24:19.204040Z"},"intended_state":"Running","memory":644245094400,"ncpus":2,"project_id":"9c4152f9-4317-4269-9018-66142964d21c","runtime_state":{"dst_propolis_id":null,"gen":3,"migration_id":null,"nexus_state":"Vmm","propolis_id":"b5bf8281-09fc-43e1-b12c-c91c0bb18543","time_last_auto_restarted":null,"time_updated":"2025-05-23T00:24:19.204040Z"},"updater_gen":1,"updater_id":null,"user_data":[]}
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.318499 UTC | 5: instance_start.dpd_ensure | started |
very unfortunately, enough of the instance's state was determined that we started by looking for a Propolis issue, and came up blank for a while even though it looks convincing from omdb:
root@oxz_switch1:~# omdb db instance info bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:109::3]:32221,[fd00:1122:3344:105::3]:32221,[fd00:1122:3344:10b::3]:32221,[fd00:1122:3344:107::3]:32221,[fd00:1122:3344:108::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (144.0.0)
== INSTANCE ====================================================================
ID: bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4
project ID: 9c4152f9-4317-4269-9018-66142964d21c
name: ixi-600g-mem
description: beeeeeg memory (shouldn't panic a sled, probably)
created at: 2025-05-23 00:24:19.204040 UTC
last modified at: 2025-05-23 00:24:19.204040 UTC
== CONFIGURATION ===============================================================
vCPUs: 2
memory: 600 GiB
hostname: ixi-600g-mem
boot disk: None
auto-restart:
InstanceAutoRestart {
policy: None,
cooldown: None,
}
== RUNTIME STATE ===============================================================
nexus state: Vmm
(i) external API state: Starting
intended state: running
last updated at: 2025-05-23T00:24:19.204040Z (generation 3)
needs reincarnation: false
karmic status: saṃsāra (reincarnation enabled)
last reincarnated at: None
active VMM ID: Some(b5bf8281-09fc-43e1-b12c-c91c0bb18543)
target VMM ID: None
migration ID: None
updater lock: UNLOCKED at generation: 1
== ACTIVE VMM ==================================================================
ID: b5bf8281-09fc-43e1-b12c-c91c0bb18543
instance ID: bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4
created at: 2025-05-23 00:24:21.200216 UTC
state: creating
updated at: 2025-05-23T00:24:21.200216Z (generation 1)
propolis address: fd00:1122:3344:106::1:9b7:12400
sled ID: b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88
at the very least, we probably should have timed out and failed the instance start at some point?
Metadata
Metadata
Assignees
Labels
customerFor any bug reports or feature requests tied to customer requestsFor any bug reports or feature requests tied to customer requests