-
Notifications
You must be signed in to change notification settings - Fork 26
baseline.login result doesn't match job log #244
Description
This is an intermittent issue, in most cases the baseline.login test case result is correct. It should be PASS when login succeeded. However, it is sometimes reported as PASS even though the login did not succeed.
For example, this job:
https://kernelci.org/test/plan/id/5f02ef215c51b2c46485bb2d/
shows only the baseline-uefi.login test case, others are missing because the kernel didn't reach user-space.
It should have been set to FAILED since there was a kernel panic and it didn't reach login as can be seen in the full job log:
https://storage.kernelci.org/next/master/next-20200706/arm/multi_v7_defconfig/gcc-8/lab-collabora/baseline-uefi-qemu_arm-virt-gicv2.html
09:25:04.711763 <1>[ 1.813286] 8<--- cut here ---
09:25:04.711951 <1>[ 1.813418] Unhandled fault: page domain fault (0x01b) at 0x00000000
09:25:04.712059 <1>[ 1.813637] pgd = (ptrval)
09:25:04.712545 Matched prompt kernelci/kernelci-frontend#1: (Unhandled fault.*)\r\n
09:25:04.712836 Setting prompt string to ['/ #']
09:25:04.712987 end: 2.2 auto-login-action (duration 00:00:03) [common]
09:25:04.713306 start: 2.3 expect-shell-connection (timeout 00:04:57) [common]
09:25:04.713415 Setting prompt string to ['/ #']
09:25:04.713515 Forcing a shell prompt, looking for ['/ #']
09:25:04.764033 <1>[ 1.813727] [00000000] *pgd
09:25:04.764212 expect-shell-connection: Wait for prompt ['/ #'] (timeout 00:05:00)
09:25:04.764336 Waiting using forced prompt support (timeout 00:02:30)
09:25:04.764527 =00000000
09:25:04.764622 <0>[ 1.814136] Internal error: : 1b [#1] SMP ARM
09:29:05.333341 ShellCommand command timed out.: Sending # in case of corruption. Connection timeout 00:05:00, retry in 00:00:30
09:29:05.333574 pattern: ['/ #']
09:29:05.434391 #
09:29:35.519458 ShellCommand command timed out.: Sending # in case of corruption. Connection timeout 00:05:00, retry in 00:00:30
09:29:35.519706 pattern: ['/ #']
09:29:35.620520 #
09:30:01.713627 end: 2.3 expect-shell-connection (duration 00:04:57) [common]
09:30:01.714008 boot-image-retry failed: 1 of 1 attempts. 'expect-shell-connection timed out after 297 seconds'
09:30:01.714287 end: 2 boot-image-retry (duration 00:05:00) [common]
The same issue occurred with the same job run in all the labs, so it seems to be fully reproducible. It may either be an issue in LAVA itself or potentially in kernelci-backend when parsing the callback results, maybe when a kernel panic was also detected by LAVA.
Here's the original LAVA job for the example above:
https://lava.collabora.co.uk/scheduler/job/2482262
and the auto-login-action test case is marked as PASS, the job failed at the expect-shell-connection stage:
https://lava.collabora.co.uk/results/2482262/lava
It seems look like auto-login-action was wrongly set to PASS, maybe as a side-effect of detecting a kernel panic.