<dustymabe>
the pipeline finally failed so I'll grab the logs from it now to see if there is anything
<dustymabe>
oh interesting
<jlebon>
yeah, let's attach logs to the tracker issue
<dustymabe>
in the pipeline run this time the `coreos.boot-mirror.luks/detach-primary` failed on the first go round and in the rerun it was `coreos.boot-mirror/detach-primary`
<dustymabe>
one message on the console that keeps scrolling by: "block device autoloading is deprecated and will be removed."
<dustymabe>
that message is also in my local logs (where the test passed)
<jlebon>
hmm, not sure what that's referring to
<dustymabe>
but in my local logs it takes upwards of 6 minutes to reboot that machine
<dustymabe>
so clearly the problem is there in my local tests too - just happens to be fast enough to not trigger the timeout
ravanelli has quit [Remote host closed the connection]
<dustymabe>
jlebon: basically it appears (in a RAID setup) after we delete the primary block device and then try to reboot that reboot gets hung up and can take a really long time.
<walters>
TIL centos ci is running 6 metal nodes in aws right now
ravanelli has joined #fedora-coreos
jpn has quit [Ping timeout: 244 seconds]
paragan has quit [Quit: Leaving]
<jlebon>
MichaelArmijo[m]: ok, there's a simpler approach. try rerunning the job and use 'basic' for the KOLA_TESTS parameter
<MichaelArmijo[m]>
jlebon: sounds good. I'll do that now
<jlebon>
the thing with parallelizing it is that I think we've been hitting capacity limits in AWS for aarch64
<jlebon>
dustymabe: haven't read scrollback yet. lots of meetings :)
<MichaelArmijo[m]>
jlebon: test restarted
<dustymabe>
I think the capacity limit doesn't have to do with `quota` though. it's just amazon running out of instances
jpn has joined #fedora-coreos
Betal has joined #fedora-coreos
jpn has quit [Quit: Lost terminal]
<MichaelArmijo[m]>
dustymabe: jlebon: I removed ppc64le from the build jobs, should I also remove that arch from the release job?
<dustymabe>
MichaelArmijo[m]: yes please
<MichaelArmijo[m]>
sounds good. thanks
ravanelli has quit [Remote host closed the connection]
crobinso has quit [Ping timeout: 268 seconds]
<jlebon>
dustymabe: re. quota, the comment was around whether we should change our cloud tests so that everything runs in parallel so that if e.g. a test in the regular kola run fails but passes in the rerun, we've still run all kola invocations so we can choose to ignore the one failure
<jlebon>
another approach is to remember the kola failure, but still keep going and still fail the overall job at the end
bytehackr has quit [Ping timeout: 255 seconds]
rsalveti has quit [Quit: Connection closed for inactivity]
<dustymabe>
jlebon: ahh I see what you are saying.. IOW the "followup" tests where we launch on a specific instance type (or types) would run alongside the main tests?