ravanelli has quit [Remote host closed the connection]
ravanelli has joined #fedora-coreos
plarsen has joined #fedora-coreos
<vrothberg>
dustymabe: the change to the ready.service of podman-machine considerably improved the flake rate (see github.com/containers/podman/issues/17403). But ... I am still seeing the flake. What I observe is that always the first SSH attempt into the machine fails. That happens right after starting the machine (and receiving the signal from the ready.service).
<vrothberg>
Adding a sleep etc. resolves the issue. It seems there is still a race. Do you have any other tricks in your pockets? I am tempted to just do an exponential backoff to "wait" for SSH in the machine to be ready. But that feels like patching symptoms rather than fixing the underlying issue.
cyberpear has joined #fedora-coreos
job[m] has joined #fedora-coreos
<dustymabe>
vrothberg: I don't have any other ideas without spending more time digging into the problem in depth. It all depends on where the error is happening. Is it actually reaching the ssd process or is port 22 somehow not yet bound? If it's the former we'd need to look at the source code.
<dustymabe>
sometimes a sleep/retry is ugly, but it's practical. and at least the delay here should be really small.
<vrothberg>
dustymabe: Thanks for checking! I agree that a client-side check is probably the best way forward at the moment.
jpn has joined #fedora-coreos
<dustymabe>
vrothberg: sorry I couldn't be of more help :(
<dustymabe>
as a followup we were going to try to make it more generic and maybe hoist it up into coreos-ci-lib so the upstream CI projects would have the workaround
nalind has joined #fedora-coreos
saschagrunert has quit [Remote host closed the connection]
Betal has joined #fedora-coreos
Betal has quit [Client Quit]
Betal has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
<dustymabe>
marmijo[m]: I think the vexxhost issues settled since yesterday
jpn has joined #fedora-coreos
miabbott has joined #fedora-coreos
miabbott[m] has joined #fedora-coreos
<marmijo[m]>
dustymabe: That's good. I see we've had some successful kola openstack jobs.
<marmijo[m]>
bgilbert: It looks like the new release of ignition in rawhide is attempting to install `hv_utils` on ppc64le and s390x and causing the builds for those arches to fail.