dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos
ravanelli has joined #fedora-coreos
jpn has joined #fedora-coreos
fifofonix has joined #fedora-coreos
jpn has quit [Ping timeout: 248 seconds]
fifofonix has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
jlebon has quit [Quit: leaving]
bgilbert_ is now known as bgilbert
<bgilbert> dustymabe: started testing-devel. waiting for the config-bot sync before starting next-devel.
jlebon has joined #fedora-coreos
<bgilbert> dustymabe: next-devel is running
jlebon has quit [Quit: leaving]
daMaestro has joined #fedora-coreos
ravanell_ has joined #fedora-coreos
ravanelli has quit [Ping timeout: 252 seconds]
gursewak has quit [Ping timeout: 246 seconds]
gursewak has joined #fedora-coreos
gursewak has quit [Ping timeout: 255 seconds]
ravanell_ has quit [Remote host closed the connection]
<bgilbert> looks like the failures were: an amd64 iso-offline-install SIGKILL (presumed OOM), s390x grub-users failures that I understand, and ppc64le grub-users-fix failures that I don't.
<bgilbert> the latter failed on both builds, and in both cases, the journal shows that the test actually passed (?)
<bgilbert> oh wow, looks like it twice timed out after the test succeeded but before the shutdown finished
daMaestro has quit [Quit: Leaving]
bytehackr has joined #fedora-coreos
<bgilbert> lucab: the new releases will need ^, and maybe a scratch testing-devel/next-devel build first to confirm that the tests are fixed
paragan has joined #fedora-coreos
bgilbert has quit [Ping timeout: 248 seconds]
piwu has quit [Quit: Bye!]
piwu has joined #fedora-coreos
<lucab> bgilbert: merged, and started two builds
jcajka has joined #fedora-coreos
saschagrunert has joined #fedora-coreos
saschagrunert has quit [Remote host closed the connection]
saschagrunert has joined #fedora-coreos
Betal has quit [Quit: WeeChat 3.7.1]
piwu1 has joined #fedora-coreos
piwu has quit [Read error: Connection reset by peer]
piwu1 is now known as piwu
klaas has quit [Quit: ZNC 1.8.2 - https://znc.in]
klaas has joined #fedora-coreos
jpn has joined #fedora-coreos
c4rt0 has joined #fedora-coreos
baaash[m] has quit [Quit: You have been kicked for being idle]
paragan has quit [Ping timeout: 248 seconds]
paragan has joined #fedora-coreos
Arkanterian has joined #fedora-coreos
Arkanterian has quit [*.net *.split]
piwu7 has joined #fedora-coreos
piwu has quit [Ping timeout: 252 seconds]
piwu7 is now known as piwu
klaas_ has joined #fedora-coreos
klaas has quit [Ping timeout: 252 seconds]
crobinso has joined #fedora-coreos
jpn has quit [Ping timeout: 246 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 255 seconds]
jpn has joined #fedora-coreos
nalind has joined #fedora-coreos
jpn has quit [Ping timeout: 248 seconds]
jpn has joined #fedora-coreos
plarsen has joined #fedora-coreos
<dustymabe> lucab: looks like we need to run `testing` and `stable` again from scratch :(
<dustymabe> that particular failure has been showing up in CI recently and we don't know why. It's intermittent. I asked some of the team to take a look at it earlier this week but no one has had a chance to figure it out yet. I'll open an issue for it here in an hour
piwu9 has joined #fedora-coreos
piwu has quit [Read error: Connection reset by peer]
piwu9 is now known as piwu
vgoyal has joined #fedora-coreos
<lucab> dustymabe: ah, all the artifacts were properly pushed, I was hoping we could proceed from there anyway
jpn has quit [Ping timeout: 252 seconds]
jlebon has joined #fedora-coreos
jpn has joined #fedora-coreos
mheon has joined #fedora-coreos
<lucab> dustymabe: but rerunning those, the new jobs finish immediately with "(no new build)". Should I force them?
jpn has quit [Ping timeout: 252 seconds]
<jlebon> (reading from chat logs) lucab: yeah, you'll need to force it. marmijo is looking at that failure
jpn has joined #fedora-coreos
<dustymabe> jlebon: mind a review on https://github.com/coreos/fedora-coreos-browser/pull/39 ?
<jlebon> stamped and deployed
ravanelli has joined #fedora-coreos
<dustymabe> jlebon: when you have a chance can you circle back on the locking PR: https://github.com/coreos/coreos-assembler/pull/3152 - I think it's pretty close
<jlebon> dustymabe: will do. currently reviewing 3154
<dustymabe> +1
jpn has quit [Ping timeout: 255 seconds]
ravanelli has quit [Remote host closed the connection]
gursewak has joined #fedora-coreos
fifofonix has joined #fedora-coreos
jpn has joined #fedora-coreos
fifofonix has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
fifofonix has joined #fedora-coreos
fifofonix has quit [Client Quit]
fifofonix has joined #fedora-coreos
<dustymabe> lucab: looks like s390x had failures
<dustymabe> stable just looks like a flake in one of the root reprovision tests
<dustymabe> lucab: I replayed them both - new jobs are running
saschagrunert has quit [Remote host closed the connection]
bgilbert has joined #fedora-coreos
piwu3 has joined #fedora-coreos
strigazi has quit [Quit: leaving]
strigazi has joined #fedora-coreos
nalind has quit [Ping timeout: 252 seconds]
piwu has quit [Ping timeout: 252 seconds]
piwu3 is now known as piwu
003AAJNHV has quit [Quit: You have been kicked for being idle]
nalind has joined #fedora-coreos
jcajka has quit [Remote host closed the connection]
<dustymabe> davdunc: you might know what's going on behind the scenes here: https://github.com/coreos/fedora-coreos-tracker/issues/1306#issuecomment-1300781645
<davdunc[m> I'll take a look right now.
<davdunc[m> I really hope they didn't back that out dustymabe
<dustymabe> I'm glad I wrote a test :)
paragan has quit [Quit: Leaving]
marmijo has joined #fedora-coreos
<dustymabe> aaradhak davdunc dustymabe gursewak jaimelm jbrooks jcajka jdoss jlebon jmarrero lorbus miabbott nasirhm ravanelli saqali skunkerk walters
<dustymabe> FCOS community meeting in #fedora-meeting-1
<dustymabe> If you don't want to be pinged remove your name from this file: https://github.com/coreos/fedora-coreos-tracker/blob/main/meeting-people.txt
<davdunc[m> dustymabe I won't make it to the meeting, but you can action me for the investigation on the nvme
<dustymabe> davdunc[m: :) - that's fine - i'm not too worried about it, just noting that it regressed again it seems
<dustymabe> lucab: testing s390x got past tests this time
<dustymabe> stable had another flake :( - when it rains it pours
<dustymabe> replayed it again
<lucab> dustymabe: yes, thanks. I'm watching for the flakes and keeping the tickets updated.
<dustymabe> lucab++
<zodbot> dustymabe: Karma for lucab changed to 2 (for the current release cycle): https://badges.fedoraproject.org/tags/cookie/any
poppajarv7 has joined #fedora-coreos
poppajarv has quit [Ping timeout: 248 seconds]
poppajarv7 is now known as poppajarv
mnguyen has joined #fedora-coreos
poppajarv has quit [Ping timeout: 248 seconds]
<bgilbert> the F36 container image includes openssl-libs and isn't updated yet, so a container rebuild that doesn't dnf upgrade won't have the fix
<bgilbert> I've asked in #fedora-releng about updating the image
poppajarv has joined #fedora-coreos
marmijo has quit [Quit: Client closed]
<lucab> How do we feel about thin-lto everywhere https://github.com/coreos/zincati/pull/877#issuecomment-1300835957?
<lucab> I'm personally fine with that, but I know coreos-installer is more sensitive than zincati et al. on that front.
<lucab> bgilbert jlebon ^^^
<bgilbert> after the second (or third, I don't recall) time that LTO broke us, I concluded that we shouldn't bother with it at all
<bgilbert> if it's not the Rust upstream default, it's more likely to have problems e.g. on non-x86, and LTO problems are annoying to debug
<walters> (Related PSA: be sure to enable `CARGO_INCREMENTAL=1` to have a pleasant Rust edit-compile-debug cycle)
<bgilbert> and all that is in exchange for basically no practical upside. we're not _that_ performance- or size-sensitive
<bgilbert> I'd argue pretty strongly that we have better things to do than chase linker regressions
c4rt0 has quit [Quit: Leaving]
<lucab> no doubt
c4rt0 has joined #fedora-coreos
<lucab> for me it was mostly on the size topic, as the binaries ends up in the base OS content
<bgilbert> we're also shipping a lot of Go :-P
c4rt0 has quit [Client Quit]
<lucab> so let's turn LTO fully off on zincati / afterburn / ssh-key-dir? (anything else?)
<jlebon> bgilbert: agreed, though re. your point about upstream, i did see recently: https://www.reddit.com/r/rust/comments/ycmqml/the_rust_compiler_is_now_compiled_with_thin_lto/
<bgilbert> well, if upstream changes the default, I wouldn't argue for turning it back off again
<jlebon> that's not the same commitment as changing the default, but at least there's dogfooding going on even at the compiler level
<bgilbert> +1
<walters> For things where perf really matters, it's not just LTO, you also want PGO and even BOLT, neither of which we're doing (PGO just totally breaks in the koji model)
<walters> For e.g. Firefox they absolutely need cross-language LTO even to make the Rust/C++ transition not terrible
<bgilbert> ssh-key-dir has LTO on? 😱
<bgilbert> without objection, I'd like to PR Afterburn and ssh-key-dir to drop the override
<lucab> we have a blanket "LTO on" for the **release** profile almost everywhere, I'd say
<bgilbert> my grep found those + rpm-ostree and zincati
<lucab> Colin Walters: ostree-ext is currently on thin-lto, do you want to turn that off too?
<walters> thin lto works fine with incremental builds
<walters> I am only battling against `lto = true` by default
<walters> the difference between 1 second and 1 minute is palpable
<lucab> ack, let's stick to only killing "fat" LTO everywhere then
<dustymabe> jlebon: regarding https://github.com/coreos/rpm-ostree/pull/4122 - do we need to consider silverblue and IoT here? i.e. should we try to get that fix promoted into the GA of 37 (meaning FE or Blocker bug)
<dustymabe> lucab: that s390x stable build is getting close to being done
<walters> (I upgraded my workstation to a new i9-13900k, it absolutely plows through compile jobs; building ostree-ext from scratch takes 23 seconds on this, over 4 minutes on my older i7-8665U laptop...so I now definitely notice a full minute for LTO)
<bgilbert> lucab walters: thanks for bringing this up
<lucab> dustymabe: yay, that's the last straggler. I've pre-check and ticked all the other boxes, we can jump to release jobs after that.
<dustymabe> lucab: I think so
<jlebon> dustymabe: i'm not sure if it's blocker-level. the policy still does get updated. it just "dirties" system state unnecessarily
<dustymabe> jlebon +1
<jlebon> i also think there's a path towards salvaging those machines so they're back on the canonical policy
poppajarv1 has joined #fedora-coreos
* dustymabe grabbing food
poppajarv has quit [Ping timeout: 272 seconds]
poppajarv1 is now known as poppajarv
bytehackr has quit [Ping timeout: 255 seconds]
<lucab> triple releases almost done, what time do we want to start the rollouts? 20:00 UTC (~2h from now)?
jpn has quit [Ping timeout: 246 seconds]
strigazi has quit [Ping timeout: 252 seconds]
strigazi has joined #fedora-coreos
aaradhak has joined #fedora-coreos
<bgilbert> lucab: sgtm
<lucab> nope, oscontainer pushing broke for testing
<dustymabe> wow SIGKILL 9 (which is usually what we see when the job runs out of memory)
<dustymabe> lucab: should be safe to re-run it (though we can see if stable goes through fine first)
<dustymabe> s/stable/next
<lucab> stable is already done, I started the jobs staggered
<lucab> ah yes, next is ongoing right now
<dustymabe> yeah feel free to go ahead and re-run - the job is pretty well idempotent
<dustymabe> at least the parts that have already run are
HappyMan has quit [Read error: Software caused connection abort]
HappyMan has joined #fedora-coreos
<dustymabe> 👀
<dustymabe> i think my mental model for how https://github.com/coreos/fedora-coreos-streams/pull/587#discussion_r1012185454 works isn't complete
<dustymabe> if I have a testing node on `36.20221014.2.0` which upgrade will I perform?
<dustymabe> cc bgilbert
<dustymabe> am I playing in two lotteries now?
<bgilbert> yes, you are
<bgilbert> so you might upgrade once or twice
<bgilbert> but I think that's the outcome we want
<bgilbert> we shouldn't jump everyone on the old rollout forward, nor should we cancel it
<dustymabe> ok
<dustymabe> my skeptism here is mostly because I don't know the capabilities of the code itself
<bgilbert> it comes down to which nodes in the version graph we deliver to clients
<bgilbert> or edges, rather
<lucab> if both edges are feasible, the client logic in Zincati will pick up the target release with the highest age-index (i.e. newer)
ravanelli has joined #fedora-coreos
ravanelli has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
aaradhak has quit [Quit: Connection closed for inactivity]
dustymabe has quit [Quit: WeeChat 3.5]
dustymabe has joined #fedora-coreos
HappyMan has quit [Ping timeout: 252 seconds]
HappyMan has joined #fedora-coreos
nalind has quit [Quit: bye for now]
jpn has quit [Ping timeout: 255 seconds]
_whitelogger has joined #fedora-coreos
sayan has quit [Read error: Software caused connection abort]
plarsen has quit [Remote host closed the connection]
sayan has joined #fedora-coreos
jpn has joined #fedora-coreos
vgoyal has quit [Quit: Leaving]
jpn has quit [Ping timeout: 252 seconds]
jpn has joined #fedora-coreos
crobinso has quit [Remote host closed the connection]
dustymabe has quit [Ping timeout: 252 seconds]
dustymabe has joined #fedora-coreos
mheon has quit [Ping timeout: 248 seconds]
jpn has quit [Ping timeout: 248 seconds]
jlebon has quit [Quit: leaving]
jpn has joined #fedora-coreos
shoragan has quit [Read error: Software caused connection abort]