dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos
nomenot has quit [Quit: Client closed]
<bgilbert> nomenot: you've said "fails" and "wouldn't let" a couple times. error messages would be very helpful.
mheon has quit [Ping timeout: 248 seconds]
daMaestro has joined #fedora-coreos
bgilbert has quit [Ping timeout: 268 seconds]
daMaestro has quit [Quit: Leaving]
travisghansen has quit [Ping timeout: 260 seconds]
travisghansen has joined #fedora-coreos
jlebon has quit [Quit: leaving]
plarsen has quit [Remote host closed the connection]
paragan has joined #fedora-coreos
Turnikov has joined #fedora-coreos
bgilbert has joined #fedora-coreos
bgilbert has quit [Ping timeout: 272 seconds]
zpytela_ has joined #fedora-coreos
zp has joined #fedora-coreos
zpytela_ has quit [Ping timeout: 260 seconds]
paragan has quit [Quit: Leaving]
zpytela_ has joined #fedora-coreos
zp has quit [Ping timeout: 260 seconds]
saschagrunert has joined #fedora-coreos
mboddu has quit [Quit: ZNC - http://znc.in]
mboddu has joined #fedora-coreos
zp has joined #fedora-coreos
zpytela_ has quit [Ping timeout: 265 seconds]
Betal has quit [Quit: WeeChat 3.8]
zpytela_ has joined #fedora-coreos
zp has quit [Ping timeout: 260 seconds]
apiaseck has joined #fedora-coreos
jpn has joined #fedora-coreos
bagasse has quit [Ping timeout: 252 seconds]
zp has joined #fedora-coreos
zpytela_ has quit [Ping timeout: 260 seconds]
<travier[m]> Hey folks, I need someone to cover for me for the two upcoming OKD Meeting as I won't be available.
guesswhat[m] has joined #fedora-coreos
<guesswhat[m]> Hello, question, any particular reason why Podman is allocating larger usernamespace in rootful mode than in rootless mode? See https://github.com/containers/podman/issues/16795#issuecomment-1377763863 , its exactly +43 in terms of size of namespace. Thanks
paragan has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
jpn has joined #fedora-coreos
apiaseck has quit [Remote host closed the connection]
apiaseck has joined #fedora-coreos
bagasse has joined #fedora-coreos
fifofonix has joined #fedora-coreos
bagasse has quit [Read error: Connection reset by peer]
vgoyal has joined #fedora-coreos
Turnikov has quit [Ping timeout: 255 seconds]
Turnikov has joined #fedora-coreos
jpn has quit [Ping timeout: 272 seconds]
jpn has joined #fedora-coreos
Turnikov has quit [Ping timeout: 246 seconds]
jpn has quit [Ping timeout: 260 seconds]
jpn has joined #fedora-coreos
fifofonix has quit [Ping timeout: 260 seconds]
fifofonix has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
saschagrunert has quit [Remote host closed the connection]
jlebon has joined #fedora-coreos
mheon has joined #fedora-coreos
<dustymabe> guesswhat[m]: not sure - but that issue has the right people who would know so you're in the right plac3e
Turnikov has joined #fedora-coreos
<davdunc[m> thanks!
<fifofonix> fyi: i'm having to rollback next/testing updates on vsphere because many nodes are crashing post successful boot.
<fifofonix> priority rn is getting back to normal operations but a couple of different types of messages suggesting it may be smb-connected.
<dustymabe> fifofonix: smb - like SAMBA ?
<fifofonix> yes. a message seen also like 'kernel BUG at mm/slub.c:386!
Turnikov has quit [Ping timeout: 252 seconds]
Turnikov has joined #fedora-coreos
<dustymabe> fifofonix: can you `sudo rpm-ostree override replace https://bodhi.fedoraproject.org/updates/FEDORA-2023-39b55235fc` and reboot and let me know if that fixes the issue?
<fifofonix> in due course. have a lot of fire fighting first. that link is a 404 for me rn.
<dustymabe> link works for me
<fifofonix> sorry my IRC didn't figure out the trailing character. got it.
<dustymabe> I think this is https://bugzilla.redhat.com/show_bug.cgi?id=2158496 in which case maybe we should pause the rollout - cc jlebon ravanelli bgilbert travier[m]
<jlebon> dustymabe: pause, fast-track the kernel and then respin testing and next?
<jlebon> let's check what else is in that new kernel
<dustymabe> that's the thought. want to get some confirmation from fifofonix that the new kernel helps first (and also an issue opened we can reference)
<dustymabe> jlebon: doesn't matter too much, it's already passed tests in testing-devel
<dustymabe> and we're not shipping directly to `stable`
<jlebon> ahh ok, wasn't sure if it had reached testing-devel yet
Turnikov has quit [Ping timeout: 252 seconds]
<dustymabe> jlebon: can you open the PR to pause the rollout
<dustymabe> I just realized I forgot to update the casc in jenkins the other day so I'm going to do it now while nothing is running
<jlebon> will do, checking something else first
<dustymabe> need one more review on ^^
<jlebon> since it's already in testing-devel, might be simpler to just repromote
<dustymabe> yeah - i guess it depends on what went in since then.
<jlebon> perhaps marmijo[m] or ravanelli can add the missing stamp?
<dustymabe> one more thing to clean up: https://github.com/coreos/fedora-coreos-config/pull/2166
* dustymabe brgb
<dustymabe> brb
nalind has joined #fedora-coreos
plarsen has joined #fedora-coreos
<marmijo[m]> jlebon: just catching up on this. let me know what I can do to help.
<dustymabe> jlebon: looks like https://github.com/coreos/fedora-coreos-tracker/issues/1373 is going to cause us some headache
<dustymabe> marmijo[m]: if you're looking for a challenge ^^
<dustymabe> fifofonix: when you let us know that new kernel solves the issue and open an issue for it we can proceed with getting a new release spun
<dustymabe> travier[m]: is next up in the ad-hoc release rotation, but by the time we get it kicked off it will probably be late for him. ravanelli is next in line
<fifofonix> dustymabe: i want to get to that but practically that may bleed into tomorrow morning.
<dustymabe> fifofonix: OK - jlebon considering ^^ what do you think is the best course of action?
<marmijo[m]> dustymabe: 👍️
<travier[m]> Sorry I'm really busy right now and will be mostly unavailable in the coming days so better to pick someone else
<travier[m]> kernel bug is bad and would be nice to have the fix / not regress indeed
<travier[m]> s/is/looks/
<marmijo[m]> I dont mind doing a release again if needed
<dustymabe> fifofonix: can you at least post up the dmesg output somewhere where the crash happens (like https://paste.centos.org/) ?
bagasse has joined #fedora-coreos
<fifofonix> all i have right now are some photos of the console unfortunately.
<jlebon> photos also work
<dustymabe> ahh - yeah. what we're trying to do is have some confidence that https://bugzilla.redhat.com/show_bug.cgi?id=2158496 is the actual problem
<jlebon> we just want to sanity-check whether the issue you're hitting is the same one reported
<fifofonix> i have several photos with different crash messages.
<fifofonix> an email would be easiest for me if you want to dm me one.
<dustymabe> jlebon: if you look at his email I'll look at your code review :)
<jlebon> dustymabe: wfm :)
<jlebon> fifofonix: sent :)
bagasse_ has joined #fedora-coreos
bagasse has quit [Ping timeout: 260 seconds]
<fifofonix> jlebon: thanks, two emails coming your way.
<jlebon> fifofonix: thanks!
ravanell_ has joined #fedora-coreos
ravanelli has quit [Read error: Connection reset by peer]
<dustymabe> jlebon: really only one comment outstanding: https://github.com/coreos/coreos-assembler/pull/3297/files#r1067182180
paragan has quit [Ping timeout: 260 seconds]
paragan has joined #fedora-coreos
paragan has quit [Remote host closed the connection]
<dustymabe> aaradhak anthr76 davdunc dustymabe gursewak jaimelm jbrooks jcajka jdoss jlebon jmarrero lorbus miabbott nasirhm ravanelli saqali skunkerk walters
<dustymabe> FCOS video community meeting happening today at https://meet.google.com/ado-zjfr-qsj
<dustymabe> If you don't want to be pinged remove your name from this file: https://github.com/coreos/fedora-coreos-tracker/blob/main/meeting-people.txt
<jlebon> fifofonix, dustymabe: looking at the photos, it doesn't appear to be the same issue
giuseppe has joined #fedora-coreos
jpn has quit [Ping timeout: 272 seconds]
<jlebon> the reported issue is about querying files on a CIFS mount. but fifofonix's issue is hit when trying to connect
<jlebon> however
dwalsh has joined #fedora-coreos
<dwalsh> dustymabe, Meeting here or different IRC?
<jlebon> the commit reported to fix the first issue fixes a buffer overrun, so presumably random issues might happen. but it's not clear if it's possible for the buggy code to be executed before a CIFS mount is successful
<jlebon> i think if we can, it'd be good to wait until fifofonix can verify if the issue is fixed with the new kernel
bgilbert has joined #fedora-coreos
<fifofonix> jlebon: so, the nodes are successfully mounting smb initially. not sure what causes a subsequent crash.
<jlebon> fifofonix: any way you could log into the node and upload the full dmesg?
<jlebon> fifofonix: indeed, the crash happens in the "reconnect" path, as if it lost connection. but in the reported issue, it happens when e.g. stat'ing a file in the mount.
<fifofonix> i reverted all the nodes and am now at the point where i can re-introduce instability. i've upgraded one node and am watching.
<jlebon> fifofonix: great!
zp has quit [Ping timeout: 268 seconds]
vgoyal has quit [Quit: Leaving]
<guesswhat[m]> How can I make changes via rpm-ostree override and apply-live --alow-replacement make permanent, Its getting reseted after reboot. Thanks
bgilbert has quit [Quit: Leaving]
vgoyal has joined #fedora-coreos
<travier[m]> Thanks Dusty for running!
<jlebon> guesswhat[m]: it should be rebooting into the new deployment. it's possible finalization is failing, which `rpm-ostree status` should normally tell you about
<jlebon> guesswhat[m]: one gotcha is if zincati staged a deployment, you'll want to `cleanup -p` first before overriding
<dustymabe> travier[m]: np :)
<jlebon> thanks dustymabe!
<jlebon> was nice to see everything we did get done since last time
bgilbert has joined #fedora-coreos
<dustymabe> yeah, I was happy to see that too
<jlebon> fifofonix: btw, instead of doing an override replace, if it's easier you can also grab the vmware image for testing-devel at https://builds.coreos.fedoraproject.org/browser
<guesswhat[m]> jlebon: i am using example from fedora coreos docs ( before=zincati.service ), but sometimes its ok, sometimes not ok..
<jlebon> guesswhat[m]: ahh ok, yeah that should work. we might need the full butane config to help debug.
* jlebon goes for food
<gursewak> dustymabe, will look at it
vgoyal_ has joined #fedora-coreos
vgoyal has quit [Ping timeout: 246 seconds]
jpn has joined #fedora-coreos
vgoyal_ has quit [Ping timeout: 268 seconds]
jpn has quit [Ping timeout: 272 seconds]
ddubs has joined #fedora-coreos
bagasse_ has quit [Ping timeout: 272 seconds]
vgoyal_ has joined #fedora-coreos
jpn has joined #fedora-coreos
vgoyal_ has quit [Remote host closed the connection]
ddubs has quit [Quit: leaving]
vgoyal has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
<fifofonix> jlebon: re ^^ i've had to rollback the one node i was using to test the latest testing release. i have too many concurrent changes right now so having to retrench. not sure i'll be able to validate even tomorrow.
<guesswhat[m]> jlebon: thanks
<guesswhat[m]> Any idea how how to get containers/podman upstream commitid from https://copr.fedorainfracloud.org/coprs/rhcontainerbot/podman-next/package/podman/ ? its unclear to me from what commit is a package builded. Thanks
iwanb[m] has joined #fedora-coreos
<iwanb[m]> Hi, I'm trying to persist the SSH host keys of a bare metal CoreOS setup through re-provisioning. Is there some recommended way? I tried putting the keys on a separate /var partition and point sshd_config to it, but sshd somehow cannot read it (I checked the permissions, SELinux maybe?)
fifofonix has quit [Read error: Connection reset by peer]
daMaestro has joined #fedora-coreos
<bgilbert> iwanb[m]: SELinux would be my guess, yeah. the separate /var approach sounds reasonable to me
zp has joined #fedora-coreos
zpytela_ has joined #fedora-coreos
<dustymabe> jlebon: I notice in the tests list there is `iso-offline-install` but no "online" equivalent.. similarly there is `miniso-install` (which is required to be an online install)
<dustymabe> i almost wonder if we should make "online/offline" another component rather than part of the [0] component name
<walters> iwanb[m]: One thing I was looking at tangentially related to this is leveraging https://systemd.io/CREDENTIALS/ in Ignition - basically here you could encrypt the host keys as credentials, and then with some support in ignition, decrypt them and reuse them at re-provisioning time
zp has quit [Ping timeout: 252 seconds]
<walters> (the value here is that you could then consistently git-ops by committing the keys to a git repository safely alongside all the other non-secret configs)
<iwanb[m]> <bgilbert> "iwanb: SELinux would be my guess..." <- I don't know much about SELinux, any good resource to debug this? I saw a couple open issues on the coreos tracker which seem to suggest it's not really possible to control the policies at the moment
<bgilbert> walters: that doesn't help iwanb[m]'s problem today though
<bgilbert> walters: also, I'm not sure storing the host's private key off-host is a good trade
<walters> yep, correct
<bgilbert> (i.e. it provides more vectors for a compromise)
<bgilbert> iwanb[m]: you should see AVC denials in the log if that's the problem. it should(?) just be a matter of setting the correct labels on your files
<iwanb[m]> The systemd credentials would be cleaner indeed but might not help with sshd not liking an alternate location
<bgilbert> though it's possible that an autorelabel would set them back? I'm not sure whether you'd need custom policy to avoid that. dustymabe/jlebon?
<bgilbert> iwanb[m]: walters' point was that, with that feature, you wouldn't need an alternate location - you'd just inject the keys via Ignition
<iwanb[m]> Ah true
<walters> Also related, with a https://github.com/containers/bootc/pull/30 model it'd be really easy to add something like --copy-dir /etc/ssh that would preserve the referenced directory across a re-install
<walters> since we're consistently operating at the filesystem level
bagasse has joined #fedora-coreos
<dustymabe> ravanell_: I'm done with the code review on https://github.com/coreos/coreos-assembler/pull/3298 - sorry it took me so long. Let me know if you have questions.
<dustymabe> bgilbert: is he wanting to put the host keys somewhere other than the default location?
<jlebon> dustymabe: for the smb issue, we couldn't get confirmation from fifofonix, but thinking we should go ahead and respin anyway
<jlebon> we just did the releases and the fixed kernel is already in the stable repo and in testing-devel
<dustymabe> jlebon: SGTM - bgilbert might have an opinion, but barring that might as well get the ball rolling. We still need a tracker issue to reference to start the process
<dustymabe> bgilbert: yeah, you probably just need to fix up the labels on the files if SELinux is the thing that is blocking you
<dustymabe> a `restorecon` might set it back, but there shouldn't be any process that auto runs that
<jlebon> re. iso-offline-install, no strong opinion there, but to me it makes sense that `iso-install` is what `iso-offline-install` currently is since it's really what 99% of the case what users will hit
<jlebon> naming them iso-offline-install and iso-online-install doesn't really convey that
<dustymabe> if he wanted to make sure it would survive a restorecon then he would need to set up a file context equivalency: https://danwalsh.livejournal.com/27571.html
<dustymabe> but we don't have semanage on FCOS
<dustymabe> jlebon: but currently there is no `iso-install` or `iso-online-install`
<jlebon> dustymabe: are you talking about pre or post renata's PR?
<jlebon> bgilbert dustymabe: agree, nothing should be restoring labels there automatically. there might be a bug in ignition though
<dustymabe> jlebon: i'm looking at the list in her PR. I don't know if it was there before
<jlebon> it's there now, but indeed it's missing from renata's PR
<jlebon> i'll add a comment there
<dustymabe> ahh OK
<jlebon> i think it's not there because it wasn't in the default set of scenarios we run, and the MVP was to match current defaults
<dustymabe> should there be a different in the defaults versus what is allowed?
<jlebon> i think eventually yes, but i'd rather not scope that in
<dustymabe> which means someone couldn't manually run an iso online install test once her PR merges
<jlebon> i was going to argue for adding it to the default set for now
<dustymabe> ahh +1
<bgilbert> iwanb[m]: see replies ^
<bgilbert> so setting the file labels should be enough
<iwanb[m]> Thanks, I'll try it out
<bgilbert> dustymabe: I'm okay with going ahead and respinning
<dustymabe> bgilbert: +1 - jlebon do you want to open a tracker issue we can reference for the ad-hoc spin or do you want me to?
<jlebon> dustymabe: i can open the ticket and you open the streams issues? :)
<dustymabe> Deal!
<dustymabe> jlebon: did we decide on full promotion versus just doing the kernel?
daMaestro has quit [Quit: Leaving]
<jlebon> dustymabe: maybe let's look at how much went in since and if it's not much, do a full promotion?
<dustymabe> 👍
<jlebon> cool with a cherry-pick too, it's just more work :)
<dustymabe> for f-c-c not much has gone in. that new kernel and one other package
<dustymabe> let me look at cosa
<dustymabe> should be fine to just do a normal promotion
<jlebon> +1 nice
* dustymabe switches locations
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 272 seconds]
daMaestro has joined #fedora-coreos
<iwanb[m]> <bgilbert> "so setting the file labels..." <- Got it working by setting the labels indeed, FYI one of the reasons it did not work was that I used the "directory" setting in the ignition file and that resets the labels
<bgilbert> ahh, okay
heldwin has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 246 seconds]
Betal has joined #fedora-coreos
jpn has joined #fedora-coreos
nalind has quit [Quit: bye for now]
jpn has quit [Ping timeout: 246 seconds]
mheon has quit [Ping timeout: 252 seconds]
plarsen has quit [Quit: NullPointerException!]
daMaestro has quit [Quit: Leaving]
dwalsh has quit [Ping timeout: 252 seconds]