<jlebon>
dustymabe: pause, fast-track the kernel and then respin testing and next?
<jlebon>
let's check what else is in that new kernel
<dustymabe>
that's the thought. want to get some confirmation from fifofonix that the new kernel helps first (and also an issue opened we can reference)
<dustymabe>
jlebon: doesn't matter too much, it's already passed tests in testing-devel
<dustymabe>
and we're not shipping directly to `stable`
<jlebon>
ahh ok, wasn't sure if it had reached testing-devel yet
Turnikov has quit [Ping timeout: 252 seconds]
<dustymabe>
jlebon: can you open the PR to pause the rollout
<dustymabe>
I just realized I forgot to update the casc in jenkins the other day so I'm going to do it now while nothing is running
<dustymabe>
marmijo[m]: if you're looking for a challenge ^^
<dustymabe>
fifofonix: when you let us know that new kernel solves the issue and open an issue for it we can proceed with getting a new release spun
<dustymabe>
travier[m]: is next up in the ad-hoc release rotation, but by the time we get it kicked off it will probably be late for him. ravanelli is next in line
<fifofonix>
dustymabe: i want to get to that but practically that may bleed into tomorrow morning.
<dustymabe>
fifofonix: OK - jlebon considering ^^ what do you think is the best course of action?
<marmijo[m]>
dustymabe: 👍️
<travier[m]>
Sorry I'm really busy right now and will be mostly unavailable in the coming days so better to pick someone else
<travier[m]>
kernel bug is bad and would be nice to have the fix / not regress indeed
<travier[m]>
s/is/looks/
<marmijo[m]>
I dont mind doing a release again if needed
<dustymabe>
fifofonix: can you at least post up the dmesg output somewhere where the crash happens (like https://paste.centos.org/) ?
bagasse has joined #fedora-coreos
<fifofonix>
all i have right now are some photos of the console unfortunately.
<jlebon>
the commit reported to fix the first issue fixes a buffer overrun, so presumably random issues might happen. but it's not clear if it's possible for the buggy code to be executed before a CIFS mount is successful
<jlebon>
i think if we can, it'd be good to wait until fifofonix can verify if the issue is fixed with the new kernel
bgilbert has joined #fedora-coreos
<fifofonix>
jlebon: so, the nodes are successfully mounting smb initially. not sure what causes a subsequent crash.
<jlebon>
fifofonix: any way you could log into the node and upload the full dmesg?
<jlebon>
fifofonix: indeed, the crash happens in the "reconnect" path, as if it lost connection. but in the reported issue, it happens when e.g. stat'ing a file in the mount.
<fifofonix>
i reverted all the nodes and am now at the point where i can re-introduce instability. i've upgraded one node and am watching.
<jlebon>
fifofonix: great!
zp has quit [Ping timeout: 268 seconds]
vgoyal has quit [Quit: Leaving]
<guesswhat[m]>
How can I make changes via rpm-ostree override and apply-live --alow-replacement make permanent, Its getting reseted after reboot. Thanks
bgilbert has quit [Quit: Leaving]
vgoyal has joined #fedora-coreos
<travier[m]>
Thanks Dusty for running!
<jlebon>
guesswhat[m]: it should be rebooting into the new deployment. it's possible finalization is failing, which `rpm-ostree status` should normally tell you about
<jlebon>
guesswhat[m]: one gotcha is if zincati staged a deployment, you'll want to `cleanup -p` first before overriding
<dustymabe>
travier[m]: np :)
<jlebon>
thanks dustymabe!
<jlebon>
was nice to see everything we did get done since last time
vgoyal_ has quit [Remote host closed the connection]
ddubs has quit [Quit: leaving]
vgoyal has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
<fifofonix>
jlebon: re ^^ i've had to rollback the one node i was using to test the latest testing release. i have too many concurrent changes right now so having to retrench. not sure i'll be able to validate even tomorrow.
<iwanb[m]>
Hi, I'm trying to persist the SSH host keys of a bare metal CoreOS setup through re-provisioning. Is there some recommended way? I tried putting the keys on a separate /var partition and point sshd_config to it, but sshd somehow cannot read it (I checked the permissions, SELinux maybe?)
fifofonix has quit [Read error: Connection reset by peer]
daMaestro has joined #fedora-coreos
<bgilbert>
iwanb[m]: SELinux would be my guess, yeah. the separate /var approach sounds reasonable to me
zp has joined #fedora-coreos
zpytela_ has joined #fedora-coreos
<dustymabe>
jlebon: I notice in the tests list there is `iso-offline-install` but no "online" equivalent.. similarly there is `miniso-install` (which is required to be an online install)
<dustymabe>
i almost wonder if we should make "online/offline" another component rather than part of the [0] component name
<walters>
iwanb[m]: One thing I was looking at tangentially related to this is leveraging https://systemd.io/CREDENTIALS/ in Ignition - basically here you could encrypt the host keys as credentials, and then with some support in ignition, decrypt them and reuse them at re-provisioning time
zp has quit [Ping timeout: 252 seconds]
<walters>
(the value here is that you could then consistently git-ops by committing the keys to a git repository safely alongside all the other non-secret configs)
<iwanb[m]>
<bgilbert> "iwanb: SELinux would be my guess..." <- I don't know much about SELinux, any good resource to debug this? I saw a couple open issues on the coreos tracker which seem to suggest it's not really possible to control the policies at the moment
<bgilbert>
walters: that doesn't help iwanb[m]'s problem today though
<bgilbert>
walters: also, I'm not sure storing the host's private key off-host is a good trade
<walters>
yep, correct
<bgilbert>
(i.e. it provides more vectors for a compromise)
<bgilbert>
iwanb[m]: you should see AVC denials in the log if that's the problem. it should(?) just be a matter of setting the correct labels on your files
<iwanb[m]>
The systemd credentials would be cleaner indeed but might not help with sshd not liking an alternate location
<bgilbert>
though it's possible that an autorelabel would set them back? I'm not sure whether you'd need custom policy to avoid that. dustymabe/jlebon?
<bgilbert>
iwanb[m]: walters' point was that, with that feature, you wouldn't need an alternate location - you'd just inject the keys via Ignition
<iwanb[m]>
Ah true
<walters>
Also related, with a https://github.com/containers/bootc/pull/30 model it'd be really easy to add something like --copy-dir /etc/ssh that would preserve the referenced directory across a re-install
<walters>
since we're consistently operating at the filesystem level
<dustymabe>
bgilbert: is he wanting to put the host keys somewhere other than the default location?
<jlebon>
dustymabe: for the smb issue, we couldn't get confirmation from fifofonix, but thinking we should go ahead and respin anyway
<jlebon>
we just did the releases and the fixed kernel is already in the stable repo and in testing-devel
<dustymabe>
jlebon: SGTM - bgilbert might have an opinion, but barring that might as well get the ball rolling. We still need a tracker issue to reference to start the process
<dustymabe>
bgilbert: yeah, you probably just need to fix up the labels on the files if SELinux is the thing that is blocking you
<dustymabe>
a `restorecon` might set it back, but there shouldn't be any process that auto runs that
<jlebon>
re. iso-offline-install, no strong opinion there, but to me it makes sense that `iso-install` is what `iso-offline-install` currently is since it's really what 99% of the case what users will hit
<jlebon>
naming them iso-offline-install and iso-online-install doesn't really convey that
<dustymabe>
should be fine to just do a normal promotion
<jlebon>
+1 nice
* dustymabe
switches locations
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 272 seconds]
daMaestro has joined #fedora-coreos
<iwanb[m]>
<bgilbert> "so setting the file labels..." <- Got it working by setting the labels indeed, FYI one of the reasons it did not work was that I used the "directory" setting in the ignition file and that resets the labels