#fedora-coreos on 2023-01-11 — irc logs at libera.irclog.whitequark.org

2022-05-11 12:42 dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos

00:07 nomenot has quit [Quit: Client closed]

00:52 <bgilbert> nomenot: you've said "fails" and "wouldn't let" a couple times. error messages would be very helpful.

01:09 mheon has quit [Ping timeout: 248 seconds]

01:15 daMaestro has joined #fedora-coreos

01:36 bgilbert has quit [Ping timeout: 268 seconds]

01:38 daMaestro has quit [Quit: Leaving]

01:53 travisghansen has quit [Ping timeout: 260 seconds]

01:56 travisghansen has joined #fedora-coreos

02:45 jlebon has quit [Quit: leaving]

02:49 plarsen has quit [Remote host closed the connection]

03:29 paragan has joined #fedora-coreos

03:42 Turnikov has joined #fedora-coreos

04:55 bgilbert has joined #fedora-coreos

05:16 bgilbert has quit [Ping timeout: 272 seconds]

06:23 zpytela_ has joined #fedora-coreos

06:29 zp has joined #fedora-coreos

06:32 zpytela_ has quit [Ping timeout: 260 seconds]

07:31 paragan has quit [Quit: Leaving]

08:01 zpytela_ has joined #fedora-coreos

08:03 zp has quit [Ping timeout: 260 seconds]

08:06 saschagrunert has joined #fedora-coreos

08:11 mboddu has quit [Quit: ZNC - http://znc.in]

08:12 mboddu has joined #fedora-coreos

08:12 zp has joined #fedora-coreos

08:14 zpytela_ has quit [Ping timeout: 265 seconds]

08:56 Betal has quit [Quit: WeeChat 3.8]

09:02 zpytela_ has joined #fedora-coreos

09:04 zp has quit [Ping timeout: 260 seconds]

09:13 apiaseck has joined #fedora-coreos

09:25 jpn has joined #fedora-coreos

09:49 bagasse has quit [Ping timeout: 252 seconds]

09:57 zp has joined #fedora-coreos

10:00 zpytela_ has quit [Ping timeout: 260 seconds]

10:13 <travier[m]> Hey folks, I need someone to cover for me for the two upcoming OKD Meeting as I won't be available.

10:15 guesswhat[m] has joined #fedora-coreos

10:22 <guesswhat[m]> Hello, question, any particular reason why Podman is allocating larger usernamespace in rootful mode than in rootless mode? See https://github.com/containers/podman/issues/16795#issuecomment-1377763863 , its exactly +43 in terms of size of namespace. Thanks

11:12 paragan has joined #fedora-coreos

11:13 jpn has quit [Ping timeout: 260 seconds]

11:26 jpn has joined #fedora-coreos

11:29 apiaseck has quit [Remote host closed the connection]

11:32 apiaseck has joined #fedora-coreos

12:00 bagasse has joined #fedora-coreos

12:11 fifofonix has joined #fedora-coreos

12:22 bagasse has quit [Read error: Connection reset by peer]

12:27 vgoyal has joined #fedora-coreos

12:45 Turnikov has quit [Ping timeout: 255 seconds]

12:49 Turnikov has joined #fedora-coreos

12:50 jpn has quit [Ping timeout: 272 seconds]

12:52 jpn has joined #fedora-coreos

12:57 Turnikov has quit [Ping timeout: 246 seconds]

12:58 jpn has quit [Ping timeout: 260 seconds]

13:18 jpn has joined #fedora-coreos

13:21 fifofonix has quit [Ping timeout: 260 seconds]

13:21 fifofonix has joined #fedora-coreos

13:32 jpn has quit [Ping timeout: 268 seconds]

13:46 jpn has joined #fedora-coreos

14:05 saschagrunert has quit [Remote host closed the connection]

14:16 jlebon has joined #fedora-coreos

14:18 mheon has joined #fedora-coreos

14:22 <dustymabe> guesswhat[m]: not sure - but that issue has the right people who would know so you're in the right plac3e

14:34 <dustymabe> fifofonix: jdoss: davdunc: FYI: https://lists.fedoraproject.org/archives/list/coreos@lists.fedoraproject.org/message/UFC25ERR5I3G5HQZ5QUUCQGZ2VNTB3BI/

14:34 Turnikov has joined #fedora-coreos

14:34 <davdunc[m> thanks!

14:35 <fifofonix> fyi: i'm having to rollback next/testing updates on vsphere because many nodes are crashing post successful boot.

14:35 <fifofonix> priority rn is getting back to normal operations but a couple of different types of messages suggesting it may be smb-connected.

14:36 <dustymabe> fifofonix: smb - like SAMBA ?

14:37 <fifofonix> yes. a message seen also like 'kernel BUG at mm/slub.c:386!

14:38 Turnikov has quit [Ping timeout: 252 seconds]

14:40 Turnikov has joined #fedora-coreos

14:42 <dustymabe> fifofonix: can you `sudo rpm-ostree override replace https://bodhi.fedoraproject.org/updates/FEDORA-2023-39b55235fc` and reboot and let me know if that fixes the issue?

14:43 <fifofonix> in due course. have a lot of fire fighting first. that link is a 404 for me rn.

14:44 <dustymabe> link works for me

14:44 <fifofonix> sorry my IRC didn't figure out the trailing character. got it.

14:45 <dustymabe> I think this is https://bugzilla.redhat.com/show_bug.cgi?id=2158496 in which case maybe we should pause the rollout - cc jlebon ravanelli bgilbert travier[m]

14:49 <jlebon> dustymabe: pause, fast-track the kernel and then respin testing and next?

14:50 <jlebon> let's check what else is in that new kernel

14:50 <dustymabe> that's the thought. want to get some confirmation from fifofonix that the new kernel helps first (and also an issue opened we can reference)

14:50 <dustymabe> jlebon: doesn't matter too much, it's already passed tests in testing-devel

14:51 <dustymabe> and we're not shipping directly to `stable`

14:51 <jlebon> ahh ok, wasn't sure if it had reached testing-devel yet

14:51 Turnikov has quit [Ping timeout: 252 seconds]

14:52 <dustymabe> jlebon: can you open the PR to pause the rollout

14:52 <dustymabe> I just realized I forgot to update the casc in jenkins the other day so I'm going to do it now while nothing is running

14:52 <jlebon> will do, checking something else first

14:59 <jlebon> dustymabe: https://github.com/coreos/fedora-coreos-streams/pull/635

15:01 <dustymabe> need one more review on ^^

15:02 <jlebon> since it's already in testing-devel, might be simpler to just repromote

15:03 <dustymabe> yeah - i guess it depends on what went in since then.

15:03 <jlebon> perhaps marmijo[m] or ravanelli can add the missing stamp?

15:03 <dustymabe> one more thing to clean up: https://github.com/coreos/fedora-coreos-config/pull/2166

15:05 * dustymabe brgb

15:05 <dustymabe> brb

15:13 nalind has joined #fedora-coreos

15:23 plarsen has joined #fedora-coreos

15:26 <marmijo[m]> jlebon: just catching up on this. let me know what I can do to help.

15:26 <dustymabe> jlebon: looks like https://github.com/coreos/fedora-coreos-tracker/issues/1373 is going to cause us some headache

15:27 <dustymabe> marmijo[m]: if you're looking for a challenge ^^

15:28 <dustymabe> fifofonix: when you let us know that new kernel solves the issue and open an issue for it we can proceed with getting a new release spun

15:30 <dustymabe> travier[m]: is next up in the ad-hoc release rotation, but by the time we get it kicked off it will probably be late for him. ravanelli is next in line

15:30 <fifofonix> dustymabe: i want to get to that but practically that may bleed into tomorrow morning.

15:31 <dustymabe> fifofonix: OK - jlebon considering ^^ what do you think is the best course of action?

15:34 <marmijo[m]> dustymabe: 👍️

15:35 <travier[m]> Sorry I'm really busy right now and will be mostly unavailable in the coming days so better to pick someone else

15:36 <travier[m]> kernel bug is bad and would be nice to have the fix / not regress indeed

15:36 <travier[m]> s/is/looks/

15:37 <marmijo[m]> I dont mind doing a release again if needed

15:45 <dustymabe> fifofonix: can you at least post up the dmesg output somewhere where the crash happens (like https://paste.centos.org/) ?

15:52 bagasse has joined #fedora-coreos

15:52 <fifofonix> all i have right now are some photos of the console unfortunately.

15:52 <jlebon> photos also work

15:53 <dustymabe> ahh - yeah. what we're trying to do is have some confidence that https://bugzilla.redhat.com/show_bug.cgi?id=2158496 is the actual problem

15:53 <jlebon> we just want to sanity-check whether the issue you're hitting is the same one reported

15:53 <fifofonix> i have several photos with different crash messages.

15:53 <fifofonix> an email would be easiest for me if you want to dm me one.

15:54 <dustymabe> jlebon: if you look at his email I'll look at your code review :)

15:54 <jlebon> dustymabe: wfm :)

15:55 <jlebon> fifofonix: sent :)

15:56 bagasse_ has joined #fedora-coreos

15:59 bagasse has quit [Ping timeout: 260 seconds]

16:00 <fifofonix> jlebon: thanks, two emails coming your way.

16:00 <jlebon> fifofonix: thanks!

16:03 ravanell_ has joined #fedora-coreos

16:04 ravanelli has quit [Read error: Connection reset by peer]

16:11 <dustymabe> jlebon: really only one comment outstanding: https://github.com/coreos/coreos-assembler/pull/3297/files#r1067182180

16:17 paragan has quit [Ping timeout: 260 seconds]

16:21 paragan has joined #fedora-coreos

16:22 paragan has quit [Remote host closed the connection]

16:30 <dustymabe> aaradhak anthr76 davdunc dustymabe gursewak jaimelm jbrooks jcajka jdoss jlebon jmarrero lorbus miabbott nasirhm ravanelli saqali skunkerk walters

16:30 <dustymabe> FCOS video community meeting happening today at https://meet.google.com/ado-zjfr-qsj

16:30 <dustymabe> If you don't want to be pinged remove your name from this file: https://github.com/coreos/fedora-coreos-tracker/blob/main/meeting-people.txt

16:30 <jlebon> fifofonix, dustymabe: looking at the photos, it doesn't appear to be the same issue

16:30 giuseppe has joined #fedora-coreos

16:30 jpn has quit [Ping timeout: 272 seconds]

16:31 <jlebon> the reported issue is about querying files on a CIFS mount. but fifofonix's issue is hit when trying to connect

16:31 <jlebon> however

16:32 dwalsh has joined #fedora-coreos

16:32 <dwalsh> dustymabe, Meeting here or different IRC?

16:33 <dustymabe> https://meet.google.com/ado-zjfr-qsj

16:33 <jlebon> the commit reported to fix the first issue fixes a buffer overrun, so presumably random issues might happen. but it's not clear if it's possible for the buggy code to be executed before a CIFS mount is successful

16:34 <jlebon> i think if we can, it'd be good to wait until fifofonix can verify if the issue is fixed with the new kernel

16:34 bgilbert has joined #fedora-coreos

16:35 <fifofonix> jlebon: so, the nodes are successfully mounting smb initially. not sure what causes a subsequent crash.

16:36 <jlebon> fifofonix: any way you could log into the node and upload the full dmesg?

16:37 <jlebon> fifofonix: indeed, the crash happens in the "reconnect" path, as if it lost connection. but in the reported issue, it happens when e.g. stat'ing a file in the mount.

16:37 <fifofonix> i reverted all the nodes and am now at the point where i can re-introduce instability. i've upgraded one node and am watching.

16:38 <jlebon> fifofonix: great!

16:53 zp has quit [Ping timeout: 268 seconds]

17:13 vgoyal has quit [Quit: Leaving]

17:18 <guesswhat[m]> How can I make changes via rpm-ostree override and apply-live --alow-replacement make permanent, Its getting reseted after reboot. Thanks

17:31 bgilbert has quit [Quit: Leaving]

17:40 vgoyal has joined #fedora-coreos

17:49 <travier[m]> Thanks Dusty for running!

17:49 <jlebon> guesswhat[m]: it should be rebooting into the new deployment. it's possible finalization is failing, which `rpm-ostree status` should normally tell you about

17:51 <jlebon> guesswhat[m]: one gotcha is if zincati staged a deployment, you'll want to `cleanup -p` first before overriding

17:52 <dustymabe> travier[m]: np :)

17:53 <jlebon> thanks dustymabe!

17:53 <jlebon> was nice to see everything we did get done since last time

17:54 bgilbert has joined #fedora-coreos

17:54 <dustymabe> yeah, I was happy to see that too

17:56 <jlebon> fifofonix: btw, instead of doing an override replace, if it's easier you can also grab the vmware image for testing-devel at https://builds.coreos.fedoraproject.org/browser

17:57 <guesswhat[m]> jlebon: i am using example from fedora coreos docs ( before=zincati.service ), but sometimes its ok, sometimes not ok..

17:59 <jlebon> guesswhat[m]: ahh ok, yeah that should work. we might need the full butane config to help debug.

17:59 * jlebon goes for food

18:03 <dustymabe> gursewak: mind looking at the failure on rawhide in https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/blue/organizations/jenkins/build/detail/build/783/pipeline ?

18:24 <gursewak> dustymabe, will look at it

18:36 vgoyal_ has joined #fedora-coreos

18:38 vgoyal has quit [Ping timeout: 246 seconds]

18:58 jpn has joined #fedora-coreos

18:59 vgoyal_ has quit [Ping timeout: 268 seconds]

19:04 jpn has quit [Ping timeout: 272 seconds]

19:05 ddubs has joined #fedora-coreos

19:05 bagasse_ has quit [Ping timeout: 272 seconds]

19:07 vgoyal_ has joined #fedora-coreos

19:10 jpn has joined #fedora-coreos

19:20 vgoyal_ has quit [Remote host closed the connection]

19:22 ddubs has quit [Quit: leaving]

19:37 vgoyal has joined #fedora-coreos

19:41 jpn has quit [Ping timeout: 252 seconds]

20:18 <fifofonix> jlebon: re ^^ i've had to rollback the one node i was using to test the latest testing release. i have too many concurrent changes right now so having to retrench. not sure i'll be able to validate even tomorrow.

20:26 <guesswhat[m]> jlebon: thanks

20:28 <guesswhat[m]> Any idea how how to get containers/podman upstream commitid from https://copr.fedorainfracloud.org/coprs/rhcontainerbot/podman-next/package/podman/ ? its unclear to me from what commit is a package builded. Thanks

20:28 iwanb[m] has joined #fedora-coreos

20:29 <iwanb[m]> Hi, I'm trying to persist the SSH host keys of a bare metal CoreOS setup through re-provisioning. Is there some recommended way? I tried putting the keys on a separate /var partition and point sshd_config to it, but sshd somehow cannot read it (I checked the permissions, SELinux maybe?)

20:29 fifofonix has quit [Read error: Connection reset by peer]

20:35 daMaestro has joined #fedora-coreos

20:35 <bgilbert> iwanb[m]: SELinux would be my guess, yeah. the separate /var approach sounds reasonable to me

20:37 zp has joined #fedora-coreos

20:39 zpytela_ has joined #fedora-coreos

20:39 <dustymabe> jlebon: I notice in the tests list there is `iso-offline-install` but no "online" equivalent.. similarly there is `miniso-install` (which is required to be an online install)

20:40 <dustymabe> i almost wonder if we should make "online/offline" another component rather than part of the [0] component name

20:41 <walters> iwanb[m]: One thing I was looking at tangentially related to this is leveraging https://systemd.io/CREDENTIALS/ in Ignition - basically here you could encrypt the host keys as credentials, and then with some support in ignition, decrypt them and reuse them at re-provisioning time

20:41 zp has quit [Ping timeout: 252 seconds]

20:42 <walters> (the value here is that you could then consistently git-ops by committing the keys to a git repository safely alongside all the other non-secret configs)

20:43 <iwanb[m]> <bgilbert> "iwanb: SELinux would be my guess..." <- I don't know much about SELinux, any good resource to debug this? I saw a couple open issues on the coreos tracker which seem to suggest it's not really possible to control the policies at the moment

20:44 <bgilbert> walters: that doesn't help iwanb[m]'s problem today though

20:44 <bgilbert> walters: also, I'm not sure storing the host's private key off-host is a good trade

20:44 <walters> yep, correct

20:45 <bgilbert> (i.e. it provides more vectors for a compromise)

20:46 <bgilbert> iwanb[m]: you should see AVC denials in the log if that's the problem. it should(?) just be a matter of setting the correct labels on your files

20:46 <iwanb[m]> The systemd credentials would be cleaner indeed but might not help with sshd not liking an alternate location

20:46 <bgilbert> though it's possible that an autorelabel would set them back? I'm not sure whether you'd need custom policy to avoid that. dustymabe/jlebon?

20:47 <bgilbert> iwanb[m]: walters' point was that, with that feature, you wouldn't need an alternate location - you'd just inject the keys via Ignition

20:48 <iwanb[m]> Ah true

20:48 <walters> Also related, with a https://github.com/containers/bootc/pull/30 model it'd be really easy to add something like --copy-dir /etc/ssh that would preserve the referenced directory across a re-install

20:48 <walters> since we're consistently operating at the filesystem level

20:52 bagasse has joined #fedora-coreos

20:52 <dustymabe> ravanell_: I'm done with the code review on https://github.com/coreos/coreos-assembler/pull/3298 - sorry it took me so long. Let me know if you have questions.

20:52 <dustymabe> bgilbert: is he wanting to put the host keys somewhere other than the default location?

20:53 <jlebon> dustymabe: for the smb issue, we couldn't get confirmation from fifofonix, but thinking we should go ahead and respin anyway

20:54 <jlebon> we just did the releases and the fixed kernel is already in the stable repo and in testing-devel

20:54 <dustymabe> jlebon: SGTM - bgilbert might have an opinion, but barring that might as well get the ball rolling. We still need a tracker issue to reference to start the process

20:55 <dustymabe> bgilbert: yeah, you probably just need to fix up the labels on the files if SELinux is the thing that is blocking you

20:55 <dustymabe> a `restorecon` might set it back, but there shouldn't be any process that auto runs that

20:56 <jlebon> re. iso-offline-install, no strong opinion there, but to me it makes sense that `iso-install` is what `iso-offline-install` currently is since it's really what 99% of the case what users will hit

20:56 <jlebon> naming them iso-offline-install and iso-online-install doesn't really convey that

20:56 <dustymabe> if he wanted to make sure it would survive a restorecon then he would need to set up a file context equivalency: https://danwalsh.livejournal.com/27571.html

20:56 <dustymabe> but we don't have semanage on FCOS

20:57 <dustymabe> jlebon: but currently there is no `iso-install` or `iso-online-install`

20:58 <jlebon> dustymabe: are you talking about pre or post renata's PR?

20:59 <jlebon> bgilbert dustymabe: agree, nothing should be restoring labels there automatically. there might be a bug in ignition though

20:59 <dustymabe> jlebon: i'm looking at the list in her PR. I don't know if it was there before

21:00 <jlebon> it's there now, but indeed it's missing from renata's PR

21:00 <jlebon> i'll add a comment there

21:01 <dustymabe> ahh OK

21:01 <jlebon> i think it's not there because it wasn't in the default set of scenarios we run, and the MVP was to match current defaults

21:01 <dustymabe> should there be a different in the defaults versus what is allowed?

21:01 <jlebon> i think eventually yes, but i'd rather not scope that in

21:02 <dustymabe> which means someone couldn't manually run an iso online install test once her PR merges

21:02 <jlebon> i was going to argue for adding it to the default set for now

21:02 <dustymabe> ahh +1

21:09 <bgilbert> iwanb[m]: see replies ^

21:09 <bgilbert> so setting the file labels should be enough

21:09 <iwanb[m]> Thanks, I'll try it out

21:10 <dustymabe> jlebon: finally got back to https://github.com/coreos/fedora-coreos-releng-automation/pull/165#discussion_r1065131380

21:10 <bgilbert> dustymabe: I'm okay with going ahead and respinning

21:11 <dustymabe> bgilbert: +1 - jlebon do you want to open a tracker issue we can reference for the ad-hoc spin or do you want me to?

21:11 <jlebon> dustymabe: i can open the ticket and you open the streams issues? :)

21:12 <dustymabe> Deal!

21:14 <dustymabe> jlebon: did we decide on full promotion versus just doing the kernel?

21:16 daMaestro has quit [Quit: Leaving]

21:16 <jlebon> dustymabe: maybe let's look at how much went in since and if it's not much, do a full promotion?

21:17 <dustymabe> 👍

21:17 <jlebon> cool with a cherry-pick too, it's just more work :)

21:20 <dustymabe> for f-c-c not much has gone in. that new kernel and one other package

21:21 <dustymabe> let me look at cosa

21:21 <jlebon> filed https://github.com/coreos/fedora-coreos-tracker/issues/1379

21:24 <dustymabe> should be fine to just do a normal promotion

21:27 <jlebon> +1 nice

21:36 * dustymabe switches locations

21:37 jpn has joined #fedora-coreos

21:43 jpn has quit [Ping timeout: 272 seconds]

21:43 daMaestro has joined #fedora-coreos

22:01 <iwanb[m]> <bgilbert> "so setting the file labels..." <- Got it working by setting the labels indeed, FYI one of the reasons it did not work was that I used the "directory" setting in the ignition file and that resets the labels

22:02 <bgilbert> ahh, okay

22:18 heldwin has joined #fedora-coreos

22:32 jpn has joined #fedora-coreos

22:37 jpn has quit [Ping timeout: 246 seconds]

22:56 Betal has joined #fedora-coreos

23:26 jpn has joined #fedora-coreos

23:29 nalind has quit [Quit: bye for now]

23:30 jpn has quit [Ping timeout: 246 seconds]

23:30 mheon has quit [Ping timeout: 252 seconds]

23:40 plarsen has quit [Quit: NullPointerException!]

23:42 daMaestro has quit [Quit: Leaving]

23:44 dwalsh has quit [Ping timeout: 252 seconds]