#fedora-coreos on 2023-06-28 — irc logs at libera.irclog.whitequark.org

2022-05-11 12:42 dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos

00:39 gregshomo[m] has joined #fedora-coreos

00:41 <gregshomo[m]> hello, world ! i've been unable to get the c9s-oscore container to boot (bootc switch, as per the bootc repo). after typing in the LUKS passphrase, well, that's all i get to do. i've had no troubles with the fedora-coreos container, so expectations are high ;) has anyone else experienced this ?

00:42 <gregshomo[m]> i suspect context may be important here, so i am working from F38 Sericea with the COPR for bootc layered in.

00:43 <gregshomo[m]> apologies if this is not the right-place for bootc questions. just figured i'd ask

00:46 spresti has joined #fedora-coreos

00:55 spresti has quit []

00:57 spresti has joined #fedora-coreos

01:10 spresti has quit [Remote host closed the connection]

01:14 vgoyal has quit [Quit: Leaving]

01:14 spresti has joined #fedora-coreos

01:19 spresti has quit [Ping timeout: 258 seconds]

01:39 <bgilbert> gregshomo[m]: this is the right place, I think

01:39 <bgilbert> I can't answer your question though :-P

01:42 <gregshomo[m]> bgilbert: fair enough. i'll test on a fresh install without LUKS tomorrow and see if that is the issue. in my perfect world, the fact that the fedora-coreos bits work is all i need to know -- but sometimes it's somebody else's ideal-world.

01:55 Guidon has quit [Server closed connection]

01:56 Guidon has joined #fedora-coreos

01:58 gursewak_ has joined #fedora-coreos

02:00 gursewak has quit [Ping timeout: 264 seconds]

02:00 spresti has joined #fedora-coreos

02:05 spresti has quit [Ping timeout: 246 seconds]

02:12 gursewak_ has quit [Ping timeout: 246 seconds]

02:43 gursewak_ has joined #fedora-coreos

02:48 spresti has joined #fedora-coreos

02:56 spresti has quit [Ping timeout: 260 seconds]

04:13 spresti has joined #fedora-coreos

04:18 spresti has quit [Ping timeout: 260 seconds]

04:28 gursewak_ has quit [Ping timeout: 240 seconds]

04:30 eballetbo has joined #fedora-coreos

04:37 daMaestro has quit [Quit: Leaving]

05:36 jpn has joined #fedora-coreos

05:40 sentenza has quit [Remote host closed the connection]

05:45 jcajka has joined #fedora-coreos

06:02 gursewak_ has joined #fedora-coreos

06:08 jpn has quit [Ping timeout: 240 seconds]

06:13 jpn has joined #fedora-coreos

06:52 saschagrunert has joined #fedora-coreos

07:06 jpn has quit [Ping timeout: 252 seconds]

07:10 jpn has joined #fedora-coreos

07:16 jpn has quit [Ping timeout: 260 seconds]

07:21 jpn has joined #fedora-coreos

07:26 jpn has quit [Ping timeout: 240 seconds]

07:30 jpn has joined #fedora-coreos

07:42 jpn has quit [Ping timeout: 260 seconds]

07:46 jpn has joined #fedora-coreos

07:50 jpn has quit [Ping timeout: 260 seconds]

07:53 apiaseck has joined #fedora-coreos

08:00 jpn has joined #fedora-coreos

08:05 jpn has quit [Ping timeout: 260 seconds]

08:42 daMaestro has joined #fedora-coreos

08:43 Betal has quit [Quit: WeeChat 4.0.0]

08:54 jpn has joined #fedora-coreos

08:59 jpn has quit [Ping timeout: 240 seconds]

09:48 jpn has joined #fedora-coreos

09:53 jpn has quit [Ping timeout: 264 seconds]

10:51 thomasfedb_ has left #fedora-coreos [#fedora-coreos]

11:16 spresti has joined #fedora-coreos

11:17 spresti has quit [Remote host closed the connection]

11:17 spresti has joined #fedora-coreos

11:34 vgoyal has joined #fedora-coreos

11:35 <gregshomo[m]> ok, i've just done that test and got this error:

11:35 <gregshomo[m]> error: ../../grub.core/kernel/efi/sb.c:102:bad shim signature.

11:35 <gregshomo[m]> error: ../../grub-core/loader/i386/efi/linux.c:258:you need to load the kernel first.

11:35 <gregshomo[m]> i suspect LUKS was hiding this from me last night.

11:58 <travier[m]> There are known issues with c9s & secure boot

12:06 jpn has joined #fedora-coreos

12:09 <gregshomo[m]> ok, last bit of data from me about this today. i was unable to 'bootc switch' to either of the images mentioned in the bootc-demo-images repository, but am able to 'bootc switch' to quay.io/fedora/fedora-coreos:stable

12:09 <gregshomo[m]> i'll drop an issue in that repo later today when i get the chance

12:09 <gregshomo[m]> thanks, all for the pointers !

12:13 jpn has quit [Ping timeout: 246 seconds]

12:18 baude has quit []

12:21 baude has joined #fedora-coreos

12:24 sgrunert has joined #fedora-coreos

12:26 saschagrunert has quit [Ping timeout: 250 seconds]

12:28 daMaestro has quit [Quit: Leaving]

12:43 vgoyal has quit [Remote host closed the connection]

12:46 sgrunert has quit [Remote host closed the connection]

13:01 jpn has joined #fedora-coreos

13:06 jpn has quit [Ping timeout: 250 seconds]

13:21 plarsen has joined #fedora-coreos

13:26 nalind has joined #fedora-coreos

13:43 plarsen has quit [Ping timeout: 240 seconds]

13:58 plarsen has joined #fedora-coreos

14:14 <baude> heya dustymabe got your systemd brain on ?>

14:15 <baude> ... and fcos

14:15 <baude> heya bgilbert long time no see

14:16 <bgilbert> baude: 👋

14:16 vrothberg has joined #fedora-coreos

14:17 <bgilbert> baude: any chance of a libhvee release with those changes?

14:18 <bgilbert> Dependabot tends not to bump dependencies when go.mod is pinned to a commit

14:19 <baude> as you wish ...

14:19 <bgilbert> ty

14:25 ravanell_ has quit [Remote host closed the connection]

14:26 <dustymabe> baude: 👋

14:26 <dustymabe> systemd is a part of my brain now

14:26 <dustymabe> systemd-braind

14:26 <dustymabe> i'm engaged

14:26 ravanelli has joined #fedora-coreos

14:26 <baude> dustymabe, vrothberg and i are looking at this boot issue we have with podman machine and fcos on super fast hw like the m2 pros

14:27 <baude> we have a systemd unit that sends an ACK over a socket/vsock/etc from the guest to host when it is "booted". we call this a ready socket

14:28 <baude> once the ACK is sent to the host, the host immediately starts interacting with the guest via ssh ... and the ssh auth fails

14:29 <baude> at first we thought maybe sshd was not running, but today vrothberg found some logging which suggested pam was at play as well

14:29 <baude> stopping here, am i making sense up to here?

14:30 <vrothberg> Yes, we see the following two logs in sshd:

14:30 <vrothberg> error: kex_exchange_identification: Connection closed by remote host

14:30 <vrothberg> fatal: Access denied for user core by PAM account configuration [preauth]

14:30 <vrothberg> 👋 dustymabe :)

14:31 <dustymabe> I mean I've definitely seen sometimes where trying to SSH in to a machine in early boot gives me a `Connection closed by remote host`. But pretty much always the next try works.

14:32 <dustymabe> Does it not ever succeed?

14:32 <baude> the errors causes podman to quit

14:32 <dustymabe> podman-machine on the host?

14:33 <baude> we dont do a backoff or anything, and we could do, but we want to try to diagnose and fix the problem

14:33 <baude> yes

14:33 <teuf> I'm also seeing this sometimes if I ssh too early

14:33 <teuf> kex_exchange_identification: Connection closed by remote host

14:33 <teuf> Connection closed by 127.0.0.1 port 2222

14:34 <vrothberg> teuf: yes, that looks like the same symptom.

14:34 <teuf> but yeah next ssh works, and since I'm ssh'ing manually, it's no big deal for me

14:34 <baude> it makes me think we are sending the ACK too early? and maybe a "latter" systemd service needs to be selected

14:35 <dustymabe> baude: yeah - are you ordering it after sshd is up?

14:36 <baude> vrothberg, can you fpaste the unit file we generate?

14:37 <vrothberg> https://github.com/containers/podman/blob/main/pkg/machine/qemu/machine.go#L334

14:39 <dustymabe> 👀

14:49 jpn has joined #fedora-coreos

14:54 jpn has quit [Ping timeout: 240 seconds]

14:57 <dustymabe> vrothberg: baude let me see if I can come up with a better unit for you to `After=`

14:57 <baude> dustymabe, tyvm ....

14:58 <vrothberg> Thanks for your time, dustymabe. FWIW, I tried with After=network-online.target but that didn't change things.

15:03 <dustymabe> baude: vrothberg: maybe systemd-user-sessions.service

15:04 <dustymabe> honestly I would think the `After=sshd.service` that you have would work

15:04 <dustymabe> we have some code that monitors logs to evaluate when we can SSH in and it just checks to see if sshd is started

15:05 <dustymabe> https://github.com/coreos/coreos-assembler/blob/main/mantle/cmd/kola/devshell.go#L515

15:05 <dustymabe> https://github.com/coreos/coreos-assembler/blob/28bf72f49e10699a6b3cb1aba21a3b5706289abb/mantle/cmd/kola/devshell.go#L566-L568

15:06 <dustymabe> https://github.com/coreos/coreos-assembler/blob/28bf72f49e10699a6b3cb1aba21a3b5706289abb/mantle/cmd/kola/devshell.go#L208-L212

15:06 <dustymabe> https://github.com/coreos/coreos-assembler/blob/28bf72f49e10699a6b3cb1aba21a3b5706289abb/mantle/cmd/kola/devshell.go#L367-L368

15:09 <dustymabe> this is also an important bit from the systemd.unit man page about `After=`:

15:09 <dustymabe> It depends on the unit type when precisely a unit has finished starting up. Most importantly, for service units start-up is considered completed for the purpose of Before=/After= when all its configured start-up commands have been invoked and they either failed or reported start-up success. Note that this does includes ExecStartPost= (or ExecStopPost= for the shutdown case).

15:10 <dustymabe> though sshd seems to be `Type=notify` so I think it should block

15:11 <dustymabe> and like you said the error you were getting seemed to be from PAM so maybe SSHD is fine but there is more stuff happening in the background before PAM can work?

15:12 <vrothberg> Thanks, that was quite helpful! We're also wondering about the PAM issue.

15:12 <dustymabe> vrothberg: try adding `systemd-user-sessions.service` to the After and see what you end up with

15:12 <vrothberg> Maybe we do need an initial back-off to ssh into machine.

15:12 copperi has joined #fedora-coreos

15:16 <travier[m]> You need to order after the nologin file has been removed

15:16 <travier[m]> /run/nologin

15:18 <travier[m]> systemd-user-sessions.service sound like a good candidate

15:19 <dustymabe> travier[m]: what removes that file?

15:21 <travier[m]> your suggestion (systemd-user-sessions.service) sounds like the most plausible one

15:21 <dustymabe> 👍

15:21 <travier[m]> from man 8 systemd-user-sessions.service

15:22 <travier[m]> so After=sshd.service & After=systemd-user-sessions.service should do it

15:23 nb7 has joined #fedora-coreos

15:25 nb has quit [Ping timeout: 250 seconds]

15:25 nb7 is now known as nb

15:32 <vrothberg> Thanks! I will try this out tomorrow (currently in meeting and workday is almost over).

15:50 bgilbert has quit [Ping timeout: 246 seconds]

15:55 apiaseck has quit [Ping timeout: 250 seconds]

15:56 apiaseck has joined #fedora-coreos

16:09 <dustymabe> spresti: I've got the meeting today (I think that's what we had talked about last time)

16:16 jcajka has quit [Quit: Leaving]

16:29 <dustymabe> aaradhak anthr76 apiaseck davdunc dustymabe guidon gursewak jaimelm jbrooks jcajka jdoss jlebon jmarrero lorbus miabbott nasirhm quentin9696[m] ravanelli saqali walters

16:29 <dustymabe> FCOS community meeting in #fedora-meeting-1

16:29 <dustymabe> If you don't want to be pinged remove your name from this file: https://github.com/coreos/fedora-coreos-tracker/blob/main/meeting-people.txt

16:31 jpn has joined #fedora-coreos

16:41 jpn has quit [Ping timeout: 250 seconds]

17:38 <dustymabe> Guidon: does the explanation make sense?

17:39 <Guidon> Yeah, I'm gonna test for the fingerprint.

17:39 <Guidon> One thing is for sure though

17:39 <Guidon> Running aws-cli in podman takes at least 2/3 seconds per rnu

17:40 <Guidon> Compared to 0.5s natively.

17:40 <dustymabe> I'd expect that on the first run.. maybe not on subsequent runs

17:40 <Guidon> No, on all runs, that I am sure

17:40 <Guidon> I'm gonna do more tests

17:40 <dustymabe> when you run things "rootless" a lot of times podman needs to copy the contents to the filesystem and then rewrite the uid/gid on the files from the inside

17:41 <Guidon> With different images also

17:41 <dustymabe> so another test would be running the containers as root (i.e. sudo podman) and compare as well

17:41 <Guidon> Yeah, we run as root some of our processes

17:41 <dustymabe> +1

17:41 <dustymabe> lunch time for me

17:43 <Guidon> There are many factors that could impact the performance, but I see consistently runs takes 2/3s, even for aws --help, suggesting that’s the podman overhead. Disabling namespacing does not improve the situation much.

17:45 <Guidon> It’s not a huge deal, but that can makes things tricky for us and boot time very slow. And parallelization is directly limited by the size of the RAM (t3.nano can only run about 3 aws containers at the same time before freezing)

17:49 jpn has joined #fedora-coreos

17:58 gursewak_ has quit [Ping timeout: 264 seconds]

18:05 saschagrunert has joined #fedora-coreos

18:11 saschagrunert has quit [Remote host closed the connection]

18:14 sentenza has joined #fedora-coreos

18:18 gursewak_ has joined #fedora-coreos

18:24 GingerGeek has quit [Server closed connection]

18:25 GingerGeek has joined #fedora-coreos

18:27 <dustymabe> Guidon: maybe taking some of this to the podman team would be useful.. it may be expected (they already know about it and it's explainable) or it may be surprising and they'd be interested in more information

18:27 <Guidon> Sure, will do. Thanks

18:33 <dustymabe> Guidon++

18:33 <dustymabe> they hang out in #podman

18:43 spresti has quit [Remote host closed the connection]

18:43 spresti has joined #fedora-coreos

18:51 jpn has quit [Ping timeout: 260 seconds]

18:55 gursewak_ is now known as gursewak

19:07 copperi has quit [Quit: Konversation terminated!]

19:13 baude has quit [Quit: Leaving]

19:18 mheon has joined #fedora-coreos

19:30 <jdoss> Before I open an issue for this on rpm-ostree, I think I either found a bug or a limitation with FCOS layering when trying to use a podman manifest to represent an amd64 and aarch64 FCOS layer so I can run both archs on baremetal.

19:31 <jdoss> I am doing podman build --platform linux/amd64,linux/arm64 --build-arg FCOS_CHANNEL=stable--manifest ${FCOS_CONTAINER_NAME} . with this Containerfile

19:31 <jdoss> https://www.irccloud.com/pastebin/kiJGxgMh/

19:32 <dustymabe> jdoss: lines 6 and 7 can be combined into a single line I think

19:32 <dustymabe> RUN rpm-ostree install iftop ipmitool minicom net-tools pam_yubico pciutils strace sysstat systemd-oomd-defaults tcpdump vim-default-editor vim-enhanced unzip --uninstall nano-default-editor

19:32 <jdoss> and then when I inspect the manifest podman manifest inspect localhost/fcos-base:stable

19:33 <jdoss> I get this nastyness

19:33 <jdoss> https://www.irccloud.com/pastebin/qvM3lTd0/

19:34 <jdoss> dustymabe: neat! thanks for that

19:36 <jdoss> and when I check the arch on each respective container arch it just has the aarch64 arch which is the last arch built in my script.

19:36 <jdoss> https://www.irccloud.com/pastebin/AkcTqxyb/

19:38 <jdoss> so I assume that manifest support is not a thing in FCOS layering. walters would a GH issue be good for the above as a feature request if that is the case or is this a bug?

19:38 <dustymabe> jdoss: all the tools you've described are podman related

19:38 <dustymabe> possibly some issue with those tools?

19:39 <jdoss> I thought maybe too but I built a dummy container (golang program) and it manifests work as expected.

19:40 <jdoss> err not golang based, I just used alpine and made a super simple container

19:40 <jdoss> https://www.irccloud.com/pastebin/s7VJ5SDv/

19:41 <dustymabe> maybe try with something like:

19:41 <dustymabe> FROM registry.fedoraproject.org/fedora:38

19:41 <dustymabe> RUN arch > /etc/arch

19:42 <dustymabe> ^^ will probably work

19:43 <dustymabe> it could be a problem with there being multiple container layers

19:44 <dustymabe> so maybe toolbox would be a good example of one to try

19:45 <dustymabe> FROM registry.fedoraproject.org/fedora-toolbox:38

19:45 <dustymabe> RUN arch > /etc/arch

19:46 spresti has quit [Remote host closed the connection]

19:46 spresti has joined #fedora-coreos

19:47 jorti[m] has quit [Server closed connection]

19:47 <jdoss> dustymabe yeah, manifests work as expected when using F38 too. https://www.irccloud.com/pastebin/yTBwRpgh/

19:47 <jdoss> I don't think this a podman issue TBH.

19:48 jorti[m] has joined #fedora-coreos

19:48 <dustymabe> try with toolbox now

19:48 <jdoss> I guess I am not following. How would that be different?

19:48 <dustymabe> toolbox is a container with multiple layers

19:48 <dustymabe> alpine isn't and neither is fedora:38

19:50 <jdoss> Yep works as expected.

19:51 <jdoss> https://www.irccloud.com/pastebin/7bvn8PNf/

19:51 <dustymabe> hmm

19:51 <dustymabe> I would open an FCOS issue tracker ticket - we can transfer it if necessary

19:51 <jdoss> OK will do

19:59 gursewak has quit [Remote host closed the connection]

19:59 gursewak has joined #fedora-coreos

20:03 gursewak has quit [Ping timeout: 246 seconds]

20:07 <dustymabe> darknao: when will the code that has been merged in the docs gitlab make it to prod? it's been a few weeks but we still don't see the ppc64le options on the coreos download page

20:08 <dustymabe> https://gitlab.com/fedora/websites-apps/fedora-websites/fedora-websites-3.0/-/merge_requests/780

20:08 <jdoss> dustymabe: https://github.com/coreos/fedora-coreos-tracker/issues/1520 done and done

20:13 ravanelli has quit [Read error: Connection reset by peer]

20:14 ravanelli has joined #fedora-coreos

20:15 jpn has joined #fedora-coreos

20:20 jpn has quit [Ping timeout: 252 seconds]

20:27 spresti has quit [Remote host closed the connection]

20:27 spresti has joined #fedora-coreos

20:33 bgilbert has joined #fedora-coreos

20:57 <darknao> dustymabe: what do you mean? it's right here: https://fedoraproject.org/coreos/download/?stream=next&arch=ppc64le#download_section

21:02 <dustymabe> darknao: oh interesting..

21:02 <dustymabe> I tried in a different browser and you are right

21:02 <dustymabe> does FF cache a website for like weeks on end?

21:03 <dustymabe> in my one FF instance I still see the old behavior (no ppc64le)

21:03 <dustymabe> but it does have the new releases (from 1 day ago)

21:04 <dustymabe> ok yeah. hard refresh fixed it

21:04 <dustymabe> sorry for the noise darknao

21:15 spresti has quit [Remote host closed the connection]

21:15 spresti has joined #fedora-coreos

21:27 gursewak has joined #fedora-coreos

21:35 spresti has quit [Remote host closed the connection]

21:35 <jdoss> dustymabe: I remember now why I had the uninstall before the install line in my Containerfile for my layer

21:36 <jdoss> https://www.irccloud.com/pastebin/NOEK7V5V/

21:36 nalind has quit [Quit: bye for now]

21:36 spresti has joined #fedora-coreos

21:41 pbrobinson has quit [Server closed connection]

21:41 pbrobinson has joined #fedora-coreos

21:42 jpn has joined #fedora-coreos

21:49 jpn has quit [Ping timeout: 246 seconds]

22:17 apiaseck has quit [Quit: Konversation terminated!]

22:20 mheon has quit [Ping timeout: 240 seconds]

22:44 plarsen has quit [Remote host closed the connection]

22:55 spresti has quit [Remote host closed the connection]

22:56 gursewak has quit [Ping timeout: 260 seconds]