<gregshomo[m]>
hello, world ! i've been unable to get the c9s-oscore container to boot (bootc switch, as per the bootc repo). after typing in the LUKS passphrase, well, that's all i get to do. i've had no troubles with the fedora-coreos container, so expectations are high ;) has anyone else experienced this ?
<gregshomo[m]>
i suspect context may be important here, so i am working from F38 Sericea with the COPR for bootc layered in.
<gregshomo[m]>
apologies if this is not the right-place for bootc questions. just figured i'd ask
spresti has joined #fedora-coreos
spresti has quit []
spresti has joined #fedora-coreos
spresti has quit [Remote host closed the connection]
vgoyal has quit [Quit: Leaving]
spresti has joined #fedora-coreos
spresti has quit [Ping timeout: 258 seconds]
<bgilbert>
gregshomo[m]: this is the right place, I think
<bgilbert>
I can't answer your question though :-P
<gregshomo[m]>
bgilbert: fair enough. i'll test on a fresh install without LUKS tomorrow and see if that is the issue. in my perfect world, the fact that the fedora-coreos bits work is all i need to know -- but sometimes it's somebody else's ideal-world.
Guidon has quit [Server closed connection]
Guidon has joined #fedora-coreos
gursewak_ has joined #fedora-coreos
gursewak has quit [Ping timeout: 264 seconds]
spresti has joined #fedora-coreos
spresti has quit [Ping timeout: 246 seconds]
gursewak_ has quit [Ping timeout: 246 seconds]
gursewak_ has joined #fedora-coreos
spresti has joined #fedora-coreos
spresti has quit [Ping timeout: 260 seconds]
spresti has joined #fedora-coreos
spresti has quit [Ping timeout: 260 seconds]
gursewak_ has quit [Ping timeout: 240 seconds]
eballetbo has joined #fedora-coreos
daMaestro has quit [Quit: Leaving]
jpn has joined #fedora-coreos
sentenza has quit [Remote host closed the connection]
jcajka has joined #fedora-coreos
gursewak_ has joined #fedora-coreos
jpn has quit [Ping timeout: 240 seconds]
jpn has joined #fedora-coreos
saschagrunert has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 240 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
apiaseck has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
daMaestro has joined #fedora-coreos
Betal has quit [Quit: WeeChat 4.0.0]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 240 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 264 seconds]
thomasfedb_ has left #fedora-coreos [#fedora-coreos]
spresti has joined #fedora-coreos
spresti has quit [Remote host closed the connection]
spresti has joined #fedora-coreos
vgoyal has joined #fedora-coreos
<gregshomo[m]>
ok, i've just done that test and got this error:
<gregshomo[m]>
error: ../../grub-core/loader/i386/efi/linux.c:258:you need to load the kernel first.
<gregshomo[m]>
i suspect LUKS was hiding this from me last night.
<travier[m]>
There are known issues with c9s & secure boot
jpn has joined #fedora-coreos
<gregshomo[m]>
ok, last bit of data from me about this today. i was unable to 'bootc switch' to either of the images mentioned in the bootc-demo-images repository, but am able to 'bootc switch' to quay.io/fedora/fedora-coreos:stable
<gregshomo[m]>
i'll drop an issue in that repo later today when i get the chance
<gregshomo[m]>
thanks, all for the pointers !
jpn has quit [Ping timeout: 246 seconds]
baude has quit []
baude has joined #fedora-coreos
sgrunert has joined #fedora-coreos
saschagrunert has quit [Ping timeout: 250 seconds]
daMaestro has quit [Quit: Leaving]
vgoyal has quit [Remote host closed the connection]
sgrunert has quit [Remote host closed the connection]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 250 seconds]
plarsen has joined #fedora-coreos
nalind has joined #fedora-coreos
plarsen has quit [Ping timeout: 240 seconds]
plarsen has joined #fedora-coreos
<baude>
heya dustymabe got your systemd brain on ?>
<baude>
... and fcos
<baude>
heya bgilbert long time no see
<bgilbert>
baude: π
vrothberg has joined #fedora-coreos
<bgilbert>
baude: any chance of a libhvee release with those changes?
<bgilbert>
Dependabot tends not to bump dependencies when go.mod is pinned to a commit
<baude>
as you wish ...
<bgilbert>
ty
ravanell_ has quit [Remote host closed the connection]
<dustymabe>
baude: π
<dustymabe>
systemd is a part of my brain now
<dustymabe>
systemd-braind
<dustymabe>
i'm engaged
ravanelli has joined #fedora-coreos
<baude>
dustymabe, vrothberg and i are looking at this boot issue we have with podman machine and fcos on super fast hw like the m2 pros
<baude>
we have a systemd unit that sends an ACK over a socket/vsock/etc from the guest to host when it is "booted". we call this a ready socket
<baude>
once the ACK is sent to the host, the host immediately starts interacting with the guest via ssh ... and the ssh auth fails
<baude>
at first we thought maybe sshd was not running, but today vrothberg found some logging which suggested pam was at play as well
<baude>
stopping here, am i making sense up to here?
<vrothberg>
Yes, we see the following two logs in sshd:
<vrothberg>
error: kex_exchange_identification: Connection closed by remote host
<vrothberg>
fatal: Access denied for user core by PAM account configuration [preauth]
<vrothberg>
π dustymabe :)
<dustymabe>
I mean I've definitely seen sometimes where trying to SSH in to a machine in early boot gives me a `Connection closed by remote host`. But pretty much always the next try works.
<dustymabe>
Does it not ever succeed?
<baude>
the errors causes podman to quit
<dustymabe>
podman-machine on the host?
<baude>
we dont do a backoff or anything, and we could do, but we want to try to diagnose and fix the problem
<baude>
yes
<teuf>
I'm also seeing this sometimes if I ssh too early
<teuf>
kex_exchange_identification: Connection closed by remote host
<teuf>
Connection closed by 127.0.0.1 port 2222
<vrothberg>
teuf: yes, that looks like the same symptom.
<teuf>
but yeah next ssh works, and since I'm ssh'ing manually, it's no big deal for me
<baude>
it makes me think we are sending the ACK too early? and maybe a "latter" systemd service needs to be selected
<dustymabe>
baude: yeah - are you ordering it after sshd is up?
<baude>
vrothberg, can you fpaste the unit file we generate?
<dustymabe>
this is also an important bit from the systemd.unit man page about `After=`:
<dustymabe>
It depends on the unit type when precisely a unit has finished starting up. Most importantly, for service units start-up is considered completed for the purpose of Before=/After= when all its configured start-up commands have been invoked and they either failed or reported start-up success. Note that this does includes ExecStartPost= (or ExecStopPost= for the shutdown case).
<dustymabe>
though sshd seems to be `Type=notify` so I think it should block
<dustymabe>
and like you said the error you were getting seemed to be from PAM so maybe SSHD is fine but there is more stuff happening in the background before PAM can work?
<vrothberg>
Thanks, that was quite helpful! We're also wondering about the PAM issue.
<dustymabe>
vrothberg: try adding `systemd-user-sessions.service` to the After and see what you end up with
<vrothberg>
Maybe we do need an initial back-off to ssh into machine.
copperi has joined #fedora-coreos
<travier[m]>
You need to order after the nologin file has been removed
<travier[m]>
/run/nologin
<travier[m]>
systemd-user-sessions.service sound like a good candidate
<dustymabe>
travier[m]: what removes that file?
<travier[m]>
your suggestion (systemd-user-sessions.service) sounds like the most plausible one
<dustymabe>
π
<travier[m]>
from man 8 systemd-user-sessions.service
<travier[m]>
so After=sshd.service & After=systemd-user-sessions.service should do it
nb7 has joined #fedora-coreos
nb has quit [Ping timeout: 250 seconds]
nb7 is now known as nb
<vrothberg>
Thanks! I will try this out tomorrow (currently in meeting and workday is almost over).
bgilbert has quit [Ping timeout: 246 seconds]
apiaseck has quit [Ping timeout: 250 seconds]
apiaseck has joined #fedora-coreos
<dustymabe>
spresti: I've got the meeting today (I think that's what we had talked about last time)
<dustymabe>
Guidon: does the explanation make sense?
<Guidon>
Yeah, I'm gonna test for the fingerprint.
<Guidon>
One thing is for sure though
<Guidon>
Running aws-cli in podman takes at least 2/3 seconds per rnu
<Guidon>
Compared to 0.5s natively.
<dustymabe>
I'd expect that on the first run.. maybe not on subsequent runs
<Guidon>
No, on all runs, that I am sure
<Guidon>
I'm gonna do more tests
<dustymabe>
when you run things "rootless" a lot of times podman needs to copy the contents to the filesystem and then rewrite the uid/gid on the files from the inside
<Guidon>
With different images also
<dustymabe>
so another test would be running the containers as root (i.e. sudo podman) and compare as well
<Guidon>
Yeah, we run as root some of our processes
<dustymabe>
+1
<dustymabe>
lunch time for me
<Guidon>
There are many factors that could impact the performance, but I see consistently runs takes 2/3s, even for aws --help, suggesting thatβs the podman overhead. Disabling namespacing does not improve the situation much.
<Guidon>
Itβs not a huge deal, but that can makes things tricky for us and boot time very slow. And parallelization is directly limited by the size of the RAM (t3.nano can only run about 3 aws containers at the same time before freezing)
jpn has joined #fedora-coreos
gursewak_ has quit [Ping timeout: 264 seconds]
saschagrunert has joined #fedora-coreos
saschagrunert has quit [Remote host closed the connection]
sentenza has joined #fedora-coreos
gursewak_ has joined #fedora-coreos
GingerGeek has quit [Server closed connection]
GingerGeek has joined #fedora-coreos
<dustymabe>
Guidon: maybe taking some of this to the podman team would be useful.. it may be expected (they already know about it and it's explainable) or it may be surprising and they'd be interested in more information
<Guidon>
Sure, will do. Thanks
<dustymabe>
Guidon++
<dustymabe>
they hang out in #podman
spresti has quit [Remote host closed the connection]
spresti has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
gursewak_ is now known as gursewak
copperi has quit [Quit: Konversation terminated!]
baude has quit [Quit: Leaving]
mheon has joined #fedora-coreos
<jdoss>
Before I open an issue for this on rpm-ostree, I think I either found a bug or a limitation with FCOS layering when trying to use a podman manifest to represent an amd64 and aarch64 FCOS layer so I can run both archs on baremetal.
<jdoss>
I am doing podman build --platform linux/amd64,linux/arm64 --build-arg FCOS_CHANNEL=stable--manifest ${FCOS_CONTAINER_NAME} . with this Containerfile
<jdoss>
so I assume that manifest support is not a thing in FCOS layering. walters would a GH issue be good for the above as a feature request if that is the case or is this a bug?
<dustymabe>
jdoss: all the tools you've described are podman related
<dustymabe>
possibly some issue with those tools?
<jdoss>
I thought maybe too but I built a dummy container (golang program) and it manifests work as expected.
<jdoss>
err not golang based, I just used alpine and made a super simple container
<dustymabe>
I would open an FCOS issue tracker ticket - we can transfer it if necessary
<jdoss>
OK will do
gursewak has quit [Remote host closed the connection]
gursewak has joined #fedora-coreos
gursewak has quit [Ping timeout: 246 seconds]
<dustymabe>
darknao: when will the code that has been merged in the docs gitlab make it to prod? it's been a few weeks but we still don't see the ppc64le options on the coreos download page