<dustymabe>
failing in basic checks.. no new packages so must be something changed with f-c-c and your commits are the only thing that have been merged recently
<dustymabe>
could also be something in cosa changed
<lucab>
dustymabe: ack, just let me login to see that
<dustymabe>
lucab++
<dustymabe>
if it is something related to your commits I'd be interested to know why CI failed us in this case
<walters>
dustymabe: note that traditional rpm also has this semantic...it's because post scripts are just regular shell scripts and no one uses `set -e` there
<walters>
we could of course inject `set -e`, and if we do that probably want `-o pipefail` too but...the fallout from such a thing is hard to predict
<jlebon>
yup exactly. rpm-ostree matches rpm in that respect AFAIK.
<jlebon>
dustymabe: do we have something filed yet? last time that group lock issue came up, it turned out to be due to something else that went wrong earlier IIRC
<dustymabe>
jlebon: right - it's a failure during the compose
<dustymabe>
so the lock is left around and baked into the image
<dustymabe>
I don't think we have anything filed in the tracker repo other than the pieces luca linked to earlier
<dustymabe>
lucab: could you file something in the tracker repo?
<dustymabe>
walters: correct - we are matching rpm semantics here. I'm wondering if we could add an option to be stricter and then start to try to work on the package set we care about to get into non failing compliance
<dustymabe>
i.e. work towards a more sane future at least for ourselves
<dustymabe>
some of these failures may be expected, but some of them may be because we are using rpm-ostree and we want to know where we differ if we differ
<dustymabe>
IMO
<walters>
I am not sure if this was intentional but I believe the reason traditional RPM does not abort on script failure is because for its model of "live, in place updates", you can't really undo and it's not transactional, so the idea is to just stumble forward and hope for the best instead
<walters>
but clearly for generating new install roots, we could require strictness
<dustymabe>
walters: indeed. This is where we could be better
<dustymabe>
and I doubt package maintainers would object too hard to PRs that enhace their scriptlets
<dustymabe>
i guess 1st we'd need to support the enhancement in `rpm-ostree` and then we could go from there
<walters>
yeah let's track as an rpm-ostree issue? I know we have a lot but some do get fixed 😄
<jlebon>
hmm, shouldn't this be a proposed packaging guidelines change instead? it's valid to not use bash strict mode.
<jlebon>
and i wouldn't be surprised if many scriptlets are already relying on that fact
<walters>
yeah agree
<dustymabe>
jlebon: I think there are two ways to approach it
<dustymabe>
one is to add a check/allowlist on our side so we can incrementally implement the feature and file/track fixes upstream
<dustymabe>
the other is to enforce it at the distro level, trying to get all packages to update
<dustymabe>
we should probably do both
<dustymabe>
I doubt we'll get RPM to change semantics itself (as colin mentioned earlier, there's nothing rpm can do because it can't roll back)
<walters>
librpm could do the same thing in the case where it's constructing a new root, it's the same thing
<dustymabe>
walters: correct. I don't know if it would do some sort of detection there or if the user would need to tell it
<dustymabe>
I guess that could be hosted into librpm if we wanted to try to go that path
<dustymabe>
hoisted*
<lucab>
I don't know if it is related to our flake, but I noticed that systemd-sysusers seems to also be buggy on the topic of shadow locking: https://github.com/systemd/systemd/issues/23977
<dustymabe>
+1
<dustymabe>
lucab: can you open a tracker issue for us so we can track when this happens in our pipeline
<dustymabe>
at least when it does happen the tests fail so we won't leak any bad builds
<jlebon>
saqali: but note users can write anything they want there by editing the Ignition config directly
<saqali>
yep and that is not officially supported
<vgoyal>
Trying to use fedora core os for the first time. Trying to boot an image using qemu. As per documentation trying to prepare a ignition file. Seeting a password_hash for user "core"
<justJanne>
vgoyal: yes, that’s intended, the password is salted
<vgoyal>
so how does verification work when image boots.
<vgoyal>
When I enter password, I am assuming it will generate hash and try to match with the one I passed in ignotion file. And if hash generated is different everytime, how does this hash match.
<justJanne>
A salted hash works like this: you generate a random string, the salt. Then you return salt + . + hash(salt + password).
<justJanne>
When comparing passwords, you split by . into salt and the salted hash. Then you can compute hash(salt+password) and compare that with the salted hash
<vgoyal>
justJanne: aha, thanks for the explanation. So by looking at the hash, it can be figured out what's the salt and use that to create hash again with the password and that time it should result in same hash.
<justJanne>
exactly
* vgoyal
will go through wikipedia page as well.
<justJanne>
most password schemes use first a $, then the identifier for the algorithm (b for bcrypt, y for yescrypt, etc), then another $, the parameters that need to be used to compare the password, another $, the salt, another $ and the password
<justJanne>
*hashed password
<justJanne>
e.g., $y$j9T$hJdKZIuXD3SoFBbK8Prmz0$5WAcPyIsGS1JSjEogI5gtjA31OLAOob.xDwpm20S8p6 is likely y as algorith, j9T as parameters, hJdKZIuXD3SoFBbK8Prmz0 as salt and 5WAcPyIsGS1JSjEogI5gtjA31OLAOob.xDwpm20S8p6 as salted hash
<justJanne>
the $y$j9T$ part tells the system which algorithm to use for comparing hashes
<vgoyal>
got it. So all the information needed to create same hash from password is part of generated hash
<justJanne>
exactly :)
<jlebon>
walters: i'm inclined to drop the whole "What's Changed" section. otherwise LGTM!
<dustymabe>
justJanne++
<justJanne>
So, I’ve got an issue where I feel like I’m doing something so wrong it has to be embarassing
<justJanne>
I’m installing that with quay.io/coreos/coreos-installer:release (running this on bare metal)
<justJanne>
the only further change I do is removing rm /mnt/boot/EFI/fedora/BOOTX64.CSV, to avoid fcos from altering the default boot order (as otherwise the pxe-boot for the rescue system doesn’t work anymore)
<justJanne>
it boots fine. Everything seems to work, in theory
<justJanne>
except, I can’t log in via ssh
<justJanne>
all I get is `op=PAM:bad_ident`
<justJanne>
ah, with that simplified setup it actually works, let’s see if it still breaks with raid1 booting or if something else was the issue
<jlebon>
walters: ok, i stamped a bunch of PRs, but didn't merge them so they get in after the release
tormath1 has quit [Quit: leaving]
<justJanne>
without boot_device: mirror: ..., fcos boots in 40 seconds from first install to everything working + ssh login possible
<justJanne>
with boot_device: mirror: ..., even after 7 minutes it’s not even listening on port 22
<justJanne>
everything else exactly identical
Betal has joined #fedora-coreos
vgoyal has quit [Quit: Leaving]
<dustymabe>
justJanne: how large are your disks?
<justJanne>
2x 1TB NVMe, in raid1 as boot disks, plus 2x 4TB HDD, in raid1 as data disks.
<justJanne>
I shouldn’t be affected by the MBR/2TB issue
<dustymabe>
justJanne: anything happening on the console (serial and/or VGA) of the machine? any errors that you see?
<justJanne>
dustymabe: I have neither serial nor VGA access
<dustymabe>
well that makes things difficult :)
<justJanne>
I usually reboot back into rescue via the API and pick apart the system.journal from the ostree manually
<dustymabe>
IOW it's mostly a black box
<justJanne>
it’s a dedicated server located remotely for which I’ve got only one API call, which is basically `function reboot(rescue: boolean)`
Betal has quit [Ping timeout: 260 seconds]
<justJanne>
I can request VGA console access, but as the hoster has a limited amount of those, that quickly tends to get pricy :)
<dustymabe>
justJanne: only thing I can recommend is trying to reproduce this with a similar hardware setup in an environment where you do have access to those basic debugging tools
<justJanne>
dustymabe: I’m running fcos in multiple places just fine, it seems like it’s an issue with the raid config, which I can’t easily replicate elsewhere
jpn has quit [Ping timeout: 260 seconds]
<dustymabe>
the problem is that if it's an issue in Ignition then there won't be any logs on the disk
<dustymabe>
i.e. Ignition fails, system goes to emergency.target, never switches to the real root, journal logs never get persisted to disk
<justJanne>
ah the entire filesystem on root is irreparably broken
<justJanne>
I don’t know why, but I guess it is
<dustymabe>
I mean, if it fails provisioning the disk then that's not surprising I don't guess. We just don't know how it is failing without the console :(
vgoyal has joined #fedora-coreos
<justJanne>
I’ve got to be honest, the whole situation with how the ecosystem broke apart after the acquisition is really painful
<justJanne>
flatcar boots, but can’t do boot-disk raid. fcos doesn’t always provision despite the config passing all checks, but at least in theory it should work.
jpn has joined #fedora-coreos
<dustymabe>
I'm struggling to find a constructive response to that last comment
<dustymabe>
software is never going to #justwork all the time. In the cases it doesn't you need debugging tools. The lack of those tools is going to make your life hard if there are every any unforseen issues.
<justJanne>
I don’t think there’s an actual bug here, I think I just made a stupid mistake, which neither I nor the linters are able to catch
jpn has quit [Ping timeout: 260 seconds]
jpn has joined #fedora-coreos
Betal has joined #fedora-coreos
<dustymabe>
also possible..
<dustymabe>
only other thing I can think of is for you to run the OS off of one of the disks and then pass through the remaining disks directly to a FCOS VM and watch the whole process on the console of the VM (via your SSH connection to the host)
<dustymabe>
i.e. Fedora or Ubuntu or FCOS or whatever host with libvirt/kvm installed then create an FCOS VM with the disks passed through to it where you run the install process (and watch Ignition run/fail)
<justJanne>
tl;dr: if you try to mount a new filesystem to a folder that doesn’t exist *yet*, it never actually completes booting or creating that folder. I’ll try further to figure out more of the details
<justJanne>
mounting to /var/lib/container-data doesn’t work and fails booting
<justJanne>
let’s see what’s the actual reason behind it, and how to work around it
<justJanne>
(for reference, this was a relatively standard setup for persistent container data in old containerlinux)