#fedora-coreos on 2023-05-26 — irc logs at libera.irclog.whitequark.org

2022-05-11 12:42 dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos

00:32 daMaestro has joined #fedora-coreos

01:05 chrish136 has joined #fedora-coreos

01:32 misuto has quit [Remote host closed the connection]

01:32 misuto has joined #fedora-coreos

01:35 vgoyal has quit [Quit: Leaving]

01:36 jpn has joined #fedora-coreos

01:41 jpn has quit [Ping timeout: 240 seconds]

01:51 jlebon has quit [Quit: leaving]

02:36 misuto has quit [Remote host closed the connection]

02:36 misuto has joined #fedora-coreos

02:38 misuto has quit [Remote host closed the connection]

02:38 misuto has joined #fedora-coreos

02:48 ravanelli has quit [Remote host closed the connection]

03:18 ravanelli has joined #fedora-coreos

03:24 jpn has joined #fedora-coreos

03:29 jpn has quit [Ping timeout: 240 seconds]

03:30 ravanelli has quit [Ping timeout: 240 seconds]

04:24 jpn has joined #fedora-coreos

04:28 jpn has quit [Ping timeout: 240 seconds]

04:42 ebbex has quit [Remote host closed the connection]

04:46 dustymabe has quit [Ping timeout: 268 seconds]

04:47 dustymabe has joined #fedora-coreos

05:01 jpn has joined #fedora-coreos

05:06 jpn has quit [Ping timeout: 256 seconds]

05:27 sentenza has quit [Remote host closed the connection]

06:01 jpn has joined #fedora-coreos

06:15 saschagrunert has joined #fedora-coreos

06:31 jpn has quit [Ping timeout: 240 seconds]

06:34 bgilbert has quit [Ping timeout: 240 seconds]

06:36 jpn has joined #fedora-coreos

06:42 misuto has quit [Remote host closed the connection]

06:43 misuto has joined #fedora-coreos

07:00 daMaestro has quit [Quit: Leaving]

07:07 jpn has quit [Ping timeout: 246 seconds]

07:49 jpn has joined #fedora-coreos

07:50 apiaseck has joined #fedora-coreos

07:54 jpn has quit [Ping timeout: 246 seconds]

08:02 jcajka has joined #fedora-coreos

08:16 c4rt0 has joined #fedora-coreos

08:20 apiaseck has quit [Ping timeout: 268 seconds]

08:20 c4rt0 is now known as apiaseck

08:34 Betal has quit [Quit: WeeChat 3.8]

08:47 jpn has joined #fedora-coreos

08:52 apiaseck has quit [Ping timeout: 256 seconds]

08:55 apiaseck has joined #fedora-coreos

09:02 uny[m] has quit [Remote host closed the connection]

09:10 ravanelli has joined #fedora-coreos

09:15 ravanelli has quit [Ping timeout: 240 seconds]

09:54 apiaseck has quit [Ping timeout: 240 seconds]

10:03 apiaseck has joined #fedora-coreos

10:42 misuto has quit [Remote host closed the connection]

10:42 misuto has joined #fedora-coreos

11:19 vgoyal has joined #fedora-coreos

11:59 plarsen has joined #fedora-coreos

12:12 jpn has quit [Ping timeout: 248 seconds]

12:31 nalind has joined #fedora-coreos

12:45 ravanelli has joined #fedora-coreos

12:48 jpn has joined #fedora-coreos

12:52 jpn has quit [Ping timeout: 240 seconds]

12:55 jlebon has joined #fedora-coreos

13:00 saschagrunert has quit [Remote host closed the connection]

13:05 jpn has joined #fedora-coreos

13:06 ravanelli has quit [Remote host closed the connection]

13:06 ravanelli has joined #fedora-coreos

13:10 ravanelli has quit [Remote host closed the connection]

13:56 jcajka has quit [Quit: Leaving]

13:59 <travier[m]> https://github.com/fedora-silverblue/issue-tracker/issues/470 > Might impact rollbacks on FCOS as well

14:04 <jlebon> travier[m]: i.e. user does a fresh install of FCOS 39 and explicitly deploys FCOS 38 and reboots into it? not sure we need to go out of our way to support that

14:20 ravanelli has joined #fedora-coreos

14:25 ravanelli has quit [Remote host closed the connection]

14:26 ravanelli has joined #fedora-coreos

14:26 ravanelli has quit [Remote host closed the connection]

14:28 <dustymabe> jlebon: I agree. Though it does often happen where we've switched to FN in FCOS but OKD rebases to FN-1, so this would probably come up there. I'm not saying we should support this (TBH it's almost impossible to support it), just saying it's probably going to happen and we should know how to respond to issues that get reported.

14:32 ravanelli has joined #fedora-coreos

14:41 <travier[m]> hum, indeed, it's not a rollback, it's a pure downgrade

14:44 <jlebon> dustymabe: hmm, I thought OKD referenced its own bootimages, like RHCOS?

14:44 <travier[m]> Agree that we don't want to support that. I had misunderstood that as impacting F X+1 -> F X rollbakcs

14:44 <travier[m]> OKD has it's own boot images AFAIK

14:45 <travier[m]> s/rollbakcs/rollbacks like the one we had for F38/

14:45 <travier[m]> s/it's/its/

14:45 <jlebon> travier[m]: +1

14:46 bgilbert has joined #fedora-coreos

14:54 <bgilbert> there are going to be a few PRs landing in repo-templates today, so I'll let them accumulate in the downstream repos and merge at the end

14:57 <dustymabe> bgilbert: +1

14:57 <dustymabe> travier[m]: jlebon: I'm thinking of UPI OKD

14:58 <dustymabe> but there are probably documentation steps that the user which version of FCOS to grab? I don't know

14:58 <dustymabe> the only reason I'm saying something is because I recently hit an issue: https://github.com/okd-project/okd/issues/1607#issuecomment-1553625380

14:59 <dustymabe> as a user of OKD I don't necessarily know which version of FCOS it targets so I just grabbed the latest FCOS as a starting point

14:59 <travier[m]> yeah, you should not do that :)

15:00 <dustymabe> and we (FCOS) don't really tell people how to grab older versions of bootimages, so I maintain it's probably a legit problem

15:00 <dustymabe> travier[m]: what should I do? not run OKD on my own hardware? only run IPI?

15:00 <travier[m]> OKD as specific boot images versions just like RHCOS/OCP

15:00 <travier[m]> You should not pick an aritrary FCOS image as boot image :)

15:01 <travier[m]> s/aritrary/arbitrary/

15:01 <dustymabe> well, it's not arbitrary :) - it's whatever the latest is, but yeah - are there docs for this?

15:02 <dustymabe> it's definitely possible things have improved in the past few years (I'm working on some old experiences with OKD)

15:03 <travier[m]> https://docs.okd.io/4.13/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-user-infra-machines-iso_installing-platform-agnostic

15:03 <travier[m]> It's the same for OCP & OKD

15:03 <travier[m]> https://docs.openshift.com/container-platform/4.13/installing/installing_platform_agnostic/installing-platform-agnostic.html#creating-machines-bare-metal_installing-platform-agnostic

15:04 <travier[m]> openshift-install coreos print-stream-json | grep '.iso[^.]'

15:05 <dustymabe> travier[m]: nice

15:05 <dustymabe> I didn't know about thta

15:05 <jlebon> yeah, i think the reason you can get it wrong with OKD is that FCOS builds are public. whereas with RHCOS they're not, so the only thing you can easily do is the right thing.

15:06 <travier[m]> yes, I agree that it's not obvious and somewhat hidden

15:06 <dustymabe> though I will say that we (FCOS) really don't (or haven't) consider that use case as valid (public API)

15:06 <dustymabe> the unofficial builds browser is... unofficial

15:06 <travier[m]> Do you mean downgrades?

15:06 <dustymabe> travier[m]: no, using an old version of FCOS as a starting point

15:06 <travier[m]> We do support that

15:07 <dustymabe> emmm.. no :)

15:07 <travier[m]> Why would we not support using previous Fedora CoreOS releases to setup nodes?

15:07 <dustymabe> it works, but we don't go out of our way

15:08 <travier[m]> You've written the test that basically verifies that this works

15:08 <dustymabe> i'm channeling my inner bgilbert here

15:08 <travier[m]> :)

15:08 <bgilbert> \o/

15:08 <dustymabe> travier[m]: right. what that test is trying to do is verify that if you had started your node X months ago that it can continue to upgrade

15:08 <jlebon> travier[m]: the test is simulating users who installed when those old versions were the latest

15:08 <dustymabe> this ^^

15:09 <travier[m]> I agree that beyond a certain time frame, things get more complex, but just like RHCOS in OCP, OKD does not support updating boot images

15:09 <bgilbert> travier[m]: OKD has intentionally chosen to use a flow that's not supported by FCOS

15:09 <travier[m]> That's the same thing here

15:09 <dustymabe> it's my understanding that from our "public" stance we could disallow access to old bootimages and that would be inline with our desired level of support

15:09 <bgilbert> that's their right, but that choice doesn't transform it to a supported flow

15:10 <dustymabe> correct

15:10 <dustymabe> to be clear what I'm trying to do by starting this conversation is emphasize this so it's clear at least amongst us

15:10 ravanelli has quit [Remote host closed the connection]

15:10 <dustymabe> here's an example

15:10 <travier[m]> What OKD is doing is exactly what your test is doing

15:11 <dustymabe> OKD comes to us and says we need to stay on F37 but we also need Igntion to support "new feature X" in f37

15:11 <dustymabe> our answer to that is "no", sorry

15:11 <jlebon> travier[m]: though i don't think OKD is barrier-aware, right?

15:11 <dustymabe> does that make it more clear?

15:12 <travier[m]> If we say that we don't support that then barrier releases don't matter that much indeed then and all the discussion around that goes away and we say "you must update at least once per year"

15:12 <dustymabe> travier[m]: no no no

15:12 <travier[m]> When did that happen?

15:12 <travier[m]> That's not what OKD does

15:12 <travier[m]> neither RHCOS

15:12 <travier[m]> the MCO downgrades Ignition configs on demand

15:12 <travier[m]> to match the Ignition in the boot imahe

15:13 <travier[m]> s/imahe/image/

15:13 <dustymabe> what we are trying to do with barrier releases is keep existing nodes updating, not allow new nodes deployed with old media to get up to date. It happens to be a side effect. but the first goal is the real reason

15:13 <bgilbert> +1

15:13 <dustymabe> ok that's a bad example then

15:13 <dustymabe> i'm just saying if they needed something new in F37 right now in a bootimage, they wouldn't be able to get it from us.. hence it's not really supported

15:14 <travier[m]> I don't understand how that would be different

15:14 <dustymabe> what OKD is doing works, but I'm just trying to draw the line and make it more clear

15:14 <dustymabe> travier[m]: ok here's another example

15:15 <travier[m]> That's incredibly unlikely to happen by design in OCP as we support older boot images

15:15 <travier[m]> on newer clusters

15:15 <travier[m]> * on updated clusters

15:15 <dustymabe> actually there are a few examples littered in the upgrade test itself: https://github.com/coreos/fedora-coreos-config/blob/f7aaeb3d6c6b2d7b67dfa7267c6ef308a29f70a4/tests/kola/upgrade/extended/test.sh#L159-L176

15:16 <jlebon> travier[m]: that's something we need to fix in OCP too :)

15:16 <dustymabe> so if you started on a version < 35 you wouldn't be able to update all the way to latest FCOS

15:16 <dustymabe> because of a gpgkey issue

15:16 <travier[m]> jlebon: agree!

15:17 <dustymabe> if you started on F31 we changed the cincinnati update URL - so you wouldn't get updates either

15:17 <travier[m]> note that all of those are pure FCOS issues that don't affect OKD

15:18 <dustymabe> right, but we are talking about supporting older bootimages and why we don't do it

15:18 <dustymabe> not why "none of those reasons matters and OKD works anyway"

15:19 <travier[m]> OK, I see the difference now

15:20 <jlebon> that said, even outside OKD, based on incoming issues, there are definitely users who do pin for a bit

15:21 <dustymabe> TL;DR it works in some cases, but if you need support for an older bootimage we're probably going to tell you to use latest

15:21 <jlebon> i think roughly, if users report upgrade issues, we should help them. most other issues would probably be "use the latest version"

15:21 <dustymabe> jlebon: correct. there is a difference between swimming in a pool with a lifeguard or swimming in a pool without one

15:22 <dustymabe> it still works without one, but if you have trouble...

15:22 <dustymabe> yeah, upgrading is something we do want to support

15:23 <dustymabe> it's definitely a subtle difference

15:24 <travier[m]> I see now why Colin says that we don't want to support barrier releases as this is the same discussion here

15:25 <travier[m]> If we decide that if your image is 2 Fedora releases old, you're no getting auto-updated / you're not guaranteed to update, then we don't need barrier releases.

15:26 <bgilbert> travier[m]: that's true for the most common reason we need barrier releases, but it's not true in general, unless we have a different way of running scripts on upgrade

15:26 <bgilbert> e.g. the pre-upgrade container idea

15:27 <bgilbert> xref the aarch64 bootloader issue we just had

15:27 <travier[m]> hum, indeed, that does not work here

15:28 <dustymabe> maybe we are talking past each other here

15:29 <dustymabe> care to talk voice ? https://meet.google.com/_meet/jhc-oxrn-cae?ijlm=1685114906743&adhoc=1&hs=187

15:55 <dustymabe> Ignition: Fetching the Ignition config via the Virtio block driver is currently experimental and subject to change.

15:55 <dustymabe> wondering if we should promote that ^^

15:56 <dustymabe> to non-experimental

15:58 <bgilbert> dustymabe: we still don't have a solution for the race condition problem

15:59 <jlebon> this is good ol' https://github.com/coreos/ignition/issues/928

15:59 <dustymabe> bgilbert: +1 - I wasn't familiar with the details, just was observing we've been using it a while (I assume without issue)

16:01 <bgilbert> dustymabe: "without issue" in the sense that we've set a five-minute timeout

16:01 <bgilbert> so anyone booting with that provider and _without_ an Ignition config always has to wait five minutes

16:01 <bgilbert> (on the first boot)

16:01 <dustymabe> interesting

16:02 <dustymabe> i guess the cases are few for that (I'm thinking openstack or another cloud platform, maybe IMBCloud, where you could get an SSH key from a metadata service)?

16:03 <dustymabe> well.. here I am thinking about ppc64le only

16:03 <dustymabe> we use it for s390x too?

16:03 <jlebon> yes

16:05 <jlebon> the ignition PR to add this was initially just to allow us to use it in CI. it kinda leaked out though and is now used by users.

16:06 <dustymabe> jlebon: can I ask you some questions about ostree autoprune real quick?

16:07 <jlebon> sure

16:07 <dustymabe> is there a case where it won't prune even though it should?

16:08 <dustymabe> this seemed to work on aarch64 yesterday but isn't on ppc64le and I'm wondering if I'm doing something wrong or not

16:08 <dustymabe> I know if it does prune it will print a message, but maybe we should print a message too if pruning was requested and considered, but not performed

16:09 <bgilbert> dustymabe: I agree that the qemu image is less likely to be used without a config, but that approach would make the ability to omit the config dependent on the platform, which is unexpected

16:09 <jlebon> it won't do anything if even with autopruning we'd hit ENOSPC

16:09 <jlebon> we could log something indeed in that case

16:09 <jlebon> but i'm not sure if that's the case you're hitting

16:10 <dustymabe> right. maybe we should even have a log message (for at least the time period that autoprune is experimental) saying autoprune was requested or something

16:11 <dustymabe> it's hard to tell right now if the env var is plumbed through correctly OR if the code decided not to prune

16:11 <dustymabe> I swear i tested it yesterday :)

16:12 <dustymabe> but it was also on the system I was developing the fix, so it's possible something didn't get back into my PR that should have

16:12 <dustymabe> #thisiswhywetest

16:13 <bgilbert> btw kola caught a legitimate regression in an Ignition dependency update: https://github.com/coreos/ignition/pull/1634

16:13 <jlebon> yup, experimental logging sounds fine to me

16:14 <dustymabe> bgilbert: nice! this is a win for sure

16:14 <jlebon> bgilbert: nice! were you the one to report it?

16:14 <bgilbert> no, it was fixed before I got to it

16:24 <dustymabe> jlebon: here's the scenario I'm in: https://paste.centos.org/view/36c1fdb2

16:25 <dustymabe> [root@cosa-devsh ~]# rpm -q ostree

16:26 <dustymabe> ostree-2023.3-1.fc38.ppc64le

16:27 <jlebon> how large are the (kernel, initrd) pairs?

16:28 <dustymabe> [root@cosa-devsh ostree]# ls -lh */

16:28 <dustymabe> fedora-coreos-28983714bf02bf4d0cade8c13e72a487398daebab5ce3059415d7d956edd2dcd/:

16:28 <dustymabe> total 112M

16:28 <dustymabe> -rw-r--r--. 1 root root 70M May 26 15:46 initramfs-6.2.15-300.fc38.ppc64le.img

16:28 <dustymabe> -rwxr-xr-x. 1 root root 43M May 26 15:46 vmlinuz-6.2.15-300.fc38.ppc64le

16:28 <dustymabe> fedora-coreos-ba044800cd32c148c49bc3c464d9260d2bf51f8461e7068a3c5aade4593a29b6/:

16:28 <dustymabe> total 112M

16:28 <dustymabe> -rw-r--r--. 1 root root 70M May 26 15:57 initramfs-6.2.15-300.fc38.ppc64le.img

16:28 <dustymabe> -rwxr-xr-x. 1 root root 43M May 26 15:57 vmlinuz-6.2.15-300.fc38.ppc64le

16:28 <dustymabe> if you're interested you can `tmux attach` into https://console-openshift-console.apps.ocp.stg.fedoraproject.org/k8s/ns/fedora-coreos-pipeline/pods/pod-d65e9757-d007-49bd-8c18-5d124d20457b-01ggr-kz3hd

16:28 * dustymabe brb - switching to home location

16:31 oo has joined #fedora-coreos

16:44 vgoyal has quit [Quit: Leaving]

16:56 <dustymabe> back

16:56 jpn has quit [Ping timeout: 268 seconds]

17:13 ravanelli has joined #fedora-coreos

17:24 jpn has joined #fedora-coreos

17:25 peko[m] has quit [Excess Flood]

17:50 Betal has joined #fedora-coreos

18:22 jpn has quit [Ping timeout: 256 seconds]

18:26 <mhayden> decided to write myself a little blog post on coreos as a "pet" instance: https://major.io/p/coreos-as-pet/

18:27 sentenza has joined #fedora-coreos

18:31 plarsen has quit [Ping timeout: 250 seconds]

18:35 plarsen has joined #fedora-coreos

18:50 jpn has joined #fedora-coreos

18:56 jpn has quit [Ping timeout: 240 seconds]

19:05 misuto has quit [Remote host closed the connection]

19:05 misuto has joined #fedora-coreos

19:14 jpn has joined #fedora-coreos

19:31 misuto has quit [Remote host closed the connection]

19:31 misuto has joined #fedora-coreos

19:39 jpn has quit [Ping timeout: 265 seconds]

19:44 plarsen has quit [Ping timeout: 250 seconds]

19:47 plarsen has joined #fedora-coreos

20:00 <dustymabe> mhayden: look at you!

20:00 <mhayden> writin' things and stuff

20:01 <dustymabe> mhayden: you could mention typhoon too https://typhoon.psdn.io/

20:01 <mhayden> whaaaaaaat? first time hearing about it. usually ended up in k3s

20:02 <dustymabe> yeah dghubble maintains it - from what I hear it's pretty solid

20:04 <mhayden> kubernetes often just ends up causing me too much frustration for my personal projects. i usually end up back with docker-compose 🙃

20:05 <dustymabe> yeah, it's a balance for sure

20:05 <dustymabe> it's not really a 2h per week thing (which is what most side projects are)

20:13 <fifofonix[m]> i think typhoon is especially appealing if you're already doing a lot via terraform.we've enjoyed for our higher end needs when we've needed to grow beyond swarm (but we still use swarm a bunch for now).

20:15 <dustymabe> jlebon: I think this is what we had discussed: https://github.com/coreos/fedora-coreos-config/pull/2438

20:16 jpn has joined #fedora-coreos

20:16 <jlebon> thanks!

20:17 <dustymabe> though, I wonder if the "starting earlier" part could throw off some of our other tests. Maybe we should by default make the kola systemd units run after say systemd-user-sessions and then allow a tag or something to override that behavior

20:19 <jlebon> yeah, it's possible we might've unknowingly taken a dependency on the existing behaviour in other places. maybe let's keep an eye out for other fallout and then do something fancier if it's a nontrivial amount

20:21 jpn has quit [Ping timeout: 268 seconds]

20:21 <dustymabe> +1

20:22 <dustymabe> I enabled automerge on the linked PR above

20:25 <quentin9696[m]> Hey guys, I create the PR to update the doc about wireguard as discuss during the weekly meeting

20:25 <quentin9696[m]> I create 2 PR, 1 to add it, 1 to remove it

20:39 <dustymabe> quentin9696[m]: dropped in a review

20:41 <dustymabe> quentin9696[m]: for that particular issue the wireguard maintainer is busy and the selinux maintainer might not have enough expertise to drive it forward. If you're motivated you could work with the selinux maintainer or may have to wait for some time

20:41 misuto has quit [Remote host closed the connection]

20:41 misuto has joined #fedora-coreos

20:43 misuto has quit [Remote host closed the connection]

20:43 misuto has joined #fedora-coreos

21:10 plarsen has quit [Remote host closed the connection]

21:22 nalind has quit [Quit: bye for now]

21:37 <dustymabe> mhayden: that other PR merged so now I'm unblocked to open a PR to the google guest configs RPM

21:38 <dustymabe> let me check with upstairs how much time I have - might be able to whip something up now

21:40 apiaseck has quit [Quit: Konversation terminated!]

21:42 <jlebon> dustymabe: were you planning to carry https://github.com/GoogleCloudPlatform/guest-configs/pull/51 there?

21:43 <dustymabe> yep

21:45 <jlebon> +1

21:52 <dustymabe> mhayden: https://src.fedoraproject.org/rpms/google-compute-engine-guest-configs/pull-request/4

21:52 <dustymabe> jlebon: ^^

21:53 <dustymabe> if that looks good.. we need it in f38 and f39 if possible

21:53 * dustymabe has to run upstairs now

22:04 jpn has joined #fedora-coreos

22:09 gursewak has quit [Ping timeout: 240 seconds]

22:09 jpn has quit [Ping timeout: 240 seconds]

22:56 <quentin9696[m]> <dustymabe> "quentin9696: dropped in a review" <- thanks, will make the required changes

22:56 <quentin9696[m]> <dustymabe> "quentin9696: for that particular..." <- Sure I can work with him. Where can I contact them ?

23:04 <dustymabe> quentin9696[m]: you could start by offering up help with a comment in the BZ - it doesn't always work but is one way. In the comment you can ask for advice or say that you'll be in an IRC channel XYZ if they want to talk more real time. https://bugzilla.redhat.com/show_bug.cgi?id=2188714

23:45 oo has quit [Ping timeout: 256 seconds]

23:52 jpn has joined #fedora-coreos

23:58 jpn has quit [Ping timeout: 240 seconds]