#fedora-coreos on 2022-06-21 — irc logs at libera.irclog.whitequark.org

2022-05-11 12:42 dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos

00:27 jpn has joined #fedora-coreos

00:32 jpn has quit [Ping timeout: 276 seconds]

01:03 stephan has quit [Ping timeout: 240 seconds]

01:06 stephan has joined #fedora-coreos

01:40 gursewak has quit [Ping timeout: 244 seconds]

01:41 pwhalen has quit [Ping timeout: 244 seconds]

01:42 jdoss has quit [Ping timeout: 248 seconds]

01:58 jpn has joined #fedora-coreos

02:01 jdoss has joined #fedora-coreos

02:02 pwhalen has joined #fedora-coreos

02:02 jpn has quit [Ping timeout: 276 seconds]

02:08 jdoss has quit [Ping timeout: 276 seconds]

02:09 pwhalen has quit [Ping timeout: 258 seconds]

02:11 pwhalen has joined #fedora-coreos

02:17 jdoss has joined #fedora-coreos

02:22 ravanelli has quit [Remote host closed the connection]

02:59 gursewak has joined #fedora-coreos

03:18 gursewak has quit [Ping timeout: 248 seconds]

04:34 paragan has joined #fedora-coreos

05:34 jpn has joined #fedora-coreos

05:39 jpn has quit [Ping timeout: 248 seconds]

06:05 gursewak has joined #fedora-coreos

06:26 paragan has quit [Ping timeout: 248 seconds]

06:34 bagasse_ has quit [Quit: Leaving]

07:02 paragan has joined #fedora-coreos

07:10 tormath1 has joined #fedora-coreos

07:21 jcajka has joined #fedora-coreos

07:23 jpn has joined #fedora-coreos

07:27 jpn has quit [Ping timeout: 256 seconds]

08:43 Betal has quit [Quit: WeeChat 3.5]

08:50 stereobutter[m] has joined #fedora-coreos

09:03 * stereobutter[m] uploaded an image: (146KiB) < https://libera.ems.host/_matrix/media/r0/download/matrix.org/YxiZAEWsjTJJizJjAHHnKtzl/Bildschirmfoto%202022-06-21%20um%2010.50.47.png >

09:03 * stereobutter[m] uploaded an image: (146KiB) < https://libera.ems.host/_matrix/media/r0/download/matrix.org/FpTyJlzbhCtlhzynhPDMqGjO/Bildschirmfoto%202022-06-21%20um%2010.50.47.png >

09:07 <stereobutter[m]> Hi folks :) I'm currently toying with gitops-based system upgrades for FCOS which means I will be calling `rpm-ostree deploy <some_version>`. What I'd like to do is use the graph data provided by cincinnati to validate the current version is compatible with `some_version`. Cincinnati's update graph apparently does only contain edges for going from a version to the most current version however.

09:07 * stereobutter[m] uploaded an image: (146KiB) < https://libera.ems.host/_matrix/media/r0/download/matrix.org/uIUTUsZInzRERJWfRqeYaWjr/Bildschirmfoto%202022-06-21%20um%2010.50.47.png >

09:09 <stereobutter[m]> Say I'm on 32.20200629.3.0 and the new version is 32.20200715.3.0. I assume this is a valid transition (but the edge is omitted from the graph because the number of edges would explode) and is implied by both versions pointing at 32.20201104.3.0?

09:11 jpn has joined #fedora-coreos

09:12 * stereobutter[m] uploaded an image: (151KiB) < https://libera.ems.host/_matrix/media/r0/download/matrix.org/KJLRPqFFmSoXtWOKXnPfmxBk/Bildschirmfoto%202022-06-21%20um%2010.50.47.png >

09:12 <lucab> stereobutter: yes it is valid transition

09:12 <stereobutter[m]> So the graph would look something like the above if one painted in the implict edges

09:14 <lucab> stereobutter: yes, it's just that the graph is optimized to minimize the amount of updates/reboots

09:15 jpn has quit [Ping timeout: 246 seconds]

09:16 <stereobutter[m]> And I assume for downgrading the same rules apply. I can only go back to a version *A* < *B* when *A* -> *B* is a valid upgrade transition?

09:20 jpn has joined #fedora-coreos

09:21 <lucab> stereobutter: no, downgrades in general have no guarantees. You cannot invert the direction of the graph and try to infer anything.

09:21 <stereobutter[m]> So basically the only "safe" thing to do is a rollback i.e. I came from A and installed B then B->A is okay

09:21 <stereobutter[m]> ?

09:22 <stereobutter[m]> Maybe not "safe" but "sensible"

09:23 <lucab> that usually works in practice, yes, but it has no strict guarantees as well. It depends on specific softwares (and exact versions) on both A and B

09:24 <stereobutter[m]> The reason BTW why I'd like to manually deploy versions is that for some workloads we use nvidia GPUs and the whole driver thing is a mess (we got to work) but which will for the forseeable future only work reliably with super fixed versions (both driver and FCOS) so we'd like to periodically try a new FCOS+Driver version and then pin this

09:25 <stereobutter[m]> and roll out exactly this

09:25 <lucab> usual example is: release A contains fooware-1.0 and release B contains fooware-2.0. fooware-2.0 has up-migration logic for data generated by 1.0, but fooware-1.0 has no idea about the layout of migrated data. As soon as release B is booted and the data is migrated, rolling-back to release A may result in troubles for fooware.

09:26 jpn has quit [Ping timeout: 256 seconds]

09:28 <stereobutter[m]> Yeah sure, that makes sense. Say I use `rpm-ostree` to overlay a package `Foo` where A has `FooV1` and B has `FooV2` and assuming `Foo` upon installation does not doing anything to exotic will this work from FCOS point of view when I rollback from B to A?

09:29 <lucab> stereobutter: it's a reasonable scenario, and zincati can not cover all the very-custom strategies. I'd say it makes sense to consume the graph on your down and drive rpm-ostree as you wish

09:31 <stereobutter[m]> (Since we haven't upgraded anything yet we have not been in the situation to perform a rollback yet)

09:31 <lucab> stereobutter: yes, the rollback part for the packages in itself should work. The main caveats are about persisted data, and data-format migrations between `FooV1` and `FooV2`.

09:31 <stereobutter[m]> Nice!

09:33 <lucab> sidenote, that's the reason why we suggest putting `fooware` in a container, so that OS upgrades/rollbacks do not force it to go back and forth between V1<->V2

09:36 <stereobutter[m]> Yeah, the Nvidia Driver already runs in a container but there is some still some RPM that currently has to be installed (see https://discussion.fedoraproject.org/t/can-you-run-nvidia-gpu-workloads-on-fcos/35090). I actually had a look at this together with another developer for almost a week before we manage to get everything up and running.

09:37 ravanelli has joined #fedora-coreos

09:38 <stereobutter[m]> When working on this I had the feeling that there are probably only a handful to a couple hundred people in existence that actually know how the whole thing actually works

09:38 jpn has joined #fedora-coreos

09:39 <stereobutter[m]> > putting fooware in a container, so that OS upgrades/rollbacks do not force it to go back and forth

09:39 <stereobutter[m]> only works the the container works on both versions. In the driver case the container version must match the OS version

09:39 <stereobutter[m]> * > putting fooware in a container, so that OS upgrades/rollbacks do not force it to go back and forth

09:39 <stereobutter[m]> only works the the container works on both versions. In the driver case the container version must match the OS version

09:40 dwalsh_ has joined #fedora-coreos

09:42 ravanelli has quit [Ping timeout: 256 seconds]

10:41 <stereobutter[m]> Would it be possible to also add a github release on https://github.com/coreos/fedora-coreos-streams for every new FCOS version? That way one could point e.g. renovatebot at the repo and trigger some action when a new FCOS version is released.

10:42 <stereobutter[m]> alternatively to github releases just plain git tags would also suffice I guess.

10:47 <stereobutter[m]> I had a look at the github actions in the repo but from looking at them I'm not sure what the actual release process is like and when this should run.

10:48 dwalsh_ has quit [Remote host closed the connection]

10:48 dwalsh_ has joined #fedora-coreos

11:49 ravanelli has joined #fedora-coreos

11:56 crobinso has joined #fedora-coreos

12:01 <lucab> stereobutter: a better trigger would be "whenever update/<yourstream.json> changes", as that is really the point when a new release starts to be rolled out

12:01 <lucab> *updates/

12:04 <lucab> (I don't know if renovatebot can do that, I haven't used it before)

12:05 <stereobutter[m]> as far as I know renovatebot can only track github releases and tags and cannot run https requests to find out if a new version exists

12:07 <lucab> if it can activate on git commits and filter by filepath, that may be enough

12:08 <lucab> we may be already pushing out an event on the fedora message bus, I don't exactly remember

12:28 <Sheogorath[m]> https://docs.renovatebot.com/modules/datasource/ <-- these are the ways renovate can consume versions and version upgrades

12:31 <ravanelli> lucab: 👋

12:31 <Sheogorath[m]> It's currently not possible to check for content of a git repository, renovate only works on metadata. Therefore no, that's not possible right now. One could consider to write a datasource integration https://github.com/renovatebot/renovate/tree/main/lib/modules/datasource

12:31 <ravanelli> lucab: I have a question for you

12:31 <lucab> ravanelli: sure!

12:32 <ravanelli> lucab: I finished the release process late yesterday, and during the release job, I got: ostree pull --commit-metadata-only fedora:fedora/s390x/coreos/testing

12:32 <ravanelli> Seems we don't have the ostree repo for s390x

12:32 <ravanelli> Is there a process to create it?

12:33 <ravanelli> Opis, full error:

12:33 <ravanelli> 23:09:39 + ostree pull --commit-metadata-only fedora:fedora/s390x/coreos/testing

12:33 <ravanelli> 23:09:39 error: No such branch 'fedora/s390x/coreos/testing' in repository summary

12:37 <lucab> ravanelli: ah uhm, yes the branch is probably missing but I guess the pipeline should try to push/create it the first time. Do you have a link to the job/logs?

12:39 <ravanelli> lucab: https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/job/release/385/console

12:44 <lucab> ravanelli: ok, I think that the `OSTree Import s390x: Prod Repo` before that had some troubles, or at least it didn't properly set up the branch

12:49 <lucab> I don't have to the `coreos-ostree-importer` project on the OCP cluster to see if it says something in the logs, I'd wait for jlebon or dustymabe

12:51 <lucab> (looking at the code, it seems to have logic to deal with scratch-new branches, so it may have just hit a bug)

12:53 paragan has quit [Ping timeout: 246 seconds]

13:08 <ravanelli> lucab: Thanks for checking it ;)

13:12 mheon has joined #fedora-coreos

13:12 <jlebon> ravanelli: try rerunning those, and it should work now

13:13 <jlebon> the bit that failed is the script that waits until the branches are updated

13:13 <jlebon> but it doesn't account for new branches correctly

13:14 <jlebon> i'll see about fixing that, but in the meantime, we can sanity-check that a rerun passes since the summary should've had time to update by now

13:17 <lucab> jlebon: was coreos-ostree-importer happy?

13:18 <lucab> ah, it's only the summary that maybe does not match after the import

13:18 <jlebon> lucab: i didn't check but from the job logs it should be. it knows to create the branch if it doesn't exist

13:19 <jlebon> yeah, and i think there's CDN lag too IIRC

13:20 <jlebon> `ostree remote summary fedora` on my FSB shows both `fedora/s390x/coreos/next` and `fedora/s390x/coreos/testing` are there yup

13:23 <dustymabe> jlebon: ravanelli: all good now?

13:27 <ravanelli> let me rerun that to see

13:27 <dustymabe> hold on one sec

13:28 <dustymabe> we shouldn't need to re-run anything for `testing`/`next` right?

13:28 plarsen has joined #fedora-coreos

13:28 <ravanelli> ok

13:28 <dustymabe> the step at the end that failed is just a check/read-only step

13:29 plarsen has quit [Remote host closed the connection]

13:31 <jlebon> yeah should be fine now. i checked locally the summary was updated.

13:31 <dustymabe> jlebon: can you check the failure in the re-run she did?

13:31 <dustymabe> https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/blue/organizations/jenkins/release/detail/release/387/pipeline/123

13:32 <ravanelli> I stopped it btw

13:32 <jlebon> it was aborted

13:33 <jlebon> to be clear, rerunning it is totally fine too. the job is idempotent. but it's not necessary

13:33 <dustymabe> yeah, not sure if aborting in the middle can leave us in a weird state though

13:34 <dustymabe> i guess we'll find out

13:34 <dustymabe> ok the `stable` release job failed too

13:34 <dustymabe> in a different way

13:34 <ravanelli> It got timeout, all jobs ran yesterday around 2 hours

13:38 <ravanelli> dustymabe: For the stable one is it ok to rerun again?

13:38 nalind has joined #fedora-coreos

13:38 <dustymabe> ravanelli: should be, yes

13:38 <dustymabe> the issue was that replicating to some regions wasn't working

13:39 <dustymabe> which is why the job timed out, it was retying to replicate and never worked

13:43 <dustymabe> jlebon: are we good to merge https://github.com/coreos/coreos-assembler/pull/2935 now (well, maybe after that final release job finishes successfully)

13:43 <jlebon> dustymabe: yup should be fine

13:50 <ravanelli> stable just passed. Should we be fine at this point to run the rollout job?

13:51 <dustymabe> I think so - i'm going to do a spot check of the AMIs

13:54 <ravanelli> ok

13:54 ccha has quit [Ping timeout: 260 seconds]

14:35 azukku has joined #fedora-coreos

15:00 ccha has joined #fedora-coreos

15:01 bgilbert has joined #fedora-coreos

15:06 ccha has quit [Ping timeout: 248 seconds]

15:07 <dustymabe> lucab: jlebon: do you mind looking over https://github.com/coreos/fedora-coreos-streams/pull/519 for issues (added s390x arch for testing/next)

15:07 ccha has joined #fedora-coreos

15:17 <ravanelli> I should probably update the time that already passed

15:17 <ravanelli> or Is it not an issue?

15:19 plarsen has joined #fedora-coreos

15:21 plarsen has quit [Remote host closed the connection]

15:23 plarsen has joined #fedora-coreos

15:27 <lucab> ravanelli: not a real issue, the rollouts will just immediately start from something slightly larger than 0% (e.g. 0.5%) instead of starting from 0% and growing linearly to 0.5%

15:28 plarsen has quit [Ping timeout: 255 seconds]

15:36 fifofonix has joined #fedora-coreos

15:40 <jlebon> i think ravanelli is afk, so i just merged it to not let it grow larger :)

15:43 dustymabe has quit [Quit: WeeChat 3.4]

15:43 <ravanelli> jlebon: thanks =)

15:44 dustymabe has joined #fedora-coreos

15:52 jcajka has quit [Quit: Leaving]

15:54 fifofonix has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

16:17 Betal has joined #fedora-coreos

16:32 <jlebon> travier[m], walters: should we hop on a call at some point to hash out https://github.com/coreos/coreos-assembler/pull/2934 ?

16:48 <travier[m]> I have approximately 13 min then I have the OKD WG Meeting

16:48 <travier[m]> then it's better tomorrow for me

16:49 <jlebon> travier[m]: WDYT about the last comment I added there?

16:50 <travier[m]> looking

16:51 <jlebon> if we can stop dirtying the config dir, i'm cool with it

16:51 <dustymabe> anyone know if the CI for ostree builds artifacts I can curl and test with? i.e. https://github.com/ostreedev/ostree/pull/2632

16:52 <travier[m]> jlebon: with the changes from that PR we don't need to override any links and we can directly place those symlinks in gitignore.

16:53 <travier[m]> https://github.com/openshift/os/pull/855/files#diff-bc37d034bad564583790a46f19d807abfe519c5671395fd494d8cce506c42947

16:53 <jlebon> dustymabe: not currently, no

16:53 <travier[m]> but I understand that this is still not ideal

16:53 <travier[m]> we also have the repos "dirtying" the config right now for rhcos

16:53 <travier[m]> and the content set file

16:55 <jlebon> yeah, the RHCOS setup is really awkward

16:55 <travier[m]> And I also agree that the SHA will not define anymore which version is built

16:55 <travier[m]> we would have to store the variant name / version somewhere alonside it to keep track of it

16:56 <travier[m]> it will however still correctly reflect the commit used.

16:58 <jlebon> i think we can roll with this for now, though i do wonder if all this would be cleaner if it were separate branches instead and shared files like FCOS is setup. anyway, we can discuss this more down the line

16:59 <travier[m]> I would have like having separated branches but this would have required some CI setup that is complex with Prow in openshift org

16:59 <jlebon> (so the flip side of what walters said at the end of https://github.com/coreos/coreos-assembler/pull/2934#issuecomment-1161661690)

16:59 <travier[m]> It's already complex enough to build RHCOS to test it there

16:59 <travier[m]> Maybe we can do the reverse and move FCOS to a single branch

17:00 <travier[m]> liked*

17:00 <jlebon> mayyybe. the benefits would have to be really worth it

17:00 <travier[m]> (going to OKD WG meeting)

17:00 <jlebon> ack, ttyl!

17:00 <travier[m]> We can have a quick chat after in an hour if you'd like

17:02 cmagina has quit [Changing host]

17:02 cmagina has joined #fedora-coreos

17:07 jpn has quit [Ping timeout: 256 seconds]

17:08 <ravanelli> dustymabe: jlebon last question about the release, the graph for s390x was not generated https://builds.coreos.fedoraproject.org/graph?stream=next&basearch=s390x

17:08 <ravanelli> What should we do to create it?

17:12 <ravanelli> + we need to update the issue template to add s390x, seems I don't have to update it

17:13 <ravanelli> s/have/have access

17:19 saqali_ has quit [Ping timeout: 248 seconds]

17:21 saqali_ has joined #fedora-coreos

17:22 jpn has joined #fedora-coreos

17:27 tormath1 has quit [Quit: leaving]

17:53 azukku has quit [Quit: Leaving.]

17:55 jpn has quit [Ping timeout: 268 seconds]

17:58 jpn has joined #fedora-coreos

18:05 jpn has quit [Ping timeout: 240 seconds]

18:20 <dustymabe> ravanelli: I think you're right. looks like we need to do something for s390x

18:21 <dustymabe> https://github.com/coreos/fedora-coreos-streams/issues/514#issuecomment-1162149942

18:21 <dustymabe> cc bgilbert jlebon

18:28 jpn has joined #fedora-coreos

18:32 <jlebon> dustymabe: i think we just need https://github.com/coreos/fedora-coreos-cincinnati/pull/80, but will let lucab confirm

18:37 jpn has quit [Ping timeout: 256 seconds]

18:44 jpn has joined #fedora-coreos

18:53 jpn has quit [Ping timeout: 256 seconds]

19:03 jpn has joined #fedora-coreos

19:08 <jlebon> all, just wanted to mention a few things about the @CoreOS/continuous COPR repo: (1) i've removed el8-based chroots because most upstream projects no longer build as is for various reasons, (2) i've enabled the aarch64, s390x, and ppc64le chroots; this means it's now possible to get multi-arch git main packages! as a result, i've closed https://pagure.io/releng/issue/9801 which pursued doing this in

19:08 <jlebon> koji via packit.

19:16 <bgilbert> \o/

19:27 jpn has quit [Ping timeout: 240 seconds]

19:32 crobinso has quit [Remote host closed the connection]

19:41 guesswhat has joined #fedora-coreos

19:41 <dustymabe> jlebon: let's try to re-intro the cosa generate-hashlist parallel workflow now: https://github.com/coreos/fedora-coreos-pipeline/pull/555

19:41 <dustymabe> and one more additional parallel thingy: https://github.com/coreos/fedora-coreos-pipeline/pull/556

19:42 <guesswhat> I am trying to expose zincati socket via socat , to scape it with prometheus, but I can not get it work, any ideas? ( /usr/bin/socat tcp-listen:9988,fork,reuseaddr unix-connect:/run/zincati/public/metrics.promsock )

19:42 <guesswhat> *scrape

19:42 <guesswhat> curl 127.0.0.1:9988 returns curl: (1) Received HTTP/0.9 when not allowed

19:43 <guesswhat> lucab any ideas? i saw your exporter.. thanks

19:45 <dustymabe> guesswhat: never done it before but I highly doubt the socket is going to speak HTTP (which is what curl is trying to use)

19:47 <guesswhat> but its working for docker.sock afaik

19:49 <guesswhat> its working like this echo -e "GET / HTTP/1.0\r\n" | socat unix-connect:/run/zincati/public/metrics.promsock STDIO

19:50 <dustymabe> guesswhat: maybe i'm wrong then :)

19:51 <guesswhat> its working even with empty stdin echo "" | socat unix-connect:/run/zincati/public/metrics.promsock STDIO

19:51 <dustymabe> maybe specify different http versions to curl? --http1.0 --http1.1 etc..

19:51 <dustymabe> --http0.9

19:52 <guesswhat> yeah, it works, but not sure how to use it in prometheus :X

19:59 <dustymabe> gursewak: jlebon: now that we've shipped the releases this week we should be able to proceed with the `rd.debug` change we discussed last week

19:59 <dustymabe> OR - should we wait until we see the problem again and then enable `rd.debug` ?

20:07 <jlebon> sure, that SGTM

20:07 <jlebon> (the latter)

20:25 crobinso has joined #fedora-coreos

20:48 dwalsh_ has quit [Quit: Leaving]

21:08 saqali__ has joined #fedora-coreos

21:10 <dustymabe> crobinso: will we need to wait the full 14 days for https://bodhi.fedoraproject.org/updates/FEDORA-2022-0142d562ca to land?

21:11 <crobinso> dustymabe: pushed now :)

21:11 saqali_ has quit [Ping timeout: 264 seconds]

21:11 <dustymabe> crobinso: thanks!

21:22 crobinso has quit [Remote host closed the connection]

22:10 heldwin has quit [Quit: Teleporting ...]

22:24 nalind has quit [Quit: bye]

23:11 cyberpear has quit [Quit: Connection closed for inactivity]

23:19 mheon has quit [Ping timeout: 248 seconds]