dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 276 seconds]
stephan has quit [Ping timeout: 240 seconds]
stephan has joined #fedora-coreos
gursewak has quit [Ping timeout: 244 seconds]
pwhalen has quit [Ping timeout: 244 seconds]
jdoss has quit [Ping timeout: 248 seconds]
jpn has joined #fedora-coreos
jdoss has joined #fedora-coreos
pwhalen has joined #fedora-coreos
jpn has quit [Ping timeout: 276 seconds]
jdoss has quit [Ping timeout: 276 seconds]
pwhalen has quit [Ping timeout: 258 seconds]
pwhalen has joined #fedora-coreos
jdoss has joined #fedora-coreos
ravanelli has quit [Remote host closed the connection]
gursewak has joined #fedora-coreos
gursewak has quit [Ping timeout: 248 seconds]
paragan has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 248 seconds]
gursewak has joined #fedora-coreos
paragan has quit [Ping timeout: 248 seconds]
bagasse_ has quit [Quit: Leaving]
paragan has joined #fedora-coreos
tormath1 has joined #fedora-coreos
jcajka has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 256 seconds]
Betal has quit [Quit: WeeChat 3.5]
stereobutter[m] has joined #fedora-coreos
<stereobutter[m]> Hi folks :) I'm currently toying with gitops-based system upgrades for FCOS which means I will be calling `rpm-ostree deploy <some_version>`. What I'd like to do is use the graph data provided by cincinnati to validate the current version is compatible with `some_version`. Cincinnati's update graph apparently does only contain edges for going from a version to the most current version however.
<stereobutter[m]> Say I'm on 32.20200629.3.0 and the new version is 32.20200715.3.0. I assume this is a valid transition (but the edge is omitted from the graph because the number of edges would explode) and is implied by both versions pointing at 32.20201104.3.0?
jpn has joined #fedora-coreos
<lucab> stereobutter: yes it is valid transition
<stereobutter[m]> So the graph would look something like the above if one painted in the implict edges
<lucab> stereobutter: yes, it's just that the graph is optimized to minimize the amount of updates/reboots
jpn has quit [Ping timeout: 246 seconds]
<stereobutter[m]> And I assume for downgrading the same rules apply. I can only go back to a version *A* < *B* when *A* -> *B* is a valid upgrade transition?
jpn has joined #fedora-coreos
<lucab> stereobutter: no, downgrades in general have no guarantees. You cannot invert the direction of the graph and try to infer anything.
<stereobutter[m]> So basically the only "safe" thing to do is a rollback i.e. I came from A and installed B then B->A is okay
<stereobutter[m]> ?
<stereobutter[m]> Maybe not "safe" but "sensible"
<lucab> that usually works in practice, yes, but it has no strict guarantees as well. It depends on specific softwares (and exact versions) on both A and B
<stereobutter[m]> The reason BTW why I'd like to manually deploy versions is that for some workloads we use nvidia GPUs and the whole driver thing is a mess (we got to work) but which will for the forseeable future only work reliably with super fixed versions (both driver and FCOS) so we'd like to periodically try a new FCOS+Driver version and then pin this
<stereobutter[m]> and roll out exactly this
<lucab> usual example is: release A contains fooware-1.0 and release B contains fooware-2.0. fooware-2.0 has up-migration logic for data generated by 1.0, but fooware-1.0 has no idea about the layout of migrated data. As soon as release B is booted and the data is migrated, rolling-back to release A may result in troubles for fooware.
jpn has quit [Ping timeout: 256 seconds]
<stereobutter[m]> Yeah sure, that makes sense. Say I use `rpm-ostree` to overlay a package `Foo` where A has `FooV1` and B has `FooV2` and assuming `Foo` upon installation does not doing anything to exotic will this work from FCOS point of view when I rollback from B to A?
<lucab> stereobutter: it's a reasonable scenario, and zincati can not cover all the very-custom strategies. I'd say it makes sense to consume the graph on your down and drive rpm-ostree as you wish
<stereobutter[m]> (Since we haven't upgraded anything yet we have not been in the situation to perform a rollback yet)
<lucab> stereobutter: yes, the rollback part for the packages in itself should work. The main caveats are about persisted data, and data-format migrations between `FooV1` and `FooV2`.
<stereobutter[m]> Nice!
<lucab> sidenote, that's the reason why we suggest putting `fooware` in a container, so that OS upgrades/rollbacks do not force it to go back and forth between V1<->V2
<stereobutter[m]> Yeah, the Nvidia Driver already runs in a container but there is some still some RPM that currently has to be installed (see https://discussion.fedoraproject.org/t/can-you-run-nvidia-gpu-workloads-on-fcos/35090). I actually had a look at this together with another developer for almost a week before we manage to get everything up and running.
ravanelli has joined #fedora-coreos
<stereobutter[m]> When working on this I had the feeling that there are probably only a handful to a couple hundred people in existence that actually know how the whole thing actually works
jpn has joined #fedora-coreos
<stereobutter[m]> > putting fooware in a container, so that OS upgrades/rollbacks do not force it to go back and forth
<stereobutter[m]> only works the the container works on both versions. In the driver case the container version must match the OS version
<stereobutter[m]> * > putting fooware in a container, so that OS upgrades/rollbacks do not force it to go back and forth
<stereobutter[m]> only works the the container works on both versions. In the driver case the container version must match the OS version
dwalsh_ has joined #fedora-coreos
ravanelli has quit [Ping timeout: 256 seconds]
<stereobutter[m]> Would it be possible to also add a github release on https://github.com/coreos/fedora-coreos-streams for every new FCOS version? That way one could point e.g. renovatebot at the repo and trigger some action when a new FCOS version is released.
<stereobutter[m]> alternatively to github releases just plain git tags would also suffice I guess.
<stereobutter[m]> I had a look at the github actions in the repo but from looking at them I'm not sure what the actual release process is like and when this should run.
dwalsh_ has quit [Remote host closed the connection]
dwalsh_ has joined #fedora-coreos
ravanelli has joined #fedora-coreos
crobinso has joined #fedora-coreos
<lucab> stereobutter: a better trigger would be "whenever update/<yourstream.json> changes", as that is really the point when a new release starts to be rolled out
<lucab> *updates/
<lucab> (I don't know if renovatebot can do that, I haven't used it before)
<stereobutter[m]> as far as I know renovatebot can only track github releases and tags and cannot run https requests to find out if a new version exists
<lucab> if it can activate on git commits and filter by filepath, that may be enough
<lucab> we may be already pushing out an event on the fedora message bus, I don't exactly remember
<Sheogorath[m]> https://docs.renovatebot.com/modules/datasource/ <-- these are the ways renovate can consume versions and version upgrades
<ravanelli> lucab: 👋
<Sheogorath[m]> It's currently not possible to check for content of a git repository, renovate only works on metadata. Therefore no, that's not possible right now. One could consider to write a datasource integration https://github.com/renovatebot/renovate/tree/main/lib/modules/datasource
<ravanelli> lucab: I have a question for you
<lucab> ravanelli: sure!
<ravanelli> lucab: I finished the release process late yesterday, and during the release job, I got: ostree pull --commit-metadata-only fedora:fedora/s390x/coreos/testing
<ravanelli> Seems we don't have the ostree repo for s390x
<ravanelli> Is there a process to create it?
<ravanelli> Opis, full error:
<ravanelli> 23:09:39 + ostree pull --commit-metadata-only fedora:fedora/s390x/coreos/testing
<ravanelli> 23:09:39 error: No such branch 'fedora/s390x/coreos/testing' in repository summary
<lucab> ravanelli: ah uhm, yes the branch is probably missing but I guess the pipeline should try to push/create it the first time. Do you have a link to the job/logs?
<lucab> ravanelli: ok, I think that the `OSTree Import s390x: Prod Repo` before that had some troubles, or at least it didn't properly set up the branch
<lucab> I don't have to the `coreos-ostree-importer` project on the OCP cluster to see if it says something in the logs, I'd wait for jlebon or dustymabe
<lucab> (looking at the code, it seems to have logic to deal with scratch-new branches, so it may have just hit a bug)
paragan has quit [Ping timeout: 246 seconds]
<ravanelli> lucab: Thanks for checking it ;)
mheon has joined #fedora-coreos
<jlebon> ravanelli: try rerunning those, and it should work now
<jlebon> the bit that failed is the script that waits until the branches are updated
<jlebon> but it doesn't account for new branches correctly
<jlebon> i'll see about fixing that, but in the meantime, we can sanity-check that a rerun passes since the summary should've had time to update by now
<lucab> jlebon: was coreos-ostree-importer happy?
<lucab> ah, it's only the summary that maybe does not match after the import
<jlebon> lucab: i didn't check but from the job logs it should be. it knows to create the branch if it doesn't exist
<jlebon> yeah, and i think there's CDN lag too IIRC
<jlebon> `ostree remote summary fedora` on my FSB shows both `fedora/s390x/coreos/next` and `fedora/s390x/coreos/testing` are there yup
<dustymabe> jlebon: ravanelli: all good now?
<ravanelli> let me rerun that to see
<dustymabe> hold on one sec
<dustymabe> we shouldn't need to re-run anything for `testing`/`next` right?
plarsen has joined #fedora-coreos
<ravanelli> ok
<dustymabe> the step at the end that failed is just a check/read-only step
plarsen has quit [Remote host closed the connection]
<jlebon> yeah should be fine now. i checked locally the summary was updated.
<dustymabe> jlebon: can you check the failure in the re-run she did?
<ravanelli> I stopped it btw
<jlebon> it was aborted
<jlebon> to be clear, rerunning it is totally fine too. the job is idempotent. but it's not necessary
<dustymabe> yeah, not sure if aborting in the middle can leave us in a weird state though
<dustymabe> i guess we'll find out
<dustymabe> ok the `stable` release job failed too
<dustymabe> in a different way
<ravanelli> It got timeout, all jobs ran yesterday around 2 hours
<ravanelli> dustymabe: For the stable one is it ok to rerun again?
nalind has joined #fedora-coreos
<dustymabe> ravanelli: should be, yes
<dustymabe> the issue was that replicating to some regions wasn't working
<dustymabe> which is why the job timed out, it was retying to replicate and never worked
<dustymabe> jlebon: are we good to merge https://github.com/coreos/coreos-assembler/pull/2935 now (well, maybe after that final release job finishes successfully)
<jlebon> dustymabe: yup should be fine
<ravanelli> stable just passed. Should we be fine at this point to run the rollout job?
<dustymabe> I think so - i'm going to do a spot check of the AMIs
<ravanelli> ok
ccha has quit [Ping timeout: 260 seconds]
azukku has joined #fedora-coreos
ccha has joined #fedora-coreos
bgilbert has joined #fedora-coreos
ccha has quit [Ping timeout: 248 seconds]
<dustymabe> lucab: jlebon: do you mind looking over https://github.com/coreos/fedora-coreos-streams/pull/519 for issues (added s390x arch for testing/next)
ccha has joined #fedora-coreos
<ravanelli> I should probably update the time that already passed
<ravanelli> or Is it not an issue?
plarsen has joined #fedora-coreos
plarsen has quit [Remote host closed the connection]
plarsen has joined #fedora-coreos
<lucab> ravanelli: not a real issue, the rollouts will just immediately start from something slightly larger than 0% (e.g. 0.5%) instead of starting from 0% and growing linearly to 0.5%
plarsen has quit [Ping timeout: 255 seconds]
fifofonix has joined #fedora-coreos
<jlebon> i think ravanelli is afk, so i just merged it to not let it grow larger :)
dustymabe has quit [Quit: WeeChat 3.4]
<ravanelli> jlebon: thanks =)
dustymabe has joined #fedora-coreos
jcajka has quit [Quit: Leaving]
fifofonix has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Betal has joined #fedora-coreos
<jlebon> travier[m], walters: should we hop on a call at some point to hash out https://github.com/coreos/coreos-assembler/pull/2934 ?
<travier[m]> I have approximately 13 min then I have the OKD WG Meeting
<travier[m]> then it's better tomorrow for me
<jlebon> travier[m]: WDYT about the last comment I added there?
<travier[m]> looking
<jlebon> if we can stop dirtying the config dir, i'm cool with it
<dustymabe> anyone know if the CI for ostree builds artifacts I can curl and test with? i.e. https://github.com/ostreedev/ostree/pull/2632
<travier[m]> jlebon: with the changes from that PR we don't need to override any links and we can directly place those symlinks in gitignore.
<jlebon> dustymabe: not currently, no
<travier[m]> but I understand that this is still not ideal
<travier[m]> we also have the repos "dirtying" the config right now for rhcos
<travier[m]> and the content set file
<jlebon> yeah, the RHCOS setup is really awkward
<travier[m]> And I also agree that the SHA will not define anymore which version is built
<travier[m]> we would have to store the variant name / version somewhere alonside it to keep track of it
<travier[m]> it will however still correctly reflect the commit used.
<jlebon> i think we can roll with this for now, though i do wonder if all this would be cleaner if it were separate branches instead and shared files like FCOS is setup. anyway, we can discuss this more down the line
<travier[m]> I would have like having separated branches but this would have required some CI setup that is complex with Prow in openshift org
<jlebon> (so the flip side of what walters said at the end of https://github.com/coreos/coreos-assembler/pull/2934#issuecomment-1161661690)
<travier[m]> It's already complex enough to build RHCOS to test it there
<travier[m]> Maybe we can do the reverse and move FCOS to a single branch
<travier[m]> liked*
<jlebon> mayyybe. the benefits would have to be really worth it
<travier[m]> (going to OKD WG meeting)
<jlebon> ack, ttyl!
<travier[m]> We can have a quick chat after in an hour if you'd like
cmagina has quit [Changing host]
cmagina has joined #fedora-coreos
jpn has quit [Ping timeout: 256 seconds]
<ravanelli> dustymabe: jlebon last question about the release, the graph for s390x was not generated https://builds.coreos.fedoraproject.org/graph?stream=next&basearch=s390x
<ravanelli> What should we do to create it?
<ravanelli> + we need to update the issue template to add s390x, seems I don't have to update it
<ravanelli> s/have/have access
saqali_ has quit [Ping timeout: 248 seconds]
saqali_ has joined #fedora-coreos
jpn has joined #fedora-coreos
tormath1 has quit [Quit: leaving]
azukku has quit [Quit: Leaving.]
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 240 seconds]
<dustymabe> ravanelli: I think you're right. looks like we need to do something for s390x
<dustymabe> cc bgilbert jlebon
jpn has joined #fedora-coreos
<jlebon> dustymabe: i think we just need https://github.com/coreos/fedora-coreos-cincinnati/pull/80, but will let lucab confirm
jpn has quit [Ping timeout: 256 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 256 seconds]
jpn has joined #fedora-coreos
<jlebon> all, just wanted to mention a few things about the @CoreOS/continuous COPR repo: (1) i've removed el8-based chroots because most upstream projects no longer build as is for various reasons, (2) i've enabled the aarch64, s390x, and ppc64le chroots; this means it's now possible to get multi-arch git main packages! as a result, i've closed https://pagure.io/releng/issue/9801 which pursued doing this in
<jlebon> koji via packit.
<bgilbert> \o/
jpn has quit [Ping timeout: 240 seconds]
crobinso has quit [Remote host closed the connection]
guesswhat has joined #fedora-coreos
<dustymabe> jlebon: let's try to re-intro the cosa generate-hashlist parallel workflow now: https://github.com/coreos/fedora-coreos-pipeline/pull/555
<dustymabe> and one more additional parallel thingy: https://github.com/coreos/fedora-coreos-pipeline/pull/556
<guesswhat> I am trying to expose zincati socket via socat , to scape it with prometheus, but I can not get it work, any ideas? ( /usr/bin/socat tcp-listen:9988,fork,reuseaddr unix-connect:/run/zincati/public/metrics.promsock )
<guesswhat> *scrape
<guesswhat> curl 127.0.0.1:9988 returns curl: (1) Received HTTP/0.9 when not allowed
<guesswhat> lucab any ideas? i saw your exporter.. thanks
<dustymabe> guesswhat: never done it before but I highly doubt the socket is going to speak HTTP (which is what curl is trying to use)
<guesswhat> but its working for docker.sock afaik
<guesswhat> its working like this echo -e "GET / HTTP/1.0\r\n" | socat unix-connect:/run/zincati/public/metrics.promsock STDIO
<dustymabe> guesswhat: maybe i'm wrong then :)
<guesswhat> its working even with empty stdin echo "" | socat unix-connect:/run/zincati/public/metrics.promsock STDIO
<dustymabe> maybe specify different http versions to curl? --http1.0 --http1.1 etc..
<dustymabe> --http0.9
<guesswhat> yeah, it works, but not sure how to use it in prometheus :X
<dustymabe> gursewak: jlebon: now that we've shipped the releases this week we should be able to proceed with the `rd.debug` change we discussed last week
<dustymabe> OR - should we wait until we see the problem again and then enable `rd.debug` ?
<jlebon> sure, that SGTM
<jlebon> (the latter)
crobinso has joined #fedora-coreos
dwalsh_ has quit [Quit: Leaving]
saqali__ has joined #fedora-coreos
<dustymabe> crobinso: will we need to wait the full 14 days for https://bodhi.fedoraproject.org/updates/FEDORA-2022-0142d562ca to land?
<crobinso> dustymabe: pushed now :)
saqali_ has quit [Ping timeout: 264 seconds]
<dustymabe> crobinso: thanks!
crobinso has quit [Remote host closed the connection]
heldwin has quit [Quit: Teleporting ...]
nalind has quit [Quit: bye]
cyberpear has quit [Quit: Connection closed for inactivity]
mheon has quit [Ping timeout: 248 seconds]