#fedora-coreos on 2022-12-13 — irc logs at libera.irclog.whitequark.org

2022-05-11 12:42 dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos

00:00 michele- has quit [*.net *.split]

00:00 strigazi has quit [*.net *.split]

00:00 pej0703 has quit [*.net *.split]

00:00 djinni`_ has quit [*.net *.split]

00:00 ikonia has quit [*.net *.split]

00:00 sayan has quit [*.net *.split]

00:00 strigazi has joined #fedora-coreos

00:00 pej0703 has joined #fedora-coreos

00:02 djinni` has joined #fedora-coreos

00:02 michele_ has joined #fedora-coreos

00:04 sayan has joined #fedora-coreos

00:05 ikonia has joined #fedora-coreos

00:33 jpn has quit [Ping timeout: 268 seconds]

01:00 jpn has joined #fedora-coreos

01:33 <dustymabe> spresti[m]: i merged two of them (and I see you merged the other one)

01:33 <dustymabe> so they are all merged now

01:34 <dustymabe> i'll kick off the builds so they can run overnight and hopefully a lot of the work will be done by the time we wake up

01:45 jpn has quit [Ping timeout: 265 seconds]

01:47 plarsen has quit [Quit: NullPointerException!]

01:52 ravanelli has quit [Remote host closed the connection]

02:12 jpn has joined #fedora-coreos

02:25 jpn has quit [Ping timeout: 246 seconds]

02:46 nb has quit [Quit: The Lounge - https://thelounge.chat]

02:47 nb has joined #fedora-coreos

02:51 jpn has joined #fedora-coreos

02:58 jpn has quit [Ping timeout: 268 seconds]

03:12 jpn has joined #fedora-coreos

03:15 hyperreal has joined #fedora-coreos

03:30 vgoyal has quit [Quit: Leaving]

04:06 jpn has quit [Ping timeout: 268 seconds]

04:16 paragan has joined #fedora-coreos

04:19 jpn has joined #fedora-coreos

04:21 mnguyen_ has quit [Ping timeout: 246 seconds]

04:25 jpn has quit [Ping timeout: 246 seconds]

04:39 jpn has joined #fedora-coreos

04:55 jtgreene has quit [Quit: fBNC - https://bnc4free.com]

05:03 jpn has quit [Ping timeout: 272 seconds]

05:14 jpn has joined #fedora-coreos

05:19 jpn has quit [Ping timeout: 252 seconds]

05:23 paragan has quit [Ping timeout: 268 seconds]

05:30 paragan has joined #fedora-coreos

05:30 jtgreene has joined #fedora-coreos

05:30 jtgreene has quit [Changing host]

05:32 jpn has joined #fedora-coreos

05:38 jpn has quit [Ping timeout: 260 seconds]

05:56 jpn has joined #fedora-coreos

06:15 jpn has quit [Ping timeout: 272 seconds]

06:52 jcajka has joined #fedora-coreos

06:57 jpn has joined #fedora-coreos

07:17 ravanelli has joined #fedora-coreos

07:21 ravanelli has quit [Ping timeout: 246 seconds]

07:46 saschagrunert has joined #fedora-coreos

08:29 hotbox has quit [Ping timeout: 265 seconds]

08:36 jpn has quit [Ping timeout: 272 seconds]

08:56 c4rt0 has joined #fedora-coreos

09:49 jpn has joined #fedora-coreos

11:00 ravanelli has joined #fedora-coreos

11:48 ravanelli has quit [Remote host closed the connection]

11:51 ravanelli has joined #fedora-coreos

12:15 ravanelli has quit [Remote host closed the connection]

12:17 ravanelli has joined #fedora-coreos

12:18 ravanelli has quit [Remote host closed the connection]

12:19 ravanelli has joined #fedora-coreos

12:26 vgoyal has joined #fedora-coreos

12:28 ravanelli has quit [Remote host closed the connection]

12:28 ravanelli has joined #fedora-coreos

12:39 mnguyen_ has joined #fedora-coreos

12:43 jpn has quit [Ping timeout: 272 seconds]

13:12 plundra has quit [Quit: fraggeln]

13:12 plundra has joined #fedora-coreos

13:34 <dustymabe> spresti[m]: looks like the builds from last night failed - i'll help look into the failures after I drop kids off at school

13:37 fifofonix has joined #fedora-coreos

13:49 jpn has joined #fedora-coreos

13:57 hyperreal has quit [Quit: The Lounge - https://thelounge.chat]

14:03 <spresti[m]> Ah thank you for kicking those off.

14:03 <spresti[m]> Ok checking on it now.

14:04 hyperreal has joined #fedora-coreos

14:11 admin0 has joined #fedora-coreos

14:15 <spresti[m]> Hmm, I wonder; It seems like two of the builds exited on the same error, [next,test] "Importing failed: Command '['ostree', '--repo=/mnt/koji/compose/ostree/repo', 'pull-local', '/tmp/tmppb75h0j5', '607ca27fa44a4b074c33bcc4979bf476725dd513a80c530ab1b45be332f4dfdd']"

14:23 <dustymabe> spresti[m]: looking closer now

14:24 <spresti[m]> kk

14:24 c4rt0 has quit [Remote host closed the connection]

14:25 <dustymabe> ok it appears to me that the ostree-importer might not be in working order

14:25 <dustymabe> investigating that now

14:28 c4rt0_ has joined #fedora-coreos

14:29 rsalveti has joined #fedora-coreos

14:30 <dustymabe> it looks like we're almost out of free space on that volume and the ostree importer is failing because of that

14:31 nalind has joined #fedora-coreos

14:38 <dustymabe> ok I dropped the free space percent check to 1% and will work to run a prune operation in the background

14:39 <dustymabe> for now we should be unblocked I think

14:39 <dustymabe> jlebon: around? I'd like to discuss implications of the failures if you don't mind

14:55 <spresti[m]> thank you dustymabe !

14:59 <dustymabe> spresti[m]: let's hold off on new builds for now

14:59 <dustymabe> still need to discuss some things with jlebon - and we still don't know exactly why the `stable` build failed

15:07 <spresti[m]> Ah sorry.

15:09 <jlebon> dustymabe: here now

15:10 <jlebon> ouch, the volume on which the compose repo is hosted is running out of space?

15:12 <dustymabe> jlebon: spresti[m]: for the `stable` failure: https://github.com/coreos/fedora-coreos-config/pull/2125

15:13 <dustymabe> jlebon: yeah, I never got back arond to running the ostree pruner consistently so it's expected to eventually hit the problem

15:14 <dustymabe> that's probably a gap I should close this week

15:15 <dustymabe> jlebon: I think the testing and next builds are salvagable, though

15:15 <dustymabe> since the ostree import is like the last thing that runs

15:15 <dustymabe> and the release job will run it again to import into the prod repo

15:15 <jlebon> dustymabe: huh, i had thought it was up and operational already. is the repo on its own volume or shared with other koji things? is this affeting anyone else?

15:16 <jlebon> i guess at least the other ostree-based variants

15:16 <dustymabe> jlebon: yeah, theoretically the pungi composes would have this same problem

15:16 <dustymabe> I'll run a prune operation after we get this set of FCOS releases out the door

15:17 <jlebon> sounds good

15:17 <dustymabe> do you agree that the `testing` and `next` build should be salvagable?

15:18 <dustymabe> we'll have to manually run cloud tests, but that shouldn't be too hard to do

15:19 <dustymabe> "manually run" == kick the jenkins jobs off with the right parameters

15:19 c4rt0_ has quit [Quit: Leaving]

15:20 c4rt0 has joined #fedora-coreos

15:22 <jlebon> dustymabe: restarted CI

15:22 <jlebon> let me check the jobs re. testing and next

15:23 <dustymabe> IIRC when the release job runs the ostree import will happen in the prod repo then. It isn't fatal if the commit doesn't exist in the compose ostree repo already

15:24 * dustymabe opens a PR to move the cloud tests to before the ostree import

15:27 <jlebon> dustymabe: agreed. the importer looks like it should be able to handle that fine.

15:28 <jlebon> i'll start the cloud tests. starting with kola-aws in case someone wants to do another

15:28 <dustymabe> I can handle gcp

15:31 <dustymabe> ok i'll do openstack now

15:31 <jlebon> aws started

15:32 <jlebon> i'll do azure

15:32 <jlebon> done

15:35 <spresti[m]> jlebon: Standup

15:38 paragan has quit [Quit: Leaving]

16:01 * dustymabe brb

16:01 plarsen has joined #fedora-coreos

16:02 <spresti[m]> Also brb

16:15 saschagrunert has quit [Remote host closed the connection]

16:29 <dustymabe> spresti[m]: jlebon: https://github.com/coreos/fedora-coreos-config/pull/2126

16:40 <spresti[m]> LGTM

16:46 <jlebon> travier[m], marmijo[m]: let me know your thoughts on https://github.com/coreos/fedora-coreos-pipeline/pull/787/files#r1047426745 or if you want to sync

16:49 <spresti[m]> Sweet! LGTM what are the next steps once that is merged? can I proceed with the builds?

16:49 <spresti[m]> * dustymabe: Sweet! LGTM

16:50 <dustymabe> spresti[m]: we can start the stable build after https://github.com/coreos/fedora-coreos-config/pull/2126 merges

16:57 <spresti[m]> kk

17:14 plarsen has quit [Remote host closed the connection]

17:16 plarsen has joined #fedora-coreos

17:28 jpn has quit [Ping timeout: 260 seconds]

17:33 jpn has joined #fedora-coreos

17:49 admin00 has joined #fedora-coreos

17:49 ramcq1 has joined #fedora-coreos

17:51 mboddu_ has joined #fedora-coreos

17:54 x3mboy1 has joined #fedora-coreos

17:55 ramcq has quit [Ping timeout: 252 seconds]

17:55 mboddu has quit [Ping timeout: 252 seconds]

17:55 admin0 has quit [Ping timeout: 252 seconds]

17:55 DeaDSouL[m]1 has quit [Ping timeout: 252 seconds]

17:55 jaimelm has quit [Remote host closed the connection]

17:55 davdunc has quit [Ping timeout: 252 seconds]

17:55 x3mboy has quit [Ping timeout: 252 seconds]

17:55 mnaser has quit [Ping timeout: 252 seconds]

17:55 OnuralpSezerhehi has quit [Ping timeout: 252 seconds]

17:55 admin00 is now known as admin0

17:55 mnaser_ has joined #fedora-coreos

17:57 davdunc has joined #fedora-coreos

17:57 jaimelm has joined #fedora-coreos

17:58 OnuralpSezerhehi has joined #fedora-coreos

18:01 jpn has quit [Ping timeout: 256 seconds]

18:03 DeaDSouL[m]1 has joined #fedora-coreos

18:07 jpn has joined #fedora-coreos

18:11 jpn has quit [Ping timeout: 256 seconds]

18:34 MHamzahKhan[m] has joined #fedora-coreos

18:34 fifofonix has quit [Quit: Textual IRC Client: www.textualapp.com]

18:35 jpn has joined #fedora-coreos

18:41 <dustymabe> spresti[m]: https://github.com/coreos/fedora-coreos-config/pull/2126

18:41 fifofonix has joined #fedora-coreos

18:41 <dustymabe> i merged ti

18:41 fifofonix has quit [Client Quit]

18:43 <spresti[m]> Sweet! thank you

18:44 <spresti[m]> K just kicked a stable build

18:45 <travier[m]> jlebon marmijo We followed it and I made an issue in cosa for your suggestion. Not sure how practical it would be but we can try

18:49 <spresti[m]> And to be clear even though the build failed for [next && testing] its ok to use that build and I dont need to rebuild?

18:50 <spresti[m]> * failed for \[next &&, * && testing\] its, * to rebuild? dustymabe

19:15 poppajarv has quit [Read error: Connection reset by peer]

19:15 poppajarv has joined #fedora-coreos

19:18 <dustymabe> spresti[m]: right. the build shows up as failed but jlebon and I determined the step was at the end and non-fatal (the release job will clean it up)

19:19 <dustymabe> so testing and next should be good - just add a comment to the streams issues noting this

19:24 <spresti[m]> Ok I will, finishing up another task and then I will jump back on it.

19:25 <walters> hooray! https://marc.info/?l=linux-xfs&m=167095228315779&w=2

19:35 <spresti[m]> dustymabe: sorry for all the questions, but is this the same for the failing aarch / x390 builds?

19:38 jcajka has quit [Quit: Leaving]

19:50 fifofonix has joined #fedora-coreos

20:05 <dustymabe> spresti[m]: yes

20:06 <spresti[m]> Ok thank you

20:11 jpn has quit [Quit: Lost terminal]

20:15 vgoyal has quit [Quit: Leaving]

20:34 justJanne has quit [Ping timeout: 255 seconds]

21:11 hyperreal has quit [Quit: The Lounge - https://thelounge.chat]

21:12 vgoyal has joined #fedora-coreos

21:14 hyperreal has joined #fedora-coreos

21:27 hyperreal has quit [Quit: The Lounge - https://thelounge.chat]

21:28 hyperreal has joined #fedora-coreos

21:30 vgoyal_ has joined #fedora-coreos

21:32 vgoyal has quit [Ping timeout: 252 seconds]

21:34 <dustymabe> jlebon: mind a review on these: https://github.com/coreos/fedora-coreos-releng-automation/pull/79 https://pagure.io/fedora-infra/ansible/pull-request/1278

21:43 <spresti[m]> Got release failures on [next, testing] https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/job/release/274/console https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/job/release/275/console

21:43 <dustymabe> yeah those two release jobs failed because: plume: couldn't publish image in ap-southeast-1

21:44 <dustymabe> let's see if the stable job succeeds

21:45 <dustymabe> might be a transient issue

21:47 <dustymabe> jlebon: it failed in `stable` too

21:48 <jlebon> fun. let me check the aws status board

21:48 <dustymabe> oh wait - I was looking at the wrong place

21:48 <dustymabe> it hasn't got there for stable yet

21:48 justJanne has joined #fedora-coreos

21:51 <jlebon> nothing on https://health.aws.amazon.com/health/status

21:53 <jlebon> "couldn't describe image: InternalError: An internal error has occurred" isn't very helpful either

21:53 <dustymabe> yeah

21:55 <jlebon> let me try reproducing it locally

21:55 <dustymabe> i think that region is hosed

21:56 <dustymabe> cc davdunc[m

21:56 <dustymabe> in the webUI when I navigate to that region I get `internal error occured`

21:56 <jlebon> dustymabe: nice

21:56 <dustymabe> other regions work

21:56 <spresti[m]> Oooof

21:57 <jlebon> reproduced locally as well

21:57 <dustymabe> so... what "half state" are we in right now?

21:58 <jlebon> and even a `aws ec2 describe-instances` gives me InternalError

21:59 <jlebon> i think it's possible some regions weren't made public depending on ordering

21:59 <dustymabe> right

21:59 <jlebon> we need a way to skip over it and rerun the release job

21:59 <dustymabe> option A: wait until morning (what are the implications of this?)

22:00 <dustymabe> option B: skip making AMIs public altogether (some regions won't be public)

22:00 <jlebon> option C: add e.g. `plume make-amis-public --skip-region` and then replay with that ninja'ed in?

22:01 <spresti[m]> What damage is it to wait till morning?

22:01 <dustymabe> that kind of implies a COSA dev/test cycle

22:01 <dustymabe> Option C does ^^

22:02 <jlebon> systems that manually rpm-ostree upgrade will get the update, but otherwise no machine will update yet

22:02 <dustymabe> we also have GCP and containers

22:02 <jlebon> right yup

22:03 <dustymabe> i.e. GCP image family will get the latest bits

22:03 <dustymabe> and people pulling directly from containers

22:03 <jlebon> so some workflows will get new content on fresh provisioning

22:03 <dustymabe> but not the end of the world?

22:03 <jlebon> existing nodes will not upgrade until we actually rollout

22:03 <jlebon> confusing, but *should* be fine, yet

22:03 <jlebon> s/yet/yeah/

22:04 <dustymabe> ok so let's wait until the morning and try again

22:04 <walters> https://health.aws.amazon.com/health/status is updated now

22:04 <walters> Longer term I think we should be thinking of things in "kubernetes controller" style of reconciling to desired state, not "job run once"

22:04 <dustymabe> walters: +1

22:05 <jlebon> walters: yeah, the release job is centered around that philosophy, except humans are the iterators right now :)

22:05 <dustymabe> well +1 for the status update :) - haven't though through the second statement

22:05 <dustymabe> spresti[m]: sync back up in the morning?

22:06 <spresti[m]> dustymabe: Yeah that sounds good

22:19 vgoyal_ has quit [Ping timeout: 256 seconds]

22:20 vgoyal_ has joined #fedora-coreos

22:20 <dustymabe> jlebon: one more: https://github.com/coreos/fedora-coreos-releng-automation/pull/161

22:21 <jlebon> dustymabe, spresti[m]: fwiw https://github.com/coreos/coreos-assembler/pull/3277

22:21 <jlebon> good to have even if we don't use it for this right now

22:22 <dustymabe> jlebon: instead of this WDYT about a `--best-effort mode

22:22 <dustymabe> or even making that the default

22:22 <dustymabe> and then just bail out at the end

22:23 <jlebon> dustymabe: can you comment in the PR? and we circle back on it tmw

22:23 <dustymabe> sure

22:23 <jlebon> automation PR reviewed!

22:23 <jlebon> see y'all tmw!

22:28 <dustymabe> one more PR (cc spresti[m] in case you are around still): https://github.com/coreos/fedora-coreos-releng-automation/pull/162

22:58 nalind has quit [Quit: bye for now]

23:31 vgoyal_ has quit [Quit: Leaving]

23:40 darknao has quit [Ping timeout: 260 seconds]

23:48 <dustymabe> the dashboard says the issues in ap-southeast-1 are resolved now - let me try a release job run

23:50 hyperreal has quit [Quit: The Lounge - https://thelounge.chat]

23:51 hyperreal has joined #fedora-coreos