#fedora-coreos on 2022-08-03 — irc logs at libera.irclog.whitequark.org

2022-05-11 12:42 dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos

00:01 crobinso has quit [Remote host closed the connection]

00:24 gursewak has quit [Ping timeout: 240 seconds]

01:53 gursewak has joined #fedora-coreos

02:22 jpn has joined #fedora-coreos

02:28 jpn has quit [Ping timeout: 268 seconds]

03:18 jpn has joined #fedora-coreos

03:32 gursewak has quit [Ping timeout: 240 seconds]

04:15 gursewak has joined #fedora-coreos

04:28 paragan has joined #fedora-coreos

04:42 paragan has quit [Quit: Leaving]

05:50 arnulfo_7 has quit [Read error: Connection reset by peer]

05:50 arnulfo_7 has joined #fedora-coreos

05:50 arnulfo_7 has quit [Changing host]

05:50 arnulfo_7 has joined #fedora-coreos

05:53 arnulfo_7 has quit [Read error: Connection reset by peer]

05:53 arnulfo_7 has joined #fedora-coreos

05:53 arnulfo_7 has quit [Changing host]

05:53 arnulfo_7 has joined #fedora-coreos

06:20 jpn has quit [Ping timeout: 252 seconds]

06:23 jpn has joined #fedora-coreos

06:27 jcajka has joined #fedora-coreos

06:35 paragan has joined #fedora-coreos

06:46 bgilbert has quit [Ping timeout: 268 seconds]

07:28 jpn has quit [Ping timeout: 268 seconds]

07:33 jpn has joined #fedora-coreos

07:42 jpn has quit [Ping timeout: 245 seconds]

08:18 jpn has joined #fedora-coreos

09:23 arnulfo_7 has quit [Read error: Connection reset by peer]

09:23 arnulfo_7 has joined #fedora-coreos

09:27 Betal has quit [Quit: WeeChat 3.6]

09:59 crobinso has joined #fedora-coreos

11:05 jpn has quit [Ping timeout: 268 seconds]

11:37 jpn has joined #fedora-coreos

11:50 jpn has quit [Ping timeout: 252 seconds]

12:00 nalind has joined #fedora-coreos

12:16 jpn has joined #fedora-coreos

12:20 ravanelli has joined #fedora-coreos

12:22 jpn has quit [Ping timeout: 240 seconds]

13:08 <dustymabe> hey jlebon - for some reason cosa pods in our main pipeline aren't coming up - the jnlp container has this error: https://paste.centos.org/view/69b6ac60

13:09 <dustymabe> the "provided port:50000 is not reachable" looks similar to what we saw in the staging pipeline the other day

13:10 <dustymabe> I was going to cycle the jenkins pod anyway (to pick up new secrets I created) so maybe that will help

13:15 aleeku_ has quit [Ping timeout: 268 seconds]

13:16 aleeku_ has joined #fedora-coreos

13:17 jpn has joined #fedora-coreos

13:30 aleeku_ has quit [Ping timeout: 245 seconds]

13:46 aleeku_ has joined #fedora-coreos

14:02 <dustymabe> I just disabled the pipeline for now and killed any jobs (that weren't completing anyway because no pods were coming up)

14:40 <dustymabe> Maybe we can fix https://github.com/coreos/fedora-coreos-pipeline/pull/583 when we do this cycle

14:44 <jlebon> dustymabe: we need to make sure the jenkins is up to date and then cycle it

14:45 <jlebon> yup, the k8s-plugin-tweaks.yaml is there. i'm not sure how, I hadn't created it yet when I originally rolled out that PR.

14:46 <jlebon> new pod coming up. let's see... hopefully the PVC won't throw it off

14:46 <jlebon> brb

14:47 <dustymabe> hmm. did we want to merge https://github.com/coreos/fedora-coreos-pipeline/pull/586 before cycling?

14:51 <dustymabe> also should we go ahead and get https://github.com/coreos/fedora-coreos-pipeline/pull/582 in or is that still WIP?

14:51 <jlebon> ahh sorry, i had already cycled it. but i've got bad news

14:52 <jlebon> the kube cloud config wasn't restored, i think because of the PVC

14:53 <jlebon> casc runs on every jenkins start, but the auto-cloud configuration happens on the first start only

14:53 <dustymabe> hmm. I feel like cycling jenkins didn't yield it inoperable in the past?

14:54 <jlebon> so i think we need to nuke the PVC. there's a way to retain logs i think.

14:55 <jlebon> i think for most things, yes. but in this case, the cloud config added by the s2i run script was clobbered

14:55 <jlebon> hmm, let me check something

14:55 <dustymabe> and that was clobbered because https://github.com/coreos/fedora-coreos-pipeline/commit/fc3532704a0c3082810d34ca461d52747a53150a ?

14:56 <jlebon> yup, exactly

14:58 <dustymabe> ok, let me know how you want to proceed. we can nuke/pave if needed

15:00 <dustymabe> here's my plan once we are ready to rollout the build-cosa changes

15:00 <dustymabe> 1. merge PR https://github.com/openshift/release/pull/31015

15:00 <dustymabe> 2. merge PR https://github.com/coreos/fedora-coreos-pipeline/pull/586

15:00 <dustymabe> 3. oc delete configmap/jenkins-casc-cfg

15:00 <dustymabe> 4. oc create configmap jenkins-casc-cfg --from-file=jenkins/config

15:00 <dustymabe> 5. oc scale dc/jenkins --replicas=0

15:00 <dustymabe> 6. oc scale dc/jenkins --replicas=1

15:08 <jlebon> dustymabe: ok let's try https://github.com/coreos/fedora-coreos-pipeline/pull/587

15:11 <jlebon> to be clear, there's definitely a chance we lose build logs with this, which would be unfortunate but not a big deal either

15:13 <dustymabe> jlebon: honestly I wouldn't mind losing logs every once in a while (I think starting fresh and making sure our steps and code for fresh bringup are accurate is worth the loss)

15:15 <jlebon> dustymabe: agreed

15:17 <dustymabe> jlebon: let me know when we should proceed

15:22 <jlebon> dustymabe: sorry, just taking some time to evaluate https://github.com/openshift/release/pull/31015

15:22 <dustymabe> 👍

15:28 <jlebon> dustymabe: https://github.com/openshift/release/pull/31015#issuecomment-1204108613

15:32 <dustymabe> working on it

15:34 <dustymabe> jlebon: updated

15:35 <jlebon> 👍

15:37 <jlebon> with that, the plan above SGTM

15:38 <dustymabe> I guess we're now blocked on CI for that PR?

15:38 <jlebon> let me try to get 582 ready

15:38 <jlebon> indeed

15:39 <jlebon> in the past, changing the mirroring bits required approval from other owners, but looks like that changed recently

15:40 <jlebon> dustymabe: can you insert a step between 5 and 6 to rerun ./deploy?

15:40 <jlebon> it's needed for #587

15:43 <dustymabe> jlebon: `./deploy --official`?

15:45 <jlebon> dustymabe: yup!

15:45 <dustymabe> looks like https://github.com/openshift/release/pull/31015 passed CI - and it's been approved.. what's blocking it being merged?

15:47 <jlebon> "Only merges with author openshift-bot are currently allowed"... interesting

15:47 <jlebon> https://prow.ci.openshift.org/pr?query=is%3Apr+repo%3Aopenshift%2Frelease+author%3Adustymabe+head%3Adusty-drop-cosa-mirroring

15:47 stephan has quit [Ping timeout: 245 seconds]

15:47 <jlebon> let's ask internally about that

15:48 <dustymabe> can you tag me in the conversation?

15:51 paragan has quit [Quit: Leaving]

15:51 stephan has joined #fedora-coreos

15:52 <jlebon> done!

15:54 jcajka has quit [Quit: Leaving]

15:54 <dustymabe> anything else we can do in the meantime?

15:56 <jlebon> we could temporarily drop the githubPush() trigger and roll it out now

15:57 <jlebon> then add it back in once the openshift/release PR is merged

15:59 <dustymabe> but won't anything pushed get cloberred by the syncing done by registry.ci ?

15:59 <dustymabe> oh actually - let's just make set the bot permissions to "read"

15:59 <dustymabe> then we can be unblocked, right?

16:00 BobSlept has quit [Quit: You have been kicked for being idle]

16:01 <jlebon> well, we would only test it on a side branch not covered by CI. but yeah, flipping the bot perms is nicer.

16:02 <dustymabe> ok so new set of steps

16:02 <dustymabe> 1. change bot perms for openshift_ci_cosa_push to "read"

16:02 <dustymabe> 2. merge PR https://github.com/coreos/fedora-coreos-pipeline/pull/586

16:02 <dustymabe> 3. oc delete configmap/jenkins-casc-cfg

16:02 <dustymabe> 4. oc create configmap jenkins-casc-cfg --from-file=jenkins/config

16:03 <dustymabe> 5. oc scale dc/jenkins --replicas=0

16:03 <dustymabe> 6. ./deploy --official

16:03 <dustymabe> 7. oc scale dc/jenkins --replicas=1

16:03 <jlebon> LGTM

16:06 <dustymabe> I'm at step 2 (just completed)

16:06 <dustymabe> i'll note before I execute further steps that the sync-stream-metadata job is having trouble starting pods

16:07 <dustymabe> jlebon: expected?

16:07 <dustymabe> ahh I think the answer is yes

16:07 <jlebon> yes, expected

16:07 <dustymabe> i.e. that's why I need to run deploy again

16:07 <dustymabe> ok

16:07 <dustymabe> continuing

16:10 <dustymabe> ok I completed all the steps

16:10 <dustymabe> let's see if the sync-stream-metadata pods come up now

16:11 <dustymabe> still seeing "provided port:50000 is not reachable" errors

16:11 <jlebon> hmm no, the cloud config is still missing

16:12 <jlebon> digging

16:12 <jlebon> grrr. used the wrong var name.

16:12 <dustymabe> :)

16:13 <jlebon> though actually, we do want that one too, so i'll just leave it ;)

16:13 <jlebon> working on a patch

16:13 <dustymabe> kk

16:13 <dustymabe> after this do I need to start over a step 3 or step 5 ?

16:14 <jlebon> no wait, i did type it correctly

16:14 <jlebon> hmm, it's like deploy didn't apply the change

16:14 <jlebon> were you on the latest git main?

16:15 <jlebon> $ oc get dc jenkins -o yaml | grep OVERRIDE_PV_CONFIG_WITH_IMAGE_CONFIG

16:15 <jlebon> $

16:17 <dustymabe> https://paste.centos.org/view/44081de7

16:18 <jlebon> oh right of course

16:18 <jlebon> jenkins.yaml isn't handled by deploy

16:18 <dustymabe> yep

16:18 <dustymabe> hand edit?

16:19 <lucab> jlebon: were you planning to take https://github.com/coreos/fedora-coreos-config/pull/1890 too later?

16:19 <jlebon> dustymabe: let me do it

16:19 <jlebon> lucab: sure, will do

16:20 <jlebon> dustymabe: new pod coming up

16:23 <dustymabe> interesting..

16:23 <dustymabe> only the seed job remains :)

16:23 <dustymabe> expected?

16:24 <jlebon> so if i'm right

16:24 <jlebon> once we seed, all logs should magically be there

16:24 ravanelli has quit [Remote host closed the connection]

16:25 <dustymabe> shall I run or you?

16:25 <jlebon> sadly not. oh well! :)

16:25 <jlebon> ran it already :)

16:25 <dustymabe> looks like you did!

16:25 <dustymabe> ok :)

16:25 <dustymabe> i'm running the build-cosa job!

16:26 <jlebon> +1

16:28 <lucab> aaradhak davdunc dustymabe gursewak jaimelm jbrooks jcajka jdoss jlebon jmarrero lorbus miabbott nasirhm ravanelli saqali skunkerk walters

16:28 <lucab> FCOS community meeting in #fedora-meeting-1

16:28 <lucab> If you don't want to be pinged remove your name from this file: https://github.com/coreos/fedora-coreos-tracker/blob/main/meeting-people.txt

16:31 mnguyen has joined #fedora-coreos

16:36 aaradhak has joined #fedora-coreos

16:39 crobinso has quit [Remote host closed the connection]

16:41 ravanelli has joined #fedora-coreos

16:41 bgilbert has joined #fedora-coreos

16:53 <dustymabe> success! https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/blue/organizations/jenkins/build-cosa/detail/build-cosa/1/pipeline/58

16:53 <dustymabe> (sorry for the non-public link)

17:07 <bgilbert> need a second stamp: https://github.com/coreos/fedora-coreos-streams/pull/544

17:09 mnguyen_ has joined #fedora-coreos

17:23 Betal has joined #fedora-coreos

17:28 ravanelli has quit [Remote host closed the connection]

17:44 <dustymabe> jlebon: can you help me with the webhook for COSA?

17:48 <dustymabe> actually I think I just added it - let's see if it works

17:48 <jlebon> dustymabe: it should be auto-added

17:48 <dustymabe> auto-added by what?

17:48 <jlebon> jenkins

17:49 <dustymabe> hmm - I didn't see one on https://github.com/coreos/coreos-assembler/settings/hooks so I created it

17:49 <dustymabe> the jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org one

17:49 <jlebon> i think it's done every X period or on some events or something

17:49 <jlebon> but you can ask it manually too

17:50 <jlebon> on the jenkins configuration page

17:50 <dustymabe> should I delete what I just created?

17:50 <jlebon> sure, and i'll tickle it

17:50 <dustymabe> ok

17:50 <dustymabe> deleted

17:50 <jlebon> ok done

17:51 <dustymabe> ok I see it now

17:51 <jlebon> hmm weird

17:51 <dustymabe> are all the other hooks in there needed?

17:51 <jlebon> i wonder why the coreos-ci one has issue_comment too

17:52 <jlebon> actually, the app.ci ones no. but let's leave them for now until we're sure we're not reverting the release PR

17:52 <dustymabe> ok i'm going to go eat lunch

17:52 <jlebon> same :)

18:25 jpn has quit [Ping timeout: 268 seconds]

18:46 aaradhak has quit [Quit: Connection closed for inactivity]

18:57 jpn has joined #fedora-coreos

19:11 ravanelli has joined #fedora-coreos

19:47 jpn has quit [Ping timeout: 268 seconds]

19:48 jpn has joined #fedora-coreos

19:53 jpn has quit [Ping timeout: 252 seconds]

19:53 jpn has joined #fedora-coreos

20:24 jpn has quit [Ping timeout: 268 seconds]

20:25 nalind has quit [Quit: bye]

20:25 jpn has joined #fedora-coreos

20:30 jpn has quit [Ping timeout: 268 seconds]

20:33 <dustymabe> jlebon: another option is that we just autotrigger builds (webhook) for `main` and require manual build for the other branches

20:37 <jlebon> dustymabe: not ideal, but that works, yeah

20:37 <jlebon> i'm confused why it's only spawning a single job. but anyway, even if it spawned for all branches, we still have the PVC problem

20:38 <jlebon> i was looking at https://plugins.jenkins.io/generic-webhook-trigger/ which looks really powerful, but needs more configuration

20:51 <dustymabe> jlebon: are you triggering the jobs manually?

20:51 <jlebon> i haven't so far. i was testing stuff by redelivering webhook events from the github UI

20:51 <dustymabe> ahh ok

20:57 jpn has joined #fedora-coreos

20:59 <dustymabe> I have to head out for now

20:59 <dustymabe> will catch back up later

21:09 samuelb has quit [Quit: ZNC 1.8.2 - https://znc.in]

21:31 gursewak has quit [Ping timeout: 240 seconds]

21:33 ravanelli has quit [Remote host closed the connection]

21:35 jpn has quit [Ping timeout: 268 seconds]

21:47 gursewak has joined #fedora-coreos

21:56 ravanelli has joined #fedora-coreos

21:59 jpn has joined #fedora-coreos

22:06 jpn has quit [Ping timeout: 252 seconds]

22:16 ravanelli has quit [Remote host closed the connection]

22:20 jpn has joined #fedora-coreos

22:32 jpn has quit [Ping timeout: 268 seconds]

22:45 jpn has joined #fedora-coreos

22:52 jpn has quit [Ping timeout: 240 seconds]

22:58 jpn has joined #fedora-coreos

23:02 jpn has quit [Ping timeout: 252 seconds]

23:17 jpn has joined #fedora-coreos

23:23 jpn has quit [Ping timeout: 252 seconds]

23:32 gursewak has quit [Remote host closed the connection]

23:32 gursewak_ has joined #fedora-coreos

23:39 mnguyen has quit [Ping timeout: 268 seconds]

23:39 mnguyen has joined #fedora-coreos

23:40 mnguyen_ has quit [Ping timeout: 268 seconds]

23:40 mnguyen_ has joined #fedora-coreos

23:54 jpn has joined #fedora-coreos

23:56 ravanelli has joined #fedora-coreos

23:58 jpn has quit [Ping timeout: 252 seconds]