dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos
crobinso has quit [Remote host closed the connection]
gursewak has quit [Ping timeout: 240 seconds]
gursewak has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
gursewak has quit [Ping timeout: 240 seconds]
gursewak has joined #fedora-coreos
paragan has joined #fedora-coreos
paragan has quit [Quit: Leaving]
arnulfo_7 has quit [Read error: Connection reset by peer]
arnulfo_7 has joined #fedora-coreos
arnulfo_7 has quit [Changing host]
arnulfo_7 has joined #fedora-coreos
arnulfo_7 has quit [Read error: Connection reset by peer]
arnulfo_7 has joined #fedora-coreos
arnulfo_7 has quit [Changing host]
arnulfo_7 has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
jpn has joined #fedora-coreos
jcajka has joined #fedora-coreos
paragan has joined #fedora-coreos
bgilbert has quit [Ping timeout: 268 seconds]
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 245 seconds]
jpn has joined #fedora-coreos
arnulfo_7 has quit [Read error: Connection reset by peer]
arnulfo_7 has joined #fedora-coreos
Betal has quit [Quit: WeeChat 3.6]
crobinso has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
nalind has joined #fedora-coreos
jpn has joined #fedora-coreos
ravanelli has joined #fedora-coreos
jpn has quit [Ping timeout: 240 seconds]
<dustymabe> hey jlebon - for some reason cosa pods in our main pipeline aren't coming up - the jnlp container has this error: https://paste.centos.org/view/69b6ac60
<dustymabe> the "provided port:50000 is not reachable" looks similar to what we saw in the staging pipeline the other day
<dustymabe> I was going to cycle the jenkins pod anyway (to pick up new secrets I created) so maybe that will help
aleeku_ has quit [Ping timeout: 268 seconds]
aleeku_ has joined #fedora-coreos
jpn has joined #fedora-coreos
aleeku_ has quit [Ping timeout: 245 seconds]
aleeku_ has joined #fedora-coreos
<dustymabe> I just disabled the pipeline for now and killed any jobs (that weren't completing anyway because no pods were coming up)
<dustymabe> Maybe we can fix https://github.com/coreos/fedora-coreos-pipeline/pull/583 when we do this cycle
<jlebon> dustymabe: we need to make sure the jenkins is up to date and then cycle it
<jlebon> yup, the k8s-plugin-tweaks.yaml is there. i'm not sure how, I hadn't created it yet when I originally rolled out that PR.
<jlebon> new pod coming up. let's see... hopefully the PVC won't throw it off
<jlebon> brb
<dustymabe> hmm. did we want to merge https://github.com/coreos/fedora-coreos-pipeline/pull/586 before cycling?
<dustymabe> also should we go ahead and get https://github.com/coreos/fedora-coreos-pipeline/pull/582 in or is that still WIP?
<jlebon> ahh sorry, i had already cycled it. but i've got bad news
<jlebon> the kube cloud config wasn't restored, i think because of the PVC
<jlebon> casc runs on every jenkins start, but the auto-cloud configuration happens on the first start only
<dustymabe> hmm. I feel like cycling jenkins didn't yield it inoperable in the past?
<jlebon> so i think we need to nuke the PVC. there's a way to retain logs i think.
<jlebon> i think for most things, yes. but in this case, the cloud config added by the s2i run script was clobbered
<jlebon> hmm, let me check something
<jlebon> yup, exactly
<dustymabe> ok, let me know how you want to proceed. we can nuke/pave if needed
<dustymabe> here's my plan once we are ready to rollout the build-cosa changes
<dustymabe> 3. oc delete configmap/jenkins-casc-cfg
<dustymabe> 4. oc create configmap jenkins-casc-cfg --from-file=jenkins/config
<dustymabe> 5. oc scale dc/jenkins --replicas=0
<dustymabe> 6. oc scale dc/jenkins --replicas=1
<jlebon> to be clear, there's definitely a chance we lose build logs with this, which would be unfortunate but not a big deal either
<dustymabe> jlebon: honestly I wouldn't mind losing logs every once in a while (I think starting fresh and making sure our steps and code for fresh bringup are accurate is worth the loss)
<jlebon> dustymabe: agreed
<dustymabe> jlebon: let me know when we should proceed
<jlebon> dustymabe: sorry, just taking some time to evaluate https://github.com/openshift/release/pull/31015
<dustymabe> 👍
<dustymabe> working on it
<dustymabe> jlebon: updated
<jlebon> 👍
<jlebon> with that, the plan above SGTM
<dustymabe> I guess we're now blocked on CI for that PR?
<jlebon> let me try to get 582 ready
<jlebon> indeed
<jlebon> in the past, changing the mirroring bits required approval from other owners, but looks like that changed recently
<jlebon> dustymabe: can you insert a step between 5 and 6 to rerun ./deploy?
<jlebon> it's needed for #587
<dustymabe> jlebon: `./deploy --official`?
<jlebon> dustymabe: yup!
<dustymabe> looks like https://github.com/openshift/release/pull/31015 passed CI - and it's been approved.. what's blocking it being merged?
<jlebon> "Only merges with author openshift-bot are currently allowed"... interesting
stephan has quit [Ping timeout: 245 seconds]
<jlebon> let's ask internally about that
<dustymabe> can you tag me in the conversation?
paragan has quit [Quit: Leaving]
stephan has joined #fedora-coreos
<jlebon> done!
jcajka has quit [Quit: Leaving]
<dustymabe> anything else we can do in the meantime?
<jlebon> we could temporarily drop the githubPush() trigger and roll it out now
<jlebon> then add it back in once the openshift/release PR is merged
<dustymabe> but won't anything pushed get cloberred by the syncing done by registry.ci ?
<dustymabe> oh actually - let's just make set the bot permissions to "read"
<dustymabe> then we can be unblocked, right?
BobSlept has quit [Quit: You have been kicked for being idle]
<jlebon> well, we would only test it on a side branch not covered by CI. but yeah, flipping the bot perms is nicer.
<dustymabe> ok so new set of steps
<dustymabe> 1. change bot perms for openshift_ci_cosa_push to "read"
<dustymabe> 3. oc delete configmap/jenkins-casc-cfg
<dustymabe> 4. oc create configmap jenkins-casc-cfg --from-file=jenkins/config
<dustymabe> 5. oc scale dc/jenkins --replicas=0
<dustymabe> 6. ./deploy --official
<dustymabe> 7. oc scale dc/jenkins --replicas=1
<jlebon> LGTM
<dustymabe> I'm at step 2 (just completed)
<dustymabe> i'll note before I execute further steps that the sync-stream-metadata job is having trouble starting pods
<dustymabe> jlebon: expected?
<dustymabe> ahh I think the answer is yes
<jlebon> yes, expected
<dustymabe> i.e. that's why I need to run deploy again
<dustymabe> ok
<dustymabe> continuing
<dustymabe> ok I completed all the steps
<dustymabe> let's see if the sync-stream-metadata pods come up now
<dustymabe> still seeing "provided port:50000 is not reachable" errors
<jlebon> hmm no, the cloud config is still missing
<jlebon> digging
<jlebon> grrr. used the wrong var name.
<dustymabe> :)
<jlebon> though actually, we do want that one too, so i'll just leave it ;)
<jlebon> working on a patch
<dustymabe> kk
<dustymabe> after this do I need to start over a step 3 or step 5 ?
<jlebon> no wait, i did type it correctly
<jlebon> hmm, it's like deploy didn't apply the change
<jlebon> were you on the latest git main?
<jlebon> $ oc get dc jenkins -o yaml | grep OVERRIDE_PV_CONFIG_WITH_IMAGE_CONFIG
<jlebon> $
<jlebon> oh right of course
<jlebon> jenkins.yaml isn't handled by deploy
<dustymabe> yep
<dustymabe> hand edit?
<lucab> jlebon: were you planning to take https://github.com/coreos/fedora-coreos-config/pull/1890 too later?
<jlebon> dustymabe: let me do it
<jlebon> lucab: sure, will do
<jlebon> dustymabe: new pod coming up
<dustymabe> interesting..
<dustymabe> only the seed job remains :)
<dustymabe> expected?
<jlebon> so if i'm right
<jlebon> once we seed, all logs should magically be there
ravanelli has quit [Remote host closed the connection]
<dustymabe> shall I run or you?
<jlebon> sadly not. oh well! :)
<jlebon> ran it already :)
<dustymabe> looks like you did!
<dustymabe> ok :)
<dustymabe> i'm running the build-cosa job!
<jlebon> +1
<lucab> aaradhak davdunc dustymabe gursewak jaimelm jbrooks jcajka jdoss jlebon jmarrero lorbus miabbott nasirhm ravanelli saqali skunkerk walters
<lucab> FCOS community meeting in #fedora-meeting-1
<lucab> If you don't want to be pinged remove your name from this file: https://github.com/coreos/fedora-coreos-tracker/blob/main/meeting-people.txt
mnguyen has joined #fedora-coreos
aaradhak has joined #fedora-coreos
crobinso has quit [Remote host closed the connection]
ravanelli has joined #fedora-coreos
bgilbert has joined #fedora-coreos
<dustymabe> (sorry for the non-public link)
mnguyen_ has joined #fedora-coreos
Betal has joined #fedora-coreos
ravanelli has quit [Remote host closed the connection]
<dustymabe> jlebon: can you help me with the webhook for COSA?
<dustymabe> actually I think I just added it - let's see if it works
<jlebon> dustymabe: it should be auto-added
<dustymabe> auto-added by what?
<jlebon> jenkins
<dustymabe> hmm - I didn't see one on https://github.com/coreos/coreos-assembler/settings/hooks so I created it
<dustymabe> the jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org one
<jlebon> i think it's done every X period or on some events or something
<jlebon> but you can ask it manually too
<jlebon> on the jenkins configuration page
<dustymabe> should I delete what I just created?
<jlebon> sure, and i'll tickle it
<dustymabe> ok
<dustymabe> deleted
<jlebon> ok done
<dustymabe> ok I see it now
<jlebon> hmm weird
<dustymabe> are all the other hooks in there needed?
<jlebon> i wonder why the coreos-ci one has issue_comment too
<jlebon> actually, the app.ci ones no. but let's leave them for now until we're sure we're not reverting the release PR
<dustymabe> ok i'm going to go eat lunch
<jlebon> same :)
jpn has quit [Ping timeout: 268 seconds]
aaradhak has quit [Quit: Connection closed for inactivity]
jpn has joined #fedora-coreos
ravanelli has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
nalind has quit [Quit: bye]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
<dustymabe> jlebon: another option is that we just autotrigger builds (webhook) for `main` and require manual build for the other branches
<jlebon> dustymabe: not ideal, but that works, yeah
<jlebon> i'm confused why it's only spawning a single job. but anyway, even if it spawned for all branches, we still have the PVC problem
<jlebon> i was looking at https://plugins.jenkins.io/generic-webhook-trigger/ which looks really powerful, but needs more configuration
<dustymabe> jlebon: are you triggering the jobs manually?
<jlebon> i haven't so far. i was testing stuff by redelivering webhook events from the github UI
<dustymabe> ahh ok
jpn has joined #fedora-coreos
<dustymabe> I have to head out for now
<dustymabe> will catch back up later
samuelb has quit [Quit: ZNC 1.8.2 - https://znc.in]
gursewak has quit [Ping timeout: 240 seconds]
ravanelli has quit [Remote host closed the connection]
jpn has quit [Ping timeout: 268 seconds]
gursewak has joined #fedora-coreos
ravanelli has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
ravanelli has quit [Remote host closed the connection]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 240 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
gursewak has quit [Remote host closed the connection]
gursewak_ has joined #fedora-coreos
mnguyen has quit [Ping timeout: 268 seconds]
mnguyen has joined #fedora-coreos
mnguyen_ has quit [Ping timeout: 268 seconds]
mnguyen_ has joined #fedora-coreos
jpn has joined #fedora-coreos
ravanelli has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]