dustymabe changed the topic of #fedora-coreos to: Fedora CoreOS :: Find out more at https://getfedora.org/coreos/ :: Logs at https://libera.irclog.whitequark.org/fedora-coreos
michele- has quit [*.net *.split]
strigazi has quit [*.net *.split]
pej0703 has quit [*.net *.split]
djinni`_ has quit [*.net *.split]
ikonia has quit [*.net *.split]
sayan has quit [*.net *.split]
strigazi has joined #fedora-coreos
pej0703 has joined #fedora-coreos
djinni` has joined #fedora-coreos
michele_ has joined #fedora-coreos
sayan has joined #fedora-coreos
ikonia has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
<dustymabe> spresti[m]: i merged two of them (and I see you merged the other one)
<dustymabe> so they are all merged now
<dustymabe> i'll kick off the builds so they can run overnight and hopefully a lot of the work will be done by the time we wake up
jpn has quit [Ping timeout: 265 seconds]
plarsen has quit [Quit: NullPointerException!]
ravanelli has quit [Remote host closed the connection]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 246 seconds]
nb has quit [Quit: The Lounge - https://thelounge.chat]
nb has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 268 seconds]
jpn has joined #fedora-coreos
hyperreal has joined #fedora-coreos
vgoyal has quit [Quit: Leaving]
jpn has quit [Ping timeout: 268 seconds]
paragan has joined #fedora-coreos
jpn has joined #fedora-coreos
mnguyen_ has quit [Ping timeout: 246 seconds]
jpn has quit [Ping timeout: 246 seconds]
jpn has joined #fedora-coreos
jtgreene has quit [Quit: fBNC - https://bnc4free.com]
jpn has quit [Ping timeout: 272 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 252 seconds]
paragan has quit [Ping timeout: 268 seconds]
paragan has joined #fedora-coreos
jtgreene has joined #fedora-coreos
jtgreene has joined #fedora-coreos
jtgreene has quit [Changing host]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 272 seconds]
jcajka has joined #fedora-coreos
jpn has joined #fedora-coreos
ravanelli has joined #fedora-coreos
ravanelli has quit [Ping timeout: 246 seconds]
saschagrunert has joined #fedora-coreos
hotbox has quit [Ping timeout: 265 seconds]
jpn has quit [Ping timeout: 272 seconds]
c4rt0 has joined #fedora-coreos
jpn has joined #fedora-coreos
ravanelli has joined #fedora-coreos
ravanelli has quit [Remote host closed the connection]
ravanelli has joined #fedora-coreos
ravanelli has quit [Remote host closed the connection]
ravanelli has joined #fedora-coreos
ravanelli has quit [Remote host closed the connection]
ravanelli has joined #fedora-coreos
vgoyal has joined #fedora-coreos
ravanelli has quit [Remote host closed the connection]
ravanelli has joined #fedora-coreos
mnguyen_ has joined #fedora-coreos
jpn has quit [Ping timeout: 272 seconds]
plundra has quit [Quit: fraggeln]
plundra has joined #fedora-coreos
<dustymabe> spresti[m]: looks like the builds from last night failed - i'll help look into the failures after I drop kids off at school
fifofonix has joined #fedora-coreos
jpn has joined #fedora-coreos
hyperreal has quit [Quit: The Lounge - https://thelounge.chat]
<spresti[m]> Ah thank you for kicking those off.
<spresti[m]> Ok checking on it now.
hyperreal has joined #fedora-coreos
admin0 has joined #fedora-coreos
<spresti[m]> Hmm, I wonder; It seems like two of the builds exited on the same error, [next,test] "Importing failed: Command '['ostree', '--repo=/mnt/koji/compose/ostree/repo', 'pull-local', '/tmp/tmppb75h0j5', '607ca27fa44a4b074c33bcc4979bf476725dd513a80c530ab1b45be332f4dfdd']"
<dustymabe> spresti[m]: looking closer now
<spresti[m]> kk
c4rt0 has quit [Remote host closed the connection]
<dustymabe> ok it appears to me that the ostree-importer might not be in working order
<dustymabe> investigating that now
c4rt0_ has joined #fedora-coreos
rsalveti has joined #fedora-coreos
<dustymabe> it looks like we're almost out of free space on that volume and the ostree importer is failing because of that
nalind has joined #fedora-coreos
<dustymabe> ok I dropped the free space percent check to 1% and will work to run a prune operation in the background
<dustymabe> for now we should be unblocked I think
<dustymabe> jlebon: around? I'd like to discuss implications of the failures if you don't mind
<spresti[m]> thank you dustymabe !
<dustymabe> spresti[m]: let's hold off on new builds for now
<dustymabe> still need to discuss some things with jlebon - and we still don't know exactly why the `stable` build failed
<spresti[m]> Ah sorry.
<jlebon> dustymabe: here now
<jlebon> ouch, the volume on which the compose repo is hosted is running out of space?
<dustymabe> jlebon: spresti[m]: for the `stable` failure: https://github.com/coreos/fedora-coreos-config/pull/2125
<dustymabe> jlebon: yeah, I never got back arond to running the ostree pruner consistently so it's expected to eventually hit the problem
<dustymabe> that's probably a gap I should close this week
<dustymabe> jlebon: I think the testing and next builds are salvagable, though
<dustymabe> since the ostree import is like the last thing that runs
<dustymabe> and the release job will run it again to import into the prod repo
<jlebon> dustymabe: huh, i had thought it was up and operational already. is the repo on its own volume or shared with other koji things? is this affeting anyone else?
<jlebon> i guess at least the other ostree-based variants
<dustymabe> jlebon: yeah, theoretically the pungi composes would have this same problem
<dustymabe> I'll run a prune operation after we get this set of FCOS releases out the door
<jlebon> sounds good
<dustymabe> do you agree that the `testing` and `next` build should be salvagable?
<dustymabe> we'll have to manually run cloud tests, but that shouldn't be too hard to do
<dustymabe> "manually run" == kick the jenkins jobs off with the right parameters
c4rt0_ has quit [Quit: Leaving]
c4rt0 has joined #fedora-coreos
<jlebon> dustymabe: restarted CI
<jlebon> let me check the jobs re. testing and next
<dustymabe> IIRC when the release job runs the ostree import will happen in the prod repo then. It isn't fatal if the commit doesn't exist in the compose ostree repo already
* dustymabe opens a PR to move the cloud tests to before the ostree import
<jlebon> dustymabe: agreed. the importer looks like it should be able to handle that fine.
<jlebon> i'll start the cloud tests. starting with kola-aws in case someone wants to do another
<dustymabe> I can handle gcp
<dustymabe> ok i'll do openstack now
<jlebon> aws started
<jlebon> i'll do azure
<jlebon> done
<spresti[m]> jlebon: Standup
paragan has quit [Quit: Leaving]
* dustymabe brb
plarsen has joined #fedora-coreos
<spresti[m]> Also brb
saschagrunert has quit [Remote host closed the connection]
<spresti[m]> LGTM
<jlebon> travier[m], marmijo[m]: let me know your thoughts on https://github.com/coreos/fedora-coreos-pipeline/pull/787/files#r1047426745 or if you want to sync
<spresti[m]> Sweet! LGTM what are the next steps once that is merged? can I proceed with the builds?
<spresti[m]> * dustymabe: Sweet! LGTM
<dustymabe> spresti[m]: we can start the stable build after https://github.com/coreos/fedora-coreos-config/pull/2126 merges
<spresti[m]> kk
plarsen has quit [Remote host closed the connection]
plarsen has joined #fedora-coreos
jpn has quit [Ping timeout: 260 seconds]
jpn has joined #fedora-coreos
admin00 has joined #fedora-coreos
ramcq1 has joined #fedora-coreos
mboddu_ has joined #fedora-coreos
x3mboy1 has joined #fedora-coreos
ramcq has quit [Ping timeout: 252 seconds]
mboddu has quit [Ping timeout: 252 seconds]
admin0 has quit [Ping timeout: 252 seconds]
DeaDSouL[m]1 has quit [Ping timeout: 252 seconds]
jaimelm has quit [Remote host closed the connection]
davdunc has quit [Ping timeout: 252 seconds]
x3mboy has quit [Ping timeout: 252 seconds]
mnaser has quit [Ping timeout: 252 seconds]
OnuralpSezerhehi has quit [Ping timeout: 252 seconds]
admin00 is now known as admin0
mnaser_ has joined #fedora-coreos
davdunc has joined #fedora-coreos
jaimelm has joined #fedora-coreos
OnuralpSezerhehi has joined #fedora-coreos
jpn has quit [Ping timeout: 256 seconds]
DeaDSouL[m]1 has joined #fedora-coreos
jpn has joined #fedora-coreos
jpn has quit [Ping timeout: 256 seconds]
MHamzahKhan[m] has joined #fedora-coreos
fifofonix has quit [Quit: Textual IRC Client: www.textualapp.com]
jpn has joined #fedora-coreos
fifofonix has joined #fedora-coreos
<dustymabe> i merged ti
fifofonix has quit [Client Quit]
<spresti[m]> Sweet! thank you
<spresti[m]> K just kicked a stable build
<travier[m]> jlebon marmijo We followed it and I made an issue in cosa for your suggestion. Not sure how practical it would be but we can try
<spresti[m]> And to be clear even though the build failed for [next && testing] its ok to use that build and I dont need to rebuild?
<spresti[m]> * failed for \[next &&, * && testing\] its, * to rebuild? dustymabe
poppajarv has quit [Read error: Connection reset by peer]
poppajarv has joined #fedora-coreos
<dustymabe> spresti[m]: right. the build shows up as failed but jlebon and I determined the step was at the end and non-fatal (the release job will clean it up)
<dustymabe> so testing and next should be good - just add a comment to the streams issues noting this
<spresti[m]> Ok I will, finishing up another task and then I will jump back on it.
<spresti[m]> dustymabe: sorry for all the questions, but is this the same for the failing aarch / x390 builds?
jcajka has quit [Quit: Leaving]
fifofonix has joined #fedora-coreos
<dustymabe> spresti[m]: yes
<spresti[m]> Ok thank you
jpn has quit [Quit: Lost terminal]
vgoyal has quit [Quit: Leaving]
justJanne has quit [Ping timeout: 255 seconds]
hyperreal has quit [Quit: The Lounge - https://thelounge.chat]
vgoyal has joined #fedora-coreos
hyperreal has joined #fedora-coreos
hyperreal has quit [Quit: The Lounge - https://thelounge.chat]
hyperreal has joined #fedora-coreos
vgoyal_ has joined #fedora-coreos
vgoyal has quit [Ping timeout: 252 seconds]
<dustymabe> yeah those two release jobs failed because: plume: couldn't publish image in ap-southeast-1
<dustymabe> let's see if the stable job succeeds
<dustymabe> might be a transient issue
<dustymabe> jlebon: it failed in `stable` too
<jlebon> fun. let me check the aws status board
<dustymabe> oh wait - I was looking at the wrong place
<dustymabe> it hasn't got there for stable yet
justJanne has joined #fedora-coreos
<jlebon> "couldn't describe image: InternalError: An internal error has occurred" isn't very helpful either
<dustymabe> yeah
<jlebon> let me try reproducing it locally
<dustymabe> i think that region is hosed
<dustymabe> cc davdunc[m
<dustymabe> in the webUI when I navigate to that region I get `internal error occured`
<jlebon> dustymabe: nice
<dustymabe> other regions work
<spresti[m]> Oooof
<jlebon> reproduced locally as well
<dustymabe> so... what "half state" are we in right now?
<jlebon> and even a `aws ec2 describe-instances` gives me InternalError
<jlebon> i think it's possible some regions weren't made public depending on ordering
<dustymabe> right
<jlebon> we need a way to skip over it and rerun the release job
<dustymabe> option A: wait until morning (what are the implications of this?)
<dustymabe> option B: skip making AMIs public altogether (some regions won't be public)
<jlebon> option C: add e.g. `plume make-amis-public --skip-region` and then replay with that ninja'ed in?
<spresti[m]> What damage is it to wait till morning?
<dustymabe> that kind of implies a COSA dev/test cycle
<dustymabe> Option C does ^^
<jlebon> systems that manually rpm-ostree upgrade will get the update, but otherwise no machine will update yet
<dustymabe> we also have GCP and containers
<jlebon> right yup
<dustymabe> i.e. GCP image family will get the latest bits
<dustymabe> and people pulling directly from containers
<jlebon> so some workflows will get new content on fresh provisioning
<dustymabe> but not the end of the world?
<jlebon> existing nodes will not upgrade until we actually rollout
<jlebon> confusing, but *should* be fine, yet
<jlebon> s/yet/yeah/
<dustymabe> ok so let's wait until the morning and try again
<walters> Longer term I think we should be thinking of things in "kubernetes controller" style of reconciling to desired state, not "job run once"
<dustymabe> walters: +1
<jlebon> walters: yeah, the release job is centered around that philosophy, except humans are the iterators right now :)
<dustymabe> well +1 for the status update :) - haven't though through the second statement
<dustymabe> spresti[m]: sync back up in the morning?
<spresti[m]> dustymabe: Yeah that sounds good
vgoyal_ has quit [Ping timeout: 256 seconds]
vgoyal_ has joined #fedora-coreos
<jlebon> dustymabe, spresti[m]: fwiw https://github.com/coreos/coreos-assembler/pull/3277
<jlebon> good to have even if we don't use it for this right now
<dustymabe> jlebon: instead of this WDYT about a `--best-effort mode
<dustymabe> or even making that the default
<dustymabe> and then just bail out at the end
<jlebon> dustymabe: can you comment in the PR? and we circle back on it tmw
<dustymabe> sure
<jlebon> automation PR reviewed!
<jlebon> see y'all tmw!
<dustymabe> one more PR (cc spresti[m] in case you are around still): https://github.com/coreos/fedora-coreos-releng-automation/pull/162
nalind has quit [Quit: bye for now]
vgoyal_ has quit [Quit: Leaving]
darknao has quit [Ping timeout: 260 seconds]
<dustymabe> the dashboard says the issues in ap-southeast-1 are resolved now - let me try a release job run
hyperreal has quit [Quit: The Lounge - https://thelounge.chat]
hyperreal has joined #fedora-coreos