mosen has quit [Read error: Connection reset by peer]
saschagrunert has joined #fedora-coreos
jcajka has joined #fedora-coreos
azukku has joined #fedora-coreos
c4rt0 has joined #fedora-coreos
mosen has joined #fedora-coreos
<lucab>
jlebon, dustymabe: ostree upstream jenkins CI started failing in a funky way, it looks like kola VMs just disappear after a short bit. Is this maybe fallout from some other CI changes?
<dustymabe>
ahh I see.. haha in that case she wasn't only changing a cosmetic name, but an actual path to a directory.
<dustymabe>
do we need to support both though? could we not just PR the upstream repos and update them?
<dustymabe>
rather than carrying the compat?
<jlebon>
we could, this just gets CI back to working faster :)
<dustymabe>
I'm cool with this but maybe let's open a followup to revert the change and link to all the open PRs that would need to merge before we can do the revert.
<dustymabe>
i'll help open the PRs against the repos if you help enumerate them and give me a recipe for the change
<jlebon>
i'm happy to revert, but was hoping repo owners would handle updating their CIs
<dustymabe>
is the change not trivial?
<dustymabe>
I don't expect anyone would update anything if it's working
<jlebon>
the change itself is trivial. just the overhead is non-trivial
<jlebon>
right, we'd notify them
<jlebon>
and also have them drop the `fcos*` aliases in favour of the new ones
<dustymabe>
yeah, that's a worthy goal
<dustymabe>
can we open an issue against each repo then and link to that from the revert PR?
<dustymabe>
in the revert PR we can also drop the fcos* symlinks
<bgilbert>
please don't externalize the cost of CI changes onto other teams :-(
<jlebon>
unless really inconvenient, we should always try to provide a path to upgrade instead of breaking configs
<jlebon>
bgilbert: there's definitely a balance. but i'd like each team to own their CIs. it's still the responsibilty of the CI changer to disclose of course
<dustymabe>
bgilbert: what are you advocating for?
<bgilbert>
keeping upstream CI working is an ongoing source of friction. it's great that CI automatically does FCOS integration testing, but every time there's a bad package update or service outage or new flaky test, it causes a priority interrupt for upstream work while we sort out how to restore our ability to merge things
<bgilbert>
a lot of that is outside our control
<bgilbert>
but for the parts that aren't, I think we should do more to encourage people to own the fallout of their own changes
<jlebon>
bgilbert: i don't disagree
<bgilbert>
so yeah, teams should own the capabilities and configuration of their own CI, but let's try to minimize the amount they have to fight a holding action against outside changes
<jlebon>
in this case, the intent was definitely not to break all the upstream CIs
<bgilbert>
:-)
<jlebon>
the PR above restores functionality. but there are maintenance tasks that we expect everyone to do on their own CI in the absence of a dedicated CI team
<bgilbert>
are those documented?
<dustymabe>
I mean. I guess the pipeline team fills that role now?
<dustymabe>
once we get this pipeline migration done maybe our teams can start functioning more properly
<dustymabe>
either way.. there is going to be some turbulence over the next month as we ratchet necessary changes in
GiuDno[m] has joined #fedora-coreos
<jlebon>
bgilbert: there isn't one set of thing. it's more of a coreos-status type of deal. i don't expect everyone to subscribe to coreos-ci-lib, but if there's a migration underway and we e.g. broadcast an announcement that changes will be needed, i hope that'd be sufficient
<dustymabe>
^^ to be clear - none of that happened here, this was an unintended breakage
<bgilbert>
yup, I do understand about the ratcheting :-)
<jlebon>
:)
<jlebon>
but yeah, we need to figure out the best way to communicate these things
<bgilbert>
but if cosa or fedora-coreos-config or coreos-installer or Ignition changes are expected to cause some fallout in other projects, whoever is making those changes would normally handle all of the fallout
<bgilbert>
I don't think CI is different
<jlebon>
define "fallout"
<dustymabe>
"whoever is making those changes would normally handle all of the fallout" <- that needs some clarity
<bgilbert>
required code changes
<dustymabe>
I don't think so
<dustymabe>
if it's trivial, maybe
<dustymabe>
but otherwise an announcement and time period would be sufficient, just like with any user base
<bgilbert>
um
<bgilbert>
if that's true, I've been doing waaaay more work than I need to
<bgilbert>
I don't think we want to get into a place with the subteams where we start throwing externalities over the wall onto other subteams' backlogs
<jlebon>
IMO i think it's different talking about CI vs hacking on the OS
<dustymabe>
agree
<bgilbert>
dustymabe: to jlebon or to me?
<dustymabe>
i agree with jlebon.. I think there is a lot of nuance here
<jlebon>
CI maintenance is a shared thing. it's a public good I would hope everyone feels shared ownership in.
<bgilbert>
mmm, not sure I agree. e.g. no one on firstboot knows Groovy. CoreOS CI is a service we use.
<dustymabe>
bgilbert: right, but if there was an announcement that said "you need to change you FOO variable to BAR" in your CI" then you'd probably be able to handle it?
<dustymabe>
it's also something we work together on. i.e. you're not completely on your own - we help
<bgilbert>
my point isn't about capability, it's about efficiency and context switches and collaboration
<dustymabe>
FWIW i don't really know groovy either
<bgilbert>
similarly I'd love to hand off maintenance of the bootimage bot to the folks who own the bootimage bumps, but I know that isn't likely to happen
<bgilbert>
there are no volunteers, and we don't just throw things over the wall to each other, so I get to maintain it
<dustymabe>
yep. I know that feeling
<jlebon>
same here, I know where you're coming from :)
<bgilbert>
yup, everyone here knows that feeling :-)
<bgilbert>
which is why I'm so surprised by this discussion
<dustymabe>
maybe we're not communicating effectively - want to go high bandwidth?