aakashjain has quit [Remote host closed the connection]
aakashjain has joined #buildbot
aakashjain has quit [Remote host closed the connection]
aakashjain has joined #buildbot
<RP>
Is there a good example of a more complex prioritizeBuilders function anywhere?
<RP>
Yocto Project just added one to try and improve our scheduling but it just made it worse :(
<RP>
The challenge is we have builders which trigger large numbers of builders. Some builders are picky about which workers they run on. We therefore made prioritizeBuilders prefer the picky builders first.
<RP>
That works but means that if we have multiple high level trigger builds, we're not completing one high level build before bits of another are running so overall competition is hampered :(
<RP>
We need a "run the builders in this order but prefer anything part of an already triggered build"
<tardyp>
RP why don't you just reduce the worker for that builder to the workers that are stable?
<RP>
tardyp: It isn't that simple. We have targets like XXX-centos which should run on a centos worker but many of our builders can run on any worker
<RP>
tardyp: if we made it centos specific, it wouldn't take generic work
<tardyp>
ah so your issue is that the builders that have less workers are always starved because other builders take their workers?
<RP>
tardyp: yes
<tardyp>
the default sorter is that the builder that has older buildrequest are selected first, so I'd say in the next iteration it is meant to get a worker
<RP>
tardyp: so are you saying we should use the default sorter but try and sort the order we trigger the build requests?
<tardyp>
I think the default sorter should just work for your usecase, without any kind of tweaking
<RP>
tardyp: it doesn't since the generic builders can use up all the centos workers, then when it reaches the centos specific builders, non are left
<tardyp>
shouldn't the generic builders rather priviledge generic workers?
<RP>
tardyp: how do you give that preference though?
<tardyp>
builderconfig.nextWorker
<RP>
nextWorker doesn't help us in this case since we need to allocate all the picky builders first, then the generic ones
<RP>
nextWorker is for a specific builder as I understand it so we can't change the order the builders are allocated in from there?
<tardyp>
no, but you can prioritize the generic workers over specific workers
<RP>
we have a pool of workers and they all have different distros (debian, ubuntu, fedora, centos), we then have a set of builders and some are distro specific, some are not. We don't have "specific" workers
<RP>
put another way, every worker does have a distro
<tardyp>
but you have some of the workers that are a bit less in number so that they starve?
<RP>
we have many more generic builds than specific ones, so the changes of starving a specific build are high
<tardyp>
my fear with your algorithm is that you will get the oposite under load, the generic builders will never run
<RP>
tardyp: we're seeing that now, we'll have to drop this new code
<tardyp>
but as the buildrequest get older you are garanteed to eventually run
<RP>
our builds do eventually schedule, yes. I'm just hoping we could somehow nudge the scheduler to allocate things in a better way
<RP>
tardyp: there is the added twist that the specific workloads tend to take much longer too so we really do want those builders started earlier
<RP>
tardyp: Looking at the default algorithm, I think we perhaps just order the build requests carefully, then the timestamp sorting might work for us
<tardyp>
maybe you can add from the default algorithm the estimated time of build to take that in account
<tardyp>
there is no real api for that, but you can just add an heuristic
<tardyp>
with hardcoded bonus values
<RP>
tardyp: maybe. I think I'm going to struggle to inject that algorithm :/
<Zorry>
would canStartBuild help?
<tardyp>
well, the default sorter is indeed super old, written in callback mode
<RP>
tardyp: builderNames in a Triggerable can be sorted in an order to start them? If so, that might be good enough for what we need
<RP>
tardyp: the callback stuff always makes my head hurt! :)
<RP>
since the buildrequest timestamps will then be in order
<RP>
Zorry: I think I did look at that but I think it didn't let us prioritise certain builders
<tardyp>
I'll send a complete example of bonus soon
<RP>
tardyp: am I right in thinking if we sort the builderNames to the triggerable, we can probably solve our specific case more easily though?
<tardyp>
with chance maybe
<tardyp>
but they will probably still be triggered in the same second so the order is more influenced by the older of the builders in the global buider list
<RP>
tardyp: ah, timestamps are only to the nearest second? :/
<tardyp>
still that would be undefined behaviour
<RP>
yes, that isn't as great an idea as I was thinking although it would probably be better than what we're doing now. We could tweak the main builders list too...
<tardyp>
yes, they are stored as timestamp
<tardyp>
as integer
<RP>
tardyp: If you have a bonus example I will try and see if I can make that work! :)
<cognifloyd>
With bbtravis is there a way to change the docker image used for a pipeline? It looks like you specify the image for all the workers before loading .bbtravis.yml right?
<cognifloyd>
Also, is there a way to specify services to load in .bbtravis.yml?
<RP>
tardyp: sadly this isn't quite working for me: rv = [b.name for b in builders]
<RP>
builtins.AttributeError: 'DeferredList' object has no attribute 'name'
<RP>
tardyp: @defer.inlineCallbacks missing?
* RP
thinks that does fix it
aakashjain has joined #buildbot
aakashjain has quit [Ping timeout: 260 seconds]
<tardyp>
RP: indeed, missing an inlineCallbacks decorator on the main function
aakashjain has joined #buildbot
aakashjain has quit [Ping timeout: 265 seconds]
skln has joined #buildbot
skln has quit [Client Quit]
aakashjain has joined #buildbot
aakashjain has quit [Ping timeout: 260 seconds]
aakashjain has joined #buildbot
aakashjain has quit [Ping timeout: 260 seconds]
aakashjain has joined #buildbot
aakashjain has quit [Remote host closed the connection]
aakashjain has joined #buildbot
aakashja_ has joined #buildbot
aakashjain has quit [Read error: Connection reset by peer]