zware has quit [Quit: No Ping reply in 180 seconds.]
zware has joined #buildbot
aakashjain has joined #buildbot
<aakashjain>
For multi-master setup, is it possible to have masters on different machines (sharing a same database over network, and loading same buildbot config)? any known issues with such a setup?
aakashjain has quit [Remote host closed the connection]
aakashjain has joined #buildbot
<p12tic>
aakashjain: Just out of interest, why do you want to do that? Is there some resource limit that a master is hitting on a single machine?
<aakashjain>
p12tic: Yeah (most likely), one of our buildbot instance (using multi-master) is running under heavy load. web pages loading is also slow, simple build-steps are also visibly slow sometimes. I was thinking to move the webserver master to a separate VM in order to speed up the webpage loading
<p12tic>
heavy load, how much is that?
<aakashjain>
p12tic: how can I quantify that? I noticed that web pages loading is slow (varies, but sometimes takes 10-60seconds), simple build-steps are also visibly slow sometimes
<p12tic>
I mean, how many builds are running concurrently and how large are the logs created by them
<aakashjain>
p12tic: between 200-300 builds concurrentloy, logs are somewhat large
<p12tic>
right, that's significant load
<p12tic>
a single buildbot master is effectively constrained to a single CPU core, so as long as there are more cores on the machine than there are buildbot masters it should not be slower than a separate machine
<p12tic>
one potential optimization would be to send logs from workers in larger chunks. it's currently hardcoded in the worker as CHUNK_LIMIT, BUFFER_SIZE and BUFFER_TIMEOUT variables
<p12tic>
I think it may make sense to bump BUFFER_TIMEOUT to something like 30 and see what happens in your case
<p12tic>
I remember you were trying the buildbot profiler, did you get any useful data out of that?
_whitelogger has joined #buildbot
<aakashjain>
p12tic: I didn't get much useful data from profiler, I sent some profiles to tardyp_, he indicated that the master is somewhat overloaded with the stdout log management. I added another master and moved few workers to that master, it helped somewhat, but not much.
<aakashjain>
Thanks for the suggestion about CHUNK_LIMIT, BUFFER_SIZE and BUFFER_TIMEOUT
<aakashjain>
How do I change this on workers? (there are few hundred bots)
<aakashjain>
Do I have to re-compile buildbot-worker package on each bot (or maybe directly change the /Library/Python/2.7/site-packages/buildbot_worker/runprocess.py on the bots)?
<p12tic>
You could change it directly in the python file
<p12tic>
at the start you could just do this for the workers connected to the most loaded master and then you could see whether it improves the situation there
<aakashjain>
p12tic: sounds good, Thanks!
tflink_ is now known as tflink
sknebel has quit [Remote host closed the connection]
sknebel has joined #buildbot
gmcdonald has quit [Ping timeout: 246 seconds]
aakashjain has quit [Remote host closed the connection]
aakashjain has joined #buildbot
aakashjain has quit [Read error: Connection reset by peer]