#buildbot on 2021-07-07 — irc logs at libera.irclog.whitequark.org

2021-06-20 09:29 tardyp changed the topic of #buildbot to: A Software Freedom Conservancy Project | Buildbot-3.2.0 | docs: http://docs.buildbot.net/current/ | tutorial: http://docs.buildbot.net/current/tutorial | irclogs: https://libera.irclog.whitequark.org/buildbot

02:52 koobs has quit [Ping timeout: 258 seconds]

03:00 koobs has joined #buildbot

04:26 koobs has quit [Quit: koobs]

04:34 koobs has joined #buildbot

09:25 zware has quit [Quit: No Ping reply in 180 seconds.]

09:26 zware has joined #buildbot

12:30 aakashjain has joined #buildbot

12:32 <aakashjain> For multi-master setup, is it possible to have masters on different machines (sharing a same database over network, and loading same buildbot config)? any known issues with such a setup?

12:33 aakashjain has quit [Remote host closed the connection]

12:33 aakashjain has joined #buildbot

12:37 <p12tic> aakashjain: Just out of interest, why do you want to do that? Is there some resource limit that a master is hitting on a single machine?

12:40 <aakashjain> p12tic: Yeah (most likely), one of our buildbot instance (using multi-master) is running under heavy load. web pages loading is also slow, simple build-steps are also visibly slow sometimes. I was thinking to move the webserver master to a separate VM in order to speed up the webpage loading

12:42 <p12tic> heavy load, how much is that?

12:44 <aakashjain> p12tic: how can I quantify that? I noticed that web pages loading is slow (varies, but sometimes takes 10-60seconds), simple build-steps are also visibly slow sometimes

12:45 <p12tic> I mean, how many builds are running concurrently and how large are the logs created by them

13:25 <aakashjain> p12tic: between 200-300 builds concurrentloy, logs are somewhat large

13:35 <p12tic> right, that's significant load

13:37 <p12tic> a single buildbot master is effectively constrained to a single CPU core, so as long as there are more cores on the machine than there are buildbot masters it should not be slower than a separate machine

13:39 <p12tic> one potential optimization would be to send logs from workers in larger chunks. it's currently hardcoded in the worker as CHUNK_LIMIT, BUFFER_SIZE and BUFFER_TIMEOUT variables

13:40 <p12tic> I think it may make sense to bump BUFFER_TIMEOUT to something like 30 and see what happens in your case

13:41 <p12tic> I remember you were trying the buildbot profiler, did you get any useful data out of that?

14:31 _whitelogger has joined #buildbot

15:19 <aakashjain> p12tic: I didn't get much useful data from profiler, I sent some profiles to tardyp_, he indicated that the master is somewhat overloaded with the stdout log management. I added another master and moved few workers to that master, it helped somewhat, but not much.

15:19 <aakashjain> Thanks for the suggestion about CHUNK_LIMIT, BUFFER_SIZE and BUFFER_TIMEOUT

15:19 <aakashjain> I guess this is the BUFFER_TIMEOUT you are referring to: https://github.com/buildbot/buildbot/blob/master/worker/buildbot_worker/runprocess.py#L274

15:19 <aakashjain> How do I change this on workers? (there are few hundred bots)

15:20 <aakashjain> Do I have to re-compile buildbot-worker package on each bot (or maybe directly change the /Library/Python/2.7/site-packages/buildbot_worker/runprocess.py on the bots)?

15:29 <p12tic> You could change it directly in the python file

15:30 <p12tic> at the start you could just do this for the workers connected to the most loaded master and then you could see whether it improves the situation there

16:17 <aakashjain> p12tic: sounds good, Thanks!

21:22 tflink_ is now known as tflink

22:40 sknebel has quit [Remote host closed the connection]

22:41 sknebel has joined #buildbot

23:05 gmcdonald has quit [Ping timeout: 246 seconds]

23:51 aakashjain has quit [Remote host closed the connection]

23:53 aakashjain has joined #buildbot

23:53 aakashjain has quit [Read error: Connection reset by peer]

23:53 aakashjain has joined #buildbot

23:58 aakashjain has quit [Ping timeout: 252 seconds]