<imaximets>
zhouhan, I'm on and off, but feel free to send a question, I'll try to reply.
<zhouhan>
imaximets: with ovn-monitor-all enabled, when a big number of ovn-controllers resync data at the same time (e.g. when SB schema is upgraded), the SB DB memory spikes extremely high. So I am thinking about reducing the memory spike by limiting the jsonrpc send buffer usage, which means introducing some backpressure mechanism when trying to flush
<zhouhan>
data to all clients, instead of flush all at once.
<zhouhan>
imaximets: Do you think this is reasonable? Or is this considered before?
<imaximets>
zhouhan,I think, the main problem may be that data that is already sent to the socket (send() succeeded) is still tracked against RSS of a process. And we don't really have much control over when this data will be sent by the kernel and the memory being released.
<imaximets>
There is already a backpressure mechanism for normal updates (not sure if it applies to the initial monitor reply) that starts accumulating changes on monitors when the tx socket is overflowing and returns ENOBUFS or something like that. You may look into that.
<zhouhan>
imaximets: ok, let me check
<imaximets>
zhouhan, But, in general, I've seen these spikes in practice and they are not very good. If you manage to get some mitigation, it'll be a nice improvement.
<zhouhan>
imaximets: so you mean the jsonrpc->backlog is already empty when the memory spikes because they are in socket's tx buffer?
<imaximets>
zhouhan, I think so.
<imaximets>
I mean, it probably starts before that, but I'm not sure how to track the memory that is already out.
<zhouhan>
imaximets: then does it mean we only need to set a limit for the socket buffer size to avoid using too much memory in the socket?
<imaximets>
It can be. But I'm not sure how the backpressure mechanism is working on initial monitor replies. It probably doesn't, needs checking.
<zhouhan>
imaximets: sure, I will check the initial monitor updates
<imaximets>
zhouhan, there is a test named 'ovsdb-server combines updates on backlogged connections'. You may look at it as a starting point.
<zhouhan>
imaximets: thank a lot for the pointer!
<imaximets>
np
zhouhan has quit [Quit: Client closed]
zhouhan has joined #openvswitch
mtomaska_ has joined #openvswitch
mtomaska__ has quit [Ping timeout: 252 seconds]
mtomaska__ has joined #openvswitch
mtomaska_ has quit [Ping timeout: 246 seconds]
ihrachys has quit [Ping timeout: 260 seconds]
zhouhan has quit [Ping timeout: 256 seconds]
roriorden has quit [Ping timeout: 252 seconds]
mtomaska_ has joined #openvswitch
mtomaska__ has quit [Ping timeout: 246 seconds]
mtomaska__ has joined #openvswitch
mtomaska_ has quit [Ping timeout: 246 seconds]
roriorden has joined #openvswitch
ihrachys has joined #openvswitch
racosta has quit [Remote host closed the connection]
racosta has joined #openvswitch
racosta has quit [Remote host closed the connection]