ChanServ changed the topic of #openvswitch to: Open vSwitch, a Linux Foundation Collaborative Project || FAQ: http://docs.openvswitch.org/en/latest/faq/ || OVN meeting Thurs 9:15 am US Pacific || Use ovs-discuss@openvswitch.org for questions if you don't get an answer here. || Channel logs can be found at https://libera.irclog.whitequark.org/openvswitch
ChmEarl has quit [Quit: Leaving]
donhw has quit [Read error: Connection reset by peer]
donhw has joined #openvswitch
otherwiseguy has quit [Ping timeout: 252 seconds]
otherwiseguy has joined #openvswitch
GNUmoon has quit [Remote host closed the connection]
GNUmoon has joined #openvswitch
kuraudo has joined #openvswitch
froyo has joined #openvswitch
kuraudo has quit [Quit: kuraudo]
kuraudo has joined #openvswitch
donhw has quit [Read error: Connection reset by peer]
donhw has joined #openvswitch
elvira has joined #openvswitch
kuraudo has quit [Remote host closed the connection]
kuraudo has joined #openvswitch
imaximets has quit [Remote host closed the connection]
imaximets has joined #openvswitch
otherwiseguy has quit [Ping timeout: 260 seconds]
otherwiseguy has joined #openvswitch
BlackDex has quit [Quit: ByeBye]
tpires has joined #openvswitch
imaximets has quit [Changing host]
imaximets has joined #openvswitch
dceara has joined #openvswitch
elvira has quit [Ping timeout: 248 seconds]
froyo has quit [Ping timeout: 244 seconds]
froyo has joined #openvswitch
otherwiseguy has quit [Read error: Connection reset by peer]
otherwiseguy has joined #openvswitch
zhouhan has joined #openvswitch
mkalcok has joined #openvswitch
mmichelson has joined #openvswitch
amusil has joined #openvswitch
mj2 has joined #openvswitch
<mj2> hi !!!
<mmichelson> Hi everybody. It's time for the weekly OVN developers' meeting.
<mkalcok> hello \o
<_lore_> hi all
<mmichelson> My update this week is pretty quick.
<mmichelson> Last Friday I branch ovn25.03 upstream. Thanks again to everyone who contributed either with code or with reviews.
<mmichelson> s/branch/branched/
<felixhuettner> o/
<mmichelson> This week, I've alternated between doing code reviews and making progress on composable services.
<mmichelson> Currently I'm reviewing _lore_'s MAC binding probe patch.
<mmichelson> I should have a review posted later this afternoon.
<mmichelson> That's all from me.
<mmichelson> Who's next?
<felixhuettner> i can continue
<mmichelson> go ahead felixhuettner
<felixhuettner> i mostly worked on the incremental support for learned routes. That was quite interesting to build
<felixhuettner> also i am working with some collegues on some performance improvement to ovn-controller in case of large southbound updates
<felixhuettner> as we regularly see updates >400kb which is larger than the default receives of jsonrpc, so we need multiple iterations of ovn-controller for one message
<felixhuettner> i guess there might be a first version on the ML sometime next week
<felixhuettner> I also wanted to ask if there are any regular ovn-heater runs as we are interested in running some otherwise
<felixhuettner> especially to test performance based on the changes we observe in our environment
<mmichelson> felixhuettner, we (Red Hat) do weekly ovn-heater runs on some of our machines each weekend.
<felixhuettner> is there any fancy tooling around that, that you can share
<felixhuettner> or is it mostly just what the regular repo provides?
<dceara> felixhuettner, it's mostly what the regular repo provides, with custom deployment yaml files (to match our actual machines).
<felixhuettner> ok, thanks a lot. Then we will probably try to build something similar on our side.
<felixhuettner> Thanks a lot, thats it from me
<imaximets> felixhuettner, btw, we do idl batching northd. And there are maybe other use cases for it to be implemented in ovn-controller.
<dceara> felixhuettner, we also run the existing https://github.com/ovn-org/ovn-heater/tree/main/test-scenarios with both ipv4 and ipv6
<imaximets> s/batching northd/batching in northd/
<felixhuettner> imaximets: thats what we found too and the current idea is to generalize it a little and then use it
<felixhuettner> dceara: thanks a lot
<imaximets> felixhuettner, ack. Another use case: https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/421160.html
<felixhuettner> yep that might fit as well
<felixhuettner> we actually observed sb connection timeouts, because we have too many incoming messages to process the echo in time :)
<imaximets> Uff, OK. :)
<felixhuettner> its like 10 updates a second
<felixhuettner> and a recompute currently eats away around 25 seconds
<felixhuettner> so we have high chances to recompute again
<felixhuettner> at least on some chassis
<zhouhan> felixhuettner: for the timeout problem, usually users set the probe interval from DB server side as large as >100s
<mmichelson> 25 seconds for a recompute? Ouch.
<felixhuettner> we have 60s
<imaximets> felixhuettner, that's exactly the northd behavior batching was targeting.
<felixhuettner> but honestly i dont like that at all :)
<imaximets> ovn-controller is really slow in recomputes in my experience. :(
<felixhuettner> seems to be in logical_flow_output, but we are also looking to improve that
<imaximets> Especially with many ACLs with conjunctions.
<dceara> imaximets, well, to be pedantic ovn-northd's batching was initially designed to avoid continuous cpu usage on streams of NB changes. But it works for the cases you're discussing here too I guess.
<felixhuettner> but we also have 1.6 mio flows on that chassis, so some time i guess is normal :)
<imaximets> dceara, not really.
<dceara> imaximets, ah, i was thinking of the backoff now, never mind.
<imaximets> This ^ :)
<zhouhan> imaximets: which IDL batching were you talking about? Sorry I don't recall anything
<imaximets> 703949bd8b9a ("northd: Accumulate more database updates before processing.")
<mmichelson> felixhuettner, did you have anything else to add for your update?
<felixhuettner> nope that it
<mmichelson> OK thanks felixhuettner, I'm looking forward to seeing the patch(es) when you have them available.
<zhouhan> imaximets: thanks, I see. (I reviewed it :) )
<felixhuettner> thanks
<mmichelson> Before I ask for the next person, I need to note that I have a hard cutoff in ~15 minutes so if we're still going then, I'll have to leave and pass control off to someone else.
<mmichelson> (I have to take my son to the dentist)
<mmichelson> Who's going next?
<_lore_> can I go next? quite fast.
<mmichelson> _lore_, go ahead.
<_lore_> as mmichelson said, I posted a series to enable re-arping destination before they are expiring
<_lore_> I posted v1 upstream and I added a new ovn test locally, I will post it as soon as I have some feedbacks on v1
<_lore_> that's all from my side
<zhouhan> imaximets: but felixhuettner's problem here was different. He says the message was larger than a single jsonrpc, and the concern was multiple iterations of ovn-controller for a single message?
<zhouhan> _lore_, sorry for interrupting, please continue
<_lore_> no worries ;)
<_lore_> I was done
<felixhuettner> zhouhan: we actually have both. Too many messages and too large messages :)
<imaximets> zhouhan, yeah, the problem is a bit different, but solution may be similar and we also have a case with a lot of very small updates reported on the list. So, maybe we can cover both issues at once.
<zhouhan> felixhuettner: ok
<felixhuettner> and we also have patches for both, so hopefully that is then gone
<mmichelson> OK, who wants to give the next update?
<mj2> i can
<zhouhan> imaximets: for message larger than a jsonrpc, I was under the impression that ovn-controller shouldn't do anything in the iteration because the inc-engine will find nothing changed in the input
<mmichelson> mj2, go ahead.
<mj2> so im still working on the multinode test between various microovn tasks, I noticed that it was sliently failing though it was reporting passing, so I have had to spend a while figuring out why this is and how to fix it
<mj2> I think this effort is nearing a conclusion but I will admit its taking longer than I would like
<mj2> the multinode test being the bgp unnumbered with external bgp deamon
<mj2> that is all
<mmichelson> Thanks mj2
<mmichelson> Who's next?
<imaximets> May I?
<mmichelson> go for it imaximets
<imaximets> zhouhan, for your comment, it seems like ovn-controller does a lot of work unconditionally outside of inc-proc engine, just when it wakes up. But I didn't measure that myself, so can't add any details.
<imaximets> From my side, I released OVS 3.5 and sent a patch to move OVN to v3.5.0 submodule, which is applied now.
<imaximets> Spent some time thinking on how to make address-set processing in ovn-controller faster, but have no good ideas so far.
<imaximets> Will spend some more time on that next week.
<imaximets> That's all from me.
<mmichelson> Thanks imaximets
<mmichelson> Who wants to go next?
<dceara> I can go next if that's ok.
<mmichelson> dceara, it's perfectly ok
<zhouhan> imaximets: the work outside of inc-proc engine is not trivial but still relatively small, shouldn't be in seconds (25s mentioned by felixhuettner), which he observed was in flow-output node which is part of inc-proc engine.
<dceara> I reviewed and applied some patches. Out of these I liked that it seemed easier than other times for northd i-p to be implemented: https://patchwork.ozlabs.org/project/ovn/list/?series=444957&state=*
<mmichelson> (sorry, I have to head out)
<dceara> For that series I was actually thinking of also applying it to branch-25.03 after we accept it on main but I'll start that discussion on-list.
<dceara> Related to I-P I played a tiny bit and hacked something that would generate a graphical visualization of the I-P graphs (northd and ovn-controller). It seems to have some potential, I'll refine it and maybe post it at some point in the future.
<zhouhan> This is cool
<mkalcok> dceara: that sounds amazing.
<dceara> I also realized we do quite some unnecessary parsing for LB/NAT IPs that are advertised when dynamic routing is enabled so I started on a patch that improves that but it's not yet ready.
<dceara> That's it on my side, planning to do more reviews next week.
<imaximets> Thanks, dceara!
<dceara> imaximets, will you moderate the remainder of the meeting?
<imaximets> zhouhan, I'm not sure how 25 seconds related to outside-engine processing, my best guess is that there is a smal amount of other changes outside of the one that got stuck, but we need to ask felixhuettner .
<imaximets> dceara, sure.
<imaximets> Who wants to go next?
<amusil> I can quickly
<imaximets> amusil, go ahead!
<felixhuettner> lets maybe continue the discussion after the others are done :)
<amusil> I have posted the ct-commit-all optimiztion/fix that was discussed before release
<amusil> I have also posted fix to have action name in the pinctrl dbg and some optimization for AS lflow processing
<amusil> That's about it, thanks
<amusil> Oh zhouhan would be nice if you could take a look at ct-commit-all patch
<zhouhan> amusil:  am following up with Alin on the HW offload test for the commit-all change. It seems the HW offload still doesn't work, and we are still debugging it.
zhouhan has quit [Quit: Client closed]
<imaximets> OK. Thanks, amusil!
zhouhan has joined #openvswitch
<zhouhan> sorry I was disconnected.
<zhouhan> amusil: did you see my last message?
<imaximets> I saw it. You're still debugging. :_
<imaximets> s/_/)/
<zhouhan> That's it from me
<amusil> Yeah I saw it too, hmm strange let's see what is wrong then
<imaximets> OK. Who wants to go next?
<mkalcok> I can drop a quick update
<imaximets> mkalcok, sure.
<mkalcok> I wanna say huge thank you to everyone that helped out with and reviewd NAT/LB route advertisement series. dceara felixhuettner amusil
<mkalcok> This week I was catching up mostly on our downstream stuff, and I took a look at Felix's incremental processing of learned routes.
<mkalcok> Though that has been already mostly acked by dceara.
<mkalcok> I'll keep my eye out on the NAT/LB parsing improvement for review.
<mkalcok> that's it from me.
<dceara> mkalcok, the more reviews the better :)
<imaximets> Thanks, mkalcok !
<imaximets> Anyone else want to give an update?
<imaximets> If not, felixhuettner you wanted to clarify some things for zhouhan regarding ovn-controller and inc-engine?
<felixhuettner> yep, i can add some more details
<dceara> Sorry, I need to drop early, thanks everyone, bye!
<mkalcok> thanks all o/
mkalcok has quit [Quit: leaving]
<felixhuettner> so we see multiple things in combination here. On the one hand a large amount of small messages, maybe at 10/sec
<felixhuettner> additionally sometimes larger messages at a maybe 1/sec rate
<felixhuettner> in addition to that the logical_flow_output recomputes come from changes to non_vif_data
<felixhuettner> we assume that is related to bfd sessions going up and down to compute nodes
<felixhuettner> at least that correlates quite well to the logs
<felixhuettner> and in combination that gets quite ugly :)
<felixhuettner> we have seen this mostly resulting in traffic outages when live-migrating a VM
<imaximets> Ah, OK. So, you hit a recompute on non-vif-data changes. During recompute you accumulate a lot of idl updates and then it takes forever to process them in batches of 50. Is that right?
<felixhuettner> as then there was a high chance that we where stuck in a recompute and maybe after that have such a long queue of messages that we cant handle them all in one increment
<felixhuettner> yep
<imaximets> Ack, makes sense.
<imaximets> zhouhan, ^
<felixhuettner> probably making that non_vif_data change incremental would also help a lot, but we need to look into that further
<imaximets> Thanks, felixhuettner, for the explanation.
<imaximets> Yep.
<felixhuettner> thanks for all the suggestions
<imaximets> In general, it would be good for recomputes to not take so long too. :)
<felixhuettner> yep, but at least perf did not give us a nice signal this time
<felixhuettner> there is just a lot of small things happening there that take a lot of time in sum
<imaximets> Ack.
<imaximets> OK, I guess we can call it a meeting.
<felixhuettner> yep, sounds good
<imaximets> Thanks, everyone! See you next week.
<felixhuettner> thanks a lot
<imaximets> Bye!
<felixhuettner> bye
<amusil> Thanks, bye
amusil has quit [Quit: Client closed]
kuraudo has quit [Remote host closed the connection]
dceara has quit [Ping timeout: 248 seconds]
zhouhan has quit [Quit: Client closed]
mj2 has quit [Ping timeout: 268 seconds]
ChmEarl has joined #openvswitch
dceara has joined #openvswitch
dceara has quit [Quit: Leaving]
mmichelson has quit [Quit: Leaving]
froyo has quit [Ping timeout: 268 seconds]