ChanServ changed the topic of #openvswitch to: Open vSwitch, a Linux Foundation Collaborative Project || FAQ: http://docs.openvswitch.org/en/latest/faq/ || OVN meeting Thurs 9:15 am US Pacific || Use ovs-discuss@openvswitch.org for questions if you don't get an answer here. || Channel logs can be found at https://libera.irclog.whitequark.org/openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
ArtGravity has quit []
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
Flows has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
froyo has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
elvira2 has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
kuraudo has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
Flows has quit [Remote host closed the connection]
Flows has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
Flows has quit [Ping timeout: 240 seconds]
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
dmitriis4 has quit [Quit: Ping timeout (120 seconds)]
dmitriis4 has joined #openvswitch
tryauuum has quit [Ping timeout: 268 seconds]
tryauuum has joined #openvswitch
mestery8 has joined #openvswitch
mestery has quit [Ping timeout: 255 seconds]
mestery8 is now known as mestery
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
kuraudo has quit [Ping timeout: 260 seconds]
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
kuraudo has joined #openvswitch
dceara has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
elvira2 has quit [Ping timeout: 260 seconds]
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
amusil has joined #openvswitch
froyo has quit [Ping timeout: 272 seconds]
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
mmichelson has joined #openvswitch
mkalcok has joined #openvswitch
<mmichelson> Hi everyone, it's time to begin the weekly upstream OVN developers' meeting.
<imaximets> o/
<fnordahl> o/
<amusil> Hi
<mkalcok> o/
<mmichelson> I can do the first update. The only thing I have to report is having done some reviews. That's all for me. Who's next?
<mkalcok> I cant give aquick one
<mkalcok> *can
<mkalcok> I'm back after vacation. Thanks dcaera and amusil for reviewing my v2 Direct SNAT access patch. I'm working on v3 based on dcaera's patch that removes old ct_commit_v1 action.
zhouhan has joined #openvswitch
<mkalcok> that's all from me.
<mmichelson> Thanks mkalcok. Who would like to go next?
<mkalcok> (sorry for messing up your name dceara)
<imaximets> May I?
<mmichelson> go ahead imaximets
<imaximets> On OVS side we re-named the master branch to main, that triggered some patches for OVN-related projects and OVn itself.
<imaximets> I'm also working on a small fix for the RAFT join in case of a large database. Will likley post later today or tomorrow.
<fnordahl> yay for inclusive naming!
<hexa-> I followed https://linuxcontainers.org/incus/docs/main/howto/network_ovn_setup/#set-up-an-incus-cluster-on-ovn to set up an OVN cluster and expected all nodes to have geneve tunnels amongst each other, but I'm only seeing them on all but the first node. https://gist.github.com/mweinelt/f0738d5359ea3cb4913cfbc69da60ce7
<imaximets> And I'll be off next week, so trying to wrap up some stuff before that. Hope to review Ales' I-P change for address sets.
<imaximets> That's all from me.
amusil19 has joined #openvswitch
<mmichelson> Thanks imaximets.
<mmichelson> hexa-, hi thanks for the question. At the moment, we're having our weekly scheduled upstream development progress meeting. If you stick around until it's over, I can answer your question for you, but for now we're going to continue.
<mmichelson> Who would like to give the next update?
<fnordahl> I have a quick update
<fnordahl> Posted a small patch for an OVS test and Python 3.13 compat, thanks for review/merge imaximets, other than that I've been doing some preparation for the next OVN A/V community meeting by inviting some folks to discuss expolring tighter integration with BGP for OVN
<fnordahl> Investigating a report of issue with load balancer vips in OVN 22.03, bisectin has so far hinted at it being fixed in 22.09, so hunting for the specific patch, migth post a backport once I find it
<fnordahl> That's it for me
<imaximets> fnordahl, np!
amusil has quit [Ping timeout: 250 seconds]
<imaximets> fnordahl, btw, do you have plans on reviving the debian packaging clean up patch-set for OVN?
<hexa-> mmichelson: sure, thanks!
<fnordahl> imaximets: yes, I should definitively be doing that, has just not climbed up to the top of the priority list. Would be good to get that cleaned up though, so will try to set aside some time for it soon, thanks for reminder!
<mmichelson> OK, who wants to go next?
<imaximets> fnordahl, thanks!
amusil19 is now known as amusil
<mmichelson> amusil, did you want to give an update? I saw you say "Hi" when the meeting started.
<amusil> I don't have anything special, just listening
<mmichelson> ah ok, I just didn't want to end the meeting if you had planned to give an update :)
<mmichelson> OK, I guess that's everyone then. Thanks, and have a good day.
<mkalcok> thanks, bye o/
<imaximets> Thanks! Bye.
<fnordahl> \o cheers
<mmichelson> hexa-, quick question, but is ovn-controller running on the "incus1" node?
<hexa-> yes
<mmichelson> hexa-, OK, just had to try the low-hanging fruit question :)
<hexa-> sure :)
<hexa-> restarting it did nothing fwiw
<mmichelson> A quick caveat: I am not familiar with incus at all.
<mmichelson> I'm guessing it's doing some "magic" under the hood to create the OVS ports on each node, but I'm not familiar with which incus commands translate to the creation of the OVS ports.
<hexa-> I have not yet configured it with incus
<hexa-> should just plain ovn at this point
<mmichelson> hexa-, oh, the link you posted talks about using incus to set up the cluster.
<mmichelson> The one thing that jumps out to me is this instruction:
<mmichelson> Create an Incus cluster by running incus admin init on all machines. On the first machine, create the cluster. Then join the other machines with tokens by running incus cluster add <machine_name> on the first machine and specifying the token when initializing Incus on the other machine.
amusil15 has joined #openvswitch
<hexa-> https://paste.lossy.network/6M looks like this, unrelated to ovn clustering
<mmichelson> Ah
<hexa-> > On the first machine, create and configure the uplink network:
<hexa-> this step I have not yet done
<hexa-> given that the ovs-vsctl show result looked off
amusil has quit [Ping timeout: 250 seconds]
<mmichelson> I was wondering if the `incus cluster add` command had something to do with it. The instructions make it sound like you run that command on the first machine, then the other machines need to have the token when initializing incus on the other machines. Is there a missing step where the first machine also needs to specify the token somewhere so that it is included in the cluster?
<hexa-> AIUI the incus clustering is entirely unrelated to ovn clustering, but incus can configure networks on an ovn cluster
<hexa-> the incus cluster add is just exchanging hosts/ports/secrets to get the intra-cluster communication of incus going
<hexa-> at this point incus has not interacted with ovn
<mmichelson> OK, got it. So on the incus2 and incus3 machines, did you ever explicitly execute an `ovs-vsctl add-port` command to add ports to br-int ?
<hexa-> no
<mmichelson> OK, I just realized that was a silly question. The only ports on br-int on those nodes are the geneve ports that would have been added by ovn-controller.
<mmichelson> Oh I just noticed something
<hexa-> oh, I see errors in the controller log on incus1
<mmichelson> On both incus1 and incus2 you have the same db-nb-cluster-local-addr and db-sb-cluster-local-addr
<mmichelson> In the OVN_CTL_OPTS
djhankb has quit [Remote host closed the connection]
<mmichelson> What errors do you see in the controller log?
djhankb has joined #openvswitch
<hexa-> sorry, that seems to be a c/p mistake
<hexa-> in the pad
<hexa-> this is what i see in the log
<mmichelson> OK, so the problem is that ovn-controller is trying to create the Encap record in the southbound database, but it can't. When it tries to insert the record, there's already an existing record in the DB that has the same values in it.
<mmichelson> I'm guessing that 82.195.93.133 is the local address for one of the nodes, but it appears this value got copied to a second node as well, maybe?
<hexa-> 133 is the one for incus1 where the tunnels are missing
<hexa-> hm, it's templated, but not unlikely
<hexa-> I can throw it away and roll it out again
<mmichelson> Interestingly, I see that incus2 and incus3 both in the `ovs-vsctl get open_vswitch . external_ids` output show the same ovn-encap-ip value. On incus2, it matches the local IP, but on incus3, it appears to have incus2's IP there instead.
<mmichelson> Oh and the hostname on incus3's output also says incus2.karo.tu.da.man-da.net, the same as incus2.
<mmichelson> And they have the same system-id.
<hexa-> ok, rolled it out again and all tunnels are in place
<hexa-> whew
<hexa-> I think I hit a few bad copy/pastes in the pad :(
<hexa-> oh and ovn-controller seems to be crashing easily on ubuntu 22.04
<hexa-> should probably get some debug symbols
<mmichelson> Yeah, I can't see much from the crash without the debug symbols. But also, yikes!
<amusil15> Which OVN version is it? A lot of CI is running on 22.04 without crashing that early
dceara has quit [Ping timeout: 256 seconds]
<hexa-> 22.03.3-0ubuntu0.22.04.3
<hexa-> so as dated as you'd expect
<amusil15> It's LTS so not really dated
<hexa-> oh, ok
<mmichelson> Well, not the current LTS ;)
<amusil15> Right but not that old :)
<mmichelson> But yeah it's still getting critical fixes. And if there is a crash that would qualify
<amusil15> And last CI on 22.03 was running a month ago, but actually on Ubuntu 20.04 so wei might be onto something
<hexa-> this is not a production setup, we'll be going for 24.04 soon enough
<hexa-> hm, I installed the ovn-central-dbgsyms package, but the trace looks as useless as before
<mmichelson> Ah you probably need the ovn-host debug symbols since ovn-controller is part of the ovn-host package.
amusil15 is now known as amusil
<hexa-> here we go
<mmichelson> OK, I'm having a look. Just don't want you to think I disappeared :)
<hexa-> I wasn't worried, there seemed to be (morbid?) curiosity at least :)
<hexa-> can provide more data, as needed
<mmichelson> So the thing is, we call sbrec_sb_global_first() in order to try to get the first record from the southbound SB_Global table. However, that's returning NULL. So when the tunnel_add() function is called, we dereference the NULL pointer and crash with a SIGSEGV.
<mmichelson> Now the question is *why* is that returning NULL?
<hexa-> I
<mmichelson> There is clearly an inbuilt assumption that this will be non-NULL.
<amusil> This can happen when there is no connection to SB
<hexa-> I'm basically setting up these machines in parallel, and it is always one who crashes
<hexa-> s/who/which/
<hexa-> or two :)
<amusil> I SB up and running at that point of time?
<amusil> Having clear connection from all of those machines etc?
<hexa-> this is directly after install with prepopulated OVN_CTL_OPTS
djhankb has quit [Read error: Connection reset by peer]
djhankb has joined #openvswitch
<amusil> Is there anything in he ovn-controller.log? There should be something along:
<amusil> |INFO|unix:/workspace/ovn/tests/testsuite.dir/473/hv1/db.sock: connecting...
<amusil> |INFO|unix:/workspace/ovn/tests/testsuite.dir/473/hv1/db.sock: connected
<amusil> In your case probably with SSL/TCP instead
<amusil> I mean we shouldn't crash even if we lose SB connection
mkalcok has quit [Quit: leaving]
<amusil> Interesting the is successful connection:
<amusil> 2024-04-11T17:09:43.656Z|00006|reconnect|INFO|tcp:82.195.93.134:6642: connecting...
<amusil> 2024-04-11T17:09:43.656Z|00007|main|INFO|OVNSB IDL reconnected, force recompute.
<amusil> 2024-04-11T17:09:43.656Z|00008|reconnect|INFO|tcp:82.195.93.134:6642: connected
<amusil> How do you start the SB server?
<mmichelson> amusil, according to the initial linked page: https://linuxcontainers.org/incus/docs/main/howto/network_ovn_setup/#set-up-an-incus-cluster-on-ovn , it's started using systemd `systemctl start ovn-central`
<amusil> That should take care of proper initialization of the DB so I'm not sure why the global table would be missing
<mmichelson> It's also weird that only one or two nodes crash. If it were a problem with the DB, I'd expect them all to crash.
<mmichelson> Or rather, I'd never expect them to crash.
<mmichelson> But if there is a crash bug related to the DB, then all ovn-controllers would hit it, I'd suspect.
<mmichelson> I need to drop for a while. I'll still be logged in here so I can check on this convo more when I get back.
<amusil> I need to drop completely I'll check the logs tomorrow
amusil has quit [Quit: Client closed]
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
dmitriis4 has quit [Remote host closed the connection]
dmitriis4 has joined #openvswitch
dmitriis4 has quit [Remote host closed the connection]
dmitriis4 has joined #openvswitch
zhouhan has quit [Quit: Client closed]
dceara has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
dceara has quit [Ping timeout: 240 seconds]
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
mmichelson has quit [Quit: Leaving]
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
kuraudo has quit [Remote host closed the connection]
ihrachys_ has quit [Ping timeout: 268 seconds]
ihrachys has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
adamcstephens has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
djhankb has quit [Remote host closed the connection]
djhankb has joined #openvswitch
dmitriis4 has quit [Remote host closed the connection]
dmitriis4 has joined #openvswitch