#glasgow on 2023-08-15 — irc logs at libera.irclog.whitequark.org

2023-07-13 14:05 Catherine[m] changed the topic of #glasgow to: digital interface explorer · code https://github.com/GlasgowEmbedded/glasgow · logs https://libera.irclog.whitequark.org/glasgow · Matrix #glasgow-interface-explorer:matrix.org · discord https://1bitsquared.com/pages/chat

01:46 redstarcomrade has joined #glasgow

01:46 redstarcomrade has quit [Changing host]

01:46 redstarcomrade has joined #glasgow

03:13 cakes_ has joined #glasgow

03:15 cakes has quit [Ping timeout: 246 seconds]

04:10 joerg has quit [Ping timeout: 248 seconds]

04:11 joerg has joined #glasgow

06:34 icb has quit [Ping timeout: 260 seconds]

06:35 icb has joined #glasgow

06:48 redstarcomrade has quit [Read error: Connection reset by peer]

07:24 Guest70 has joined #glasgow

07:25 Guest70 has quit [Client Quit]

08:19 RaYmAn has quit [Ping timeout: 246 seconds]

08:57 ewenmcneill[m] has joined #glasgow

08:57 * ewenmcneill[m] uploaded an image: (193KiB) < https://catircservices.org/_matrix/media/v3/download/catircservices.org/oykhocdtlZeJsqnLmVRWbvnS/whitequark-pcap.png >

08:57 <ewenmcneill[m]> Re my earlier (and ongoing) trouble reaching the whitequark.org IRC logs, only via IPv4, I've confirmed HTTP (TCP/80) *does* work, as does IPv6 HTTPS (TCP/443). Only IPv4 HTTPS (TCP/443) appears affected, and it appears part of the TCP stream never arrives. IPv4 HTTPS worked fine until ~7-10 days ago, so it's weird.

08:58 <ewenmcneill[m]> From a packet capture it appears the reason TLS negotiation stalls is because two TCP frames never arrive -- seq=1 (empty ACK) arrives, then seq=2897 arrives (with encrypted payload), but the 2 * 1448 data frames in between do not arrive. Then there's some dupe-ACKs, and eventually I give up. Symptoms now seem repeatable. And FTR I've actually rebooted cable modem, ISP "home gateway" and relevant laptop all in the last week (to fix

08:58 <ewenmcneill[m]> other issues).

09:06 <Xesxen> Is your provider doing anything funky like NAT64? I've had similar issues where the MTU wasn't matching causing connections to "time out" in a similar manner

09:08 * ewenmcneill[m] uploaded an image: (81KiB) < https://catircservices.org/_matrix/media/v3/download/catircservices.org/bPRZIslSnekXNLJQIPnYoVcI/whitequark-ping.png >

09:08 <ewenmcneill[m]> whitequark: FYI, there's a path MTU issue. whitequark.org server stops responding beyond 1496 octet packets, over IPv4. And it doesn't seem to affect the IPv6 path for some reason.

09:09 <ewenmcneill[m]> (Based on "off by 4 bytes" my gut feeling is that it's a VLAN tag overhead, or MPLS tag overhead for some reason. But it doesn't seem to be local to me, as I can get 1500 byte MTU through to other sites.)

09:10 <ewenmcneill[m]> And FTR, no, my ISP isn't doing NAT64 on this connection. I have native IPv4 and native IPv6 here, and no PPPoE (it's a cable modem presenting Ethernet).

09:16 <ewenmcneill[m]> FTR, I can make whitequark.org work again over IPv4 by using iptables to clamp MSS to ensure the TCP frames generated don't get too large.

09:35 * ewenmcneill[m] uploaded an image: (56KiB) < https://catircservices.org/_matrix/media/v3/download/catircservices.org/KVymNPmYFFLtubNtLjLppERW/whitequark-ping-colo.png >

09:35 <ewenmcneill[m]> Interestingly, the path MTU issue over IPv4 *also* doesn't seem to be at the whitequark.org end either. As from my colo box (same city, different ISP) full 1500 octet frames make it end to end. So apparently something *near* my local connection is doing something *selectively weird* with IPv4 to that IP or /24.

09:39 ar-jan has joined #glasgow

10:01 rogandawes[m] has joined #glasgow

10:01 <rogandawes[m]> tcptraceroute with large packets, perhaps?

10:18 <ewenmcneill[m]> Thanks Rogan, that's an interesting suggestion. My attempt right now seems to be hitting MSS clamping (even after I removed that iptables rule), so it's inconclusive, but I'll try to remember to try that again later (eg, when I've had time to reboot/otherwise expire MSS clamping)

10:28 Eli2| has joined #glasgow

10:32 Eli2_ has quit [Ping timeout: 246 seconds]

13:12 <whitequark[cis]> ewenmcneill: bleh, I see

13:19 <attiegrande[m]> I'm not going to be about for the meeting this week either unfortunately

13:20 <whitequark[cis]> ewenmcneill: let me just reboot the server to begin with

13:22 _whitelogger has joined #glasgow

13:32 <rogandawes[m]> Therapeutic reboots 🙂

14:03 bvernoux has joined #glasgow

15:26 brolin has joined #glasgow

15:27 brolin has quit [Quit: leaving]

15:27 brolin has joined #glasgow

15:33 brolin has quit [Quit: leaving]

15:33 brolin has joined #glasgow

17:08 brolin has quit [Ping timeout: 245 seconds]

17:14 brolin has joined #glasgow

17:49 brolin has quit [Ping timeout: 260 seconds]

17:58 bvernoux_ has joined #glasgow

18:01 bvernoux has quit [Ping timeout: 260 seconds]

18:12 brolin has joined #glasgow

18:26 brolin has quit [Ping timeout: 260 seconds]

18:34 brolin has joined #glasgow

18:37 bvernoux_ has quit [Read error: Connection reset by peer]

18:38 brolin has quit [Ping timeout: 246 seconds]

19:07 brolin has joined #glasgow

19:12 brolin has quit [Ping timeout: 258 seconds]

19:42 brolin has joined #glasgow

19:47 brolin has quit [Ping timeout: 245 seconds]

19:53 brolin has joined #glasgow

20:09 brolin has quit [Ping timeout: 260 seconds]

20:16 brolin has joined #glasgow

20:20 brolin has quit [Ping timeout: 252 seconds]

20:50 brolin has joined #glasgow

20:55 brolin has quit [Ping timeout: 256 seconds]

21:15 brolin has joined #glasgow

21:50 duderonomy has quit [Quit: Textual IRC Client: www.textualapp.com]

22:03 brolin has quit [Ping timeout: 246 seconds]

22:26 <ewenmcneill[m]> whitequark: Thanks for trying. Rebooting your server didn't make a difference (same symptoms). I also rebooted the laptop I was testing from to clear the MSS clamping state, and tested again with tcptraceroute suggested by Rogan. As best I can tell the MTU-mismatch is in the *return* path (ie your hosting provider to me), as tcptraceroute can successfully send full size (1514 ethernet, 1500 IP, 1460 TCP) "SYN" packets and get

22:26 <ewenmcneill[m]> SYN/ACK answers back.

22:28 <ewenmcneill[m]> Given electric_eel noted earlier your hosting provider closed down their public looking glass/debug tool, so it's difficult to figure out the return path, I think at this point I'm concluding "I know *how* it's being caused, but not *exactly where* it's being caused" -- and it doesn't seem to be either end. And having found it's a Path MTU issue, I know how to work around that (with MSS clamping, to avoid TCP ever generating "full

22:28 <ewenmcneill[m]> size" packets).

22:57 duderonomy has joined #glasgow