Catherine[m] changed the topic of #glasgow to: digital interface explorer · code https://github.com/GlasgowEmbedded/glasgow · logs https://libera.irclog.whitequark.org/glasgow · Matrix #glasgow-interface-explorer:matrix.org · discord https://1bitsquared.com/pages/chat
redstarcomrade has joined #glasgow
redstarcomrade has quit [Changing host]
redstarcomrade has joined #glasgow
cakes_ has joined #glasgow
cakes has quit [Ping timeout: 246 seconds]
joerg has quit [Ping timeout: 248 seconds]
joerg has joined #glasgow
icb has quit [Ping timeout: 260 seconds]
icb has joined #glasgow
redstarcomrade has quit [Read error: Connection reset by peer]
Guest70 has joined #glasgow
Guest70 has quit [Client Quit]
RaYmAn has quit [Ping timeout: 246 seconds]
ewenmcneill[m] has joined #glasgow
<ewenmcneill[m]> Re my earlier (and ongoing) trouble reaching the whitequark.org IRC logs, only via IPv4, I've confirmed HTTP (TCP/80) *does* work, as does IPv6 HTTPS (TCP/443). Only IPv4 HTTPS (TCP/443) appears affected, and it appears part of the TCP stream never arrives. IPv4 HTTPS worked fine until ~7-10 days ago, so it's weird.
<ewenmcneill[m]> From a packet capture it appears the reason TLS negotiation stalls is because two TCP frames never arrive -- seq=1 (empty ACK) arrives, then seq=2897 arrives (with encrypted payload), but the 2 * 1448 data frames in between do not arrive. Then there's some dupe-ACKs, and eventually I give up. Symptoms now seem repeatable. And FTR I've actually rebooted cable modem, ISP "home gateway" and relevant laptop all in the last week (to fix
<ewenmcneill[m]> other issues).
<Xesxen> Is your provider doing anything funky like NAT64? I've had similar issues where the MTU wasn't matching causing connections to "time out" in a similar manner
<ewenmcneill[m]> whitequark: FYI, there's a path MTU issue. whitequark.org server stops responding beyond 1496 octet packets, over IPv4. And it doesn't seem to affect the IPv6 path for some reason.
<ewenmcneill[m]> (Based on "off by 4 bytes" my gut feeling is that it's a VLAN tag overhead, or MPLS tag overhead for some reason. But it doesn't seem to be local to me, as I can get 1500 byte MTU through to other sites.)
<ewenmcneill[m]> And FTR, no, my ISP isn't doing NAT64 on this connection. I have native IPv4 and native IPv6 here, and no PPPoE (it's a cable modem presenting Ethernet).
<ewenmcneill[m]> FTR, I can make whitequark.org work again over IPv4 by using iptables to clamp MSS to ensure the TCP frames generated don't get too large.
<ewenmcneill[m]> Interestingly, the path MTU issue over IPv4 *also* doesn't seem to be at the whitequark.org end either. As from my colo box (same city, different ISP) full 1500 octet frames make it end to end. So apparently something *near* my local connection is doing something *selectively weird* with IPv4 to that IP or /24.
ar-jan has joined #glasgow
rogandawes[m] has joined #glasgow
<rogandawes[m]> tcptraceroute with large packets, perhaps?
<ewenmcneill[m]> Thanks Rogan, that's an interesting suggestion. My attempt right now seems to be hitting MSS clamping (even after I removed that iptables rule), so it's inconclusive, but I'll try to remember to try that again later (eg, when I've had time to reboot/otherwise expire MSS clamping)
Eli2| has joined #glasgow
Eli2_ has quit [Ping timeout: 246 seconds]
<whitequark[cis]> ewenmcneill: bleh, I see
<attiegrande[m]> I'm not going to be about for the meeting this week either unfortunately
<whitequark[cis]> ewenmcneill: let me just reboot the server to begin with
_whitelogger has joined #glasgow
<rogandawes[m]> Therapeutic reboots 🙂
bvernoux has joined #glasgow
brolin has joined #glasgow
brolin has quit [Quit: leaving]
brolin has joined #glasgow
brolin has quit [Quit: leaving]
brolin has joined #glasgow
brolin has quit [Ping timeout: 245 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 260 seconds]
bvernoux_ has joined #glasgow
bvernoux has quit [Ping timeout: 260 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 260 seconds]
brolin has joined #glasgow
bvernoux_ has quit [Read error: Connection reset by peer]
brolin has quit [Ping timeout: 246 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 258 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 245 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 260 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 252 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 256 seconds]
brolin has joined #glasgow
duderonomy has quit [Quit: Textual IRC Client: www.textualapp.com]
brolin has quit [Ping timeout: 246 seconds]
<ewenmcneill[m]> whitequark: Thanks for trying. Rebooting your server didn't make a difference (same symptoms). I also rebooted the laptop I was testing from to clear the MSS clamping state, and tested again with tcptraceroute suggested by Rogan. As best I can tell the MTU-mismatch is in the *return* path (ie your hosting provider to me), as tcptraceroute can successfully send full size (1514 ethernet, 1500 IP, 1460 TCP) "SYN" packets and get
<ewenmcneill[m]> SYN/ACK answers back.
<ewenmcneill[m]> Given electric_eel noted earlier your hosting provider closed down their public looking glass/debug tool, so it's difficult to figure out the return path, I think at this point I'm concluding "I know *how* it's being caused, but not *exactly where* it's being caused" -- and it doesn't seem to be either end. And having found it's a Path MTU issue, I know how to work around that (with MSS clamping, to avoid TCP ever generating "full
<ewenmcneill[m]> size" packets).
duderonomy has joined #glasgow