<ewenmcneill[m]>
Re my earlier (and ongoing) trouble reaching the whitequark.org IRC logs, only via IPv4, I've confirmed HTTP (TCP/80) *does* work, as does IPv6 HTTPS (TCP/443). Only IPv4 HTTPS (TCP/443) appears affected, and it appears part of the TCP stream never arrives. IPv4 HTTPS worked fine until ~7-10 days ago, so it's weird.
<ewenmcneill[m]>
From a packet capture it appears the reason TLS negotiation stalls is because two TCP frames never arrive -- seq=1 (empty ACK) arrives, then seq=2897 arrives (with encrypted payload), but the 2 * 1448 data frames in between do not arrive. Then there's some dupe-ACKs, and eventually I give up. Symptoms now seem repeatable. And FTR I've actually rebooted cable modem, ISP "home gateway" and relevant laptop all in the last week (to fix
<ewenmcneill[m]>
other issues).
<Xesxen>
Is your provider doing anything funky like NAT64? I've had similar issues where the MTU wasn't matching causing connections to "time out" in a similar manner
<ewenmcneill[m]>
whitequark: FYI, there's a path MTU issue. whitequark.org server stops responding beyond 1496 octet packets, over IPv4. And it doesn't seem to affect the IPv6 path for some reason.
<ewenmcneill[m]>
(Based on "off by 4 bytes" my gut feeling is that it's a VLAN tag overhead, or MPLS tag overhead for some reason. But it doesn't seem to be local to me, as I can get 1500 byte MTU through to other sites.)
<ewenmcneill[m]>
And FTR, no, my ISP isn't doing NAT64 on this connection. I have native IPv4 and native IPv6 here, and no PPPoE (it's a cable modem presenting Ethernet).
<ewenmcneill[m]>
FTR, I can make whitequark.org work again over IPv4 by using iptables to clamp MSS to ensure the TCP frames generated don't get too large.
<ewenmcneill[m]>
Interestingly, the path MTU issue over IPv4 *also* doesn't seem to be at the whitequark.org end either. As from my colo box (same city, different ISP) full 1500 octet frames make it end to end. So apparently something *near* my local connection is doing something *selectively weird* with IPv4 to that IP or /24.
ar-jan has joined #glasgow
rogandawes[m] has joined #glasgow
<rogandawes[m]>
tcptraceroute with large packets, perhaps?
<ewenmcneill[m]>
Thanks Rogan, that's an interesting suggestion. My attempt right now seems to be hitting MSS clamping (even after I removed that iptables rule), so it's inconclusive, but I'll try to remember to try that again later (eg, when I've had time to reboot/otherwise expire MSS clamping)
Eli2| has joined #glasgow
Eli2_ has quit [Ping timeout: 246 seconds]
<whitequark[cis]>
ewenmcneill: bleh, I see
<attiegrande[m]>
I'm not going to be about for the meeting this week either unfortunately
<whitequark[cis]>
ewenmcneill: let me just reboot the server to begin with
_whitelogger has joined #glasgow
<rogandawes[m]>
Therapeutic reboots 🙂
bvernoux has joined #glasgow
brolin has joined #glasgow
brolin has quit [Quit: leaving]
brolin has joined #glasgow
brolin has quit [Quit: leaving]
brolin has joined #glasgow
brolin has quit [Ping timeout: 245 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 260 seconds]
bvernoux_ has joined #glasgow
bvernoux has quit [Ping timeout: 260 seconds]
brolin has joined #glasgow
brolin has quit [Ping timeout: 260 seconds]
brolin has joined #glasgow
bvernoux_ has quit [Read error: Connection reset by peer]
<ewenmcneill[m]>
whitequark: Thanks for trying. Rebooting your server didn't make a difference (same symptoms). I also rebooted the laptop I was testing from to clear the MSS clamping state, and tested again with tcptraceroute suggested by Rogan. As best I can tell the MTU-mismatch is in the *return* path (ie your hosting provider to me), as tcptraceroute can successfully send full size (1514 ethernet, 1500 IP, 1460 TCP) "SYN" packets and get
<ewenmcneill[m]>
SYN/ACK answers back.
<ewenmcneill[m]>
Given electric_eel noted earlier your hosting provider closed down their public looking glass/debug tool, so it's difficult to figure out the return path, I think at this point I'm concluding "I know *how* it's being caused, but not *exactly where* it's being caused" -- and it doesn't seem to be either end. And having found it's a Path MTU issue, I know how to work around that (with MSS clamping, to avoid TCP ever generating "full