Discovering the Largest Supported MTU

RCL_SPD

4.00/5 (1 vote)

Jan 28, 2014

CPOL

4 min read

7316

Discovering the largest supported MTU

1. Weird 'connection reset' Problems with Google Sites

I recently bought a new Windows laptop for my Mum, and she immediately complained about not being able to reach either Google or GMail with any browser. Interestingly, all other sites (including competing search engines) were working fine and I could ping both google.com and gmail.com. However, trying to open them or any other Google-related site in either FF, Opera, Chrome or Internet Explorer resulted in 'connection reset' errors - and deleting cookies did not help.

After having 'binged' for a while, I found out that we were not the first people to experience this problem. Those findings misled me into thinking that the problem was specific to Mum's laptop and possibly caused by malware redirecting browsers to some weird address or such. This misguided conjecture was reinforced by the fact that other computers on our home LAN (particularly Mum's Linux desktop) were not affected.

I tested her laptop for malware and reset Windows sockets and IP registry settings (

netsh winsock reset
netsh int ip reset

), but this didn't help. Puzzled, I decided to dive deeper into this.

2. The Cause

While tweaking wi-fi router settings, I noticed that its MTU was set to default value (1500, the Ethernet MTU). MTU, short for Maximum Transmission Unit, determines the maximum size of a single IP datagram that can be transmitted over a certain link, but this normally does not affect TCP traffic - segments of size larger than MTU will be fragmented (split into parts).

However, this is not the case when Path MTU Discovery is used. With PMTUD, all fragments have Don't Fragment flag set and any router, if it cannot forward such a large datagram further and has to fragment it, will instead drop the datagram and send back appropriate notification with its own (smaller) MTU, in accordance to which packets should be split when resending.

My problems were caused by PMTUD failure, possibly due to paranoid Windows firewall filtering out ICMP traffic. Router's WAN uplink (a FreeBSD server of mine) has a PPPoE connection with smaller MTU than 1500. After examining traffic coming from router with tcpdump, I discovered that it sent 1500 bytes sized datagrams with Don't Fragment flag, which were dropped. For some reason, Windows box never saw an ICMP Fragmentation Needed reply with appropriate MTU or ignored that.

3. The Solution

The solution for this particular problem was simple: I set the router's MTU to one matching PPPoE link and things started to work. However, I wanted to have a tool that would tell me what was the largest MTU which is safe to use on a particular path. In other words, I wanted to be able to discover path MTU manually.

I tried to Google for such a tool, but didn't find anything that would do just this. Since basic ping can send packets of specified size with Don't Fragment flag, there's no practical need in a specialized tool and most advices go along these lines: try pinging with 1500 byte sized payload and decrement by 10 bytes until you get a reply. While this will work perfectly in most cases, this solution assumes a priori knowledge about how large MTU is likely to be and is tedious. (By the way, BSD systems - FreeBSD in particular - support 'size sweeping' pings, which make such approach less tedious).

4. The Automated Solution

Below is mtu-discovery.sh, a shell script which I wrote to automate discovery of the largest possible MTU in a general case. It starts with pretty low number and keeps doubling it until packets of that size are dropped. Then it bisects the range between size that doesn't work and last known good size to find the exact value.

This script should find the largest MTU faster (i.e., with less pings) than any size-sweeping solution, and it also doesn't assume any upper bound for MTU. I tried to make it portable, and the script runs on Linux, FreeBSD and probably Mac OS X, but I haven't tested it on the latter.

#!/bin/sh
# RCL'24.09.2011

ABS_PATH_PING=/bin/ping
DONT_FRAGMENT_SWITCH="-M do"
NUM_PINGS=3
PING_OVERHEAD=28 # IPv4

SIZE_KNOWN_TO_WORK=100
SIZE_NOT_WORKING=100

test_ping()
{
    local ADDR=$1
    local SIZE=$2
    # ping with Don't Fragment flag set
     $ABS_PATH_PING -c $NUM_PINGS -s $SIZE $DONT_FRAGMENT_SWITCH $ADDR >/dev/null 2>/dev/null
}

determine_range()
{
    local ADDR=$1
    SIZE_KNOWN_TO_WORK=$SIZE_NOT_WORKING
    SIZE_NOT_WORKING=$(( 2 * $SIZE_NOT_WORKING ))

    test_ping $ADDR $SIZE_NOT_WORKING
    if [ $? -eq 0 ]; then
        # keep doubling SIZE_NOT_WORKING until it actually doesn't work 
        determine_range $ADDR
    fi 
}

bisect()
{
    local ADDR=$1
    local MIN=$2 # known to work
    local MAX=$3 # known not to work
    local MID=$(( ($MIN + $MAX) / 2 ))
    local RESULT=0

    # if cannot bisect further, return the value known to work
    if [ $MIN -eq $MID ] || [ $MAX -eq $MID ]; then
        echo $MIN
        return
    fi

    test_ping $ADDR $MID

    if [ $? -eq 0 ]; then
        RESULT=`bisect $ADDR $MID $MAX`
    else
        RESULT=`bisect $ADDR $MIN $MID`
    fi

    echo $RESULT
} 

if [ $# -lt 1 ]; then
    echo "Usage: `basename $0` <addr>"
    exit 1
fi

# infer the OS from ping location and use appropriate switches 
if [ -e /bin/ping ]; then
    ABS_PATH_PING=/bin/ping
    DONT_FRAGMENT_SWITCH="-M do"
    echo "Using Linux ping."
elif [ -e /sbin/ping ]; then
    ABS_PATH_PING=/sbin/ping
    DONT_FRAGMENT_SWITCH=-D
    echo "Using BSD ping."
else
    echo "ping does not exist in either /bin or /sbin, 
          assuming Linux and using whatever you have in PATH."
    ABS_PATH_PING=`which ping`
fi

test_ping $1 0
if [ $? -ne 0 ]; then
    echo "Site is unreachable."
    exit 2
fi

determine_range $1

echo "Bisecting $(($SIZE_KNOWN_TO_WORK + 
      $PING_OVERHEAD)) - $(($SIZE_NOT_WORKING + $PING_OVERHEAD)) bytes range."

MTU=`bisect $1 $SIZE_KNOWN_TO_WORK $SIZE_NOT_WORKING`
echo "Largest working MTU is $(($MTU + $PING_OVERHEAD)) bytes."

exit 0

An obvious (and potentially troublesome) assumption here is that path MTU isn't changed while this script is running. Since we cannot control the path, this is impossible to guarantee, and this limits the usefulness of this script.