An Overview of Multipath TCP

Multipath TCP (MPTCP) is an evolution of TCP, allowing it to run over multiple paths at the same time
Transparent to applications, they need not know nor care

Introduction and Motivation

TCP is used by the majority of applications
Neither mobile devices nor computers with many network interfaces were an immediate design priority
TCP designers did account for the fact that network links could (and will) fail
Thus: decouple network-layer protocols (IP) from transport-layer protocols (TCP)
This way, network can re-route packets around failures, without affecting TCP connections
Largely done through dynamic routing protocols
Much easier to do since they need not know about transport-layer connections

Today's networks are multipath:
Devices often have multiple wireless interfaces, meaning traffic can take multiple paths
Datacenters have redundant paths and are multi-homed
...
However, TCP is still stuck to one connection: the connection is bound to the IP addresses used when setting up the connection
If one address changes, the connection will fail
TCP connection can't even be load balanced across links in a network, as this causes packet re-ordering which TCP mis-interprets as congestion, and slows down

This mismatch between network and TCP leaves performance on the table
When switching interfaces (e.g. due to failures), TCP connections stall and fail: there is no way to migrate them
Modern data centers have many redundant links, leaving unused capacity on the table, since we cannot load balance across them

MPTCP is a major change to TCP to allow multiple paths to be used at the same time by a single connection

Overview of MPTCP Operation

Design influenced by 2 main requirements

A TCP connection can be divided into three phases:

  1. Connection establishment
    1. Handshake
      1. SYN (Synchronize), includes source port and initial seq. number
      2. SYN+ACK, includes server's initial seq. number
      3. ACK, connection is now established
  2. Data transfer
    • Clients can send data (called segments)
    • Seq. number used to tell them apart, re-order them and detect loss
    • TCP header contains a cumulative acknowledgement, tells the sender the next expected seq. number
  3. Connection release
    1. Can be closed abruptly with reset (RST)
    2. Normal way is to use finish (FIN) packets, these indicate the last byte sent
    3. Connection is terminated once both sides have acknowledged the FIN packets

MPTCP allows multiple subflows to be set up
First subflow starts similar to a normal TCP session
After the first is set up, additional subflows can be added
Each looks similar to a regular TCP session, with 3-way handshake and FIN teardown
Data can be sent over any existing subflow
Subflows can be added and removed during the lifetime, without affecting data transport

Usually, subflows take different paths, with different characteristics (e.g. delay)
This can cause packets to arrive out-of-order
Regular TCP uses the seq. number in each header to re-order
MPTCP could re-use this, but this could cause issues
Some middle boxes behave strangely when they only see a subset of the seq. numbers
Instead, introduce a per-flow seq. number (in the normal header spot) and a global seq. number (called data sequence number/DSN inside a TCP option)
This way, every middlebox will see al consecutive seq. numbers
Global seq. number is used for ordering

Example

Consider a smartphone
It uses its LTE interface to start a TCP connection
The SYN segment will include the MP_CAPABLE TCP option, indicating MPTCP support
Also contains a key, chosen by the sender
Server replies with SYN+ACK, also containing the MP_CAPABLE option
Also contains a key, chosen by the server
MPTCP connection is established, segments can be exchanged over the LTE path

Trying to use the Wi-Fi interface without setting up a subflow will almost always fail
Sending packets naively over the interface will fail, as these packets will contain the LTE IP and be dropped by the ISP
Telling the server the Wi-Fi IP and sending packets will fail, as stateful firewalls expect to see a SYN before data packets
The only option is to do a full 3-way handshake
Set up a new subflow by doing a 3-way handshake
Includes MP_JOIN TCP option, using the previously shared keys to ascertain this is actually the same device
Server replies with MP_JOINin the SYN+ACK and the subflow is established

Congestion Control (CC)

CC is one of the most important parts of TCP
Allows it to adapt to changing network conditions
To do this, each TCP sender maintains a congestion window
Determines the amount of packets the sender can send without waiting for ACK (i.e. the amount of in-flight data)
Congestion window is updated dynamically
Grow linearly when no congestion
Halved when congestion occurs
Enables fairness
Each connection will eventually convergence on the same average value

How do we define this in MPTCP? There are 3 requirements for MTPCP-CC:

  1. It will share links with normal TCP, and should be fair in regards to those (not get more bandwidth)
  2. The performance of all subflows together should be at least that of a regular TCP flow on any of the paths in use (otherwise, why bother)
  3. MPTCP should prefer efficient paths, so send traffic on paths with less congestion

Req. 3 will make wide-area load balancing happen
If large amounts of traffic are multi-path, they will direct their traffic away from congested links, onto less-congested links, thus evening out

These req. are achieved by:
Giving each subflow it's own CC mechanism
Each halves when experiencing congestion
During increase, non-congested links are allowed to increase proportionally more than congested ones
The total increase across all subflows is decided to achieve req. 1 and 2

Implementation and Performance

A few use cases:
Mobile devices
Data centers
Multi-homed web servers
Dual-stack IPv4/IPv6 hosts

The test setup: Device is connected to Wifi and LTE, Wifi goes down, device has to switch
Test: download file over HTTP
Scenario 1: application-layer handover, application detects interface going down, uses HTTP range header to resume download
Scenario 2: use MPTCP to maintain connection

Whereas app-layer handover shows a downtime of 3s, using MPTCP shows a smooth handover

Second test: use MPTCP to measure iperf between EC2 instances
Shocker, the more subflow you add, the more performance you get, as it is able to use multiple redundant links

Conclusion

MPTCP allows apps to achieve better performance, w/o needing to add explicit support