Tuesday, December 8, 2015

MPLS testbed on Ubuntu Linux with kernel 4.3

MPLS in the kernel

Linux 4.3 was released last month, and one of the long-awaited features was MPLS support in the kernel. There is still a the odd bug to iron out, but you can get a working MPLS testbed with the current kernel source (plus a single patch to fix a showstopper).

Building the kernel

  1. Download the source of kernel 4.3 from here: https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.3.tar.xz
  2. Unpack the tarball (tar -xf linux-4.3.tar.xz)
  3. Enter the newly-created linux-4.3 directory, run make menuconfig, and enable lwtunnel support, mpls-iptunnel support, mpls-gso support, and mpls-router support.
  4. Apply the patch from http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/diff/?id=fe82b3300ec9c0dc4ba871f9a58b265aadf4e186 (this fixes a problem with sending MPLS packets)
  5. Build the kernel: make -j `getconf _NPROCESSORS_ONLN`
  6. Once this has finished, build the debian packages: make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-mplsfix
  7. This will create a bunch of .deb files in the parent directory - copy both linux-image-4.3.0-mplsfix_amd64.deb and linux-headers-4.3.0-mplsfix_amd64.deb to the machine you want to install your new kernel on
  8. Install the kernel with dpkg -i [package name]
  9. Reboot, select Advanced options for booting Ubuntu, and choose your new kernel
  10. You are all ready to go!
edit: easier way with a docker container: https://github.com/samrussell/kernelbuilder

Enabling MPLS

The MPLS modules aren't loaded by default, so you'll need to load them yourself:

modprobe mpls_router
modprobe mpls_gso
modprobe mpls_iptunnel
sysctl -w net.mpls.conf.enp0s9.input=1
sysctl -w net.mpls.conf.lo.input=1
sysctl -w net.mpls.platform_labels=1048575

You'll need to set net.mpls.conf.[interface-name].input=1 for any other interfaces that you plan to receive MPLS packets on, otherwise the MPLS route table won't accept your routes.

Applying MPLS routes

The latest release of iproute2 isn't quite ready, so we'll need to live life on the bleeding edge and build this from source too

git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
cd iproute2
./configure
make
sudo make install

Once this is done, we can see that iproute2 has a few more options available for us - try ip route help and see what is available.

Some route examples:

Routing 10.10.10.10/32 to 192.168.1.2 with label 100: ip route add 10.10.10.10/32 encap mpls 100 via inet 192.168.1.2

Label swapping 100 for 200 and sent to 192.168.2.2: ip -f mpls route add 100 as 200 via inet 192.168.2.2

Decapsulating label 300 and delivering locally: ip -f mpls route add 300 dev lo

Testbed setup

We're going to make use of network namespaces here to set up a couple of hosts. The plan is as follows:
  • Base machine: has veth0 (plugs into veth1) and veth2 (plugs into veth3)
  • Host1: Has veth1 (plugs into veth0)
  • Host2: Has veth3 (plugs into veth2)
We will use label 111 for traffic from host1 to host2, and label 112 for traffic from host2 to host1. We will use penultimate hop popping here (as opposed to label swapping), but feel free to play with this and get different results.

Setup (all executed as root):

ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
sysctl -w net.mpls.conf.veth0.input=1
sysctl -w net.mpls.conf.veth2.input=1
ifconfig veth0 10.3.3.1/24 up
ifconfig veth2 10.4.4.1/24 up
ip netns add host1
ip netns add host2
ip link set veth1 netns host1
ip link set veth3 netns host2
ip netns exec host1 ifconfig lo 10.10.10.1/32 up
ip netns exec host1 ifconfig veth1 10.3.3.2/24 up
ip netns exec host2 ifconfig lo 10.10.10.2/32 up
ip netns exec host2 ifconfig veth3 10.4.4.2/24 up
ip netns exec host1 ip route add 10.10.10.2/32 encap mpls 112 via inet 10.3.3.1
ip netns exec host2 ip route add 10.10.10.1/32 encap mpls 111 via inet 10.4.4.1
ip -f mpls route add 111 via inet 10.3.3.2
ip -f mpls route add 112 via inet 10.4.4.2

Testing (executed as root due to netns):

ip netns exec host2 ping 10.10.10.1 -I 10.10.10.2

Results:

tcpdump -envi veth0
tcpdump: listening on veth0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:14:14.687380 9a:08:f4:cf:aa:9c > 12:c7:db:9d:a5:25, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 53781, offset 0, flags [DF], proto ICMP (1), length 84)
    10.10.10.2 > 10.10.10.1: ICMP echo request, id 1359, seq 1, length 64
21:14:14.687404 12:c7:db:9d:a5:25 > 9a:08:f4:cf:aa:9c, ethertype MPLS unicast (0x8847), length 102: MPLS (label 112, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 19009, offset 0, flags [none], proto ICMP (1), length 84)
    10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1359, seq 1, length 64
21:14:15.701789 9a:08:f4:cf:aa:9c > 12:c7:db:9d:a5:25, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 53845, offset 0, flags [DF], proto ICMP (1), length 84)
    10.10.10.2 > 10.10.10.1: ICMP echo request, id 1359, seq 2, length 64
21:14:15.701810 12:c7:db:9d:a5:25 > 9a:08:f4:cf:aa:9c, ethertype MPLS unicast (0x8847), length 102: MPLS (label 112, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 19246, offset 0, flags [none], proto ICMP (1), length 84)
    10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1359, seq 2, length 64

tcpdump -envi veth2
tcpdump: listening on veth2, link-type EN10MB (Ethernet), capture size 262144 bytes
21:14:45.714220 8e:d5:9d:07:9a:5c > d6:8a:7c:5e:5b:0f, ethertype MPLS unicast (0x8847), length 102: MPLS (label 111, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 55648, offset 0, flags [DF], proto ICMP (1), length 84)
    10.10.10.2 > 10.10.10.1: ICMP echo request, id 1363, seq 1, length 64
21:14:45.714251 d6:8a:7c:5e:5b:0f > 8e:d5:9d:07:9a:5c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 22394, offset 0, flags [none], proto ICMP (1), length 84)
    10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1363, seq 1, length 64
21:14:46.717538 8e:d5:9d:07:9a:5c > d6:8a:7c:5e:5b:0f, ethertype MPLS unicast (0x8847), length 102: MPLS (label 111, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 55848, offset 0, flags [DF], proto ICMP (1), length 84)
    10.10.10.2 > 10.10.10.1: ICMP echo request, id 1363, seq 2, length 64
21:14:46.717570 d6:8a:7c:5e:5b:0f > 8e:d5:9d:07:9a:5c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 22412, offset 0, flags [none], proto ICMP (1), length 84)
    10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1363, seq 2, length 64

It works!

Next steps

We have software routers such as Quagga and BIRD, and these speak some of the more traditional protocols such as OSPF and BGP. We now need LDP daemons, and other linux software to stand up l2vpn and l3vpn.

Thanks to the team on the netdev mailing list, they have been super responsive and helpful.

10 comments:

  1. hi,

    The makefile install target use the "DESRDIR" to install for example, ip, route....
    But this parameter is missing. So all installation is done to "/sbin" instead of "/usr/sbin".
    When check the program ip "which ip", I can see that the system use the program from "/usr/sbin/ip" which is not from the iproute2 build.
    To solve this issue, just add "DESTDIR?=/usr" to the /iproute2/Makefile


    ReplyDelete
  2. Hi Sam,

    I'm trying to exercise your example, but to no avail. I can install all the routes, confirm with tcpdump that label push works, but neither label swap nor label pop do.
    This is Ubuntu 16.04 with kernel 4.4, and a iproute2 built from current HEAD.

    I had to make the following amendments to your instructions:

    - in a netns, all interfaces (lo and all moved into it) have mpls disabled, so I need to re-issue again "echo 1| tee /proc/sys/net/mpls/conf/*/input; echo 1048575 > /proc/sys/net/mpls/platform_labels"

    - Local forwarding is off by default, enable with "echo 1 > /proc/sys/net/ipv4/ip_forward". However, packets are still not forwarded after being received as MPLS.

    ReplyDelete
    Replies
    1. Where you able to solve your problem? I have the same issue. When I try to do a swap or pop I get the error:

      RTNETLINK answers: Invalid argument

      Delete
  3. what settings needs to be done when using TCP ??
    I am currently using 3 physical hosts where one being router between other two hosts TCP communications. I used same configuration above which I was able to ping but not able to run TCP traffic between those

    ReplyDelete
  4. You need at least iproute2 version 4.3. Ubuntu 16.04.1 only ships with iproute2 version 4.2, so you still need to build from git as per instructions.

    azrdev, I'm not sure what's going wrong for you, sorry :( I am working on an LSR at the moment though, currently I've got it semi-peering with some mikrotik cloud routers so that might be another way to get a testbed going

    ReplyDelete
  5. Hi/Czesc Sam.
    I am trying to get my head around this MPLS setup with the Ubuntu, but today after 10h of work I cannot make this work.
    Can you send me your email as I could get a bit of help from you?
    My email is dariusz.terefenko@gmail.com

    ReplyDelete
  6. Hi Sam,

    I tried with your example.
    It works fine for encapsulation but not working for label swapping and decapsulation.

    Am I missing something ? Any idea ?

    Thanks in Advance

    ReplyDelete
  7. Can you share same example without PHP ?

    ReplyDelete
  8. Sam,
    I have 2 separate physical hosts connected thru a simple HUB I'm struggling trying to get something similar to your example to work.
    I have 2 Intel NUC's thru a Gb hub. The NUC's are:
    10.4.4.2/16 and 10.4.3.2/16. Both running basic installs of Ubuntu 20.04.
    on Box1 - 10.4.3.2 I have
    ip link add link enp0s25 name mv1 type macvlan mode bridge
    ip addr add 10.4.3.3 dev mv1
    sysctl -w net.mpls.conf.mv1.input=1
    ifconfig mv1 10.4.3.3 up
    ip route add 10.4.4.3/32 encap mpls 112 via inet 10.4.3.3
    ip -f mpls route add 112 dev mv1
    ip -f mpls route add 111 dev mv1
    On Box2 I have:
    ip link add link eno1 name mv1 type macvlan mode bridge
    ip addr add 10.4.4.3 dev mv1
    sysctl -w net.mpls.conf.mv1.input=1
    ifconfig mv1 10.4.4.3 up
    ip route add 10.4.3.3/32 encap mpls 111 via inet 10.4.3.3
    ip -f mpls route add 112 dev mv1
    ip -f mpls route add 111 dev mv1

    I used a macvlan to get a MAC address for the MPLS. If I ping from Box1 to Box2, ping 10.4.4.3 -I 10.4.3.3 I see the ICMP-request packets with 112 but never see an ICMP-reply packet. I"m sure this is something simple. Any suggestions are appreciated.
    Thanks,
    Michael
    garlieb@me.com







    ReplyDelete