Showing posts with label openflow controller. Show all posts
Showing posts with label openflow controller. Show all posts

Friday, September 7, 2012

Tunneling traffic through your OpenFlow controller - Building a POX-based OpenFlow router

Why would you do this?

If we want to make an OpenFlow router, we need to be able to communicate with other non-OpenFlow routers. Normally, you would assign an IP address to your router, turn on BGP/OSPF, and then configure these protocols to talk to other routers using this IP address. With OpenFlow, the controller has the brains, but no obvious way to talk to other network devices. If only we could pretend that the controller was in the router somehow...

Can't we just look at the OpenFlow messages?

Sure, and we looked at this last week, but it's clumsy and means we need to reinvent the wheel to make software routers talk to POX. RouteFlow abstracts this by loading software routers in virtual machines, last week's demonstration hardcodes everything into the controller, but tunnelling gives us a middle-of-the-road solution: no virtual machines needed, but we can still bind stuff to a network interface on the controller and let the linux network stack handle already-solved problems like TCP and the like.

Building a tunnel

Linux has a fantastic tool called TUN/TAP, which lets you create virtual network interfaces. One end talks to the Linux network stack and lets any application use it, and the other end talks to our program. In the spirit of keeping things modular, and minimising opportunities for me to write bad code, I've used the PyTap library to set this up. PyTap has a PIP package, which means we can easily add it to a virtualenv and continue to keep everything self-contained.

Protip: TUN interfaces take IP packets, TAP interfaces take Ethernet packets

If you haven't used virtualenvs, here's the basic idea:

virtualenv tundemo
cd tundemo
source bin/activate
pip install pytap
git clone http://github.com/noxrepo/pox

This will set you up with a virtualenv that has POX and PyTap ready to go. Despite being in a virtualenv, PyTap still needs root privileges, so you'll need to be root before source'ing into your virtualenv to make this work. If anyone can show me how to make this work without root privileges I'll be happy to hear (presumably some trickery with the /dev/net/tun device)

As with my other modules, I've hacked code into a copy of forwarding.l2_learning - this time I've renamed it to tundemo, and changed the name of the class all through the source.

Here are all my imports, add these at the top:

from pytun import TunTapDevice, IFF_TAP
from pox.lib.addresses import *
from pox.lib.packet import *
from threading import Thread
import subprocess

In the __init__() function, I've put the following code to make the TAP device:

    # Our table
    self.macToPort = {}
    
    # TAP device
    self.tap = TunTapDevice(flags=IFF_TAP)
    self.tap.addr = '10.1.1.13'
    self.tap.netmask = '255.255.255.0'
    self.tap.mtu=1300
    print "hwaddr for " + self.tap.name + ": " + str(EthAddr(self.tap.hwaddr))
    
    # Bring tap interface up
    subprocess.check_call("ifconfig " + self.tap.name + " up", shell=True)

PyTap chooses a random MAC address when it creates the interface, so printing it out lets us debug things a bit easier.

Tunneling fron TAP to switch

Once we have our TAP interface up, we need to handle packets that we receive on it. Let's set up a thread to handle this

# Create thread to read from tap and send to switch
    self.th = Thread(target=handle_tap_in, args=(self))
    self.th.daemon = True
    self.th.start()

    # Set max packet size to 1400 bytes
    self.connection.send(of.ofp_set_config(miss_send_len=1400))

Our handler function is fairly straightforward

def handle_tap_in(switch):
  while True:
    packettap = switch.tap.read(switch.tap.mtu+24)
    print "Packet read from tap"
    e = ethernet()
    e.parse(packettap[4:])
    
    port = of.OFPP_ALL
    if e.dst in switch.macToPort:
        port = switch.macToPort[e.dst]
    
    msg = of.ofp_packet_out()
    msg.data = packettap[4:]
    msg.actions.append(of.ofp_action_output(port =
                                          port))
    switch.connection.send(msg)

This will send all packets that come up on the tap0 interface to the switch, and either floods them or sends them on the right port, depending on what MAC addresses we've already learned.

Tunneling from switch to TAP

We already get sent packets from the switch by default, and these go to the _handle_PacketIn() function. We just need to get the raw data out and send this to the TAP interface

My switch always sends VLAN-tagged packets, so if yours doesn't then you'll want to change this a bit. Here is the SendToTap() function:

def SendToTap():
     # remove vlan header and rebuild
      print "Forwarding packet"
      v = packet.next
      i = v.next
      eth = ethernet(src=packet.src, dst=packet.dst, type=v.eth_type)
      print type(i)
      eth.set_payload(i)
      # first 4 bytes are 00 00 08 00 (null short, then IPv4 ethertype)
      totap = struct.pack('!bbbb', 0, 0, 8, 0) + eth.pack()
      #print totap.encode('hex')
      self.tap.write(totap)

And we call this when a packet comes to us with a multicast MAC or our MAC:

if packet.dst == EthAddr(self.tap.hwaddr):
      print "Packet for us!"
      SendToTap()
      return

if packet.dst.isMulticast():
      SendToTap()
      flood() # 3a

Now the tunnel is all good to go. Just make sure any devices plugged into the switch have an MTU of 1300, and you can talk to the controller, transfer files off with SCP (30 minutes to copy an Ubuntu ISO at around 4Mb/s)

A couple of hiccups


Packet sizes

My switch doesn't seem to handle having the packet-size value changed. POX by default tells the switch to send the first 128 bytes of packets, and while we can send messages to increase this, they're ignored. The work-around is to change DEFAULT_MISS_SEND_LEN to 1400 in pox/openflow/libopenflow01.py

Jitter

Latency varies from 1ms to 50ms, and TCP really, really doesn't like this. UDP routing protocols like OSPF shouldn't notice this, and even TCP-based routing protocols like BGP should be fine - but TCP gets really confused and this means you shouldn't expect any large data flows to work well with this.

MTU sizes

This stuff confuses me. I'm a network engineer, and I'm supposed to know this stuff, but I don't. When we read from the TAP device, we read the MTU + 24 bytes. There's 14 bytes for the Ethernet header, 4 bytes for the TAP header, and another 6 bytes in there for no obvious reason. 24 bytes just seems to work, and I have no idea why.

TAP device

Two things bug me about this - there doesn't seem to be a nice way to bring it up (apart from using ifconfig), and you need root to create it in the first place - I'd want to fix both of these for a nicer solution

Next steps

  • TAP devices could be created for each physical port on an OpenFlow device, or as routed interfaces for each VLAN - limitless opportunities here
  • BIRD or Quagga could bind to a TAP device, and the controller could turn routes into flows. BIRD has a python interface, but since both use standard routing protocols, you could easily sniff the traffic and build routing tables out of these. Sniffing BGP updates is still way easier than trying to build a Python TCP stack
  • VRFs? Traffic injection? Just another example of how easy it is to grab POX and do novel things with inexpensive hardware

Thursday, March 29, 2012

Polishing pyswitch

Polishing pyswitch

I've had my modified version of pyswitch running on NOX for a couple of weeks, and it's working fine. The key to OpenFlow is the controller - if your controller is processing a lot of packets, then it's a bottleneck; but if all your traffic is matching flows in the switch, then it will work at line speed.

As I've been using the switch for more and more test servers, I've noticed that my modifications have oversimplified things a little. Here's a summary of the current pyswitch logic:

  1. If a packet doesn't match a flow in the switch, send to the controller
  2. For each packet sent to the controller, save the source address and source port
  3. If the controller gets a packet with a destination address it knows, it sends it to that port and installs a new flow into the switch
Do you see the problem? It's fine with two computers on the switch, but here's how it works with three:
  1. PC A sends a packet to PC B. No flows in the switch so the controller gets the packet, saves the address and port of A, and floods the packet
  2. PC B replies. No flow matched, controller gets the packet, saves the address and port of B, and recognises PC A. Controller then forwards the packet to the port that PC A was seen on, and installs a flow into the switch
  3. PC A sends another packet to PC B. No flow matched, controller gets the packet, recognises address of PC B so it forwards the packet and stores a flow in the switch.
  4. Flows are in the switch for both PC A and PC B, so packets to them are sent at line speed without touching the controller
What happens when PC C comes along?
  1. PC C sends a packet to PC A. There is a flow for this, so it is forwarded at line speed in the switch
  2. PC A replies to PC C. No flow, so the controller gets the packet, saves the source details (address and port of PC A), doesn't have details of PC C so it floods the packet
Do you see the problem? The source details of PC C never get stored, because all its outbound packets match flows in the switch. This is a serious problem - it means that all of the traffic back to PC C goes through the OpenFlow controller at about 10 packets per second, breaking the network.

The original pyswitch didn't have this problem - it created very specific flows based on all the source and destination attributes. I could have fixed it up to handle VLANS better (by making it recognise ethertype 0x8100 as VLAN and move up the header for the actual ethertype), but this isn't efficient - a connection to a website would have 2 flows for the original arp requests, another 2 for the dns lookup, and another 2 for the TCP connection - 6 flows for a single web page?

We could strike a compromise and set flows based on the source and destination MAC addresses, but I still don't like that. It means that for N MAC addresses on the switch, you go from N flows to NxN flows; for a 48-port switch, this is from 48 flows to 2,304 flows. It may be a case of trading extra flows for simpler code, but I think I have a better solution.

My new addition to pyswitch adds a flow to the switch whenever it has to flood a packet. The idea is, when PC C comes along and sends a packet, we want that to go to the controller, even if we know the destination. Here's the new code:

# --
# If we've learned the destination MAC set up a flow and
# send only out of its inport.  Else, flood.
# --
def forward_l2_packet(dpid, inport, packet, buf, bufid):    
    dstaddr = packet.dst.tostring()
    if not ord(dstaddr[0]) & 1 and inst.st[dpid].has_key(dstaddr):
        prt = inst.st[dpid][dstaddr]
        if  prt[0] == inport:
            log.err('**warning** learned port = inport', system="pyswitch")
            logger.info('**warning** learned port = inport')
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_ALL, inport)
        else:
            # We know the outport, set up a flow
            log.msg('installing flow for ' + mac_to_str(packet.dst), system="pyswitch")
            logger.info('installing flow for ' + mac_to_str(packet.dst))
            # delete src flow if exists
            delflow = {}
            delflow[core.DL_SRC] = packet.dst
            inst.delete_datapath_flow(dpid, delflow)
            # sam edit - just load dest address, the rest doesn't matter
            flow = create_l2_out_flow(packet)
            actions = [[openflow.OFPAT_OUTPUT, [0, prt[0]]]]
            inst.install_datapath_flow(dpid, flow, CACHE_TIMEOUT, 
                                       openflow.OFP_FLOW_PERMANENT, actions,
                                       bufid, openflow.OFP_DEFAULT_PRIORITY,
                                       inport, buf)
    else:    
        # haven't learned destination MAC. Flood 
        if ord(dstaddr[0]) & 1:
            logger.info('broadcast/multicast packet to ' + mac_to_str(packet.dst) + ', flooding')
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_ALL, inport)
        else:
            logger.info('no MAC known for ' + mac_to_str(packet.dst) + ', flooding')
            # set up flow to capture source packet
            flow = {}
            flow[core.DL_SRC] = packet.dst
            actions = [[openflow.OFPAT_OUTPUT, [65535, openflow.OFPP_CONTROLLER]]]
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_ALL, inport)
            inst.install_datapath_flow(dpid, flow, CACHE_TIMEOUT,
                                       1, actions,
                                       None, openflow.OFP_DEFAULT_PRIORITY+1,
                                       None, None)

Pay attention to the install_datapath_flow() functions. If we start from the bottom, you'll see that the else statement is a lot larger. Broadcast/multicast packets get flooded, but unknown packets also install a flow (at default priority+1) so that any packets from this unknown host come to the controller. This is matched by a delete_datapath_flow() call further up the function, so that when a new flow is installed, it tries to delete any flows that match the source address.

How does it perform? Each new flow sends roughly 3 packets to the controller (the first unknown, and a couple because of our source-match flow - it doesn't get deleted before the next queued packet comes through), but we get our O(N) amount of flows in the table. If we look at our ARP + UDP + TCP example from before, it performs way better - for 6 flows the controller gets 6 packets, but for our 2 flows the controller also gets 6 packets. This means it uses the controller as much as the old, specific pyswitch, but uses a fraction of the flows.

OFPP_FLOOD vs OFPP_ALL

One extra note for those of you who haven't spotted it - I've changed the action from OFPP_FLOOD to OFPP_ALL. The Pronto 3290 we have at work has always responded to FLOOD messages weirdly - it looks like it sets up individual flows for each active port, and after trolling through the OpenFlow spec I've figured out why:

OpenFlow-only switches support only the required actions below, while OpenFlow-
enabled switches, routers, and access points may also support the NORMAL
action. Either type of switch can also support the FLOOD action.
Required Action: Forward. OpenFlow switches must support forwarding
the packet to physical ports and the following virtual ones:
• ALL: Send the packet out all interfaces, not including the incoming in-
terface.
• CONTROLLER: Encapsulate and send the packet to the controller.
• LOCAL: Send the packet to the switchs local networking stack.
• TABLE: Perform actions in flow table. Only for packet-out messages.
• IN PORT: Send the packet out the input port.
Optional Action: Forward. The switch may optionally support the following
virtual ports:
• NORMAL: Process the packet using the traditional forwarding path
supported by the switch (i.e., traditional L2, VLAN, and L3 processing.)
The switch may check the VLAN field to determine whether or not to
forward the packet along the normal processing route. If the switch can-
not forward entries for the OpenFlow-specific VLAN back to the normal
processing route, it must indicate that it does not support this action.
• FLOOD: Flood the packet along the minimum spanning tree, not includ-
ing the incoming interface.

See the difference? FLOOD is an optional action, that activates any spanning-tree code in the switch. It's not as intensive as NORMAL (which only true hybrid switches will support), but it isn't what pyswitch is supposed to do. Changing the code to use OFPP_ALL instead of OFPP_FLOOD seems to make the switch work less on each packet that comes back from the controller - and this means the controller can handle even more flows per second!

Here's a code dump of my latest version, I may polish it and send it back to the NOX dudes later if I get the time:

# Copyright 2008 (C) Nicira, Inc.
# This file is part of NOX. Additions from Sam Russell for
# compatibility with OVS on Pronto 3920
# NOX is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# NOX is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with NOX.  If not, see <http://www.gnu.org/licenses/>.
# Python L2 learning switch 
#
# ----------------------------------------------------------------------
#
# This app functions as the control logic of an L2 learning switch for
# all switches in the network. On each new switch join, it creates 
# an L2 MAC cache for that switch. 
#
# In addition to learning, flows are set up in the switch for learned
# destination MAC addresses.  Therefore, in the absence of flow-timeout,
# pyswitch should only see one packet per flow (where flows are
# considered to be unidirectional)
#

from nox.lib.core     import *

from nox.lib.packet.ethernet     import ethernet
from nox.lib.packet.packet_utils import mac_to_str, mac_to_int

from twisted.python import log

import logging
from time import time
from socket import htons
from struct import unpack

logger = logging.getLogger('nox.coreapps.examples.pyswitch')

# Global pyswitch instance 
inst = None

# Timeout for cached MAC entries
CACHE_TIMEOUT = 5

# Modified extract_flow except just dest address - another sam edit
def create_l2_out_flow(ethernet):
    attrs = {}
    attrs[core.DL_DST] = ethernet.dst
#    attrs[core.DL_SRC] = ethernet.src
    return attrs

# --
# Given a packet, learn the source and peg to a switch/inport 
# --
def do_l2_learning(dpid, inport, packet):
    global inst 
    logger.info('learning MAC for incoming packet...' + mac_to_str(packet.src))
    # learn MAC on incoming port
    srcaddr = packet.src.tostring()
    if ord(srcaddr[0]) & 1:
        log.msg('MAC is null', system='pyswitch')
        logger.info('MAC is null')
        return
    if inst.st[dpid].has_key(srcaddr):
        dst = inst.st[dpid][srcaddr]
        if dst[0] != inport:
            log.msg('MAC has moved from '+str(src)+'to'+str(inport), system='pyswitch')
            logger.info('MAC has moved from '+str(src)+'to'+str(inport))
        else:
            return
    else:
        logger.info('learned MAC '+mac_to_str(packet.src)+' on %d %d'% (dpid,inport))

    # learn or update timestamp of entry
    inst.st[dpid][srcaddr] = (inport, time(), packet)

    # Replace any old entry for (switch,mac).
    mac = mac_to_int(packet.src)

# --
# If we've learned the destination MAC set up a flow and
# send only out of its inport.  Else, flood.
# --
def forward_l2_packet(dpid, inport, packet, buf, bufid):    
    dstaddr = packet.dst.tostring()
    if not ord(dstaddr[0]) & 1 and inst.st[dpid].has_key(dstaddr):
        prt = inst.st[dpid][dstaddr]
        if  prt[0] == inport:
            log.err('**warning** learned port = inport', system="pyswitch")
            logger.info('**warning** learned port = inport')
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_ALL, inport)
        else:
            # We know the outport, set up a flow
            log.msg('installing flow for ' + mac_to_str(packet.dst), system="pyswitch")
            logger.info('installing flow for ' + mac_to_str(packet.dst))
            # delete src flow if exists
            delflow = {}
            delflow[core.DL_SRC] = packet.dst
            inst.delete_datapath_flow(dpid, delflow)
            # sam edit - just load dest address, the rest doesn't matter
            flow = create_l2_out_flow(packet)
            actions = [[openflow.OFPAT_OUTPUT, [0, prt[0]]]]
            inst.install_datapath_flow(dpid, flow, CACHE_TIMEOUT, 
                                       openflow.OFP_FLOW_PERMANENT, actions,
                                       bufid, openflow.OFP_DEFAULT_PRIORITY,
                                       inport, buf)
    else:    
        # haven't learned destination MAC. Flood 
        if ord(dstaddr[0]) & 1:
            logger.info('broadcast/multicast packet to ' + mac_to_str(packet.dst) + ', flooding')
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_ALL, inport)
        else:
            logger.info('no MAC known for ' + mac_to_str(packet.dst) + ', flooding')
            # set up flow to capture source packet
            flow = {}
            flow[core.DL_SRC] = packet.dst
            actions = [[openflow.OFPAT_OUTPUT, [65535, openflow.OFPP_CONTROLLER]]]
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_ALL, inport)
            inst.install_datapath_flow(dpid, flow, CACHE_TIMEOUT,
                                       1, actions,
                                       None, openflow.OFP_DEFAULT_PRIORITY+1,
                                       None, None)
        
# --
# Responsible for timing out cache entries.
# Is called every 1 second.
# --
def timer_callback():
    global inst

    curtime  = time()
    for dpid in inst.st.keys():
        for entry in inst.st[dpid].keys():
            if (curtime - inst.st[dpid][entry][1]) > CACHE_TIMEOUT:
                log.msg('timing out entry'+mac_to_str(entry)+str(inst.st[dpid][entry])+' on switch %x' % dpid, system='pyswitch')
                inst.st[dpid].pop(entry)

    inst.post_callback(1, timer_callback)
    return True

def datapath_leave_callback(dpid):
    logger.info('Switch %x has left the network' % dpid)
    if inst.st.has_key(dpid):
        del inst.st[dpid]

def datapath_join_callback(dpid, stats):
    logger.info('Switch %x has joined the network' % dpid)

# --
# Packet entry method.
# Drop LLDP packets (or we get confused) and attempt learning and
# forwarding
# --
def packet_in_callback(dpid, inport, reason, len, bufid, packet):

    if not packet.parsed:
        log.msg('Ignoring incomplete packet',system='pyswitch')
        
    if not inst.st.has_key(dpid):
        log.msg('registering new switch %x' % dpid,system='pyswitch')
        inst.st[dpid] = {}

    # don't forward lldp packets    
    if packet.type == ethernet.LLDP_TYPE:
        return CONTINUE

    # learn MAC on incoming port
    do_l2_learning(dpid, inport, packet)

    forward_l2_packet(dpid, inport, packet, packet.arr, bufid)

    return CONTINUE

class pyswitch(Component):

    def __init__(self, ctxt):
        global inst
        Component.__init__(self, ctxt)
        self.st = {}

        inst = self

    def install(self):
        inst.register_for_packet_in(packet_in_callback)
        inst.register_for_datapath_leave(datapath_leave_callback)
        inst.register_for_datapath_join(datapath_join_callback)
        inst.post_callback(1, timer_callback)

    def getInterface(self):
        return str(pyswitch)

def getFactory():
    class Factory:
        def instance(self, ctxt):
            return pyswitch(ctxt)

    return Factory()