Sunday, June 2, 2013

Brocade and Juniper Interop - OSPF, MPLS, VLL/VPLS, and VRF interconnects

From the start

We run a mix of Brocade MLX and Juniper MX80's at work, and I've spent the last week trying to make them talk to each other properly. You'd think that by 2013, a multi-vendor network would work fine using standardised protocols, but it's still quite time consuming finding which ways work and which ways most certainly don't. Oddly enough, I'm not the only person who's been working on this recently - Nick Buraglio has done a bit in the last couple of weeks too (thanks for the help on this)

As SDN starts to take over, this will become much less of a problem, but until then, here's how to do the following with Juniper MX-series routers and Brocade MLX/XMR chassis:
  • Jumbo frames
  • OSPF
  • MPLS
  • VPLS
  • VLL/l2circuit
  • Tagging VPLS/VLL/l2circuit into a VRF on Juniper
Disclaimer - the MX80 chassis has lots of stuff built in, including a tunnel services PIC - we need this for some of the stuff below, not sure how it works with bigger chassis.

Jumbo frames

This should be easy, but there's a couple of things that will trip you up if you aren't careful. The maximum frame size you can have on the Brocades is 9216 bytes, and on the Junipers it's 9192 bytes. I tried to set the frame size on the Brocades down to 9192 bytes and found a weird quirk - I could send 9146 byte pings from the Juniper, but the Brocade would only respond to 9142 byte pings - it appears the Brocades include the FCS in their count of frame size.

In the end, the best solution was to just leave both routers at their maximum values, set VPLS/VLL MTUs to 9100 bytes (or some number with a bit of headroom over 9000), and IP MTUs to 9000 (except for OSPF interfaces, but we'll get to that soon).

The main thing to remember is that on the Junipers, you can set MTU on the physical interface, or inside a "family XXX" stanza, but not directly on a logical interface. If you want to set the IP MTU for a logical interface, it sits in "interface XXX -> unit Y -> family inet -> mtu 9000"

OSPF

You *can* stand up OSPF with Jumbo frames, but it's fine with 1500 byte frames. We're in the position of introducing Junipers into our Brocade OSPF cloud, and since the defaults for Brocade are already 1500 bytes, it's easier to step the Junipers down than bring the Brocades up. I've set up the Brocade-facing interface like this:

interfaces {
    ge-1/0/0 {
        vlan-tagging;
        mtu 9192;
        unit 1000 {
            vlan-id 1000;
            family inet {
                mtu 1500
                address 10.1.2.1/30;
            }
            family mpls;
        }
    }
}

The Brocade end looks like this:

interface ve 100
 bfd interval 100 min-rx 100 multiplier 3
 ip ospf area 0
 ip ospf cost 100
 ip ospf dead-interval 40
 ip ospf hello-interval 10
 ip address 10.2.3.2/30
 ip mtu 1500
!

The only tricky bit is making sure the IP MTU is the same for each end - if you get a huge route update going into an interface that can't take the whole packet then you'll end up blackholing routes. Junipers are supposed to not stand up OSPF when there's an IP MTU mismatch, but it doesn't always work - it pays to test with ping packets to confirm - you should be able to ping in either direction with a 1472 bytes of data (1500 - 20 byte IP header - 8 byte ICMP header).

MPLS

This is pretty straightforward - set up loopback interfaces on each end, and enable RSVP and LDP. We'll use RSVP for the outer tags, and LDP for the inner tags - no clever BGP signalling here.

Juniper:

protocols {
    rsvp {
        interface all {
            disable;
        }
        interface ge-1/0/0.1000;
        interface lo0.0;
    }
    mpls {                              
        label-switched-path 1-to-2 {
            from 10.0.0.1;
            to 10.0.0.2;
            fast-reroute;
        }
        label-switched-path 1-to-3 {
            from 10.0.0.1;
            to 10.0.0.3;
            fast-reroute;
        }
        interface all {
            disable;
        }
        interface ge-1/0/0.1000;
    }
    ospf {
        traffic-engineering;
        area 0.0.0.0 {
            interface all {
                disable;
            }
            interface ge-1/0/0.1000 {
                hello-interval 3;       
                dead-interval 12;
                bfd-liveness-detection {
                    minimum-interval 300;
                    multiplier 3;
                }
            }
            interface lo0.0 {
                passive;
            }
        }
    }
    ldp {
        interface all {
            disable;
        }
        interface lo0.0;
    }
}

Brocade:

router mpls
 policy
  traffic-eng ospf


 mpls-interface ve100


 path 3-to-1
  loose 10.0.0.1                                                  

 path 3-to-2
  loose 10.0.0.2

 path S3-to-1
  loose 10.0.0.1

 path S3-to-2
  loose 10.0.0.2


 lsp LSP-3-to-1
  to 10.0.0.1
  primary 3-to-1
  secondary S3-to-1
    standby
  frr
  revert-timer 30
  enable

 lsp LSP-3-to-2
  to 10.0.0.2
  primary 3-to-2                                                  
  secondary S3-to-2
    standby
  frr
  revert-timer 30
  enable

VPLS

This is where it gets interesting. In my opinion, Brocade does the right thing (packets come out of a VLAN-tagged "pipe", and then go into a VPLS "pipe"), whereas Juniper does it at a lower and less-abstract level (packets with headers that get altered) - it's more flexible, but it's harder to make it do the right thing.

Raw mode, tagged mode, and tags inside raw mode

VPLS has two modes - Raw mode creates a broadcast domain between all peers on the same VPLS, whereas tagged mode allows you to use inner tags within a VPLS circuit. The problem comes where Juniper sends 802.1q VLAN-tagged packets through a raw-mode VPLS. You end up left with a situation where traffic can go one way, but not the other, and it's all quite confusing.

Raw mode interop

Check out these configs:

On Juniper, we make a routing instance, and add interfaces into it. They can be VLAN-tagged or normal access ports, and there's a very important trick to make it all use raw mode properly - the line "vlan-id none". If you don't do this, packets on an untagged port go through fine, but packets from tagged ports come through with 802.1q VLAN tags on them. On Brocade, a VPLS get delivered to a mix of tagged and untagged ports, but all traffic is sent as normal untagged ethernet. The "vlan-id none" line makes the Junipers behave in the same way. The config below delivers VPLS 40 untagged at both ends, and VPLS 140 tagged as VLAN 140.

Don't worry too much about the MTUs - they need to match up, but they don't appear to be enforced. We picked 9100 as it's well under the 9192 byte hardware MTU, but well above the 9000 byte IP MTU - a bit of leeway in each direction.

Juniper:

routing-instances {
    vpls-40 {
        description vpls-40;
        instance-type vpls;
        vlan-id none;
        interface ge-1/1/9.40;           
        protocols {
            vpls {
                no-tunnel-services;
                vpls-id 40;
                mtu 9100;
                neighbor 10.0.0.2;
                neighbor 10.0.0.3;
            }
        }
    }
    vpls-140 {
        description vpls-140;
        instance-type vpls;
        vlan-id none;
        interface ge-1/1/9.140;           
        protocols {
            vpls {
                no-tunnel-services;
                vpls-id 140;
                mtu 9100;
                neighbor 10.0.0.2;
                neighbor 10.0.0.3;
            }
        }
    }
}
interfaces {
    ge-1/1/9 {
        flexible-vlan-tagging;
        native-vlan-id 40;
        mtu 9192;
        encapsulation flexible-ethernet-services;
        unit 40 {
            encapsulation vlan-vpls;
            vlan-id 40;
            family vpls;
        }
        unit 140 {
            encapsulation vlan-vpls;
            vlan-id 140;
            family vpls;
        }
    }
}

Brocade:

router mpls
 vpls vlan40 40 
  vpls-peer 10.0.0.1 10.0.0.2
  vpls-mtu 9100
  vlan 40
   untagged ethe 1/5 


 vpls vlan40 40 
  vpls-peer 10.0.0.1 10.0.0.2
  vpls-mtu 9100
  vlan 140
   tagged ethe 1/5 

If you're interested in the mechanics behind the Juniper implementation, the "show interface" command gives you a bit of insight - Juniper interprets the "vlan-id none" line in the routing instance and converts that to tag push/pop operations on the interface:

  Logical interface ge-1/1/9.40 (Index 332) (SNMP ifIndex 564) 
    Flags: SNMP-Traps 0x0
    VLAN-Tag [ 0x8100.40 ] Native-vlan-id: 40 In(pop) Out(push 0x8100.40) 
    Encapsulation: VLAN-VPLS
    Input packets : 9 
    Output packets: 7
    Protocol vpls, MTU: 9192            
      Flags: Is-Primary

  Logical interface ge-1/1/9.140 (Index 333) (SNMP ifIndex 597) 
    Flags: SNMP-Traps 0x0 VLAN-Tag [ 0x8100.140 ] In(pop) Out(push 0x8100.140) 
    Encapsulation: VLAN-VPLS
    Input packets : 149 
    Output packets: 92
    Protocol vpls, MTU: 9192
      Flags: Is-Primary

VLL/l2circuit

This is the fun one. Tagged-mode VLLs are the easiest to get up and running, but raw-mode should be doable too. There is one problem though - the only way I can make raw-mode work on the Junipers looks like a filthy hack, but it produces the same results that we see above in the "show interface" output for VPLS.

First off, configs for tagged mode

Brocade:

router mpls
 vll vlan42 42
  vll-mtu 9100
  vll-peer 10.0.0.1
  vlan 42
   tagged e 1/5

Juniper:

interfaces {
    ge-1/1/9 {
        flexible-vlan-tagging;
        mtu 9192;
        encapsulation flexible-ethernet-services;
        unit 42 {
            encapsulation vlan-ccc;
            vlan-id 42;                     
            family ccc;
        }
    }
}

protocols {
    l2circuit {
        neighbor 10.0.0.3 {
            interface ge-1/1/9.42 {
                virtual-circuit-id 42;
                mtu 9100;
                encapsulation-type ethernet-vlan;
            }
        }
    }
}

This is all pretty straightforward, and works out of the box. Here's what the raw mode config looks like

Brocade:

router mpls
 vll vlan41 41 raw-mode
  vll-mtu 9100
  vll-peer 10.0.0.1
  vlan 41                                                         
   tagged e 1/5

Juniper:

interfaces {
    ge-1/1/9 {
        flexible-vlan-tagging;
        mtu 9192;
        encapsulation flexible-ethernet-services;
        unit 41 {
            encapsulation vlan-ccc;
            vlan-id 41;
            input-vlan-map pop;
            output-vlan-map push;
            family ccc;
        }
    }
}
protocols {
    l2circuit {
        neighbor 10.0.0.3 {
            interface ge-1/1/9.41 {
                virtual-circuit-id 41;
                mtu 9100;
                encapsulation-type ethernet;
            }
        }
    }
}

As you can see, the default for Brocade is tagged mode, so we need to explicitly put it in raw mode. On the Juniper end, we set the encapsulation type on the VLL to ethernet instead of ethernet-vlan, but this only works with encapsulation ethernet-ccc on the physical interface. If we have VLAN tagging mode enabled, there doesn't seem to be any way to tell the MX80 about this. The way I've made this work is with the "input-vlan-map" and "output-vlan-map" statements - they seem to round everything out and make it all work. Given the default for both Juniper and Brocade is tagged mode, and we need a bit of mad hax to make raw mod work here, it might make sense to use tagged mode.

Tagging circuits to VRFs (Juniper only)

This was my favourite part. The way to do this seems to be lt- devices, which means you need to set up the tunnel services PIC (if you have one - the MX80's have one built in).

chassis {
    fpc 0 {
        pic 0 {
            tunnel-services {
                bandwidth 1g;
            }
        }
    }
}

This next part took a while to figure out, but it totally works - you just need to make sure you match up the encapsulations and it all works fine.

Here's some config:

routing-instances {
    vrf {
        instance-type vrf;
        interface lt-0/0/10.1;
        route-distinguisher 1:2;
        vrf-target target:1:2;
        vrf-table-label;
    }
    vrf-43 {
        instance-type vrf;
        interface lt-0/0/10.3;
        route-distinguisher 1:42;
        vrf-target target:1:42;
        vrf-table-label;
    }
    vpls-40 {
        description vpls-40;
        instance-type vpls;
        vlan-id none;
        interface lt-0/0/10.2;           
        protocols {
            vpls {
                no-tunnel-services;
                vpls-id 40;
                mtu 9100;
                neighbor 10.0.0.2;
                neighbor 10.0.0.3;
            }
        }
    }
}
protcols {
    l2circuit {
        neighbor 10.0.0.2 {
            interface lt-0/0/10.4 {
                virtual-circuit-id 43;
                mtu 9100;
                encapsulation ethernet-vlan;
            }
        }
    }
}
interfaces {
    lt-0/0/10 {
        mtu 9192;
        unit 1 {
            encapsulation ethernet;
            peer-unit 2;
            family inet {
                mtu 9000;
                address 192.168.0.13/24;
            }
        }
        unit 2 {
            encapsulation ethernet-vpls;
            peer-unit 1;
        }
        unit 3 {
            encapsulation ethernet;
            peer-unit 4;
            family inet {
                mtu 9000;
                address 192.168.43.13/24;
            }
        }
        unit 4 {
            encapsulation ethernet-ccc;
            peer-unit 3;
        }
    }
}

This took about a day to get working, but it's totally simple once it matches up. The lt- devices are instead of crossover cables hanging out the back of your router, and the key is to just set the encapsulation types correctly. You can cheat a little with irb interfaces in VPLS routing instances, but this alters the route table directly on the chassis. Doing it this way means it's locked down to a VRF, and everything is a bit nicer.

End

I hope you've enjoyed this - let me know if there's anything I've missed or got wrong. The moral of the story here is - Juniper and Brocade can do VPLS and VLLs fine between each other - just watch out for the little quirks that would trip you up.

Monday, February 18, 2013

SamShares - Parsing financial data out of annual report PDFs

What's up

I've been doing a lot of financial research, and a big chunk of that is looking through financial reports, manually copying the fields for assets, liabilities, equity, EBIT etc. It's boring as hell, and takes a long time. Why can't we automate this?

Parsing PDFs

I started by forking PyPDF2 to give me better access to the underlying objects. It's a fairly good start for working with PDFs, but just blurts out (some of) the text in a random order, which isn't what I want. This lead me down a bit of a rabbit hole and lead to me downloading a copy of the PDF 1.7 reference and browsing through this, sections 5.2 and 5.3 in particular

What's the plan?

  • Find the pages with assets/liabilites and income
  • Render them such that it's obvious where the columns and rows line up
  • Convert this to a spreadsheet
  • ???
  • PROFIT
For example, above is a screenshot from the annual report of New Zealand's largest NZX company, Fletcher Building. The PDF displays like lovely rows and columns, but can't be easily accessed in this way. If we can parse the PDF and render all the text in place, we can then make fairly accurate guesses at which rows and columns the values fall into.

Quick primer to text in PDF

Here are some of the operators you'll find for manipulating text in a PDF

BT, ET - Start and end a text object. This initialises the text matrix to the identify matrix - i.e. positioned at the top left of the document
Td, TD, T* - Operators to move the cursor to the next line
TM - Sets the text matrix. This is an affine transform, with 6 parameters - the first 4 matter for manipulating the text itself (scaling, warping, italics), and the last two essentially just set the start point for the text. This is enough for us to cheat and guess which way the text will go
Tc, Tw, Tf and lots more - Spacing and font settings

Tj, TJ - Display a text string - Tj does this simply, TJ has options after each character/substring for spacing information

Putting it all together

To parse a table out of a PDF, here's the rough idea:
  1. Locate all the strings on a page (BT/ET and TJ/Tj operators)
  2. Create a structure which ties the strings to locations (probably just Tm)
  3. Assign values row and column IDs
Once this is done, just check what is at the leftmost and topmost of each table, and use these as keys to the data. For the above image, the field "total assets" lined up with "June 2012" gives two results, so these just need to be referenced to the headers at the top, OR we can cheat and use the leftmost as this is generally the convention.

Next steps

Assuming I can make all this work, the data will then just be stored in a DB of some sort, keyed by year and company. Once this is automated enough to just pull PDFs out of NZX announcements, it'll be left in the background accumulating data, eventually building a corpus of financial data from NZX companies that can be used to make financial analysis much, much quicker and more versatile than it currently is.


Tuesday, February 12, 2013

OpenFlow 1.0 support on Juniper MX240 with JunOS 12.3

Juniper have added OpenFlow to JunOS 12.3

Do you have a spare MX240 lying around? Chuck a copy of JunOS 12.3 on it and you can get Openflow 1.0 up and running and have a play.

Details

  • Fairly full OF1.0 implementation. I don't have a spare MX240 to test, but it would appear that everything is handled in hardware (not sure how Junipers could do otherwise tbh)
  • Supports multiple VLANs - if these can be turned on and off from the controller then this would be awesome (let me know if you find this out)
  • Doesn't handle buffered packets - make sure your controller can handle OFPT_PACKET_IN messages that don't send a buffer ID (current version of POX doesn't do this?, but the betta branch does)
  • Doesn't handle TLS connectivity to the controller - not the end of the world, but I'm curious as to why this was done
  • Doesn't do anything related to STP... who cares?
  • Only supports MX240s...
This looks like a great start, well done Juniper! Here's my list of requests for the next iteration:
  • Support more than one device :) MX80's would be great, also looking to see what the EX series implementation looks like
  • Buffered packets! Everyone else does this, and it greatly speeds up the flows-per-second bottleneck between the switch and controller
That's pretty much all from me. OF1.1 support (or 1.3 as this is where everyone is going) would be awesome so we can drive MPLS, but other than that, this is fantastic news.

Update

It looks like it's not quite ready for RouteFlow - Joe Stringer pointed this out in the notes:

• If the controller pushes a flow with a set source MAC address action, the router cannot
   program the corresponding filter term. However, CLI show commands still display the
  flow with the associated action, and the device sends an OFPET_FLOW_MOD_FAILED
 error message with an OFPMFC_UNSUPPORTED code to the controller. [PR 838699]
• If the controller pushes a flow with a set destination MAC address action, the router
   cannot program the corresponding filter term. However, CLI show commands still
  display the flow with the associated action, and the device sends an
 OFPET_FLOW_MOD_FAILED error message with an OFPMFC_UNSUPPORTED code
to the controller. [PR 838709]
• If a flow contains a set IP source address action or a set IP destination address action,
   the device rejects the flow and sends an OFPET_FLOW_MOD_FAILED error m

In other words, no MAC/IP address rewrites = no routing :(

Disclaimer

I've been told that this info and the linked documents are public... If Juniper isn't happy with this, please get in touch and I'll fix it.

Friday, January 25, 2013

Thimble - Secure, high-speed connectivity with OpenFlow & Science DMZ

I've been busy

Last week I had the pleasure of spending a week in Honolulu at TIP2013, going to workshops, watching presentations, and socialising with some of the most talented network engineers in the world, and I felt incredibly fortunate to do so. I also had the opportunity to present some of my OpenFlow work with them, and was amazed at the feedback that I got from everybody. The recording of my presentation is now available online. I gave the same presentation this week at NZNOG, and I'll link to that when the on-demand video is available.

Thimble

For scientists to move big data in reasonable time frames, they need to have data transfer nodes outside of their campus firewalls, inside a Science DMZ. This is a proven way to optimise file transfers, but exposes your file transfer nodes to the whole internet. Thimble is a way of using OpenFlow to easily program ACLs into your edge switch, and I've covered it briefly here and here. The variation that I presented at TIP2013 and NZNOG positions the Science DMZ between an edge router and campus firewall, allowing a subset of routes to be sent to the OpenFlow switch. This means we only need to send experimental traffic to the Thimble, allowing implementers to test this without risking production traffic.

Clever stuff

You can federate a few Thimbles with a single controller, allowing a file transfer logged on the web application to trigger multiple switches around the world to be reconfigured. We're not just tied to a web interface either - there's nothing to stop us implementing uPnP at the edge and letting file transfer programs communicate with the network to arrange ACLs. The end goal is to take existing concepts and apply modern software design principles, allowing us to do things easily that used to be out of the question - network doesn't need to be this hard.

If you're interested in building a Thimble, I'd love to hear from you - leave a comment or email me and I'll be more than happy to hear your stories and help wherever I can.

RouteFlow in New Zealand

We've had some cool demos up and running this week - a distributed router was deployed at WIX and another data centre here in Wellington, and a RouteFlow deployment powered one of the internet feeds for the NZNOG conference this week. We had a screen up showing the flows present in the switch as people joined and left the wifi, and counters for data plane and control plane traffic. It was a nice visual demo - we could connect a cell phone, show the MAC and IP addresses in the new flow, and start a download to show the data plane traffic going up without changing the control plane traffic - just to prove that we don't need to send every packet via the controller!

What's coming in 2013

We're off to a really good start with OpenFlow this year, and I reckon the killer app is a year or two off at tops. Here's what I want to see happen:
  • Distributed routers - a mesh of 5-10, all controlled centrally
  • MPLS!!! OpenVSwitch now does OpenFlow1.2, and the Pica8 switches have implemented this, so there's no excuse for us not having an OpenFlow replacement for LDP/RSVP
  • Breaking OSI. Flows can match on any combination of ethernet, IP, and TCP/UDP, so I want to see more clever stuff happening with that. OpenFlow will let you do longest-prefix-matching on MAC addresses if you want, or route based on TCP/UDP ports, or some other weird combination. You'd want to find a way to do this that would be valuable, but people just need to jump in and see what new ideas they can come up with.
  • ???? - you tell me what you'd like to see in OpenFlow this year.

Friday, September 7, 2012

Tunneling traffic through your OpenFlow controller - Building a POX-based OpenFlow router

Why would you do this?

If we want to make an OpenFlow router, we need to be able to communicate with other non-OpenFlow routers. Normally, you would assign an IP address to your router, turn on BGP/OSPF, and then configure these protocols to talk to other routers using this IP address. With OpenFlow, the controller has the brains, but no obvious way to talk to other network devices. If only we could pretend that the controller was in the router somehow...

Can't we just look at the OpenFlow messages?

Sure, and we looked at this last week, but it's clumsy and means we need to reinvent the wheel to make software routers talk to POX. RouteFlow abstracts this by loading software routers in virtual machines, last week's demonstration hardcodes everything into the controller, but tunnelling gives us a middle-of-the-road solution: no virtual machines needed, but we can still bind stuff to a network interface on the controller and let the linux network stack handle already-solved problems like TCP and the like.

Building a tunnel

Linux has a fantastic tool called TUN/TAP, which lets you create virtual network interfaces. One end talks to the Linux network stack and lets any application use it, and the other end talks to our program. In the spirit of keeping things modular, and minimising opportunities for me to write bad code, I've used the PyTap library to set this up. PyTap has a PIP package, which means we can easily add it to a virtualenv and continue to keep everything self-contained.

Protip: TUN interfaces take IP packets, TAP interfaces take Ethernet packets

If you haven't used virtualenvs, here's the basic idea:

virtualenv tundemo
cd tundemo
source bin/activate
pip install pytap
git clone http://github.com/noxrepo/pox

This will set you up with a virtualenv that has POX and PyTap ready to go. Despite being in a virtualenv, PyTap still needs root privileges, so you'll need to be root before source'ing into your virtualenv to make this work. If anyone can show me how to make this work without root privileges I'll be happy to hear (presumably some trickery with the /dev/net/tun device)

As with my other modules, I've hacked code into a copy of forwarding.l2_learning - this time I've renamed it to tundemo, and changed the name of the class all through the source.

Here are all my imports, add these at the top:

from pytun import TunTapDevice, IFF_TAP
from pox.lib.addresses import *
from pox.lib.packet import *
from threading import Thread
import subprocess

In the __init__() function, I've put the following code to make the TAP device:

    # Our table
    self.macToPort = {}
    
    # TAP device
    self.tap = TunTapDevice(flags=IFF_TAP)
    self.tap.addr = '10.1.1.13'
    self.tap.netmask = '255.255.255.0'
    self.tap.mtu=1300
    print "hwaddr for " + self.tap.name + ": " + str(EthAddr(self.tap.hwaddr))
    
    # Bring tap interface up
    subprocess.check_call("ifconfig " + self.tap.name + " up", shell=True)

PyTap chooses a random MAC address when it creates the interface, so printing it out lets us debug things a bit easier.

Tunneling fron TAP to switch

Once we have our TAP interface up, we need to handle packets that we receive on it. Let's set up a thread to handle this

# Create thread to read from tap and send to switch
    self.th = Thread(target=handle_tap_in, args=(self))
    self.th.daemon = True
    self.th.start()

    # Set max packet size to 1400 bytes
    self.connection.send(of.ofp_set_config(miss_send_len=1400))

Our handler function is fairly straightforward

def handle_tap_in(switch):
  while True:
    packettap = switch.tap.read(switch.tap.mtu+24)
    print "Packet read from tap"
    e = ethernet()
    e.parse(packettap[4:])
    
    port = of.OFPP_ALL
    if e.dst in switch.macToPort:
        port = switch.macToPort[e.dst]
    
    msg = of.ofp_packet_out()
    msg.data = packettap[4:]
    msg.actions.append(of.ofp_action_output(port =
                                          port))
    switch.connection.send(msg)

This will send all packets that come up on the tap0 interface to the switch, and either floods them or sends them on the right port, depending on what MAC addresses we've already learned.

Tunneling from switch to TAP

We already get sent packets from the switch by default, and these go to the _handle_PacketIn() function. We just need to get the raw data out and send this to the TAP interface

My switch always sends VLAN-tagged packets, so if yours doesn't then you'll want to change this a bit. Here is the SendToTap() function:

def SendToTap():
     # remove vlan header and rebuild
      print "Forwarding packet"
      v = packet.next
      i = v.next
      eth = ethernet(src=packet.src, dst=packet.dst, type=v.eth_type)
      print type(i)
      eth.set_payload(i)
      # first 4 bytes are 00 00 08 00 (null short, then IPv4 ethertype)
      totap = struct.pack('!bbbb', 0, 0, 8, 0) + eth.pack()
      #print totap.encode('hex')
      self.tap.write(totap)

And we call this when a packet comes to us with a multicast MAC or our MAC:

if packet.dst == EthAddr(self.tap.hwaddr):
      print "Packet for us!"
      SendToTap()
      return

if packet.dst.isMulticast():
      SendToTap()
      flood() # 3a

Now the tunnel is all good to go. Just make sure any devices plugged into the switch have an MTU of 1300, and you can talk to the controller, transfer files off with SCP (30 minutes to copy an Ubuntu ISO at around 4Mb/s)

A couple of hiccups


Packet sizes

My switch doesn't seem to handle having the packet-size value changed. POX by default tells the switch to send the first 128 bytes of packets, and while we can send messages to increase this, they're ignored. The work-around is to change DEFAULT_MISS_SEND_LEN to 1400 in pox/openflow/libopenflow01.py

Jitter

Latency varies from 1ms to 50ms, and TCP really, really doesn't like this. UDP routing protocols like OSPF shouldn't notice this, and even TCP-based routing protocols like BGP should be fine - but TCP gets really confused and this means you shouldn't expect any large data flows to work well with this.

MTU sizes

This stuff confuses me. I'm a network engineer, and I'm supposed to know this stuff, but I don't. When we read from the TAP device, we read the MTU + 24 bytes. There's 14 bytes for the Ethernet header, 4 bytes for the TAP header, and another 6 bytes in there for no obvious reason. 24 bytes just seems to work, and I have no idea why.

TAP device

Two things bug me about this - there doesn't seem to be a nice way to bring it up (apart from using ifconfig), and you need root to create it in the first place - I'd want to fix both of these for a nicer solution

Next steps

  • TAP devices could be created for each physical port on an OpenFlow device, or as routed interfaces for each VLAN - limitless opportunities here
  • BIRD or Quagga could bind to a TAP device, and the controller could turn routes into flows. BIRD has a python interface, but since both use standard routing protocols, you could easily sniff the traffic and build routing tables out of these. Sniffing BGP updates is still way easier than trying to build a Python TCP stack
  • VRFs? Traffic injection? Just another example of how easy it is to grab POX and do novel things with inexpensive hardware