Wednesday, March 14, 2012

Pyswitch bugfix, and DoS vulnerability in open vSwitch

Pyswitch
I had a bit of time to work on Pyswitch today, and I've cut it back so that it only sets the destination MAC and out port, and that was enough for it to start setting flows properly. You can look at the source if you like, or just focus on the part I've changed:

The function I've modified is forward_l2_packet - as the name suggests, it either floods all ports with the packet it has received, or sends the packet out the correct port and installs a flow in the switch. Here is the function:


def forward_l2_packet(dpid, inport, packet, buf, bufid):    
    dstaddr = packet.dst.tostring()
    if not ord(dstaddr[0]) & 1 and inst.st[dpid].has_key(dstaddr):
        prt = inst.st[dpid][dstaddr]
        if  prt[0] == inport:
            log.err('**warning** learned port = inport', system="pyswitch")
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_FLOOD, inport)
        else:
            # We know the outport, set up a flow
            log.msg('installing flow for ' + str(packet), system="pyswitch")
            flow = extract_flow(packet)
            flow[core.IN_PORT] = inport
            actions = [[openflow.OFPAT_OUTPUT, [0, prt[0]]]]
            inst.install_datapath_flow(dpid, flow, CACHE_TIMEOUT, 
                                       openflow.OFP_FLOW_PERMANENT, actions,
                                       bufid, openflow.OFP_DEFAULT_PRIORITY,
                                       inport, buf)
    else:    
        # haven't learned destination MAC. Flood 
        inst.send_openflow(dpid, bufid, buf, openflow.OFPP_FLOOD, inport)

The key to creating t flow is the extract_flow function from util.py


def extract_flow(ethernet):
    """
    Extracts and returns flow attributes from the given 'ethernet' packet.
    The caller is responsible for setting IN_PORT itself.
    """
    attrs = {}
    attrs[core.DL_SRC] = ethernet.src
    attrs[core.DL_DST] = ethernet.dst
    attrs[core.DL_TYPE] = ethernet.type
    p = ethernet.next


    if isinstance(p, vlan):
        attrs[core.DL_VLAN] = p.id
        attrs[core.DL_VLAN_PCP] = p.pcp
        p = p.next
    else:
        attrs[core.DL_VLAN] = 0xffff # XXX should be written OFP_VLAN_NONE
        attrs[core.DL_VLAN_PCP] = 0


    if isinstance(p, ipv4):
        attrs[core.NW_SRC] = p.srcip
        attrs[core.NW_DST] = p.dstip
        attrs[core.NW_PROTO] = p.protocol
        p = p.next


        if isinstance(p, udp) or isinstance(p, tcp):
            attrs[core.TP_SRC] = p.srcport
            attrs[core.TP_DST] = p.dstport
        else:
            if isinstance(p, icmp):
                attrs[core.TP_SRC] = p.type
                attrs[core.TP_DST] = p.code
            else:
                attrs[core.TP_SRC] = 0
                attrs[core.TP_DST] = 0
    else:
        attrs[core.NW_SRC] = 0
        attrs[core.NW_DST] = 0
        attrs[core.NW_PROTO] = 0
        attrs[core.TP_SRC] = 0
        attrs[core.TP_DST] = 0
    return attrs

Now, if we're just making a basic switch, this does way more than we need - why would a switch care about layer 4 protocols? Fortunately, open vSwitch on the Pronto ignores most of it because it uses DL_TYPE=0x8100 (which means the packet is 802.1q VLAN tagged, and the actual ethertype is 4 bytes futher up), but having the wrong DL_TYPE is why nothing ends up matching the flow...

Util.py needs to be fixed to interpret VLANs properly, but in the meantime, pyswitch will work fine as a simple layer two switch if we use a cut-down version of the extract_flow function. And here it is:

def create_l2_out_flow(ethernet):
    attrs = {}
    attrs[core.DL_DST] = ethernet.dst
    return attrs

Simple, right? Now we use this instead of extract_flow, and then we can walk through what the function does in detail:

ddef forward_l2_packet(dpid, inport, packet, buf, bufid):    
    dstaddr = packet.dst.tostring()
    if not ord(dstaddr[0]) & 1 and inst.st[dpid].has_key(dstaddr):
[...]

    else:  
        # haven't learned destination MAC. Flood
        inst.send_openflow(dpid, bufid, buf, openflow.OFPP_FLOOD, inport)


This pulls the destination MAC address out of the packet, converts it to a string, and makes sure the first character is 0 = unicast. If this is the case, it checks to see if it's learnt it before, and if so, then we can proceed. Otherwise, it floods to all ports - correct for both broadcast/multicast and unknown MAC addresses.

        prt = inst.st[dpid][dstaddr]
        if  prt[0] == inport:
            log.err('**warning** learned port = inport', system="pyswitch")
            inst.send_openflow(dpid, bufid, buf, openflow.OFPP_FLOOD, inport)

If the destination MAC is assigned to the source port then something is weird (either a spoof or a loop in the network), so behave like a hub for this packet

        else:
            # We know the outport, set up a flow
            log.msg('installing flow for ' + str(packet), system="pyswitch")
            # sam edit - just load dest address, the rest doesn't matter
            flow = create_l2_out_flow(packet)
            actions = [[openflow.OFPAT_OUTPUT, [0, prt[0]]]]
            inst.install_datapath_flow(dpid, flow, CACHE_TIMEOUT,
                                       openflow.OFP_FLOW_PERMANENT, actions,
                                       bufid, openflow.OFP_DEFAULT_PRIORITY,
                                       inport, buf)

This is the switch part - we create our very specific flow with our new function (just destination MAC address - not all 10 or so parts to match on), set the action to output to the correct port, then call install_datapath_flow (part of nox::lib::core::Component), which sends back the new flow and instruction on where to send the packet. All done, and works well, except for one thing:

Open vSwitch DoS (probably one of many)
The problem with OpenFlow that everybody points out is that you can only really send 10 packets per second to your controller. You can try and optimise this if you want, but this switch-controller connection is where the battle will be fought to make OpenFlow perform better. I didn't think this would be a problem with the Pronto, because I assumed that open vSwitch would process packets somewhat like this:


  1. Find flow for packet - if found, follow the actions and go to next packet
  2. Send packet to controller
  3. Get packet and flow back from controller, follow instruction for this packet and install flow
  4. Go back to 1 for next packet.
Unfortunately, it appears that open vSwitch does things a little differently:

  1. Find flow for packet - if found, follow the actions and go to next packet
  2. Send packet to controller
  3. Get packet and flow back from controller, follow instruction for this packet and add flow to some queue somewhere
  4. Go back to 1 for next packet
  5. If no more packets waiting, look at the queue and install the flow
Surprisingly enough, this works fine for TCP - the 3-way handshake gives the switch enough downtime to install the flow, and get ready for the influx of data. However, if you surprise it with 500Mb/s of UDP iperf, you find the receiving server only getting ~150Kb/s, every single packet going to the controller, and no flow being installed!

Fortunately, the staff at Pronto have been awesome to work with, so I'm hoping we'll get a solution soon, and in the meantime, I'll try to find a workaround myself. If you're testing and stuck in a similar situation, either start off with a little UDP test first, or even ping the other host before starting your iperf - this will set the flows, and then you can send as much data as you like!

Monday, March 12, 2012

Openflow with NOX & Pronto/Pica8

We've got a Pronto 3290 at work, and with Josh Bailey's help I've been getting it talking Openflow to a NOX controller running pyswitch.

I figure the more I write about it, the more sense it'll make, so here's a summary of how far I've come:


  • The pronto runs Open vSwitch, which lets you add your own flows manually - makes it easy to see what flows your controller has added too. They're supposedly going to add Openflow v1.2 support soon, which means IPv6!
  • NOX doesn't find the Python bindings for OpenSSL on Ubuntu 11.10 (oneiric) in its current branch, but the destiny branch does - a bit of Git skill will sort this out for you
  • Wireshark has an OpenFlow dissector which is part of the OpenFlow code, but it doesn't work with newer versions of Wireshark, you'll need this patch to make it build - confirmed working on Ubuntu 11.10
  • Pyswitch (included as part of NOX) doesn't send back the right flows to the pronto - it sets the ethertype as 0x8100, so the flows look like this: idle_timeout=5,priority=65535,in_port=8,dl_vlan=1,dl_vlan_pcp=0,dl_src=00:XX:XX:XX:XX:XX,dl_dst=00:YY:YY:YY:YY:YY,dl_type=0x8100 actions=output:3 - this is where I'm going to start modding pyswitch
And this is where I am now. The plan for the next few weeks (which will probably change) is going to be something like this:
  1. Make pyswitch send correct Openflow data
  2. Mod pyswitch (or a demo router app) to do some basic routing and ACL
  3. Hope that someone has written a BGP Openflow app so that I don't have to - otherwise, look at options for this
I'll be back with more details

Saturday, July 2, 2011

Woohoo, 3-way!

I added my third router and machine, but couldn't get multicast to go more than one hop... it turns out that when I tried to test msdp and pim without tunneling them, msdp worked, pim didn't, so only pim got changed back to the tunneled address. Changing msdp to the tunnelled address made it work almost immediately!

Config dump

interfaces {
    em0 {
        unit 0 {
            family inet {
                address 10.1.1.198/8;
            }
        }
    }
    em1 {
        unit 0 {
            family inet {
                address 192.168.11.1/24;
            }
        }
    }
    em2 {
        unit 0 {
            family inet {
                address 192.168.2.1/24;
            }
        }
    }
    em3 {
        unit 0;
    }
    gre {
        unit 0 {
            tunnel {
                source 192.168.11.1;
                destination 192.168.11.2;
            }
            family inet {
                address 192.168.101.1/30;
            }
            family inet6 {
                address 2001:4428:251:2::1:1/120;
            }
        }
    }
    ipip {
        unit 0 {
            tunnel {
                source 192.168.2.1;
                destination 192.168.2.2;
            }
            family inet {
                address 192.168.201.1/30;
            }
            family inet6 {
                address 2001:4428:251:2::1/120;
            }
        }
        unit 1 {
            tunnel {
                source 192.168.2.1;
                destination 192.168.2.3;
            }
            family inet {
                address 192.168.202.1/30;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 1.1.1.1/32;
            }
        }
    }
}
routing-options {
    interface-routes {
        rib-group inet if-rib;
    }
    rib-groups {
        multicast-rpf-rib {
            export-rib inet.2;
            import-rib inet.2;
        }
        if-rib {
            import-rib [ inet.2 inet.0 ];
        }
    }
    autonomous-system 65000;
}
protocols {
    igmp {
        interface all {
            version 3;
        }
    }
    bgp {
        local-as 65000;
        group branch1 {
            type external;
            export [ to-branch1 allow-all ];
            peer-as 65001;
            neighbor 192.168.201.2 {
                family inet {
                    any;
                }
            }
            neighbor 2001:4428:251:2::2 {
                family inet6 {
                    any;
                }
            }
        }
        group branch2 {
            type external;
            export [ to-branch1 allow-all ];
            peer-as 65002;
            neighbor 192.168.202.2 {
                family inet {
                    any;
                }
            }
        }
    }
    msdp {
        rib-group inet multicast-rpf-rib;
        export allow-all;
        import allow-all;
        group test {
            peer 192.168.202.2 {
                local-address 192.168.202.1;
            }
            peer 192.168.201.2 {
                local-address 192.168.201.1;
            }
        }
    }
    pim {
        rib-group inet multicast-rpf-rib;
        rp {
            local {
                address 192.168.101.1;
                group-ranges {
                    224.0.0.0/4;
                }
            }
        }
        interface all {
            mode sparse;
            version 2;
        }
        dr-election-on-p2p;
    }
    rip {
        group gateway {
            export gateway-rip;
            neighbor em0.0;
        }
    }
}
policy-options {
    policy-statement allow-all {
        then accept;
    }
    policy-statement gateway-rip {
        from protocol [ direct bgp ];
        then accept;
    }
    policy-statement reject-all {
        from protocol rip;
        then reject;
    }
    policy-statement to-branch {
        from protocol [ direct local ospf bgp static rip pim ];
        then accept;
    }
    policy-statement to-branch1 {
        from protocol [ direct local ospf bgp static rip pim ];
        then accept;
    }
}

As you can see, I've started setting up IPv6 addresses on the routers. I've got RA and stateful DHCPv6 working on my real network, so there's no point muddying up the config here. By the way, it turns out you can have as many tunnels as you like - turns out stacking routed gre/ipip interfaces is totally okay. I hope to have some IPv6 multicast results this evening, so stay tuned

Clockwork Olive: multicast update

After much pissing around it turns out multicast does work, but emcast has been having problems. Dbeacon runs well in super verbose mode, emcast receives the info, but just doesn't seem to send very well - it could be that the olives are just being shit and dropping packets though.

Want to see the config?


interfaces {
    em0 {
        unit 0 {
            family inet {
                address 192.168.2.2/24;
            }
        }
    }
    em1 {
        unit 0 {
            family inet {
                address 192.168.12.1/24;
            }
        }
    }
    gre {
        unit 0 {
            tunnel {
                source 192.168.12.1;
                destination 192.168.12.2;
            }
            family inet {
                address 192.168.102.1/30;
            }
        }
    }
    ipip {
        unit 0 {
            tunnel {
                source 192.168.2.2;
                destination 192.168.2.1;
            }
            family inet {
                address 192.168.201.2/30;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 1.1.1.2/32;
            }
        }
    }
}
routing-options {
    interface-routes {
        rib-group inet if-rib;
    }
    rib-groups {
        multicast-rpf-rib {
            export-rib inet.2;
            import-rib inet.2;
        }
        if-rib {
            import-rib [ inet.2 inet.0 ];
        }
    }
    autonomous-system 65001;
}
protocols {
    igmp {
        interface all {
            version 3;
        }
    }
    bgp {
        local-as 65001;
        group olive {
            type external;
            family inet {
                any;
            }
            export to-branch1;
            peer-as 65000;
            neighbor 192.168.201.1;
        }
    }
    msdp {
        rib-group inet multicast-rpf-rib;
        group test {
            peer 192.168.201.1 {
                local-address 192.168.201.2;
            }
        }
    }
    pim {
        rib-group inet multicast-rpf-rib;
        rp {
            local {
                address 192.168.102.1;
                group-ranges {
                    224.0.0.0/4;
                }
            }
        }
        interface all {
            mode sparse;
            version 2;
        }
        dr-election-on-p2p;
    }
}
policy-options {
    policy-statement allow-all {
        then accept;
    }
    policy-statement to-branch1 {
        from protocol [ direct local ospf bgp pim ];
        then accept;
    }
}


I'm going to be a bastard any sources except this one. I'm tempted to chalk the emcast send failure down to packets simply being dropped, and maybe try a test VLC stream if I can be bothered with that, but this was only meant to be a means to an end - the next step is IPv6 multicast!

Thursday, June 30, 2011

JunOS Router testbed part 3: multicast still not working

So after a bit of research and tons of failed attempts, I've discovered that the olives really don't like multicast. Some people have had used a patch to enable it for OSPF on earlier versions of JunOS, but there's nothing for later versions (since these run fine), although MSDP and PIM still don't work.

I had heard about people using gre tunnels, and can confirm that this works. Olives only let you have one of each type of tunnel (due to there being no PIC's installed) so I used an ipip tunnel to connect two routers, got PIM and MSDP working, then gre tunnels to my two ubuntu boxes (as per http://knol.google.com/k/juniper-hacks/gre-tunnel-between-a-linux-host-and/1xqkuq3r2h459/43#).

I can see the routes filling up the MSDP table, dbeacon seems to sense get some sort of communication, but it still looks like multicast traffic isn't being routed properly.... at least it's getting across all the links now

Friday, June 17, 2011

JunOS Router testbed part 2

My topology has since become quite complicated, so I thought it would be best to draw a picture:
The fourth olive (meant to branch off like olive2 and olive3 with a separate AS number, tap interface and Ubuntu virtual machine) has been left out for simplicity at this stage. The main problem with my original design was that layer 3 separation wasn't enough - multicast skips routers at layer 2 - so I needed to give each box its own tap interface. To go with the diagram, here's the config from olive1 and olive2 (olive3 is basically the same as olive2 - this is an exercise for the reader)

Olive 1:


interfaces {
    em0 {
        unit 0 {
            family inet {
                address 192.168.2.1/24;
                address 192.168.11.1/24;
                address 10.1.1.198/8;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 1.1.1.1/32;
            }
        }
    }
}
routing-options {
    autonomous-system 65000;
}
protocols {
    bgp {
        local-as 65000;
        group branch1 {
            type external;
            export to-branch1;
            peer-as 65001;
            neighbor 192.168.2.2;
        }
        group branch2 {
            type external;
            export to-branch;
            peer-as 65002;
            neighbor 192.168.2.3;
        }
        group branch3 {
            type external;
            export to-branch;
            peer-as 65003;
            neighbor 192.168.2.4;
        }
    }
    rip {
        group gateway {
            export gateway-rip;
            neighbor em0.0;
        }
    }
}
policy-options {
    policy-statement gateway-rip {
        from protocol [ direct bgp ];
        then accept;
    }
    policy-statement to-branch {
        from protocol [ direct local ospf bgp static rip ];
        then accept;
    }
}
Olive 2:

interfaces {
    em0 {
        unit 0 {
            family inet {
                address 192.168.2.2/24;
            }
        }
    }
    em1 {
        unit 0 {
            family inet {
                address 192.168.12.1/24;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet {
                address 1.1.1.2/32;
            }
        }
    }
}
routing-options {
    autonomous-system 65001;
}
protocols {
    bgp {
        local-as 65001;
        group olive {
            type external;
            export to-branch1;
            peer-as 65000;
            neighbor 192.168.2.1;
        }
    }
}
policy-options {
    policy-statement to-branch1 {
        from protocol [ direct local ospf bgp ];
        then accept;
    }
}

And here's a show route from olive 2

inet.0: 12 destinations, 13 routes (12 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[BGP/170] 00:31:58, MED 3, localpref 100
                      AS path: 65000 I
                    > to 192.168.2.1 via em0.0
1.1.1.1/32         *[BGP/170] 00:31:58, localpref 100
                      AS path: 65000 I
                    > to 192.168.2.1 via em0.0
1.1.1.2/32         *[Direct/0] 00:32:02
                    > via lo0.0
1.1.1.3/32         *[BGP/170] 00:25:00, localpref 100, from 192.168.2.1
                      AS path: 65000 65002 I
                    > to 192.168.2.3 via em0.0
10.0.0.0/8         *[BGP/170] 00:31:58, localpref 100
                      AS path: 65000 I
                    > to 192.168.2.1 via em0.0
192.168.2.0/24     *[Direct/0] 00:32:02
                    > via em0.0
                    [BGP/170] 00:31:58, localpref 100
                      AS path: 65000 I
                    > to 192.168.2.1 via em0.0
192.168.2.2/32     *[Local/0] 00:32:02
                      Local via em0.0
192.168.11.0/24    *[BGP/170] 00:31:58, localpref 100
                      AS path: 65000 I
                    > to 192.168.2.1 via em0.0
192.168.12.0/24    *[Direct/0] 00:31:09
                    > via em1.0
192.168.12.1/32    *[Local/0] 00:31:09
                      Local via em1.0
192.168.13.0/24    *[BGP/170] 00:25:00, localpref 100, from 192.168.2.1
                      AS path: 65000 65002 I
                    > to 192.168.2.3 via em0.0
218.101.61.124/32  *[BGP/170] 00:31:58, MED 2, localpref 100
                      AS path: 65000 I
                    > to 192.168.2.1 via em0.0

It's all going well so far - putting each subnet on a different tap interface stops them cheating and using layer 2 for multicast, so now I can start getting PIM-SM set up (IPv4 only for starters)

Router testbed with JunOS olive on Virtualbox

I did an SRX course earlier in the week and we got to use Olive virtual machines to play with what we had learned. I'd tried making my own but got into trouble when actually installing the package, so I took a copy of this olive (8.3) and tried to get it to work at home. The first results were less than ideal - they would run fine without crashing, but setting addresses had to be done on the commandline with ifconfig rather than in the interfaces stanza. Not only this, but routing was totally broken - not even OSPF would work!

I had read that JunOS 9 didn't suffer from this, and tonight I acquired a copy of JunOS 9.6. The upgrade went smoothly (needed a force as the leftover diskspace wasn't enough, but it installed fine) and it automatically picked up the addresses from the interfaces stanza. OSPF worked fine between 4 of them, so the next thing was to use BGP to set up a basic layer 3 topology with 3 routers all with a single peering with the router in the middle.

If you've done JunOS BGP before then you'll know this is trivial - I made my life easier by making the export policy take routes from direct, local and bgp (which means readvertising happens automatically). The point of this testbed was simply to check my connectivity.

It did all work in the end, and now I'm on to part two - testing out multicast. The plan is to get a couple of virtual interfaces on a real machine, set up multicast between the routers, and have each virtual interface on a subnet owned by a different router. They're all connected to the same bridged interface which means the layer 2 topology has everything effectively hanging off the same switch, so this will be successful if I can get multicasts happening between the different subnets. This is somewhat trivial though, and the next step is to get IPv6 connectivity and testbed IPv6 multicast - if it works, then I'll put up some detailed instructions of all the ins and outs!