Friday, November 27, 2020

How to profile and fix your slow home wifi

 Slow wifi

There's a lot of us working from home these days, and that means a lot of Zoom and Skype video calls, Netflix, gaming, and all sorts of other internet traffic filling up your tubes. If you live in the city then you likely have a lot of neighbours nearby, and that means a lot of extra noise from their wireless routers. But how bad is it? Can you make it better? How can you tell?

Speed tests

This is a good first step. There's a simple rule in software and networking: if you can't test it, it doesn't exist. I went to speedtest.net and had a look, and here's what I found:


This is where I should point out that my internet is 100Mb/s down, 10Mb/s up. We're getting nowhere near this on wifi. Here's the specs of the current setup:
  • Frequency: 2.4GHz
  • Technology: 802.11n
  • Connection speed: varies, but between 50 and 100Mb/s, allegedly
Now there are a few things that can affect your internet speeds, but one of the main culprits is packet loss. It depends where in the network you get it though. If you get packet loss on a physical link (like a cable), then those packets will just disappear and your computer will have to resend them. This affects you when you're downloading or uploading files over TCP, but when you're on a Zoom or a Skype call, or some other live content like gaming, the software will be able to tolerate a certain amount of loss so long as the latency (or ping) remains stable. The speed test gives us a hint that there's a problem, but we need to look a bit deeper to find out what exactly is going on under the covers.

We have to go deeper

I've used a lot of tools for network profiling over the years. I've worked with the perfSONAR framework, the WAND framework, straight iperf tests, but the one that I've settled upon is smokeping. Smokeping was designed to track latency to different hosts, but I've found you can use it for a lot more than that. Here are some of the extra things you can measure:
  • Packet loss: we need to tweak the defaults, but we can get good measurements on this
  • Jitter: this is a measure of how latency changes with time, we can definitely get a picture of this
  • Congestion: a little harder, but we can infer this from the graphs too
  • Bad links: this is a really fun one to figure out - is the fault in your local network, the other network, or somewhere in the middle? By graphing to a range of destinations along the path we can see which parts are clear and which parts are dirty and that'll point us to where the problems are being caused
Let's get it set up. Luckily for us it's not the 90's anymore and we can just spin it up in a docker container. The team at linuxserver.io have kindly made a docker image for it, and their default docker-compose script is very good to get started with. All the details are at https://hub.docker.com/r/linuxserver/smokeping, but I'll put my script here for you as an example:

---
version: "3.7"
services:
  smokeping:
    image: linuxserver/smokeping
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/London
    volumes:
      - ./config:/config
      - ./config:/data
    ports:
      - 8001:80
    restart: unless-stopped

There we go. What this config does is the following:

  • Runs the container with the timezone of your choosing
  • Mounts the folders "config" and "data" into the appropriate places in the container where smokeping is expecting to find them
  • Forwards port 8001 so we can go to the server in our browser.
I normally put this in a folder called "smokeping" and create the two empty "config" and "data" folders inside. I then run the container as follows:

docker-compose up

This will start the server, and you can navigate to it at http://localhost:8001 . You'll see a bunch of tabs on the left, and a bunch of graphs in the middle. This is all the default config; don't worry about this too much. You have smokeping up and running though!

Fine tuning

The next step is to make smokeping do what we want. Stop smokeping by going back to your command prompt and typing Control+C and you'll see it start to shut down. Then we'll look at the config. Open the config folder, and you'll see a bunch of new files. The two files we care about are Probes and Targets. Open Probes and change the config to look like this:

*** Probes ***

+ FPing
binary = /usr/sbin/fping
pings = 5
step = 30

+ DNS
binary = /usr/bin/dig
lookup = google.com
pings = 5
step = 30

The default ping is too low and won't catch small changes, this update will increase it to 5 pings per 30 seconds (one ping every 6 seconds). You can increase the pings for FPing to 30 if you want, to do one ping per second (in 30 second chunks), and that will give us an even better look at packet loss over short periods of time. You could also do 5 pings per 5 seconds (ping = 5, step = 5), and this will do the same thing but group them in 5 second chunks on the graph. One warning here: every time you change this config, you'll have to delete your data (everything in the data directory) as the database format is tuned to these values.

The next thing to do is pick some good targets. Open the Targets file, and delete all the targets except maybe Youtube and Facebook. Your config will look something like this now:

probe = FPing

menu = Top
title = My smokeping
remark = Remarkable

+ Home

menu = Home
title = Home

++ Facebook
menu = Facebook
title = Facebook
host = facebook.com

++ Youtube
menu = YouTube
title = YouTube
host = youtube.com

Now close these, delete the contents of your data directory, and restart smokeping with docker-compose up. You'll see the number of graphs has gone down to 2, and we can start working with these.

Number 3 will shock you

Here's what my graphs looked like:


Looks like a bunch of fuzzy junk. Here's what's important about this graph:
  • All the dots are lime green. This is good; it means we're getting 0% packet loss. I thought my slowdowns might have been packet loss in my ISP's network, so it's nice to see this isn't a problem
  • There's a constant large range in latency. Lots of jumps from 20 to 160ms
You can see to the right of the graph (after 18:20) the green line drops a little and all the noise goes away; this is when I plugged in an ethernet cable and used the internet through that. This tells me that all of the latency was caused by the wifi link. This is caused by packet retries on the wireless link, and this causes a ton of problems for us in practice. In short, packets that arrive out of order look a lot like lost packets and that gives you dropped connections, slow data rates, and laggy and unreliable Zoom and Skype calls.

Why does this happen? The fact is, wifi will never be as reliable as a cable. The gap between your wireless card and your router is full of walls, people, and other household objects, and these all change the way the signal behaves. This is fine if you live in the middle of nowhere, but when you have a lot of neighbours nearby, their wifi signal will leak through your walls and means your internet becomes like a conversation in a noisy bar: slow, and full of shouting and misunderstandings. Wifi tries to solve this by having different wireless channels, but here are the problems with this:
  • 2.4GHz wifi has 11 channels; this isn't enough when you have 30 networks nearby
  • The channels aren't perfectly separate, if channel 3 is strong then you'll still "hear" it on channel 1
  • Most APs will pick either channel 1, 6 or 11 to get around this, but that means you still have 10 other APs on whatever channel you pick

Change the channel

I ended up forking out for a new router that had 5GHz wifi. Apart from the fact that a lot of home wifi is still stuck on 2.4GHz (meaning less neighbours to compete with), the 5GHz frequency is also stopped by building materials (read: doesn't go through walls), so you can expect it to be "quieter", even if the neighbours all switch to 5GHz tomorrow, I can expect to have a stronger signal. So what are the new speeds on 5GHz?


That looks more like it. Here's the change in the packet latency and loss graphs:


Note that while we still get the spikes in latency, they happen a lot less frequently. In practice this means just higher quality internet; a few days ago I'd get massive lag spikes while gaming online that would cut me off completely, whereas now I can have youtube streaming on one laptop, game on another, and have zero problems. I've had a couple Skype calls with great quality, and a Zoom call with my friends for Thanksgiving last night, and I definitely noticed the difference - before my audio would cut out and my video would be pixellated, but now everything is crisp and reliable with no lags and dropouts in the middle of the call.

Finally, for completeness, I changed the ping rates for you from 5 to 15 per 30 seconds, and switched between 2.4GHz and 5GHz. I also added a new destination to my smokeping setup - my router is at 192.168.0.1, so I've used this config in my Targets file:

++ Router
menu = Router
title = Router
host = 192.168.0.1

That lets me get graphs like this:


The first part is on 5GHz, the middle bit is on 2.4GHz (with a machine reboot in the middle), and then back to 5GHz again. Note that at 15 pings per 30 seconds we see the green dots moving around a bunch on 2.4GHz and a lot more black fuzz, whereas on 5GHz it's a lot calmer; still not perfect like the cable, but noticably better. And to Facebook:


The drop in the middle is because my 2.4GHz wifi router is using different DNS servers (on 5GHz I'm using a DNS geo unblocker so I get a slightly longer path to Youtube and Facebook). You can see the difference though, 5GHz on the left and right looks fairly clean, and 2.4GHz in the middle is messy and unreliable.

You need to measure it

This is a nice story, but the important thing here isn't that moving to 5GHz magically fixed everything, or that my Skype and Zoom calls are all crystal clear or that my ping is perfect and I'm an absolute gamer pro now. The point is that you need to test and measure to find out what your problem is, and run the same tests again to see if you've fixed the problem or merely kicked the can down the road. I hope this gives you a chance to dip your toes in with docker and smokeping, and I hope this can help you diagnose (and even solve) network problems in your own home network.