Erratic throughput drops on wireless network
Last Post: August 23, 2018:
-
Hi all,
Wanted to get some help on a problem I am struggling to get to the bottom of.
A wireless network at one of our sites is performing incredibly erratically. I am testing throughput from my laptop (connected wirelessly) to a wired device on the LAN using iperf. The throughput starts as expected (150Mbps which is correct for the radio in my client and the wireless setup), however the throughput will suddenly drop into the teen's, and then flat line at 0 Mbps. Sometimes it will pick back up, but it will always flat line back. The odd thing is that this behaviour occurs 98% of the time, and there are instances where you don't see this for several minutes, and then it starts again.
The wireless network I am connected to is on 5GHz, and I have purposely moved the network onto a channel with no other wireless networks, and no unexpected RF (checked with a spectrum analyser).
Things we have changed:
1. Channel width
2. Power
3. Channel
4. Minimum bitrates
5. Creating a new SSID
6. Removing WPA 2 from the SSID
I captured a file transfer using Omnipeek, and I couldn't see anything obvious that would cause this behaviour.
We have one company in the building above us (medial company), and I spoke with their IT department to see if any of our wireless networks had been marked/blacklisted which none had.
Today I got another MR26 sent to the site, and when plugged in it exhibited the same behaviour instantly. We completed a factory reset on the AP, and set it up to broadcast a brand new network. This wireless network is behaving differently, and I would say 80% of the time the iperf throughput is as expected, however we do see the occasional flat line. As soon as I rename the SSID back it performs poorly. We did attempt an SSID rename in the past, however that also performed poorly.
I took the new MR26 home with me today and plugged it into my home network, and so far it is not displaying any of these symptoms. Which feels like it leads me down the following avenues:
1. RF interference
2. LAN
3. Clients on the wireless
Any thoughts, suggestions or ideas to help narrow this one down?
Thanks
Adam
-
You have done a good job so far with the thinking and testing. These can be baffling puzzles to solve. Some thoughts on the three suspects:
1) Have run the spectrum analyzer at the time when the throughput drops? The interference could be intermittent.
It could be an active deauth attack. Either on purpose or some IPS attacking your network. This should show up in Wireshark.
Have you tested 2.4GHz? Does it occur there?2) It is hard to saturate Ethernet to the point where 150Mbps 802.11n wouldn't get through. You could keep an eye on the switches to see if they flicker or glow brightly.
3) Does this affect all clients? Different types and models?
But if the new test WLAN didn't have the problem then it should be on the infrastructure side. Especially so if the new AP immediately picked up the problem but it disappeared after reconfiguration.I'm sorry but I don't have that much experience with Merakis. How do they behave when they have problems with the connection to the controller for example. Could there be something wrong with the uplink? Can you access the logs on the APs or the controller?
Hopefully someone with more Meraki experience chimes in.
-
Hi,
Thanks very much for the response.
1) Have run the spectrum analyzer at the time when the throughput drops? The interference could be intermittent.
I did, but to be honest it was fairly meaningless at the time as I didn't know what 'right' for this setup actually looked like to spot any differences (nothing stood out with it). Today I am going to do some captures at home which I know works, and re do them at the site to spot any differences. If you have recommendations with things to look out for, that would be greatly apppriciatedHave you tested 2.4GHz? Does it occur there?
This was something that crossed my mind yesterday but I never did. I have added it onto my test plan, and will absolutely try this
2) It is hard to saturate Ethernet to the point where 150Mbps 802.11n wouldn't get through. You could keep an eye on the switches to see if they flicker or glow brightly.
Thanks. Will add a note into my test plan for this.
3) Does this affect all clients? Different types and models?
But if the new test WLAN didn't have the problem then it should be on the infrastructure side. Especially so if the new AP immediately picked up the problem but it disappeared after reconfiguration.
My personal testing has been limited to Windows based laptops, however all users in the office have been complaining of problems on all of their devices:
1. Work laptops (windows)
2. Work phones (iOS and android)
3. Personal devices
I don't currently believe the problem is isolated to a device type, but it doesn't mean to say a specific device isn't the cause.
I'm sorry but I don't have that much experience with Merakis. How do they behave when they have problems with the connection to the controller for example. Could there be something wrong with the uplink? Can you access the logs on the APs or the controller?
Possibly. The Meraki's are cloud based with no controller in the infrastructure. I need to do some further looking into what logs are available, but from my experience the more technical features of the AP's are restricted/not available to keep Meraki 'friendly' to use
I am currently writing up a test plan for my visit back to site tomorrow. Something which I am wondering is whether it's a device on the LAN such as another AP someone installed for their 'test lab's'. I am going to see if I can get my test AP and a physical device isolated into their own VLAN on the same switch in an attempt to descope everything else on the LAN.
We are also going to be replacing these AP's with some more modern Meraki's, but I need to find the root cause for my sanity if nothing else (especially if the new AP's don't make the problem go away)
Thanks
Adam
-
Not likely, but asking anyway : If you're on a DFS channel, are there any AP's placed outdoors on the same channels ? Any airports or harbors within (say) 6 miles or so ?
Are you using an RRM-like control on the AP's ? Are they set to totally "automatic" ?
What are the "roaming thresholds" on the clients set to ? Is there any chance they are trying to roam - either successfully or not ?
Please let us know what you find.
Howard
-
Hi all,
Just a quick update.
I spent Wednesday putting together a test plan and I re-visited the site on Thursday. One of the first test's in my plan was to try and narrow the issue down to LAN or wireless, so I got an unmanaged switch and plugged my AP and wired device in. This unmanaged switch only had these two devices, with no internet access. While connected to this unmanaged switch, everything works perfectly. I repeated this multiple times throughout the course of the day, and the act of moving the device between the unmanaged switch and the core LAN goes from breaking the AP to fixing it.
Now I just need to work out the culprit on the LAN side :)
Thanks
A
P.S For any Meraki users, if you ever want to use your meraki's offline you must have a static IP set, one of the SSID's in bridge mode, and it must have checked into the Meraki cloud with both of those config's set, otherwise it reloads the old 'safe' config
-
One thought - Are you using MFP/PMF on your WPA links ?
I have seen some clients disconnect sporadically if it is turned on. Usually they connect just fine, but will drop after a certain amount of traffic. Make sure every client device has the latest firmware installed.
- 1