Blog

How "Controls" Help Isolate Wi-Fi Issues

09/22/2015

Experienced wireless networking professionals know that reliable and consistent Wi-Fi performance is never guaranteed. Sooner or later performance black spots show up, somewhere in your network.

From interference to congestion to shifts in user densities, and ever-increasing demand there are so many factors at play, that it is often quite difficult to pinpoint and isolate the true culprit.

You’re in the hot-seat, no matter the cause

Degraded performance is often caused by something entirely out of your control. It could be congestion at your router, unexpected new sources of interference, device driver issues and more. Unfortunately, however, no matter the reason, whenever the Wi-Fi user experience suffers, everyone assumes it is the Wi-Fi that sucks because that’s where it manifests. And the onus is on you to diagnose and fix the problem, before it escalates into meetings with your boss to discuss your future with the company.

But troubleshooting suspected Wi-Fi performance issues is extremely hard when you lack of visibility of what Wi-Fi performance you should expect under different circumstances, depending on user load, time of day, and so on. i.e. Do you know the performance norms for different types of devices in your Wi-Fi network, or what deviation from mean is acceptable before you pronounce it a problem?

No amount of planning can predict change

When you roll out a new network, you will most likely use some kind of predictive planning tool either from the WLAN equipment vendor, or perhaps third party software such as Ekahau’s Site Wi-Fi Planner. And you will set coverage, performance and capacity goals, which are used by the planning tool to determine the number of APs you need and where to place them. For many, the initial deployment is about the only time the Wi-Fi performance profile is well understood. After that, who knows, because in Wi-Fi networks the only thing guaranteed, is change.

For one reason or another, APs get added or moved, configurations get tweaked, office furniture gets moved around, user densities rise and fall, and with each passing week there are more users with more devices demanding more from the network. As a result, your original plan and the actual network configuration can quickly diverge. But how often do you circle back to check how the WLAN is actually performing against the predictions made by the planning tool. You don’t. Because you’re too busy dealing with the next crisis or expansion!

Usage and environment are always in flux

For example, one university customer told us that after months of planning and deployment for their new science building, students were reporting they could barely connect in one of the auditoriums. Not right away, but only three weeks after school started. They had planned for 6 Mbps per person with full occupancy.

There was no explanation, until they realized students had decided outside this building was the new favorite place to park their bicycles! “Always-on” student smartphones were connecting to APs inside the building from outside, at very low rates, draining airtime from everyone else.

Another customer, a middle school, discovered that facilities had moved all the metal lockers from one end of the hall to the other during the summer break, blissfully unaware of the impact it would have on the RF environment. No doubt you have war stories, of your own. Some Wi-Fi performance issues are elusive – you need to use all your faculties, and discipline to figure out what’s wrong.

Despite being totally proficient at their jobs, many WLAN engineers do not have a good measure of the Wi-Fi performance metrics for their network, and without it, it takes a lot more work to figure out what is really going on. It is not their fault at all. First they don’t have the tools to track end-user performance on a day-to-day basis, or the ability to correlate end-user performance with network load area by area. Second they are so busy putting out fires, there is no time for basic performance benchmarking even with the limited tools they do have.

Lack of visibility complicates troubleshooting

Let’s return to the question of diagnosing root cause. Sooner or later, this lack of visibility will come back to bite you, and turn what should be a fast, simple diagnosis of root cause into a protracted, frustrating search for a needle in a haystack.

We see this all the time in the networks of our customers. Often when customers first deploy the Sapphire System, they engage us or our partners to help them audit the network, quickly fix all the niggling problems, and bring performance up to a certain level, before setting the performance SLAs by which they’ll monitor the Wi-Fi experience going forward. It is during these optimization projects that we encounter all manner of unexpected causes that have nothing to do with the Wi-Fi configuration.

Case in point, when a large children’s hospital started a large-scale roll-out of Computer on Wheels (CoWs) to support Electronic Medical Records at the bedside, the Sapphire System revealed a sharp rise in retransmission rates which drained network capacity as much as 40%. With this information they were able to diagnose the cause as a faulty Wi-Fi driver on the laptops selected for the CoWs, as well as high attenuation from the carts themselves. Thanks to Sapphire, they were able to avert a fiasco, and rectify the problem in short order.

At one of the largest international airports in the US, chronically slow Internet performance over the Wi-Fi serving the terminals was quickly resolved when the Sapphire System revealed that the wired-network performance was almost as bad. The culprit turned out to a congested Internet router which had gone unnoticed for months, and was crippling performance for wireless users at all terminals.

How does Sapphire help you isolate root cause?

How does the Sapphire System help IT uncover the root cause, not found by other means? The short answer is “Controls”. We’re not talking about “dials” but “reference data” which gives context to your real-time test results. As with any research technique, whether it is testing drugs on rats, or testing a marketing campaign, you can’t prove anything unless you have a “Control” case. The Sapphire System has lots of “Controls” in the form of historical data and alternative ways to measure the same data points which help you eliminate phantom causes, and quickly isolate the true culprit.

For example, the Sapphire System doesn’t just test performance over the Wi-Fi. It runs the same tests over the wire as well. This “Control” mechanism quickly exposes problems that are not Wi-Fi specific, but network-related. But there’s more...

The Sapphire System doesn’t just test performance when the network is in use, it also tests performance all through the night, when the network is mostly idle (unless you’re a hospital). This “Control” gives you a high watermark, against which you can compare loaded behavior. If the high watermark looks more like a low watermark, you know you’re in trouble! Take the case depicted in the headline picture, say all lockers got moved around during summer break. Without any traffic, how would you know something had changed? With Sapphire, you’d see the changes in performance immediately, and you’d be able to compare it with the history for that area, and would be able to deal with it in time for school.

The Sapphire System doesn’t just test performance from Sensor to AP, because users’ mobile devices don’t have the same grade of antennas that a Sensor has. It also captures data from all classes of clients as they move around. This “Control” exposes coverage issues, and reveals device specific trends. It means you’re no longer in the dark about what performance level is normal for an iPAD mini with 11ac, versus an older iPhone 5 with 11n, or a Blackberry, you can even compare even different OS versions for the same model or same brand devices.

The Sapphire System doesn’t just test performance against some arbitrary server in the cloud. It lets you choose which servers to test against, including servers in your network. This “Control” quickly exposes Internet connection and server-specific issues.

The Sapphire System doesn’t just test upload / download performance, it looks at hundreds of metrics through active and passive testing, including packet loss, retransmission rates, attach rates and much more, and presents that information as meaningful Key Performance Indicators in a browser dashboard.

And finally the Sapphire System lets you annotate your timeline charts with configuration events, so you can easily verify and visualize the performance outcome of recent changes.

Why not take a quick tour of the Sapphire System dashboard, or Contact us for a personalized demo to see how the Sapphire System’s many “Controls” help you isolate device and OS issues from Wi-Fi and network issues, and ensure the best possible Wi-Fi experience for your users.