Blog
When it comes to troubleshooting enterprise Wi-Fi, it’s important to understand the scope of the issue before taking measures to address it. In this article, we recap a recent webinar in which Lee Badman, who is a Wireless Network Architect, CWNE #200, IT writer, and all-round renaissance man, discusses Wi-Fi troubleshooting best practices and why it's important to recognize that not all network-related problems have the same scope.
Here’s what Lee covered in his webinar:
- First, Slow Down to Consider the Scope
- What Is Scope and Why Does It Matter?
- In a Healthy, Well-Designed Network, Most Problems Will Be Single-Client Issues
- However, the Problem May Originate From the WLAN
- Key Considerations for Problem Scoping
- Effective Scoping Drives Response Urgency
- Misjudging Scope Leads to Poor Decisions
- WATCH: Consider the Scope of Problems When Troubleshooting Wi-Fi With Lee Badman
- More About Lee Badman
Scope out enterprise Wi-Fi issues with 7SIGNAL! Set up a meeting with our Wi-Fi experts to learn how.
First, Slow Down to Consider Scope
When troubleshooting enterprise Wi-Fi, there's an understandable urge to fix problems as quickly as possible. Afterall, Wi-Fi is core to most business operations and interruptions to connectivity put significant strain on enterprise productivity. Nevertheless, it’s too often the case that network managers jump to conclusions about cause—they react hastily, without thoroughly examining problems or assessing their scope, and end up wasting time addressing the wrong issues.
So, to kick off his presentation, Lee Badman emphasized the importance of slowing down. To drive this home, he made the point that rushing often leads to more problems, which could potentially impact company reputation as well as performance.
What Is Scope and Why Does It Matter?
So, what does "scope" mean? According to Lee, in the context of auditing issues affecting enterprise Wi-Fi, it involves three key elements:
- The number of clients or devices impacted.
- The physical locations of those clients or devices.
- Any commonalities among the affected clients or devices.
Before taking any concrete steps to address issues on an enterprise network, managers should make sure they have a general understanding of each of the above factors. To illustrate why, let’s look at a couple examples provided by Lee that highlight the importance of scope.
Example 1: Defining the Problem
On Lee’s network, an area supervisor was having “issues with connectivity.” To address these issues, which the supervisor attributed to “weak signal strength” in their area, they reached out to Lee requesting a new access point (AP).
However, given the high density of coverage in their area, signal strength shouldn't have been an issue. Additionally, all APs and radios were operational, and signal strength on clients appeared good. Still, the supervisor insisted that the problems were real, prompting Lee to dispatch technicians. Once on site, they discovered that the weak signal was actually related to cellular service, not Wi-Fi, and that users were having trouble with the authenticator app, which was necessary for logging onto the network. No Wi-Fi issues were found.
So, by taking the time to thoroughly assess the situation, Lee and his team avoided unnecessary expenses (no new AP) and saved time (no resources wasted on setting up a new AP). Indeed, in this case, the problem as described by the area supervisors didn't seem accurate. Nevertheless, Lee and his team understood the importance of being sensitive to the supervisor’s concerns. It wasn't about proving who was right; the goal was to help the supervisor succeed. As subject matter experts, Lee and his team needed to assist in identifying and resolving the actual issues. To do this, they needed to look at them with a critical eye and scope out the extent of the problem.
Example 2: Identifying Root Causes
Lee’s second example highlights the importance of defining scope to identify the root causes of Wi-Fi problems.
Lee's network began experiencing widespread authentication issues on the 802.1x network. The problem spanned multiple buildings, and wasn't specific to any user group. What’s more, key components like RADIUS, back-end directory, and WLAN remained entirely unchanged. This happened despite having no recent code upgrades and maintaining a well-instrumented network with alerts set up for all relevant system functions.
Still, the issue affected thousands of users and demanded immediate attention from Lee and his team. As it was an 802.1x network problem, well-meaning but unqualified managers suggested solutions like rebooting the RADIUS servers. But, instead of reacting hastily, Lee scoped out the problem. He consulted with the help desk and conducted tests on his own devices. Within 15 minutes, he determined the issue was specific to Windows devices, as it didn't affect Apple devices or mobile phones (Android or iOS), and both managed and BYOD Windows devices were impacted.
After some “Google-fu,” Lee discovered that others were experiencing similar issues due to a bad patch pushed by Windows, which caused an 11R issue. The options were to disable the feature on that SSID or have all Windows clients roll back the update. Lee chose to disable the feature to prevent user inconvenience and maintain network stability.
So, by defining the problem's scope, Lee avoided unnecessary actions like rebooting RADIUS servers, saved time and money, and quickly identified the root cause. This approach prevented significant disruption and ensured a smooth resolution.
In a Healthy, Well-Designed Network, Most Problems Will Be Single-Client Issues
A healthy network is one that isn't running on outdated or poor-quality code. The designs are robust and consider not only AP placement and density but also everything upstream, like Power over Ethernet (PoE), switches, capacity, and IP space. If your network meets these standards, then most problems will stem from client issues. Indeed, confidence in the WLAN typically leads to faster issue resolution since it effectively eliminates the network from consideration during the resolution process.
Still, there’s a good chance that, when issues crop up, end users will insist their devices are fine and that the network is at fault. This is when troubleshooting moves from the technical to the interpersonal. If a VIP complains that the network isn’t functioning, but is reluctant to give you access to their devices, resolving their issues is vastly more complicated. However, Lee reminds us that network managers are the experts, and their time shouldn’t be wasted troubleshooting the network when the client device is likely the culprit. Making gratuitous changes to network settings for single user issues is a fool’s errand.
However, the Problem May Originate From the WLAN
Having said that, some issues do actually originate from the WLAN. If this is the case, then they typically fall into the following categories:
- AP-related issues: These are typically measured by the number of affected users. If an access point (AP) fails, a localized group of users will be impacted.
- Switch-related issues: These are measured by the number of affected APs. If a switch providing PoE fails, it will impact multiple APs. The scope involves determining the extent of the affected area.
- Switch stack-related issues: These are measured by the number of affected switches. The problem becomes larger as more switches in the stack are impacted, leading to more wireless users being affected if an entire switch stack fails.
- VLAN-related issues: These can cause problems across multiple buildings. If a VLAN servicing several buildings encounters an issue upstream in the core, it can impact a large number of users.
- Controller/code-related issues: In a controller-based architecture, if a controller or controller cluster fails, it can cause enterprise-wide problems. The scope of such issues is extensive.
- Core switch-related issues: If a core switch servicing multiple buildings or campuses fails, the scope of the problem is even larger.
- Core services issues: Problems with core services like RADIUS, DNS, DHCP, etc., can be more challenging to detect and resolve, impacting a significant number of users and presenting tough problems to solve.
Key Considerations for Problem Scoping
When scoping out problems, focus on answering the following questions:
- Is this a single client issue or are multiple clients in the same area being impacted?
- Is this issue happening on a single device type or operating system?
- Is this happening on a single SSID or on multiple? (If it’s happening on a single SSID, then it usually indicates that APs, controllers, or core are the problem.)
- Have you made any recent changes to the network environment?
- Are core services healthy?
- What do the dashboards say?
- Have you read any discussion groups?
Effective Scoping Drives Response Urgency
Scoping is extremely important when it comes to triage. By understanding the severity and implications of various problems, you can better allocate resources and manpower to address the issues. For example:
- Is one AP down in a dense environment? The problem is likely not urgent enough to dispatch a technician immediately.
- Is an entire controller cluster down? This is a much higher priority situation requiring immediate action.
- Is one guest struggling to login while others are fine? This probably doesn't warrant rebooting the portal server.
- Does the syslog show no free leases from DHCP servers? You should contact the IPAM manager immediately.
Misjudging Scope Leads to Poor Decisions
To conclude, Lee Badman reiterated that misjudging or ignoring scope can lead to poor decision making. This includes ill-advised code upgrades, tampering with well-designed network structures, unnecessarily modifying timers, overriding RRM functions, prematurely declaring problems solved, focusing on the wrong areas for solutions, and neglecting Wi-Fi’s role within the overall network topology. To avoid making problems worse, always, always, always scope them out.
WATCH: Consider the Scope of Problems When Troubleshooting Wi-Fi with Lee Badman
Want to learn more about Wi-Fi troubleshooting? Check out Lee’s full remarks on the 7SIGNAL YouTube Channel:
More About Lee Badman
Lee Badman is a Wi-Fi industry maven and Certified Wireless Network Expert. He has 27+ years of networking experience, served for 10+ years in the United States Air Force as an electronic warfare “Journeyman,” and is a widely published freelance writer covering WLAN, Wi-Fi, IT, and related topics.
Follow Lee’s Work:
Learn More From the 7SIGNAL Experts
We’re always here to answer your Wi-Fi questions at 7SIGNAL. Our enterprise Wi-Fi optimization platform helps you plan and execute a healthier network. Contact us to learn more.
7SIGNAL® is the leader in enterprise Wi-Fi optimization, providing insight into wireless networks and control over Wi-Fi performance so businesses and organizations can thrive. Our cloud-based platform continually tests and measures Wi-Fi performance at the edges of the network, enabling fast solutions to digital experience issues and stronger connections for mission-critical users, devices, and applications. Learn more at www.7signal.com.