Colubris LMP: Extending hotspot reach

Despite competition from 3G and WiMAX, the Wi-Fi hotspot market continues to grow, fueled by mobile worker demands for high-speed Internet access to both data and voice services. But, in many public access venues, network infrastructure costs and complexities make it hard to expand hotspot capacity and coverage.


To simplify this, Colubris Networks recently added a new Local Mesh Protocol (LMP) to its MultiService Access Points (MAPs) and MultiService Controllers (MSCs). Using LMP, we quickly created a 5-node Colubris hotspot, interconnected by a self-healing Dynamic Wireless Distribution System to a single wired backhaul link. However, we were unable to centrally provision or monitor our local mesh hotspot—a key enhancement that Colubris expects to ship this October.









Colubris MSC 3300 ($1,099)
Colubris MAP-330 ($699)
5-node Local Mesh as tested: $3,895
Colubris Networks, Inc.
Waltham, Massachusetts
http://www.colubris.com

Colubris APs

Making a mesh


Many hotspot operators are already familiar with Colubris MAPs and MSCs. Today, Colubris sells half a dozen MAP models with single or dual radios, for indoor or outdoor use. This fall, Colubris expects to start shipping a new 802.11n MAP as well. In our test mesh, we used four MAP-330s with dual a/b/g radio APs that support up to 16 Virtual Service Communities (VSCs), each with its own SSID and QoS/security policy.


Colubris WLANs can be controlled in several ways. For example, Colubris 5000 series MSCs can provide access control and management for up to 200 MAPs/2000 users. Colubris 3000 series nodes combine MSC and MAP functionality to yield “hotspot in a box” solutions for up to 100 users. We chose a single MSC 3300 as our fifth (Master) AP—and the only node in our test mesh to have a wired Internet drop.


On each mesh node, we configured one radio with user-accessible 802.11b/g VSCs: an open “GuestNet” VSC requiring web authentication, a WPA2-encrypted “CorpNet” VSC requiring 802.1X authentication, and a MAC-authenticated VSC for tester access (see below). To focus on LMP, we stuck to these basic VSCs and did not attempt to exercise the more advanced QoS/WMM or VPN capabilities of our Colubris nodes.








To connect our MAPs to each other and the MSC, we configured each node’s second radio to participate in an AES-encrypted 802.11a Dynamic Wireless Distribution System (DWDS) group (see below). Our tests focused largely on set-up and operation of this self-healing wireless backhaul mesh and its impact on administration and usability.








Test network

To avoid competition with our 802.11b/g public access hotspot, we chose to dedicate each MAP’s second radio to 802.11a backhaul, secured by WPA2-PSK, without WMM. But we could have applied any radio, protocol, security, or QoS profile supported by our MAPs—for example, using WMM to prioritize VoIP across the mesh—so long as all DWDS group links share the same frequency, channel, and keys.


Overcoming static backhaul barriers


In past Colubris OS releases, MAPs could be inter-connected by static WDS links, nailed up between nodes. But if a static WDS node fails, all downstream MAPs are left high and dry. If one static WDS node changes channels to avoid interference, there is no guarantee that other MAPs will change to the same channel at the same time to preserve the mesh. These factors can make static wireless backhaul links unreliable and very hard to manage remotely. But the conventional alternative—Ethernet drops to every node—can be expensive or impractical in venues with hard-to-wire expanses.


In COS v5, Colubris added a Local Mesh Protocol (LMP) that MAPs can use to discover other dynamic WDS nodes, automatically forming the best possible backhaul links and healing the mesh without administrator intervention after failure or RF change. We found these improvements significantly reduced the effort associated with MAP installation and footprint/performance tuning. For example:



  • When we placed a MAP too far from the mesh to sustain a reliable link, we could easily reposition that MAP without relocating an Ethernet drop.
  • When we wanted to extend our test mesh’s footprint to another building 50 yards away, we just carried a MAP next door and plugged it into an AC outlet.
  • When an external 802.11a AP was placed near our MSC to create channel competition, the MSC shifted the entire mesh to a friendlier channel.
  • With few exceptions, when a MAP was “accidentally” rebooted or unplugged, the mesh reformed itself to work around the lost node within minutes.

Under the covers


In a Colubris DWDS, each node must be configured to serve as a Master, Alternate Master, or Slave. Our MSC served as our Master: a node that accepts upstream DWDS link requests from downstream nodes and relays traffic through an Internet connection. We configured all other MAPs as Alternate Masters: nodes that can simultaneously support both upstream and downstream DWDS links, and will step in to fill the Master role should no other Master exist. Initially, we chose not to configure any Slaves: leaf nodes that form only upstream DWDS links.


Each node tries to discover other nearby nodes with the same configured Group ID. A given group is limited to 10 nodes, but larger hotspots can be constructed by daisy-chaining MAPs that may participate in up to 6 groups. Unless channels are statically configured, the Master automatically chooses the quietest channel, while Alt-Masters and Slaves scan all channels to find a Master. If that Master should go down or change channels, Alt-Masters and Slaves re-scan to find a (new) Master. But if an Alt-Master cannot find a Master within a designated period (by default, 20 seconds), it starts acting as the new Master for its group.


Due to this auto-discovery, we were never forced to configure channel assignments or initiate backhaul links to create our test mesh or avoid co-channel interference. However, we learned that Masters can only re-evaluate the DWDS channel every 1 to 24 hours, or at a configured time of day. This delayed our mesh’s reaction to 802.11a competition, although once MSC re-evaluation started, the entire mesh reliably shifted to the same new channel within two minutes (see below).






As a local mesh is forming, each downstream node (Alt-Master or Slave) tries to establish a link to the “best” upstream node, based on Signal to Noise Ratio (SNR) and hop count. Configurable parameters influence this decision (see below). For example, the default SNR cost per hop is 10 Db, so a Slave will prefer a Master with SNR=45 over an Alt-Master with SNR=50, but readily shifts to an Alt-Master with SNR=60, assuming both Alt-Masters are one hop from the Master. Links are only attempted to nodes that exceed a configured minimum SNR (by default, 20 dB). If an uplink goes down, downstream nodes try to reconnect for a designated period (by default, 10 seconds) before re-starting this DWDS node discovery and link assessment process.






In general, our test mesh recovered quickly and predictably after loss of an individual MAP, such as when we disabled a DWDS radio or rebooted an upstream node. With default parameters, uplink establishment after MAP loss typically took about 30 seconds. During these events, hotspot clients lost Internet reachability for about 45 seconds and rarely required web user re-authentication. In other words, self-healing meant that many backhaul mesh breaks had no more user impact that a typical web page timeout and did not disconnect data application sessions or VPN tunnels.


Some tuning required


The automation afforded by LMP enabled self-recovery from most induced failures, but we did need some post-installation tuning to optimize results.


At first, our most-distant MAP did not reliably recover from upstream node loss. By watching that MAP’s DWDS SNR measurements, we found that just one other node consistently delivered SNR greater than the configured minimum. Because we could not move those MAPs closer, we adjusted our DWDS group’s minimum SNR just slightly. That let that the distant MAP stay connected by re-linking to alternative MAPs with marginal SNR only when absolutely necessary. The lesson here: although the DWDS configures itself, you may want to adjust MAP placement or settings to achieve not just desired hotspot coverage, but backhaul resiliency.


We followed Colubris guidelines to create a self-healing DWDS with one Master MSC and several Alt-Master MAPs, but quickly realized that we were too dependent on a single Master. When the MSC rebooted, our mesh often recovered within three minutes. However, users not only lost Internet reachability for this period—they were forced to re-authenticate, with noticeable application disruption. Worse, longer MSC outages left surviving MAPs physically inter-connected but unable to authenticate users or deliver Internet access. We could have connected another MAP to Ethernet as a second Master, but resiliency really required redundant DHCP, user authentication, and related network services. Consider this factor when deciding whether to host these higher-layer functions on a Series 3000 or 5000 MSC or on independent upstream servers.


Early on, our mesh occasionally refused to heal after MSC reboot, rooting itself to a less-desirable MAP instead. After experimentation, we found and fixed the culprit: distant Alt-Master MAPs that could not hear our MSC at all or very faintly. If a distant MAP assumed the Master role when the MSC went down, it would not relinquish that role when the MSC returned. MAPs in between would stick to their new (Alt-)Master until that distant MAP was rebooted or administratively forced back into discovery. We remedied this by changing those distant Alt-Masters to Slaves. Thereafter, our mesh reliably rooted itself to our MSC (with its essential Internet connection and public access controls) whenever that option was available.

Installation and provisioning

As a result of LMP benefits, our test network was installed and operational
within a few hours. We did not conduct a site survey or pull Ethernet cable—we just placed MAPs wherever we wanted hotspot coverage. We spent under an
hour configuring the MSC and integrating it with our Juniper AAA. We then spent
a couple of hours configuring every MAP with identical radio, network, and VSC
settings—this part would have gone faster if LMP supported Controlled Mode.

Controlled Mode provides auto-discovery and central provisioning of MAPs within
the same mobility group. Management actions that apply to every MAP, like VSC
configuration and firmware upgrades, are performed just once on the MSC and
pushed to all MAPs. However, data path functions, including QoS and security,
are still performed on each Controlled Mode MAP.

Alternatively, Colubris MAPs can operate in Autonomous Mode—the familiar
paradigm where each “fat AP” is configured independently. When we started our
test, we found that the initial Colubris LMP offering is limited to Autonomous
Mode. Colubris told us that Controlled Mode LMP, now in beta test, will be released
on 5000 series MSCs by October 2007, but is not being ported to 3000 series
MSCs like our 3300.

Autonomous Mode has no impact on DWDS group discovery and self-healing; those
local mesh capabilities were fully automated in tested products. What Autonomous
Mode lacks is central configuration of VSCs and related radio, QoS, and security
parameters. Instead, we had to hand-configure each MAP with identical settings,
using a laptop and cross-over Ethernet cable. This was not a big deal because
our hotspot was small and our VSCs were simple. But in a larger network with
complex VSCs, we believe that Controlled Mode LMP is essential to reduce effort
and ensure consistency.

Maintenance and monitoring

Determining mesh status and applying changes was consistently more challenging.
After learning a few tricks from tech support and becoming familiar log records,
we had little trouble understanding and adjusting our network’s behavior. Nonetheless,
this is where administrators will spend the bulk of their time and could benefit
from a few improvements.

Controlled Mode LMP will be a huge help for routine maintenance. In the meantime,
updates require access to each MAP’s web or ssh admin interface. This wasn’t
as easy as you might think. When access is controlled by an MSC, MAPs are treated
as users of the first configured VSC. For example, MAPs cannot reach an external
time server or syslog server unless they log into the VSC. Similarly, NAT bindings
can relay admin requests to MAPs behind an MSC, but responses are dropped if
the MAP is unauthenticated.

On our MSC, the first VSC required GuestNet web login. Tech support showed
us how to add MAP usernames and passwords to the MSC’s local list, then use
custom attributes to return those credentials in response to RADIUS Access Requests
from each MAP. This is awkward, but it works. Thereafter, each MAP authenticated
as soon as it sent traffic after DWDS link (re)establishment and could be administered
from our LAN.

Of course, MAPs cannot authenticate when DWDS links are down. When MAPs went
AWOL during our tests, we needed to administer them directly. Plugging a laptop
into each MAP’s Ethernet port was inconvenient, so we configured a hidden VSC
on each MAP for Wi-Fi admin access. We would not advise leaving this VSC enabled
in a production network, but it was extremely helpful for DWDS testing and fine-tuning.

Admin interface access is a means to an end. For monitoring, each MSC and MAP
supports real-time status queries and local log file access. For historical
event analysis, you can aggregate individual logs on a shared syslog server.
For example, we used those logs to determine when each node lost and re-created
DWDS links (see below).

To identify active mesh topology, we had to query each node for its own DWDS
group status (below), then correlate all displayed links and MAC addresses
to map out paths from root to leaf. SNR measurements for available-but-unlinked
nodes can also be seen, but on another screen. Thus, in a 5-node mesh, we had
to retrieve 10 screens to get a complete picture of DWDS status. We would love
to see that topology accessible from one place (preferably the MSC) in a future
release. In the meantime, plenty of useful and necessary information can be
found in log files—you just have to dig for it.

Conclusion

Note that we did not attempt to compare Colubris LMP to outdoor muni-wireless
mesh products—LMP is definitely not aimed at that space. Rather, LMP is an
enhancement for public and private Colubris hotspots, letting operators extend
their reach at lower cost. By putting LMP through its paces, we hoped to assess
this feature’s impact on Colubris hotspot administration and usability.

In static multi-AP hotspots, failures often cause users to connect to APs that
“sound good” but lack upstream connectivity. In our LMP-enabled hotspot, users
rarely made this mistake. Instead, users could roam throughout the network without
persistent loss of connectivity. After minimal tuning, our network availability
was good. In the end, significant disruption only occurred when our MSC went
down and stayed down—a vulnerability we could have reduced by adding upstream
redundancy.

Our tests also demonstrated how LMP lowers the cost and difficulty of hotspot
expansion. With the current release, network installation was faster and cheaper
than it would have been with Ethernet backhaul. Furthermore, the self-healing
DWDS clearly required less care and feeding than static wireless backhaul. However,
we found that LMP provisioning and monitoring capabilities are not yet fully
baked and look forward to seeing those improvements in the next release.

News Around the Web