Very Technical

XenMotion with HP ProCurve Switches13 Mar

What good is a high availability solution without testing?

Not worth waiting three minutes for, as we found out this morning in a test for a client’s solution implementation.

How Does It Work?

Our design consists of two HP Proliant servers running Citrix XenServer 5, a Compellent SAN, a bunch of Windows hosts (both clustered and non-clustered) and a couple of HP Procurve switches connecting everything together with redundant paths.

DedicatedIT's Standard XenServer HA Design

XenServer 5 (enterprise and higher editions) has the capability to move active virtual machines between hosts.  Live.  On-the-fly.  No downtime.  No interruptions. As with any solution, this capability is dependent on a proper foundational architecture. This includes shared storage (SAN) accessible by all Xen hosts in the resource pool and an enterprise-class Ethernet infrastructure.  One popular line of networking products for the SME segment is, of course, HP ProCurve. Cisco is always a good choice as well.

XenMotion VM Migration Normally Quick — Not Today

Xen’s Marathon Technologies-based VM high availability feature restarted the protected VMs after our simulated host failure quickly and without any manual remediation.  When the original host was restored, I moved the protected VMs back to their home server.  After the move, there was no more network connectivity to any IP outside of the host on which the VMs now reside — for three whole minutes.  Not acceptable, nor typical, given our own experiences with XenMotion VM migration.  We’ve moved VMs running Citrix XenApp with 20+ active clients between hosts. It’s so seamless normally that users can’t notice.

PING Network Connectivity Failure

I immediately suspected MAC problems, since the migrated VMs could still reach the other workloads on the same host and the hosts magically opened to the rest of the world after three minutes.  Obviously, it wasn’t an ARP issue since the virtual MAC assigned to a VMs network adapters are permanent – even between hosts.  So, it appeared to be a MAC table issue on the switch.  This was puzzling to me, as I’d never encountered such a problem before.

A Lesson on HP Procurve Switch MAC Tables and MAC Age

A switch’s MAC  table is built from packets leaving a node and entering the switch.  It grabs the source MAC address from a packet entering the port and adds it to the table, along with the port it came in on.  The switch uses that table to direct other inbound packets with that destination MAC to the proper port to avoid the need to broadcast the packet to every port .   Since MAC tables are built passively from inbound packets (as opposed to ARP caches on a node which are built actively), they tend to converge very, very quickly – especially since switches update the table with every passing packet.  All switches I’ve encountered exhibit that same update behavior – all of them except this HP ProCurve 2510G.

MAC Table - Courtesy cisconinja.wordpress.com/

MAC Table - Courtesy cisconinja.wordpress.com

The Solution

I proved my racing suspicions by dropping the age limit on entries in the MAC table, and the VM’s IP connectivity correlated with the table’s aging time.  Then, I came upon a fellow who had the exact same problem with the exact same switch.  Turns out that the switch was not accepting updates for entries already on the MAC table.  The entry had to expire before the MAC could be added back to the table with it’s new port.

His story can be found on the Citrix forums here.  HP has corrected the bug as of version Y.11.08 as indicated in their release notes.

Until next time, happy VM migrations!

  • email
  • LinkedIn
  • Digg
  • Slashdot
  • Technorati
  • Google Bookmarks
  • Live
  • StumbleUpon
  • Facebook
  • Twitter

One Response to “XenMotion with HP ProCurve Switches”

  1. S. FL Business Owner Reply

    TLM: XenMotion on an HP Procurve switch with old(er) firmware blew up in our faces this morning: http://tinyurl.com/XenMotionProcurveFAIL

Leave a Reply

About

DedicatedIT provides premium IT network support and consulting to small businesses with 10 to 150 employees DedicatedIT is different, because we understand that providing technology solutions to small businesses involves more than just having highly-skilled technical people on staff.

We promise:

  • Access to personable technologists when YOU need them.
  • Fewer computer problems than you have ever had before.
  • The best experience you have ever had with an IT company.

We are known for:

  • Our excellent service. Really, we’re insane about this.
  • No hourly charges.
  • Guaranteed response in under an hour.
  • Our community involvement and corporate motto of “do the right thing“.

Contact

Network Support:

Sales / Solutions Experts:

twitter ContactFacebook - DedicatedIT Computer Network Support