Solving All Paths Down (APD) when using EMC VPLEX

Let me preface this with a few facts:

  1. I’m not a software engineer! This article is merely a series of logical steps I’ve come up with that may, or may not, be technically possible to implement
  2. I’ve discussed none of this with the VPLEX product management team or VPLEX engineers, so some of my assumptions about the product may not be accurate

I’ve done a little bit of work with VPLEX and while it does a lot of awesome things with active/active storage and vSphere, there are a few limiting factors, particularly when talking about APD and a non-uniformed configuration. Before I being let me explain a few terms that will come up:

  • All Paths Down (APD)
    • This indicates that access to all devices in the storage array are dead according to the ESXi host. The ESXi host is still up and running
  • Permanent Device Loss (PDL)
    • This indicates that access to a particular device has been lost, and all paths show as dead, but the ESXi host can still contact the array, and receive SCSI sense codes. The ESXi host is still up and running
    • PDL behavior was greatly improved with the release of vSphere 5.0 update 1. This improvement brought along two advanced settings:
      • disk.terminateVMOnPDLDefault (advanced host setting) – When this is set to True a VM will be hard powered off whenever it issues an I/O to a datastore is in a PDL state (a datastore can only be put into a PDL state if it received a valid PDL sense code)
            • Side Question: if a VM has multiple disks on multiple datastores, and a disk that resides on a datastore gets put into a PDL state isn’t the OS disk, its logical to assume that the VM will still be powered off if an I/O gets issued to that non-OS disk. How can that be mitigated?
      • das.maskCleanShutdownEnabled (advanced HA setting) – setting this to True will make the hard power-off of a VM from a PDL condition get restarted by HA. If this setting is not set to True then HA will not restart the VM

Check out VMware KB2004684 for more detail on these conditions

VMware, VPLEX and PDL

I don’t want to get into what VPLEX is or how it works, that is documented in other places (VMware KB2007547). When there is a reason for I/O to be suspended on one side or the other (site isolation/partition) vSphere now recognizes this as a PDL event. In a non-uniform access configuration this is important because now VMs that are running on the site that has its I/O suspended, will now be PDL terminated and restarted via HA on hosts in the surviving site. Very cool stuff when you are in this non-uniform access configuration

VMware, VPLEX and APD

After reading Chad Sakac’s article on vSphere metro clusters I’ve interpreted it to mean that an APD scenarios is NOT seen as a PDL event and that in a APD scenario, VMs are not killed and restarted on the surviving site. If this assumption is inaccurate, and the VMs are PDL terminated then nothing in this article is really relevant and my deep apologies for wasting your time!

When storage fails on either site, but the hosts are still operational an APD condition occurs and the ESXi hosts essentially lock up. VMs continue to stay in a running state and are not restarted on the surviving site without manual intervention

A Potential Solution – Make the VPLEX Witness VMware HA aware  – Hopefully Solved!

What I mean by this is that since the VPLEX witness is already performing arbitration between the metro sites, why not get him more involved during a favor. Why can’t the VPLEX witness send API calls (do they exist?) to VMware HA to the hosts with the APD condition and instruct it to kill the VMs on hosts in the failed datacenter and bring restart them on the surviving site? Here is a ghettofied workflow that I threw together:


Logically that makes sense to me and from my perspective (which is a very ignorant one!) doesn’t seem like it would be THAT hard to accomplish.

Comments 6

  1. Pingback: Possibilities of VMware HA – VM Component Protection » ValCo Labs

    1. Post

      Thanks for the comment Olly. While the document does talk about PDL and APD, there is still an issue with APD if the storage on one side fails, but the hosts survive. So in a VPLEX Metro scenario, where site A happens to be hosting 4 virtual machines and the storage at site A fails, but the ESXi hosts remain up, the 4 VMs hosted in site A are NOT failed over to site B via VMware HA because the hosts in site A just see an APD occurring. Even in vSphere 5.1 this situation isn’t addressed. There are some possible futures on the horizon that will change APD behavior (, but for now we are stuck with manual intervention.

      1. Hi,

        Yes, if all paths to an ESX host are cut, then the host will APD (This is the same for any storage solution, not just VPLEX) however since a VPLEX cluster in either site will have at least two active directors (we can go up to 8 active directors per cluster) and also connected to a dual fabric, then this is an extremely unlikely event, especially if you keep each fabric switch in its own rack as you would need multiple failures to get into this situation.

        Furthermore, if you hook up a cross connect (detailed in the document) then you further protect yourself from APD as you have an alternate path to the remote active VPLEX cluster. You therefore need even more failures at that point to see an APD in this configuration. (multiple director/SAN failure plus dual WAN failure)

        Finally, if you deploy VPLEX Metro and have an entire storage array failure, VPLEX continues to service IO at both clusters (even without cross connect) so even this is a non-issue.



        1. Post


          Understood. It’s definitely a problem on the ESXi side and not the VPLEX side, perse, but the problem still exists. During an APD with VPLEX (and any other product), manual intervention is needed to get the VMs that were running on the APD host back up.


Leave a Reply to Olly Shorey Cancel reply

Your email address will not be published. Required fields are marked *