VCAP5-DCA–Objective 3.3 – Implement and Maintain Complex DRS Solutions

For this objective I used the following documents:

Objective 3.3– Implement and Maintain Complex DRS Solutions

Knowledge

**ITEMS IN BOLD ARE TOPICS PULLED FROM THE BLUEPRINT**

  • Explain DRS / storage DRS affinity and anti-affinity rules
    • DRS affinity and anti-affinity rules
      • Two types of rules exist; VM-Host affinity rules and VM-VM affinity rules
      • VM-Host affinity rules
        • Allows you to tie a virtual machine or group of virtual machines to a particular host or particular set of hosts. Also allows anti-affinity for said objects
        • Before creating a VM-Host affinity rule you need to create a DRS group and a host group
        • Decide whether it is a “must” rule or a “should” rule
          • “Must” rules will never be violated by DRS, DPM, or HA
          • “Should” rules are best effort and can be violated
      • VM-VM affinity rules
        • Used to keep virtual machines on the same host or ensure they do NOT run on the same host. If you had two servers that provide load-balancing for an application, it’s a good idea to ensure they aren’t running on the same host
      • VM-VM affinity rules shouldn’t conflict with each other. Meaning, you shouldn’t have one rule that separates virtual machines and another rule that keeps them together. If you have conflicting rules then the older rule wins and the new rule is disabled
    • Storage DRS affinity and anti-affinity rules
      • Storage DRS affinity rules are similar to DRS affinity rules, but instead of being applied to virtual machines and hosts they are applied on virtual disks and virtual machines when using datastore clusters
      • The three different storage DRS affinity/anti-affinity rules are Inter-VM Anti-Affinity, Intra-VM Anti-Affinity and Intra-VM Affinity(The “intra” rules are also known as VMDK anti-affinity and VMDK affinity)
        • Inter-VM ant-affinity allows you to specify which virtual machines should not be kept on the same datastore within a datastore cluster
        • Intra-VM anti-affinity lets you specify the virtual disks that belong to a particular virtual machine are stored on separate datastores within a datastore cluster
        • Intra-VM affinity will store all of your virtual disks on the same datastore within the datastore cluster (this is the default)
      • Storage DRS affinity rules are invoked during initial placement of the virtual machine and when storage DRS makes its recommendations. A migration initiated by a user will not cause storage DRS to be invoked
      • You can change the default behavior for all virtual machines in a datastore cluster by modifying the Virtual Machine Settings

image

      • This allows you to specify VMDK affinity or VMDK ant-affinity

 

  • Identify required hardware components to support DPM
    • DPM uses on of the following methods to bring hosts out of standby:
      • Intelligent Platform Management Interface (IPMI)
      • HP Integrated Lights-Out (HP iLO)
      • Wake on LAN (WOL)
    • IPMI and HP iLO both require a base management controller (BMC) – this allows access to hardware functions via a remote computer over LAN
      • THE BMC is always on whether the host is or not, enabling it to listen for power-on commands
    • IPMI that uses MD2 for authentication is not supported (use plaintext or MD5)
    • To use the WOL feature instead of IPMI or HP iLO the NIC(s) you are using must support WOL. More importantly, the physical NIC that corresponds to the vMotion vmkernel portgroup must be capable of WOL
      • In this case you can see that my vMotion vmkernel is located on vSwitch0, which has vmnic0 as its uplink
      • If you look at the Network Adapters section (host > configuration > network adapters) you can see that vmnic0 has WOL support

image

 

 

  • Identify EVC requirements, baselines and components
    • Enhanced vMotion Compatibility (EVC) is used to mask certain CPU features to virtual machines when a host(s) in a cluster have a slightly different processor than other hosts in the cluster
    • An AWESOME knowledge base article answers a lot of questions about EVC; VMware KB1005764
    • There are multiple EVC modes so check out the VMware Compatibility Guide to see which mode(s) your CPU can run
    • Enable Intel VT or AMD-V on your hosts
    • Enable the execute disable bit (XD)
    • CPUs must be of the same vendor

 

  • Understand the DRS / storage DRS migration algorithms, the Load Imbalance Metrics, and their impact on migration recommendations
    • DRS and Storage DRS use different metrics and algorithms, so I’ll talk about each of them separately
    • DRS
      • By default DRS is invoked every 5 minutes (300 seconds). This can be changed by modifying the vpxd configuration file, but it is highly discouraged and may or may not be supported
      • Prior to DRS performing load-balancing it will first try and correct any constraints that exists, such as DRS rules violations
      • One constraints have been corrected, DRS moves on to load-balancing using the following process:
        • Calculates the Current Host Load Standard Deviation (CHLSD)
        • If the CHLSD is less than the Target Host Load Standard Deviation (THLSD) then DRS has no further actions to execute
        • If CHLSD is greater than the THLSD then:
          • DRS executes a “bestmove” calculation which determines which VMs are candidates to be vMotioned in order to balance the cluster. The CHLSD is then calculated again
          • The costs, benefits and risks are then weighed based on that move
          • If the migration does not exceed the costs, benefits, and risks threshold, the migration will get added to the recommended migration list
        • Once all migration recommendations have been added to the list, the CHLSD is then calculated based on simulating those migrations on the list
      • The tolerance for imbalance is based on the user-defined migration thresholds (five total). The more aggressive the threshold, the lower the tolerance is for cluster imbalance
      • For a much deeper dive into DRS calculations, check out chapter 14 of the vSphere 5 Technical Deepdive mentioned at the top of this post
    • Imbalance Calculation and metrics
      • As mentioned earlier, load imbalance is when the CHLSD is greater than the THLSD.
      • Some things that will cause the DRS imbalance calculation to trigger are:
        • Resource settings change in a virtual machine or resource pool
        • When a host is added/removed from a DRS cluster
        • When a host enters/exits maintenance mode
        • Moving a virtual machine in/out of a resource pool
    • Storage DRS
      • There are two types of calculations performed by Storage DRS; initial placement and load-balancing
      • As with DRS, Storage DRS has a default invocation period, however it is much longer – 8 hours is the default interval. Again, it is not recommended that you change the default interval
      • Initial placement takes datastore space and I/O metrics into consideration prior to placing a virtual machine on a datastore. It also prefers to use a datastore that is connected to all hosts in the cluster instead of one that is not
      • Storage DRS Load imbalance
        • Before load-balancing is taken into consideration, corrections to constraints are processed first. Examples of constraints are VMDK affinity and anti-affinity rule violations
        • One constraint violations have been corrected, load-balancing calculations are processed and recommendations are generated
          • There are Storage DRS rules that are taken into account when the load-balancing algorithms run; Utilized Space and I/O Latency. Recommendations for Storage DRS migrations will not be made unless these thresholds are exceeded
          • Additionally, you can set advanced options that specify your tolerance for I/O imbalance and the percentage differential of space between source and destination datastores
            • Example: destination datastore must have more than a 10% utilization difference compared to the source datastore before that destination will be considered
        • Storage DRS also calculates a cost vs. benefits analysis (like DRS) prior to making a recommendation
    • Besides the standard invocation interval, the following will invoke Storage DRS:
      • If you manually click the Run Storage DRS hyperlink
      • When you place a datastore into datastore maintenance mode (the I/O latency metric is ignored during this calculation)
      • When you move a datastore into the datastore cluster
      • If the space threshold for a datastore is exceeded
    • There are a lot more technical details involved, such as workload and device modeling, but these facets of Storage DRS are complex and would make this post extremely long, If you care to review these, check out chapter 24 of the vSphere 5 Technical Deepdive mentioned at the top of this post

 

 

Skills and Abilities

  • Properly configure BIOS and management settings to support DPM
    • This will be slightly different for each system depending on the BIOS that it’s running. You will also need to configure your IPMI or iLO settings if you are using either of those technologies to support DPM. Most IPMI controllers (BMCs) will have their own configuration screen that can be accessed when booting the host
    • Some BIOS may require you to enable to WOL feature (if it’s an onboard NIC)

 

  • Test DPM to verify proper configuration
    • Before you can use the WOL option for DPM and enable it on a DRS cluster you must first successfully enter Standbymode and power the host back on successfully, If you aren’t able to successfully power the host back on after entering standby mode then you can need to disable the power management setting for that host
      • Log into the vSphere client
      • From the inventory tree right-click the cluster and select Edit Settings…
      • Under Power Management click on Host Options
      • In the right, find the host(s) that failed to exit standby and under the Power Management column select Disabled from the dropdown box

image

    • Click OK

 

  • Configure appropriate DPM Threshold to meet business requirements
    • As a business, all resources consumed cost money and being efficient as possible while still meeting business requirements is important. Using DPM can save you on unneeded power consumption, but you don’t want to use it to the point of negative returns. Setting the DPM threshold for your cluster(s) is an important consideration. You set the DPM threshold by:
      • Log into the vSphere client
      • From the inventory tree, right-click on your DRS cluster > click Edit Settings…
      • Under vSphere DRS click the Power Management setting

image

    • Here you can see that there are three different options you can choose; Off, Manual and Automatic
      • Off – power management is turned off
      • Manual – vCenter will give you recommendations during low resource utilization for hosts that can be put into standby mode
      • Automatic – vCenter will automatically place hosts in standby mode based on the DPM threshold that is set
    • Setting the Automaticoption and figuring out the DPM threshold to use is where business requirements are factored in. Before we can make the correlation, let’s talk about different migration thresholds. Like the DRS migration threshold, the DPM threshold is based on priority recommendations. The further to the right you move the slider, the more aggressive DPM becomes, the higher priority recommendations start to be included
      • There are five priority recommendations from 1-5
      • With the slider all the way to the left, only priority one recommendations are generated. When you move the slider to the right one notch, only priority one and two recommendations are generated. Each notch to the right will include a new priority level
    • Consider your hard requirements regarding resource availability. Determine if your workloads are capable of operating under resource contention should they need to wait for a host to be brought out of standby mode. Workloads can fluctuate, and while DPM will always keep enough resources powered on to satisfy admission control, it may not be able to react fast enough to meet resource demand

 

  • Configure EVC using appropriate baseline
    • EVC allows you to present the same CPU instruction sets to your virtual machines across a DRS cluster, even if the instruction sets of your physical CPUs across hosts are different. A few EVC requirements:
      • All hosts must have CPUs from the same vendor (Intel or AMD)
      • Hardware virtualization for each host should be enabled (Intel-VT or AMD-V)
      • Execute Disabled bit (Intel) or the No Execute bit (AMD) should be enabled in the BIOS
      • Any virtual machine that is running on a host with a higher CPU feature set than what is presented via the configured EVC baseline must be powered off prior to configuring EVC
        • If those virtual machines are not powered off then you will not be able to enable EVC
    • Unless you are using applications that take advantage of certain advanced CPU features that can potentially be masked by EVC you want to use the highest baseline compatible with your hosts. To configure EVC on a new cluster:
      • Log into the vSphere client
      • Right-click a datacenter from the inventory tree and click New Cluster…
      • Enter in a Name for the cluster (you will most likely want to enable DRS and HA, but for these purposes we’ll skip those steps and go straight to EVC)
      • Click Next
      • Choose Enable EVC for AMD Hosts for AMD processors or Enable EVC for Intel Hosts if using Intel processors
      • Choose an EVC mode
        • Intel

image

        • AMD

image

      • Each mode you select will give you a description of that mode, as well as the knowledge base article to look at (VMware KB1003212)
      • Complete the cluster configuration

 

  • Change the EVC mode on an existing DRS cluster
    • Changing the EVC mode or enabling EVC mode for the first time on an existing cluster can potentially be disruptive. As stated earlier, if you have virtual machines that are running on hosts that expose a higher level of advanced CPU features then are presented with the EVC baseline you want to configure, then those virtual machines must be powered off. To enable EVC mode or change the EVC mode on an existing DRS cluster:
      • Log into the vSphere client
      • Right-click the DRS cluster you want to modify from the inventory tree > click Edit Settings…
      • Select the VMware EVC option > click the Change EVC Mode… button
      • Select Enable EVC for AMD Hosts or Enable EVC for Intel Hosts
      • Select the desired mode from the dropdown
        • If the mode you select is not compatible with the processors running in your hosts you will get errors

image

        • If the mode you select is not compatible, possibly due to having powered on virtual machines running on hosts with greater CPU features than the selected EVC mode, or possibly due to a misconfigured BIOS setting on a host(s) you will see the following error

image

        • When you choose a mode that is compatible, it will show as Validation Succeeded

image

        • Click OK when finished

 

  • Create DRS and DPM alarms
    • Since DPM is a facet of DRS, I’ll cover creating DRS and DPM alarms together
    • One of the best pre-configured alarms for DRS/DPM is the Exit Standby Error. This is an event based alarm, so it will only trigger when the host/cluster reports an event of a host not able to exit standby mode
    • To create a new DRS/DPM alarm for a cluster:
      • Log into the vSphere client
      • Select a cluster from the inventory tree > on the right, click the Alarms tab
      • Click the Definitions button > here you will see a list of pre-defined alarms
      • Right-click a white area of that pane > click New Alarm…
      • Enter in an Alarm name and Description > from the alarm type dropdown select Cluster

image

      • Click the Triggers tab > click the Add button > click the event in the event column to get a drop down
      • Select which event you want. Here are a few DRS/DPM alarm events

image

      • Click on the Actions tab > click Add > select a desired action from the dropdown and when the action should be initiated (when alarm goes from green to red, red to green, etc…)

image

      • Click OK when finished

 

  • Configure applicable power management settings for ESXi hosts
    • Power management settings can be set on the hosts themselves (the active policy) or within the DRS cluster settings for DPM purposes
    • Set the Active Policy power management for an ESXi host
      • Log into the vSphere client
      • Select a host from the inventory tree > click the Configuration tab on the right
      • In the Hardware pane click the Power Management hyperlink > click the Properties hyperlink
      • Choose from one of the following power management policies
        • High Performance
        • Balanced
        • Low power
        • Custom
    • Set the Power management setting for DRS/DPM
      • Log into the vSphere client
      • Right-click a DRS cluster from the inventory tree > click Edit Settings…
      • Click  Host Options under vSphere DRS > Power Management
      • Here you will see a list of hosts that are part of the DRS cluster, under the Power Management column choose from one of the following settings
        • Default
        • Disabled
        • Manual
        • Automatic

image

        • Click OK

 

  • Properly size virtual machine and clusters for optimal DRS efficiency
    • You don’t want to size your virtual machines to the cluster, rather, you want to sized your clusters based on virtual machines
    • Properly sizing you virtual machine s and clusters can be tricky, especially if you don’t have hard requirements. Virtual machine sizing is the most important, and cluster sizes will be based on how you size your virtual machines with a percentage added in for scale and redundancy
    • In order to get optimal DRS efficiency from your clusters you want to
      • Ensure each host has the same resource configuration (memory, CPU)
      • DRS Clusters support a maximum of 32 hosts and 3000 virtual machines
      • Put vMotion on a separate layer network and use 10Gb if possible, also multiple NIC vMotion
      • Don’t set VM-HOST affinity rules (must rules) unless you absolutely have to
      • Don’t change the default automation level per virtual machine if you don’t have to
    • Don’t oversize your virtual machines, wasted resources can cause cluster imbalance

 

  • Properly apply virtual machine automation levels based upon application requirements
    • When creating a DRS cluster you set a virtual machine automation level for the cluster. There might be some use cases that require a virtual machine, or a set of virtual machines, that require a different level of automation then what the default for the cluster is. You can set automation levels for virtual machines individually
      • Do this sparingly. The more individual changes you make, the more management overhead you add, as well as potentially reducing the effectiveness of DRS
    • Why would you want to make changes to an individual virtual machine?
      • Applications might have to stay on a particular host due to licensing requirements
      • If you have an application that is constantly changing its memory contents, you may want not want it to move hosts as often as other virtual machines
    • Apply automation levels to individual virtual machines
      • Log into the vSphere client
      • Right-click on a DRS cluster from the inventory tree and click Edit Settings…
      • Under the vSphere DRS option choose Virtual Machine Options
      • Ensure that the Enable individual virtual machine automation levels checkbox is checked
      • In the Automation Level column, change the virtual machine(s) to the desired automation level using the dropdown

vm_automation

      • Click OK

 

 

  • Administer DRS / Storage DRS
    • Administering a DRS cluster involves creating and managing DRS affinity and anti-affinity rules, DRS virtual machine groups, DRS cluster validation and standard addition/removal of hosts from the DRS cluster
    • All administration takes place within the GUI and almost all of it within the cluster settings
    • Administering DRS
      • Adding and removing hosts
        • This is pretty straight forward; right-click on the cluster and click Add Host and go through the wizard
        • To remove a host from the cluster the host must be in maintenance mode first
      • Cluster Validation
        • A cluster can become overcommitted or invalid. The cluster object in the inventory tree will show yellow for overcommitted and red for invalid. A cluster can become invalid if you make changes directly to a host, and those changes aren’t reflected in vCenter. When vCenter comes back into the mix, there is a mismatch, which causes it to become invalid
      • Creating VM Anti Affinity/Affinity rules
        • There is some overlap between some of the VCAP-DCA objectives and the VCP5 objectives. While I hate referring you to another link to get information, I feel that it isn’t very efficient to duplicate some of these items when I could be continuing with other objectives in the blueprint.
    • Storage DRS can only be used with a new construct known as Datastore Clusters. With this new construct, come different points of administration, such as datastore maintenance mode, Storage DRS scheduled tasks, Storage DRS recommendations and, as with DRS, automation levels for individual virtual machines
    • Administering Storage DRS
      • Storage DRS Maintenance Mode
        • Must be manually invoked and is only available to datastores within a datastore cluster
        • Log into the vSphere client
        • Navigate to the Datastores and Datastore Clusters view (Ctrl+Shift+D)
        • Right-click on the datastore within the datastore cluster and click Enter SDRS Maintenance Mode

sdrs_maint

      • SDRS Scheduling
        • You can schedule Storage DRS to run at certain time (such as when little/0 users are at the office) in order to move vmdks to different datastores within the cluster. You then set the end settings which will revert SDRS back to its original configuration, or to a configuration you specify
        • Log into the vSphere client
        • Navigate to the Datastores and Datastore Clusters view (Ctrl+Shift+D)
        • Right-click a datastore cluster from the inventory tree > click Edit Settings…
        • Choose SDRS Scheduling
        • Click the Add… button

sdrs_sched_add

        • Enter a Start and End time as well as the Frequency > click Next
        • At the Start Settings page enter in a Description
        • Choose the Automation Level (Manual or Fully Automated)
        • Enable the I/O metric for SDRS recommendations (optional)
        • Set the Utilized Space (%)
        • Set the I/O Latency (ms)
        • Decide and set your I/O imbalance threshold (see screenshot for description)

sdrs_sched_start

        • Click Next
        • At the End Settings page enter in a Description
        • Leave the Restore settings to the original configuration checkbox checked
          • If you uncheck this option, set the Utilized Space (%), I/O Latency (ms)  and the I/O imbalance threshold

sdrs_sched_end

        • Click Next
        • Click Finish
      • Storage DRS Recommendations
        • Before you can get Storage DRS recommendations, or use it period, you need to make sure it is enabled on

sdrs_enable

        • Log into the vSphere client
        • Navigate to the Datastores and Datastore Clusters view (Ctrl+Shift+D)
        • Choose a datastore cluster from the inventory tree > click the Storage DRS tab on the right
        • In the Storage DRS Recommendations pane you can choose any pending recommendations and apply them

sdrs_recommend

      • Storage DRS Virtual Machine Settings
        • There are two parts that make up virtual machine settings; the automation level and the option to keep VMDKs (disk affinity) together
        • Log into the vSphere client
        • Navigate to the Datastores and Datastore Clusters view (Ctrl+Shift+D)
        • Right-click a datastore cluster from the inventory tree > click Edit Settings…
        • Choose Virtual Machine Settings
        • For each virtual machine you want to change, set the Automation Level (Fully Automated, Manual, Default or Disabled)
        • Check/uncheck the box to Keep VMDKs together

sdrs_vm_options

        • Click OK
      • Storage DRS Rules
        • Like DRS, Storage DRS has rules you can setup. Rules to keep VMs separate (VM anti-affinity), which means disks from those particular virtual machines will be kept on different datastores within the datastore cluster. The other option is the VMDK anti-affinity which separates virtual disks that belong to a particular virtual machine on different datastores within the datastore cluster
        • Log into the vSphere client
        • Navigate to the Datastores and Datastore Clusters view (Ctrl+Shift+D)
        • Right-click a datastore cluster from the inventory tree > click Edit Settings…
        • Choose Rules > click Add…
        • Enter in a Name for the new rule > choose the type of rule from the dropdown

sdrs_rules_vmdk

          • VMDK anti-affinity
            • Click Add
            • Click the Select Virtual Machine button
            • Choose a virtual machine from the list > click OChoose the virtual disks you want to separate (in the screenshot below there is only one virtual disk, you need at least two before you can proceed)
            • Click OK

sdrs_rules_vmdk1

          • VM anti-affinity
            • Click Add
            • Select two or more virtual machines from the list
            • Click OK

sdrs_rules_vm1

            • Click OK when finished

Tools

Leave a Reply

Your email address will not be published. Required fields are marked *

*