In my last blog about vSphere HA basic concept, I explained the conceptual part of vSphere HA with some design tips.
Now, in the continuation of the same topic, I am going to explain Admission control Policy that we use to manage vSphere HA cluster for better resource utilization and management.
There are two types of Admission control policy that runs on top of vSphere HA
Percentage based Admission control Policy
Slot based Admission control Policy
In-short, slot based Admission control policy is more rigid and best suited for common / identical hardware based cluster whereas percentage based Admission control policy is more lenient and flexible policy that support all kind of clusters whether identical hardware based or of non-identical but with same processor family
What is Admission Control Policy?
It is the policy which would not let you start or power-on VM on top of ESXi host holding reserved capacity of resources for any disaster that may happen due to Hardware failure / Network disconnectivity. So, in a nutshell, Admission Control Policy (ACP) is used to keep a portion of hardware resources reserved (from pool of resources) for rainy days (Disastrous situation).
Below picture explains ACP at a glance
Formula to calculate and manage ACP
you can use formula for "Percentage" based ACP by looking into resources like
Reserved CPU
Reserved Memory
for Reserved CPU based resource reservation for ACP, you need to use Mhz / Ghz for a VM
Available Capacity - (Reserved CPU x number of VMs) / Total Capacity of CPU = %Percentage based ACP
For Example you got 2 VMs with 500 Mhz reserved for each VM out of 3.x Ghz CPU capacity per Host (holding Single Processor Single Socket) then formula above will be looking like as of below
3000Mhz - (500(Mhz) x 2 (VMs)) /3000 Mhz (Just to convert the remainder into percentage = 66% is the total Failover capacity now you can reserve how much in percentage for Admission control lets say 30% then remaining would be 36% left behind for your day 2 administration and consumption.
Similarly, We will be calculating Reserved Memory for VMs as of below formula
Total Memory of ESXi host - (Reserved Memory x Number of VMs) / Total Memory of ESXi host
For Example, there are 2 VMs with 1 GB Reserved memory Each and total amount of memory installed in ESXi host is 64 GB then below formula looks like
So, 96% is the failover capacity left behind that you can further calculate to reserve as Admission Control value like 30% reserved for ACP in this case the remaining capacity for Memory will be 66%.
Most of the times and most manageable calculation for Clusters for vsphere HA is "Percentage Based ACP"
Think about a cluster built by Server virtualization platform "vSphere". This cluster purpose is to pool up all the resources of available ESXi hosts running within the cluster.
Background
How can we build the cluster?
Of-course, through vCenter Server you can build the cluster. For more information about vSphere HA and Cluster object, you can read my last article for "Understanding vSphere HA".
In-short, vSphere HA is used to restart VMs (on surviving host in a cluster) if any disastrous situation happens to the ESXi host either Hardware issues, Network isolation etc.
So, cluster is a separate vCenter server object the vSphere HA or DRS. As vSphere HA or DRS are the services those run on top of cluster object and try to maintain resources amongst ESXi host in it.
Definition and Concepts
DRS is the short name of Distributed Resource Scheduler. This is the service that runs on top of the cluster of ESXi hosts and keep an Eye on VMs need for resources like
CPU
Memory
Storage IOPs
Network Bandwidth
DRS is a vCenter Server service that runs maintains vCenter Server object "Cluster". Each cluster got separate DRS configuration.
With the introduction to vSphere version 7.x and above DRS focuses on VM centric resource utilization dividing VMs into Happy (>80) and Un-Happy (<60) VMs scoring. You can see this scoring is about how VMs are having their desired resources (mentioned above) running on top of relevant ESXi host in the cluster.
So, if a VM is not having a good score than that VM is considered / marked not happy on relevant ESXi host and needed to be moved to next available (Qualifying) ESXi host depending on DRS configurational Settings. These setting we shall discuss below
Requirements
DRS is always required if we want vSphere Management to take control of resource management and meet the SLA of "Applications" running on top of our virtualization platform. In this case, below are some requirements we always need to follow and focus
Enterprise Plus License for vSphere (May change in future but for DRS full functionality, this license is needed)
vCenter Server - DRS is a vCenter Server service
Cluster object must be configured before DRS enablement - becuase DRS always works with Cluster
Shared Datastore - for Compute migration using vMotion amongst ESXi host in a cluster
vMotion VMKernel Configuration - is a must without this configuation only Manual DRS config works
Configuration
This thing must always kept in mind that DRS focuses on resource mangement on top of a cluster not on individual ESXi host.
So you select the cluster which is required to configure DRS and then you click Configure Tab of vSphere Client as shown in below step
IF you see in the picture that DRS is already configured but you can do configuration by clicking Edit option available in the same interface as you can see below
Once, you click edit, the same configurational page will be appeared that can be used to reconfigure the DRS service on selected cluster object.
in the Edit DRS page you will see numerous configurational settings as you can see below
So, DRS is configurable in 3 different modes / levels
Manual Level
This is the default and first option offered by DRS that does not require you to configure vMotion and it only provide "Recommendation" for workload movement based on their demand for resources.
Partially Automated
This is the level in which it covers "Manual Level" feature which is "Recommendation only" and it also gives you option of "Initial Placement" which means before the startup of a workload/VM DRS decides which ESXi host is good to entertain the workload/VM in the cluster. So this decision capability is the part of this level.
Fully Automated
This level is always "Recommended" and it require all the requirements mentioned above in the requirements section. I provide feature sets of above 2 levels and automatically moves the workload as required by the system and needed.
Migration Threshold
This option is required to be configured that how often vMotion should be done if DRS level is set to Fully Automated. Because vMotion has always got a cost in terms of Migrating VMs from one host to another with respect to Bandwidth utilization in between source and target ESXi host.
It is always Recommended to put the ball in between Neither so Conservative nor so Aggressive.
Conservative
In this level of threshold, if a VM need resources then DRS will not immidiately respond instead it would wait longer so that resource spike may settle which could result in delay providing resources when needed by VM
Aggressive
In this level of threshold, if a VM got a slight spike of resource high utilization then a vMotion (VM migration) going to happen which could result massive level execution of vMotion keeping VMs move back and forth at all times
So, moderation is the Best policy!!
Predictive DRS
As you know that DRS by nature is a Re-active service so as VM got resrouce spike only then it responses back but now its nature got more of Pro-active as well.
If you enable this option (that works well with VMware Aria Operation older name vRealize Operations) so, its look into the predictive Analysis of VAO/vROPs or vCenter server provided logs as well and moves the VM if needed based on time based resource utilizations like for example a VM with Active Directory services require more during Office startup time when employees are walking into the office and start logging in. So, VM must be provided good resoruces during 900hrs till 1100 or depending on number of logon requests to AD
Virtual Machine Automation
If you enable this option or check box checked then you can add "Exceptions" for VMs in the cluster to remove such VMs from DRS rules and implications. Like you can add vCSA VM not to be affected in a cluster having Management and compute workload al-together.
So no Affinity and Anti-Affinity rule would be applied to such VMs when rules are actually applied to the whole cluster. In this case, VMs are consider to be configured at separate Level of DRS and consider Exceptional VMs
I will come up with practical demonstration on my YouTube Channel. If you haven't Subscribe than please do have a look and subscribe.
There are on broader level 2 types of migration for workloads depending on their status (Either Power-on or Power-off). If the workload is Power-off and you migrate the VM then it is call or known as Cold migration. But if the workload is Powered On and you migrate than it is called or known as Hot Migration.
Where do we do hot migrations and what are the benefits?
Migrate VM when its powered on and users are connected without loosing Connectivity and data.
Hardware maintenance of Hyper-visor is not a problem if workload is Powered on and connected
This migration can be done automatically depending on Hardware resources availability using DRS
Lets, talk about this in more details
If an application that represents a Business and Business availability then it should not be unavailable by any means. Whether you update or upgrade Hardware/Software or do maintenance activity while 100s and 1000s of people are connected to that Application.
If someone says that we can achieve this by having redundancy of application interfaces than the answer to that response is "Yes" but not 100% if number of connections are served by 1 Interfaces of Application Which runs on top of a hardware that require upgrade or maintenance than Connection are required to be lost resulting in Application unavailability or interruption.
So, vMotion is a technology that keep the Virtuali machine available even in the process of underlying hardware update or upgrades. And even in the event of software upgrades of hyper-visors (ESXi hosts) without letting users know about it while these users are connected to the very same instance of VM.
Requirements!
Configure VMKernel ports for ESXi host to ESXi host communication
250 Mbps minimum required.
Configure common (Shared Storage) Data-store for VMs amongst the ESXi hosts
ESXi hosts must have common type of CPU (Family = Intel / AMD)
VMs must not connected to the local hardware resources like DVD, CPU Pinning or local Datastore
If you comply with the above points then you have configured your vMotion configuration easily and you paved the path to configure DRS which is the automated (AI based) and VM score based service to identify VM requirements for resource need.
how to initiate vMotion for a VM?
Just right click the VM and from the pop-up menu select migrate
Choose "Compute" migration and select the "Target" host.
Step#1 From Pop-up menu choose "Migrate"
Step#2 Choose type of migration "Compute"
Step#3 Choose compute (i.e. Target ESXi host where you want to migrate selected VM)
Step#4 is to select network if VM was on virtual standard Switch but if VM is on distributed switch then no need to go for network stuff because network state and configuration moves with the VM (or remain the same accross ESXi host only if target ESXi host falls under same vDS)
Step#5 you also choose VMotion priority. By default is the "Normal" priority but you can set it to high priority while having multiple VMs migration. This is a system setting doesnt require any numbers to set (like Higher number high priority etc).
If you set high priority for VMs then allocation of resources like CPU and Memory will be provided at higher priority (first) amongst other VMs.
And at the last page when your migration is about to initiate, it tells you whether you can migrate the VM/VMs or not (may be because of compatlity issues or some other problems).
As soon as you initiate the migration the VM(s) migration process initiates and finishes up without connected users of that Workload know about migration.
So what happens in this migration?
As we mentioned above in "Requirements" that a single vMotion require 250Mbps. Why is it needed and what kind of data it moves?
VM occupied Memory on source ESXi host migrates to the Target ESXi host using vMotion VMkernel
Because Memory pages update and managed so quickly then after copying complete memory from source ESXi host to Target ESXi host any new changes are then copied as "Bitmap" images until minor changes just completed
Then comes a quick pause to switch over from source to target esxi host known as "Quise"
During this phase a reverse ARP / Gratituious ARP initiated by target ESXi host letting first hope physical switch know for updating CAM table to initiate VM traffic right after VM switch-over happens.
Because VM data files are located on a single (Datastore) storage so its access by Target host is required which will be assessed and verified before migration initiates.
We shall discuss "Cross-vCenter vMotion" later. I hope you enjoyed the topic. stay tuned!
vCenter server Appliance (vCSA) is the management tool that enhances the administration and management easy for the life cycle of
ESXi hosts
Virtual Machines
Other Management Services (like NSX, vSAN, VMware Aria, vSphere 8 with Tanzu etc.)
Internal Architecture
vCenter Server Appliance was introduced back in (around) 2017 with the introduction to vSphere 6.0. when VMware Announced Photon OS (a flavored Linux owned by VMware) as container optimized OS. So this appliance is comprised of 3 Major parts, let's discuss this
OS (Photon OS)
Postgres SQL (vPostgres)
vCenter Server Services
It is understood that you cannot deploy vCenter server Appliance on a Bare metal (as you were able to do when vCenter server for Windows was there) but yes you can deploy it on ESXi host as a VM.
In the beginning, vCSA was with 2 GUI interfaces
vSphere Web Client
vSphere Client
But with the introduction to vSphere 7 and above only vSphere Client left behind which is simpler and more independent than "Web Client" which was dependent on "Adobe Flash Plugin".
So, Now, Let's talk about vCenter Server Appliance Application services and their capabilities. vCenter Server Appliance is now a single VM having multiple services and some config changes to its architecture as well.
We discuss these updates and changes in more details one by one. So, let's start with
SSO
vCenter Server Single Sign-On (SSO) is a crucial component of VMware's vSphere (vCenter Server), providing authentication services to various VMware products within the vSphere environment. Here are the primary capabilities and features of vCenter Server SSO
Single Authentication source for VMware products
Integration with LDAP Servers (AD) or Open LDAP using SAML
Role based access and control of vSphere environment.
Upto 15 vCenter Server Instances using Single SSO domain can be managed
This is the AAA that is aligned with Internal vCenter Server Directory service "vmDIR" and that's the reason we always mention not to use common name as of Active Directory domain while defining SSO domain during the installation of vCenter Server.
VMDIR is a service that acts similarly as of Microsoft Active Directory technique of multi-master replication if you use Enhanced Linked Mode or ELM for vCSA instances.
ELM configuration can only be achieved during the installation of the new instance of vCSA. At the time when you are installing the second instance of vCSA it will ask you to go with new "SSO Domain" or choose an "Existing" one. So, you need to choose an existing one as shown below
Once this replication happens in between the two instances then ELM establishes connecting to vCSA instances with one another to share inventory objects based on RBAC.
Certificate Authority (VMCA)
In-order to be more independent and use VMware own certification authority for providing certificates for VMware platform-based products, now we don't need to have or maintain 3rd party CA(s) at all. vCenter Server itself can be used a certification Authority to produce, renew certificates for VMware platform products like ESXi host, VMware Aria family, vCSA iteself etc.
Web Services
vCenter server Appliance is equipped with GUI (vSphere Client) to access its Interfaces. There are 2 different types of Interfaces offered by vCenter server Appliance
vSphere Client - for datacenter Administration (Default port: 443) - can be changed using General settings of vCenter server.
We use Admin Interface by providing vCSA URL ("https://vcsa-fqdn:443/ui") and we use VAMI interface through ("https://vcsa-fqdn:5480"). both of the interfaces have got their own significance. It solely depends, what actually you want to do.
For example, if you want to do day-2 administration of the ESXi hosts and or VMs in the datacenter then you always go with Admin interface. But, if you want to do configurational changes like changing Appliance Password, IP address etc then you need Appliance Own interface which is known as VAMI.
License Service
This service is used to hold information about installed and assigned licenses for ESXi host and other solutions like NSX, vSAN and vCenter Server itself. This service provides common license inventory and management capabilities to all vCenter Server systems within the Single Sign-On domain.
Postgres DB
A bundled version of the VMware distribution of PostgreSQL database for vSphere and vCloud Hybrid Services. It is used to hold SEAT logs and vCenter Server Configuration. SEAT stands for Statistics, Events, Alarms and Tasks logs whereas vCenter Server Configuration covers Cluster, vDS, ESXi hosts and other inventory and configurational information within it.
When you do the back of your vCSA than it asks you to backup SEAT and Config or only Config information. So at this point this is the configurational information that you backup and restore when it is needed.
Its maximum capacity as per vSphere version 8 is upto 62 TB which is quite good and big for logs to retain for longer time period.
Lifecycle Manager (vCLM)
vCenter Server Life-cycle Manager previously known as Update Manager is a service that takes care of ESXi host and VMware Tools life-cycle management to maintain compliance and software patch management not only limited to ESXi host but Hardware Drivers can also be updated or deployed through this service as well.
Administrators can not only update existing ESXi host by downloading updates directly from VMware or In-directly from VMware through manual updates using FTP (File servers) but also can build ESXi host bundled images to push these images to bare metal servers.
vCenter Server Services
This is the collection of various distributed services that vCSA has to offer like
DRS
vMotion
Cluster Services
vSphere HA
vCSA HA
Other services
There are some other services most of these are by default disabled but you need to enable these. These are like
Dump collector Service
The vCenter Server support tool. You can configure ESXi to save the VMkernel memory to a network server, rather than to a disk, when the system encounters a critical failure. The vSphere ESXi Dump Collector collects such memory dumps over the network.
Auto-Deploy Service
The vCenter Server support tool that can provision hundreds of physical hosts with ESXi software. You can specify the image to deploy and the hosts to provision with the image. Optionally, you can specify host profiles to apply to the hosts, and a vCenter Server location (folder or cluster) for each host.
Syslog Collector Service
A central location for all the logs collected from ESXi host and vCSA or other VMware products to be retained for longer time period. You can have a dedicated vCSA as Syslog collector server for a centralized repository for logs depending on the company compliance policies. Example over here could be banks or telcos etc.
From version 8 and above this service is enabled by default but you need to configure it and can be integrated for troubleshooting Purpose with vRealize Log Insight new name VMware Aria for Logs or for monitoring/analytics purpose with vRealize Operations new name VMware Aria Operations.
You can configure Syslog Collector using VAMI Interface and then you need to configure other apps to send the logs.
So, this was a little introduction to vCenter Server Appliance but this is not all. We shall continue and dig deeper to understand the role of vCSA in combination to ESXi host as a hypervisor. Stay tuned...
For detailed explanation with demonstration please visit my Channel as well 😊
It was quite a long time i just got engaged in my Training deliveries that's the reason couldn't spare time to write a blog post.
Let's start our topic Discussions!
vSphere HA, we normally say or recognize it with a restart of VMs on surviving host in a vSphere cluster.
We normally use vsphere HA in vCenter server cluster object and is helpful in different situations like
ESXi host Hardware issues
Network disconnectivity among ESXi hosts in a cluster
Shared Storage connectivity or unavailability issues with ESXi hosts
Planned maintenance of ESXi hosts
How does it work?
vSphere HA, unlike its name (HA = High Availability), it restarts VMs on surviving hosts where VMs requirements are accommodated as shown in the below picture.
For example, if any ESXi host has got any hardware problem due to which it stops working resulting in unavailability of VMs. The (interrupted) VMs then be taken care by other available ESXi hosts in the same cluster to power them on accessing the same shared datastore.
This failures could be a Hosts Hardware/ Network interruptions /Storage in-accessibility etc.
So, it means we have to fulfill some important hardware requirements for vSphere HA. Let's discuss its requirements
The basic high-level requirements are as below
vCenter Server (vpxd)
Fault Domain Management (FDM-local to every host)
Hostd (local to every host)
Let's break these requirements into understandable pieces
Minimum 2 ESXi hosts and Maximum 64 ESXi hosts in a cluster
Minimum 1 Ethernet network with Static IP Address for host Recommended 2 Ethernet networks with static IP Addresses for ESXi hosts (Multiple Gateways)
Software Requirements
vCenter Server - To create cluster object
1 Management network must be common among all ESXi hosts in the Cluster
Enable vSphere HA on the cluster object
Minimum vSphere Essential plus kit license or single standard vCenter server license
Talking about high-level requirements, vCenter server is required to build or create cluster object and to push FDM agents to the ESXi host those are the part of cluster as member hosts.
FDM Agent is actually a service that runs locally inside each ESXi hosts in the cluster which is enabled with vSphere HA feature. FDM is the one who is taking care of all HA related actions like
HA logging
VM restarted on survining hosts
selection of Master node in a cluster
Management of vSphere HA all requirements
FDM service talks directly to "hostd" service of each ESXi host.
The basic purpose of "hostd" is to create/delete/start/restart/shutdown and infact all the necessary actions of ESXi host against VMs are taken care by "hostd".
vSphere HA Anatomy
When you enable vsphere HA on a cluster then the members of the cluster are divided into two basic parts
Master Node
Slave / Subordinate Nodes
There would be only one Master Node in a vSphere HA cluster and rest would be Slave / Subordinate Nodes. Total size of vSphere cluster could go upto 64 Nodes (1 Master & 63 Slave Nodes) vSphere 6.5 / 6.7 / 7
Master node has got all the responsibility to Restart VMs on available surviving hosts (slave / subordinate).
Master node has got responsibility to equally divide the workload of Restarting VMs on surviving hosts.
Master node has got responsibility to inform vCenter server about the current status of vSphere HA cluster
Master node has got responsibility to keep track of Heartbeat from Slave nodes either from Network or from datastore.
How Master node know all about the VMs which are required to be restarted on surviving hosts ?
There is a file named "Protected List" located on all shared Data-stores which can be accessed by Master Node in the cluster and held / occupied by Master-node.
This file contains information about Virtual Machines running on their respective hosts.
An-other file named "Power-on" file located on shared data-stores and accessible by all nodes including master node in the cluster. The purpose of this file is to maintain time stamp updated after every 5 minutes by all the hosts to mark the connectivity of all hosts for avoiding network isolation.
The significance of "Power-on" file is to let Master node know about network isolation impact on disconnected hosts from ethernet network. So, master node locates the alternate connectivity of such network disconnected hosts by looking into the latest time stamp with 5 minutes update after last accessibility to this file by the host using alternative to heart-beat channel other than ethernet network (which is datastores).
Minimum alternative heart-beat sources (in the form of data-store accessibility) is two. It is highly recommended to choose alternative datastores manually instead of letting vCSA to choose them (automatically) for you.
(Design Tip)
Design your vSphere HA network with redundant ethernet gateways and keep your shared storage network (fabric) physical separate. Incase of any network disaster, your vsphere design can survive / mitigate the situation.
How different Nodes respond to HA failure Scenarios?
Master Node:
Master nodes are responsible to restart failing host VMs on surviving hosts and updates "Protected List" file all across datastores it can access.
If Master node struck a failure (H/W Issue or Network Isolation etc), VMs running on top of this host shall be evenly distributed amongst the surviving hosts right after election process. What is Election process
All the slave nodes in the cluster send heart beat to each other and to master node and wait for the master node's heart beat.
If slave node do not receive Master node heart beat for 15 seconds then they consider it is dead
Slave node initiate a special broadcast which is known as election traffic which all the slave nodes sense and elect the next master node amongst them.
This election process continues for next 15 seconds right after slave nodes waited for master node's heart beat for 15 seconds.
Right after election process (which takes another 15 seconds) to elect one master node from remaining slave nodes, the elected Master node takes over the "Protected list" file and initiate (initial placement) affected VMs which were running on faulty Master node.
Conclusion:
Master node takes around 45 seconds to restart the virtual machines on surviving hosts.
Slave nodes:
These are the nodes which take instructions from Master node to take care of affected (failing host) virtual machines to be restarted.
If slave node stuck a failure (H/W and or network isolation) then Master node takes responsibility to restart VMs (from the failing host) to available hosts in the cluster.
Master node within 15 seconds takes decision and evenly distribute the VMs across the cluster amongst the surviving hosts in the cluster.
Conclusion:
Slave nodes take 15 seconds to restart the VMs amongst the surviving hosts.
About Network Isolation
In this kind of state, affected host or number of hosts cannot be able to contact their gateways and Master node will not be able to contact isolated hosts. That's the reason we choose alternative to ethernet in the form of data-store heart beat.
This kind of isolation would impact more if we have not taken care of ethernet design along with shared storage accessibility with redundancy.
Better Ethernet-network designs
Choosing better and physically separated topological approach for vSphere HA always helps a-lot. Just as you can see in below picture.
In the above picture which depicts the recommended approach for system traffic isolation, explains clearly that physical isolation of system traffic can be done through provisioning or creation of separate logical switches for separate system traffic.
Though in this picture, I have mentioned 2 separate traffics to be the part of same virtual switch which explains that you can also put different (system) traffics combined or logically separate as well.
An-other important aspect, I wanted to draw your kind attention over there is to look at redundancy from the very basic component (vmNIC) till physical switches. This approach can also lessen the impact of any network level disaster.
Note: You can use same DNS as well instead of using separate DNS zones for each network as shown or mentioned in the picture above.
Logical (Isolation) network
In this scenario, you can use as low as available number of vmNICs (Physical network cards). Specially in case of blade chassis. So, you can separate system traffic (like Management, vMotion, vSAN, FT, Replication etc) logically using vLANs.
Note: Better network design even save from disasters like shared storage unavailability resulting in problems like APD (All Path Down).