In my last blog about vSphere HA basic concept, I explained the conceptual part of vSphere HA with some design tips.
Now, in the continuation of the same topic, I am going to explain Admission control Policy that we use to manage vSphere HA cluster for better resource utilization and management.
There are two types of Admission control policy that runs on top of vSphere HA
Percentage based Admission control Policy
Slot based Admission control Policy
In-short, slot based Admission control policy is more rigid and best suited for common / identical hardware based cluster whereas percentage based Admission control policy is more lenient and flexible policy that support all kind of clusters whether identical hardware based or of non-identical but with same processor family
What is Admission Control Policy?
It is the policy which would not let you start or power-on VM on top of ESXi host holding reserved capacity of resources for any disaster that may happen due to Hardware failure / Network disconnectivity. So, in a nutshell, Admission Control Policy (ACP) is used to keep a portion of hardware resources reserved (from pool of resources) for rainy days (Disastrous situation).
Below picture explains ACP at a glance
Formula to calculate and manage ACP
you can use formula for "Percentage" based ACP by looking into resources like
Reserved CPU
Reserved Memory
for Reserved CPU based resource reservation for ACP, you need to use Mhz / Ghz for a VM
Available Capacity - (Reserved CPU x number of VMs) / Total Capacity of CPU = %Percentage based ACP
For Example you got 2 VMs with 500 Mhz reserved for each VM out of 3.x Ghz CPU capacity per Host (holding Single Processor Single Socket) then formula above will be looking like as of below
3000Mhz - (500(Mhz) x 2 (VMs)) /3000 Mhz (Just to convert the remainder into percentage = 66% is the total Failover capacity now you can reserve how much in percentage for Admission control lets say 30% then remaining would be 36% left behind for your day 2 administration and consumption.
Similarly, We will be calculating Reserved Memory for VMs as of below formula
Total Memory of ESXi host - (Reserved Memory x Number of VMs) / Total Memory of ESXi host
For Example, there are 2 VMs with 1 GB Reserved memory Each and total amount of memory installed in ESXi host is 64 GB then below formula looks like
So, 96% is the failover capacity left behind that you can further calculate to reserve as Admission Control value like 30% reserved for ACP in this case the remaining capacity for Memory will be 66%.
Most of the times and most manageable calculation for Clusters for vsphere HA is "Percentage Based ACP"
Appliance management interface is the way you interact with Appliance directly using GUI. So, the same is true for vCenter Server VAMI. you can obtain this Interface by specifying vCenter Server FQDN with port number "5480" using the address bar of the web browser as you can see below
Specify the same user name as you provide for authentication during logon of vCSA.
Summary Page
The very first "Landing" page come up with user logon is the "Summary" that shows basic health of the appliance either good or bad with some other necessary information like Version and build number, Domain joining info, SSO status and Service Health info as you can see below
Ribbon & Action
On the top its a black ribbon as vsphere client got with some options available under "Action" Menu as you can see below
On the top left corner it says "vCenter Server Management" and on the top right corner it shows logged on user name and besides this there is an "Action" menu that shows options like reboot, shutdown the vCenter Server Appliance or to export logs relevant to the appliance for troubleshooting purposes or change the "root" password.
You can also change the theme of the appliance from light to dark mode.
Changing root password of the appliance will change local root account password of the appliance that you provided during the deployment of this appliance.
Network Page
Now, lets talk about Network tab, here you can modify configured network adapter card settings for the appliance as you did during deployment phase by providing IP / Subnet Mask , DNS and Gateway settings as you can see below
Monitor & Services Pages
Moreover, you can also setup proxy for vCenter server if required for VCSA to go online through proxy or to access download through FTP server not directly from the internet.
Incase, if you need to look into Health and montior resource utilization than you need to look into three important tabs/pages
Summary Page
Monitor Page
Services Page
Summary page already explained but in "Monitor" page you can see CPU/Mem utilization and also can see storage performance for this Appliance including Network bandwidth and database performance as well. Below is the snapshot of monitor page that you can easily observe.
Whereas in the "Services" page you can see whether the health of the service is ok and if it is set to automatic than is it running or not. If its not than you can restart the services or set its atart-up type to start with ESXi host as well.
Update Page
Briefing more, you can also setup (Schedule) or update vcenter server appliance through "Update" option as well if you have set-up NATED or proxied settings for updates. Moreover, you can also view the update history as well. But recommendation is to go through Life Cycle Manager to update the stuff.
Time Synch and Settings
You can also setup NTP server for synchronizing clocks for proper log management using "Time" options. For time source you can use Active Directory Domain Controller (PDC) or Router or Linux appliance etc. Just click "Edit" to modify the IP or settings (Time Zone etc).
Access Page
In-order to provide or enable more than just GUI interface access to vCSA, you can use "Access" page to enable or disable other interfaces like "SSH" or "Power CLI/DCUI" and can also set Bash shell timeout in minutes as well.
Syslog Forward Setting Page
You can set-up central log management as Syslog Collector and point vCenter Serve to that Syslog collector through below settings by specifying IP/FQDN of Syslog Collector server with Port number. By below given settings you can configure syslog setting for this vCSA to forward logs to Syslog collector server.
You can maximum configure upto 3 Syslog collector servers using VAMI Interface of vCenter Server Appliance.
Backup Page
You can do backup of vCSA Configurations and Logs which are sometimes collectively known as SEAT logs (Statistics, Evens, Alarms and Task Logs). This backup takes the backup of configurations in File backup structure instead of image based backup structure which is quite big in size whenever you want to restore where as File based backup can be restore even at granular level which means a single file can be restored instead of the whole image.
You can take backup of only configuration of vCSA or can couple logs with the configuration as well.
You can also schedule the backup on Daily, Weekly, Monthly basis or you can immidiately initiate the backup at any time.
Moreover, you can configure this backup depending on your backup location. I mean whether you want to store the backup on top of NFS, Web based or FTP based solutions so it also support.
So, people this was a breif introduction to VAMI interface of vCSA. I hope you enjoyed this article, soon I'll share a video demonstration through my youtube Channel.
Please, do subscribe to my posts also. It will benefit you in future for the latest updates about my write-ups. I would really appreciate, if you add your valuable comments down here as well.
vCenter server Appliance (vCSA) is the management tool that enhances the administration and management easy for the life cycle of
ESXi hosts
Virtual Machines
Other Management Services (like NSX, vSAN, VMware Aria, vSphere 8 with Tanzu etc.)
Internal Architecture
vCenter Server Appliance was introduced back in (around) 2017 with the introduction to vSphere 6.0. when VMware Announced Photon OS (a flavored Linux owned by VMware) as container optimized OS. So this appliance is comprised of 3 Major parts, let's discuss this
OS (Photon OS)
Postgres SQL (vPostgres)
vCenter Server Services
It is understood that you cannot deploy vCenter server Appliance on a Bare metal (as you were able to do when vCenter server for Windows was there) but yes you can deploy it on ESXi host as a VM.
In the beginning, vCSA was with 2 GUI interfaces
vSphere Web Client
vSphere Client
But with the introduction to vSphere 7 and above only vSphere Client left behind which is simpler and more independent than "Web Client" which was dependent on "Adobe Flash Plugin".
So, Now, Let's talk about vCenter Server Appliance Application services and their capabilities. vCenter Server Appliance is now a single VM having multiple services and some config changes to its architecture as well.
We discuss these updates and changes in more details one by one. So, let's start with
SSO
vCenter Server Single Sign-On (SSO) is a crucial component of VMware's vSphere (vCenter Server), providing authentication services to various VMware products within the vSphere environment. Here are the primary capabilities and features of vCenter Server SSO
Single Authentication source for VMware products
Integration with LDAP Servers (AD) or Open LDAP using SAML
Role based access and control of vSphere environment.
Upto 15 vCenter Server Instances using Single SSO domain can be managed
This is the AAA that is aligned with Internal vCenter Server Directory service "vmDIR" and that's the reason we always mention not to use common name as of Active Directory domain while defining SSO domain during the installation of vCenter Server.
VMDIR is a service that acts similarly as of Microsoft Active Directory technique of multi-master replication if you use Enhanced Linked Mode or ELM for vCSA instances.
ELM configuration can only be achieved during the installation of the new instance of vCSA. At the time when you are installing the second instance of vCSA it will ask you to go with new "SSO Domain" or choose an "Existing" one. So, you need to choose an existing one as shown below
Once this replication happens in between the two instances then ELM establishes connecting to vCSA instances with one another to share inventory objects based on RBAC.
Certificate Authority (VMCA)
In-order to be more independent and use VMware own certification authority for providing certificates for VMware platform-based products, now we don't need to have or maintain 3rd party CA(s) at all. vCenter Server itself can be used a certification Authority to produce, renew certificates for VMware platform products like ESXi host, VMware Aria family, vCSA iteself etc.
Web Services
vCenter server Appliance is equipped with GUI (vSphere Client) to access its Interfaces. There are 2 different types of Interfaces offered by vCenter server Appliance
vSphere Client - for datacenter Administration (Default port: 443) - can be changed using General settings of vCenter server.
We use Admin Interface by providing vCSA URL ("https://vcsa-fqdn:443/ui") and we use VAMI interface through ("https://vcsa-fqdn:5480"). both of the interfaces have got their own significance. It solely depends, what actually you want to do.
For example, if you want to do day-2 administration of the ESXi hosts and or VMs in the datacenter then you always go with Admin interface. But, if you want to do configurational changes like changing Appliance Password, IP address etc then you need Appliance Own interface which is known as VAMI.
License Service
This service is used to hold information about installed and assigned licenses for ESXi host and other solutions like NSX, vSAN and vCenter Server itself. This service provides common license inventory and management capabilities to all vCenter Server systems within the Single Sign-On domain.
Postgres DB
A bundled version of the VMware distribution of PostgreSQL database for vSphere and vCloud Hybrid Services. It is used to hold SEAT logs and vCenter Server Configuration. SEAT stands for Statistics, Events, Alarms and Tasks logs whereas vCenter Server Configuration covers Cluster, vDS, ESXi hosts and other inventory and configurational information within it.
When you do the back of your vCSA than it asks you to backup SEAT and Config or only Config information. So at this point this is the configurational information that you backup and restore when it is needed.
Its maximum capacity as per vSphere version 8 is upto 62 TB which is quite good and big for logs to retain for longer time period.
Lifecycle Manager (vCLM)
vCenter Server Life-cycle Manager previously known as Update Manager is a service that takes care of ESXi host and VMware Tools life-cycle management to maintain compliance and software patch management not only limited to ESXi host but Hardware Drivers can also be updated or deployed through this service as well.
Administrators can not only update existing ESXi host by downloading updates directly from VMware or In-directly from VMware through manual updates using FTP (File servers) but also can build ESXi host bundled images to push these images to bare metal servers.
vCenter Server Services
This is the collection of various distributed services that vCSA has to offer like
DRS
vMotion
Cluster Services
vSphere HA
vCSA HA
Other services
There are some other services most of these are by default disabled but you need to enable these. These are like
Dump collector Service
The vCenter Server support tool. You can configure ESXi to save the VMkernel memory to a network server, rather than to a disk, when the system encounters a critical failure. The vSphere ESXi Dump Collector collects such memory dumps over the network.
Auto-Deploy Service
The vCenter Server support tool that can provision hundreds of physical hosts with ESXi software. You can specify the image to deploy and the hosts to provision with the image. Optionally, you can specify host profiles to apply to the hosts, and a vCenter Server location (folder or cluster) for each host.
Syslog Collector Service
A central location for all the logs collected from ESXi host and vCSA or other VMware products to be retained for longer time period. You can have a dedicated vCSA as Syslog collector server for a centralized repository for logs depending on the company compliance policies. Example over here could be banks or telcos etc.
From version 8 and above this service is enabled by default but you need to configure it and can be integrated for troubleshooting Purpose with vRealize Log Insight new name VMware Aria for Logs or for monitoring/analytics purpose with vRealize Operations new name VMware Aria Operations.
You can configure Syslog Collector using VAMI Interface and then you need to configure other apps to send the logs.
So, this was a little introduction to vCenter Server Appliance but this is not all. We shall continue and dig deeper to understand the role of vCSA in combination to ESXi host as a hypervisor. Stay tuned...
For detailed explanation with demonstration please visit my Channel as well 😊
ESXi host different Interfaces and their usecases.
In this skillup series, we are now talking about the other advanced options that you may need to know about DCUI options in an ESXi host like you can see as of below picture "Troubleshooting Mode options"
So you can either enable or disable local ESXi Shell or SSH shell with Shell timeout settings that you can configure in Minutes. Maximum minutes you can go for is "1440" and "0" means disabled settings.
Moreover, you can also setup DCUI idel timout in minutes as of the same frequency as mentioned above.
Otherthan above options you can go for Restart Management Agents which are locally available in all ESXi hosts locally. These Agents / Services are "Hostd" and "vpxa". But be very careful, if you are using SSH or remote shell or vCenter Server then ESXi host can be disconnected.
Otherthan DCUI there are some more connectivity interfaces that you can use to access ESXi host either in the form of command line or through graphical user interface. Like
ESXi Shell (Local command line shell)
SSH (Remote command line shell)
PowerCLI (using Powershell capability of vCSA)
vSphere Host Client (GUI offered by ESXi host individually)
vSphere Client (GUI offered by vCenter Server)
Below picture explains some of above interfaces and their connectivity easily.
Some of the points mentioned above have been explained in our demonstration in a video that you should watch to understand this topic quite easily.
It was quite a long time i just got engaged in my Training deliveries that's the reason couldn't spare time to write a blog post.
Let's start our topic Discussions!
vSphere HA, we normally say or recognize it with a restart of VMs on surviving host in a vSphere cluster.
We normally use vsphere HA in vCenter server cluster object and is helpful in different situations like
ESXi host Hardware issues
Network disconnectivity among ESXi hosts in a cluster
Shared Storage connectivity or unavailability issues with ESXi hosts
Planned maintenance of ESXi hosts
How does it work?
vSphere HA, unlike its name (HA = High Availability), it restarts VMs on surviving hosts where VMs requirements are accommodated as shown in the below picture.
For example, if any ESXi host has got any hardware problem due to which it stops working resulting in unavailability of VMs. The (interrupted) VMs then be taken care by other available ESXi hosts in the same cluster to power them on accessing the same shared datastore.
This failures could be a Hosts Hardware/ Network interruptions /Storage in-accessibility etc.
So, it means we have to fulfill some important hardware requirements for vSphere HA. Let's discuss its requirements
The basic high-level requirements are as below
vCenter Server (vpxd)
Fault Domain Management (FDM-local to every host)
Hostd (local to every host)
Let's break these requirements into understandable pieces
Minimum 2 ESXi hosts and Maximum 64 ESXi hosts in a cluster
Minimum 1 Ethernet network with Static IP Address for host Recommended 2 Ethernet networks with static IP Addresses for ESXi hosts (Multiple Gateways)
Software Requirements
vCenter Server - To create cluster object
1 Management network must be common among all ESXi hosts in the Cluster
Enable vSphere HA on the cluster object
Minimum vSphere Essential plus kit license or single standard vCenter server license
Talking about high-level requirements, vCenter server is required to build or create cluster object and to push FDM agents to the ESXi host those are the part of cluster as member hosts.
FDM Agent is actually a service that runs locally inside each ESXi hosts in the cluster which is enabled with vSphere HA feature. FDM is the one who is taking care of all HA related actions like
HA logging
VM restarted on survining hosts
selection of Master node in a cluster
Management of vSphere HA all requirements
FDM service talks directly to "hostd" service of each ESXi host.
The basic purpose of "hostd" is to create/delete/start/restart/shutdown and infact all the necessary actions of ESXi host against VMs are taken care by "hostd".
vSphere HA Anatomy
When you enable vsphere HA on a cluster then the members of the cluster are divided into two basic parts
Master Node
Slave / Subordinate Nodes
There would be only one Master Node in a vSphere HA cluster and rest would be Slave / Subordinate Nodes. Total size of vSphere cluster could go upto 64 Nodes (1 Master & 63 Slave Nodes) vSphere 6.5 / 6.7 / 7
Master node has got all the responsibility to Restart VMs on available surviving hosts (slave / subordinate).
Master node has got responsibility to equally divide the workload of Restarting VMs on surviving hosts.
Master node has got responsibility to inform vCenter server about the current status of vSphere HA cluster
Master node has got responsibility to keep track of Heartbeat from Slave nodes either from Network or from datastore.
How Master node know all about the VMs which are required to be restarted on surviving hosts ?
There is a file named "Protected List" located on all shared Data-stores which can be accessed by Master Node in the cluster and held / occupied by Master-node.
This file contains information about Virtual Machines running on their respective hosts.
An-other file named "Power-on" file located on shared data-stores and accessible by all nodes including master node in the cluster. The purpose of this file is to maintain time stamp updated after every 5 minutes by all the hosts to mark the connectivity of all hosts for avoiding network isolation.
The significance of "Power-on" file is to let Master node know about network isolation impact on disconnected hosts from ethernet network. So, master node locates the alternate connectivity of such network disconnected hosts by looking into the latest time stamp with 5 minutes update after last accessibility to this file by the host using alternative to heart-beat channel other than ethernet network (which is datastores).
Minimum alternative heart-beat sources (in the form of data-store accessibility) is two. It is highly recommended to choose alternative datastores manually instead of letting vCSA to choose them (automatically) for you.
(Design Tip)
Design your vSphere HA network with redundant ethernet gateways and keep your shared storage network (fabric) physical separate. Incase of any network disaster, your vsphere design can survive / mitigate the situation.
How different Nodes respond to HA failure Scenarios?
Master Node:
Master nodes are responsible to restart failing host VMs on surviving hosts and updates "Protected List" file all across datastores it can access.
If Master node struck a failure (H/W Issue or Network Isolation etc), VMs running on top of this host shall be evenly distributed amongst the surviving hosts right after election process. What is Election process
All the slave nodes in the cluster send heart beat to each other and to master node and wait for the master node's heart beat.
If slave node do not receive Master node heart beat for 15 seconds then they consider it is dead
Slave node initiate a special broadcast which is known as election traffic which all the slave nodes sense and elect the next master node amongst them.
This election process continues for next 15 seconds right after slave nodes waited for master node's heart beat for 15 seconds.
Right after election process (which takes another 15 seconds) to elect one master node from remaining slave nodes, the elected Master node takes over the "Protected list" file and initiate (initial placement) affected VMs which were running on faulty Master node.
Conclusion:
Master node takes around 45 seconds to restart the virtual machines on surviving hosts.
Slave nodes:
These are the nodes which take instructions from Master node to take care of affected (failing host) virtual machines to be restarted.
If slave node stuck a failure (H/W and or network isolation) then Master node takes responsibility to restart VMs (from the failing host) to available hosts in the cluster.
Master node within 15 seconds takes decision and evenly distribute the VMs across the cluster amongst the surviving hosts in the cluster.
Conclusion:
Slave nodes take 15 seconds to restart the VMs amongst the surviving hosts.
About Network Isolation
In this kind of state, affected host or number of hosts cannot be able to contact their gateways and Master node will not be able to contact isolated hosts. That's the reason we choose alternative to ethernet in the form of data-store heart beat.
This kind of isolation would impact more if we have not taken care of ethernet design along with shared storage accessibility with redundancy.
Better Ethernet-network designs
Choosing better and physically separated topological approach for vSphere HA always helps a-lot. Just as you can see in below picture.
In the above picture which depicts the recommended approach for system traffic isolation, explains clearly that physical isolation of system traffic can be done through provisioning or creation of separate logical switches for separate system traffic.
Though in this picture, I have mentioned 2 separate traffics to be the part of same virtual switch which explains that you can also put different (system) traffics combined or logically separate as well.
An-other important aspect, I wanted to draw your kind attention over there is to look at redundancy from the very basic component (vmNIC) till physical switches. This approach can also lessen the impact of any network level disaster.
Note: You can use same DNS as well instead of using separate DNS zones for each network as shown or mentioned in the picture above.
Logical (Isolation) network
In this scenario, you can use as low as available number of vmNICs (Physical network cards). Specially in case of blade chassis. So, you can separate system traffic (like Management, vMotion, vSAN, FT, Replication etc) logically using vLANs.
Note: Better network design even save from disasters like shared storage unavailability resulting in problems like APD (All Path Down).