Section 5 – Performance-tuning and Optimizing a VMware vSphere Solution
Objective 5.1 – Determine effective snapshot use cases
Many companies use the term snapshot. There are numerous definitions for snapshots that vary on the company. We should first define what VMware does with snapshots.
VMware preserves a Point in Time or PIT for a VM. This process freezes the original virtual disk and creates a new Delta disk. All I/O is now routed to the Delta disk. If data is needed that still exists on the original disk it will need to go back to that to retrieve data. So now you are accessing two disks. Over time you can potentially double the size of the original disk as you make changes and new I/O. The original 10 GB disk becomes 20 GB over 2 disks. If you create more snapshots, you create new Delta disks and it continues.
Now that we understand a bit more about them, we see the limitations inherent. This tool was never meant to be a backup. It was designed to be used for reverting back to the original (if needed) after small changes. Most backup tools DO use snapshots as part of their process, but it is only used for the amount of time needed to copy the data off and then the snapshot is consolidated back again. Here are a few Best Practices from VMware on how to use them.
- Don’t use snapshots as backups – major performance degradation can occur and I have seen people lose months of data or more when the chain got too long.
- 32 snapshots are supported, but it’s better not to test this.
- Don’t use a snapshot longer than 72 hrs.
- Ensure if you are using a 3rd Party backup that utilizes the snapshot mechanism, they are getting consolidated and removed after the backup is done. This may need to be checked via CLI
- Don’t attempt to increase disk size if the machine has a snapshot. You risk corrupting your snapshot and possible data loss.
Most use cases involve you changing the VM or upgrading and once you find out it does or doesn’t work, you remove the snapshot. A good example of this is Microsoft Windows Updates. Create a snapshot, install the updates and test. If the updates haven’t broken anything, consolidate. Another use case might be installation or upgrade of an important program. Or a Dev use case of changing code and executing to determine if it works. The common thread between all the use cases is temporariness. These use cases are for snapshots running a very short period of time.
Storytime. I had a company that called in once that was creating snapshots for their Microsoft Exchange Server. They were taking one every day and using it as a backup. When I was called, they were at about a year of snapshots. Their server wasn’t turning on and trying to remove the snapshots wasn’t working. Consolidation takes time and a bit more space. We tried to consolidate but you can only merge 32 snapshots at a time. They got impatient about 25% through the process and tried to turn it on again. When that didn’t work, they had to restore from tape backup and lost a decent amount of data.
Objective 5.2 – Monitor resources of VCSA in a vSphere environment
Monitoring resources can be done from more than one place. The first place is in the vCenter appliance management page at :5480. After you log into it, you have the option on the navigation pane called Monitor. This is what it looks like:
Notice the subheadings. You can monitor CPU and Memory, Disks, Networking, and the database. You can change the time period to include metrics up to the last year. Since the VCSA is also a VM, you can view this from inside the vSphere HTML5 client. This view allows you to get a bit more granular. You are looking at it from the hosts perspective whereas the Appliance Management page is from within the VM. Both are important places to give you a full look at how the vCenter is performing. Here is a screenshot of inside the HTML5 client of my vCenter Appliance.
You can attach to the vCenter via SSH or console and run TOP for a per process view of the appliance. Here is what that looks like
These are the most common ways you would monitor resources of your VCSA.
Objective 5.3 – Identify impacts of VM configurations
There much to unpack with this objective. I will work through best practices and try to stay brief.
- While you want to allocate the resources that your VM needs to perform, you don’t want to over-allocate as this can actually perform worse. Make sure there are still enough resources for ESXi itself as well.
- Unused or unnecessary hardware on VM can affect performance of both the host and all VMs on it.
- As mentioned above, over allocation of vCPU and memory resources will not necessarily increase performance and it might lower it.
- For most workloads, hyperthreading will increase performance. Hyperthreading is like a person trying to eat food. You have one mouth to consume the food, but if you are only using one arm to put the food in, it isn’t as fast as it could be. If you use both arms (enabling hyperthreading) you still only have one mouth (one core) but you aren’t waiting for more food and just keep constantly chewing. Certain workloads that keep CPU utilization high, benefit less from hyperthreading.
- Be aware of NUMA (Non-Uniform Memory Access). Memory is “owned” by sockets. If you use more memory than that socket owns, you need to use memory from the other socket (if available). This causes a small delay because it has to move across the bus vs right next to the processor. This can add up. (Oversimplified but the idea is there). There are policies that can be set that could help if needed. Not in the scope of this certification though.
- Not having enough physical memory can cause VMs to use slower methods of memory reclamation all the way to disk caching. This causes performance degradation.
- Creating shares and limits on your machine may not have the result you believe. Weigh those options carefully before you apply them.
- Make sure you use VM Tools in your VMs as they add a number of useful and performance increasing solutions.
- The hardware you use in the configuration can also change performance. For example, using PVSCSI vs LSI SAS or using VMXNET vs E1000 NICs can make a decent performance jump.
- Make sure you use VMware snapshots how they were intended and not for long periods of time.
- There are different types of VMDKS you can create. They include thin provisioned, thick (lazy zeroed), and thick (eager zeroed). There are reasons you might utilize them. Thin disks are the best in a scenario where you may not have all the space yet. You may need to buy more disks or they may be already on their way. Eventually you will have this space. It is important that you monitor your space to make sure you don’t consume it before you have it. If you do, the VM will be suspended best case, worst case you can lose data. Thick (lazy zero) is when you fence all that space off for that disk up front. You can’t over-provision this, you have to already have the disk space. The “lazy zero” comes in play when you go to use the space. VMware will need to format the disk block before using it. This can potentially be a slow down if there are a high number of writes to the disk. If the VM is more read heavy, you are just fine. Thick (eager zero) will take more time to create, because it formats the whole disk up front before use. This type if best for a VM with heavy writes and reads such as a DB server etc.
Keep these in mind when creating VMs and also take a look at the VMware Performance Best Practices guide here.