Creating a VI (Virtual Infrastructure) Cluster in VCF 4.0.1.1

I originally wanted to learn more about VMware Cloud Foundations but never had the time to. I recently (ahem COVID) found extra time to try new things and learn with my home lab. For the setup, I used the VMware Lab Constructor (downloaded here) to create VCF. After deployment, I then updated it to the latest version (currently 4.0.1.1). This is all running on a single Dell server (PowerEdge XR2) in a nested environment. I don’t believe that has too much bearing on how things are run overall, though it does make it simpler not having to deal with “real” hardware. While not being officially supported by VMware, the VMware Lab Constructor is slick enough to where maybe it should be.

VMware positions VCF as making your environment operationally simpler. While still not point and click, VCF does help immensely with setting up a VMware Validated Design for a datacenter. Using VLC to create my VCF environment even simpler. Normally you would architect how you want it to be set up (networks, VLANs, NSX, etc) and then use the Cloud Builder appliance to “bring up” or deploy infrastructure. Pre-Work includes adding all parameters to an XLS spreadsheet, generating a JSON file, and then using that to fuel the Cloud Builder appliance. The VLC simplifies this by already assigning most of these parameters. Your work isn’t much more than just imputing license keys into the JSON file and pointing the PowerShell script to where the packages and JSON files are.

I had a few hiccups deploying my environment. Specifically, it was choosy on the storage used. Either way, I got it running and well considering it was a nested environment. The hardware backing it is 2x Xeon Gold 6138 20c/40t processors, 384GB RAM, 2TB NVMe drive, 5TB Synology SSD storage, and 10Gbe. I found myself wanting to create additional clusters for infrastructure and more. To do so, apparently, I needed a cluster image. I tried to just set up one from the mgmt. cluster but received an error I couldn’t seem to decipher. The error on the SDDC Manager side was the following:

And if I wanted to import one, I needed all the following:

Seems a lot to do. The extracting a cluster image would be a lot simpler but it appears as you can’t do that unless you have already had one done previously. This guide is assuming you are on VCF 4.0+ and you are running VMware ESXi 7.0 (since this is when this feature became available).

First, you need to download the .zip deployment image of ESXi you want to use. The one I am using is VMware-ESXi-7.0b-16324942-depot.zip. You cannot use ISO images for a baseline image.

Go to the Lifecycle Manager and click on Actions. Select either Sync Updates or if you are using the .zip file, Import Updates, and select the .zip you have downloaded.

You should now see the image show up in ESXI Versions. You can now add any Vendor Addons or extra components to it.

Create a blank cluster in vCenter. Name it something such as NewCluster and at the bottom, there will be a checkbox to Manager all the Hosts with one image. Click on that and select the image you wish to use.

Next, you need to export the cluster image specifications and components. To do this, select the new cluster, and click on the updates tab. Then click the ellipsis under Image and choose Export.

You need to perform these steps 3 times. One for the JSON file, one for the ISO file, and one for the .zip file formats. Do not rename these files.

Now we need to download the cluster settings JSON file. From the Menu in the HTML5 Web Client, select Developer Center and then select API Explorer tab

Expand the cluster section, then expand the GET /rest/vcenter/cluster and scroll down a bit to click Execute

There will be a Response that appears. Click on the vcenter.cluster.list_resp and then click on the cluster you created.

Copy that cluster-ID – in this case, domain-c6003 then go to the Select API change to esx

Scroll down to /settings/cluster/software and then expand the GET for /api/esx/settings/cluster/{cluster}/software and enter in the value of the cluster from before.

Click on Execute and then Download. The response-body. JSON file is downloaded to your local computer. These are all the files you need.

Return to the SDDC-Manager and return to Image Management

Input the correct file with the correct prompt.

Cluster Settings = response-body.json
Software Spec = SOFTWARE_SPEC_xxx.json
Zip = Offline_Bundle_xxxx.Zip File you downloaded
ISO = ISO you downloaded

Click Upload Image Components. Grab a beverage of choice as this might take a while. When done you will get a message on the bottom telling you the files are uploaded.

You will also notice an entry under Available Images now.

This time when you create the VI cluster you see something a little different.

And there you go…

VMware Cloud Foundations 4.0.1: Problems with SDDC Manager refreshing

I’ve been doing some studying on VMware Cloud Foundations 4.0.1 and have it running in my lab. It seems a bit finicky at times I’ve noticed. One of the issues I’ve run into so far is that when I added 3 more hosts, everything seemed to be fine. I then wanted to add a third NIC to the hosts in order to access ISCSI storage on them. When I create the NIC though (while the nested host was turned on) it locked up my physical host and ended up needing to reboot it. Not nice….

Anyways I got that sorted. The next issue I ended up with was SDDC manager didn’t want to refresh or connect to vCenter since it wasn’t shut down properly. I started doing some research and HUGE shoutout to vSAM.pro for figuring out what was going on. I ended up having to do a bit more though. So here is what I ended up doing (and his blogs made much more sense after I figured it out weirdly enough 🙂 )

The Issue:

First, I Checked logs. These logs are located at

/var/log/vmware/vcf/

Underneath there is a bunch of service folders with logs in them. In this particular case I checked through most of the logs and found the following issue

root@sddc-manager [ /var/log/vmware/vcf/operationsmanager ]# tail operationsmanager.log

2020-08-03T22:57:50.268+0000 INFO [0000000000000000,0000] [liquibase.executor.jvm.JdbcExecutor,main] SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2020-08-03T22:57:50.270+0000 INFO [0000000000000000,0000] [l.lockservice.StandardLockService,main] Waiting for changelog lock….
2020-08-03T22:58:00.270+0000 INFO [0000000000000000,0000] [liquibase.executor.jvm.JdbcExecutor,main] SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2020-08-03T22:58:00.273+0000 INFO [0000000000000000,0000] [l.lockservice.StandardLockService,main] Waiting for changelog lock….
2020-08-03T22:58:10.273+0000 INFO [0000000000000000,0000] [liquibase.executor.jvm.JdbcExecutor,main] SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2020-08-03T22:58:10.276+0000 INFO [0000000000000000,0000] [l.lockservice.StandardLockService,main] Waiting for changelog lock….
2020-08-03T22:58:20.277+0000 INFO [0000000000000000,0000] [liquibase.executor.jvm.JdbcExecutor,main] SELECT LOCKED FROM public.databasechangeloglock WHERE ID=
2020-08-03T22:58:20.278+0000 INFO [0000000000000000,0000] [l.lockservice.StandardLockService,main] Waiting for changelog lock….
2020-08-03T22:58:30.279+0000 INFO [0000000000000000,0000] [liquibase.executor.jvm.JdbcExecutor,main] SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2020-08-03T22:58:30.281+0000 INFO [0000000000000000,0000] [l.lockservice.StandardLockService,main] Waiting for changelog lock….

It looked like there was an issue with a locked file somewhere.

In order to check the database, you need to go to the DB command line

psql –host=localhost -U postgres

Then you need to change to the database that is locked. You can list the DBs by typing

\l (That is an L)

To change databases – type the following

\c [database_name] without the brackets

This will allow you to run the command

\dt

Which shows tables for that DB in this case of the OperationsManager it looked like this:


Next you need to see if there is something in that table so type the following:

select * from databasechangeloglock

And it will return the following:


You can now delete this table by doing typing the following

delete from databasechangeloglock;

And it will kill the lock. Give the SDDC manager a few minutes and it should start working again.