Category: Operating Systems

VCSA and ESXi password security

vcsa password security

I recently went looking for information on password security for the VCSA 6.0 & 6.5 and ESXi 6.0 & 6.5. Most specifically, I was interest in the number of passwords remembered, so I could define that in documentation for a client.

Try as I might, I couldn’t find documentation for VCSA number of passwords remembered or how to configure it anywhere! Continue reading

VCSA disks become full over time

I’ve recently spoken with a number of VMware vCenter Server Appliance 6 (VCSA) users that have had issues with the root filesystem of VCSA running out of space.

This situation seems to be occurring more often now due to a combination of when the VCSA 6 went mainstream (18 to 24 months ago) and the default 365 day password expiration. The combination is just long enough for the root password to expire and after about 6 months (depending entirely on the size and activity of the vSphere environment) the /dev/sda3 disk fills! Continue reading

OVF and OVA formatted Virtual Appliances

The Open Virtual machine Format (OVF) originally came about in 2007 as the result of a proposal by vendors (VMware, HP, Dell and others) to the Distributed Management Task Force (DTMF), the goal being to create an open standard for interchangeability (portability) of Virtual Machines between hypervisors.

VMware was an early and enthusiastic adopter of the OVF standard, with support for import and export of OVF packaged VMs included in its hypervisors by 2008. Other vendors have shown varying degrees of support for the OVF standard, possibly as a reaction to VMware’s early adoption of the OVF standard. Most vendors and Cloud architectures have supported the OVF standard since the DTMF announced OVF 2.0[1] in January 2013. Continue reading

Changing existing LUNs to Round Robin on ESXi

In the following steps, I am going to show you how to set all of the VMFS Volumes (LUNs) on an ESXi Host to use the PSP known as Round Robin, using only the ESXi Shell and/or SSH. This is clearly the simplest and most direct method of changing the PSP for existing volumes, and it is available from all ESXi Hosts in every environment.

There are other ways of changing the PSP, including using the vSphere Client and setting each VMFS Volume individually or using the VMware vSphere PowerCLI and setting the PSP for all of the VMFS Volumes at once, but either of these methods may be undesirable or unusable in any given situation:

  • Using the vSphere client and setting VMFS Volume PSP for each LUN individually would be extremely time-consuming if there were more than just a few ESXi Hosts or volumes.
  • Some environments may not have a vCenter Server, or the vSphere PowerCLI may not be available at the time you need to change the PSP.

Determining the default PSP for an ESXi Host

In the following example, we have an ESXi Host on which all of the VMFS Volumes have been created using the Most Recently Used (MRU) PSP, which is not the best or most optimal choice for our SAN.

MRU

To begin, let’s check what the default PSP is for ALUA arrays (VMW_SATP_ALUA) for new VMFS volumes on this ESXi Host.

Run the command:

esxcli storage nmp satp list

In the first line of the output, we can see that the default PSP for ALUA Arrays is Most Recently Used (VMW_PSP_MRU), which is not correct or desirable for our SAN.

Change the default PSP for new VMFS Volumes to Round Robin.

Run the command:

esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_ALUA

And check your success by running the command:

esxcli storage nmp satp list

Notice, the association for VMW_SATP_ALUA is now VMW_PSP_RR; or put in simpler terms, we have changed the default PSP from Most Recently Used to Round Robin for ALUA Arrays. Unfortunately, even though we changed the default PSP for the ESXi, all of the existing ESXi volumes retain their former PSP.

Changing existing VMFS volumes to use Round Robin

Esisting VMFS volumes may either be changed to Round Robin one at a time, or by using a scriptlet, we can search for all VMFS Volumes on a host, and then change them all to use Round Robin at once!

First, list all of the LUNs by running the command:

ls /vmfs/devices/disks | grep naa.600

You will see two lines for each LUN, one is the device (first 36 characters) and the other is the first partition (:1).

Because we only need to set the PSP for the device and not the partition, we will cut the first 36 characters from our grep as a variable ‘i’, and pipe the variable to the command ‘esxcli storage nmp device set –device’ and insert the cut characters ‘$i’ in place of the device name, like this:

Run the command:

for i in `ls /vmfs/devices/disks/ | grep naa.600 | cut -b 1-36` ; do esxcli storage nmp device set --device $i --psp VMW_PSP_RR;done 

When complete, you will find that all of the VMFS Volumes on this ESXi Host have been switched to using the PSP Round Robin!

Timekeeping on ESXi

Timekeeping on ESXi Hosts is a particularly important, yet often overlooked or misunderstood topic among vSphere Administrators.

I recall a recent situation where I created an anti-affinity DRS rule (separate virtual machines) for a customer’s domain controllers. Although ESXi time was correctly configured, the firewall had been recently changed and no longer allowed NTP. As it happened, the entire domain was running fine and time was correct before the anti-affinity rule took effect. Unfortunately, as soon as the DC migrated (based on the rule I created), its time was synchronized with the ESXi host it was moved to, which was approximately 7 minutes slow! The net result was users immediately experienced log-in issues.

Unfortunately, when you configure time on your ESXi Host, there is no affirmative confirmation that the NTP servers you specified are reachable or valid! It doesn’t matter if you add correct NTP servers, or completely bogus addresses to the Time Configuration; the result is that the ESXi will report that the NTP client is running and seemingly in good health! Moreover, there is no warning or alarm when NTP cannot sync with the specified server.

Let’s create an example where we add three bogus NTP servers:

In this example, you can see the three bogus NTP servers, yet the vSphere Client reports that the NTP Client is running and there were no errors!

The only way to tell if your NTP servers are valid and/or functioning is to access the shell of your ESXi host (SSH or Console) and run the command: ntpq –p 127.0.0.1

image004

The result from ntpq –p demonstrates that *.mary.little.lamb is not a NTP server.

Now, let’s try using three valid NTP servers:

In this example, I have used us.pool.ntp.org to point to three NTP valid servers outside my network and the result (as seen from the vSphere Client) is exactly the same as when we used three bogus servers!

image008

The result from ntpq –p demonstrate that there are three valid NTP servers resolvable by DNS (we used pool.ntp.org), but that the ESXi host has not been able to poll them. This is what you see when the firewall is blocking traffic on port 123!

Additionally, when firewall rules change, preventing access to NTP, the ‘when’ column will show a value (sometimes in days!) much larger than the poll interval!

When an ESXi host is correctly configured with valid NTP servers and it is actually getting time from those servers, the result form ntpq –p will look like this:

image010

Here you see the following values:

remote Hostname or IP of the NTP server this ESXi host is actually using,
rfid Identification of the time stream.

  • INIT means the ESXi host has not yet received a response
  • CDMA means that the time stream is coming from a cellular network.
st Stratum
t tcp or udp
when last time (in seconds) the NTP was successfully queried. This is the important value: when the ‘when’ value is larger than the “poll” field, NTP is not working!
poll poll interval (in seconds)
reach An 8-bit shift register in octal (base 8), with each bit representing success (1) or failure (0) in contacting the configured NTP server. A value of 377 is ideal, representing success in the last 8 attempts to query the NTP server.
delay Round trip (in milliseconds) to the NTP Server
offset Difference (in milliseconds) in the actual time on the ESXi host and the reported time from the NTP server.
jitter the observed variance in the observed response from the NTP server. Lower values are better.

The NIST publishes a list of valid NTP IP addresses and Hostnames, but I prefer to use pool.ntp.org in all situations where the ESXi Host can be permitted access to a NTP server on port 123. The advantage to pool.ntp.org is that it changes dynamically with availability and usability of NTP servers. Theoretically, pool.ntp.org is a set-and-forget kind of thing!

ESXi Time Best Practices

Do not use a VM (such as a Domain Controller) that could potentially be hosted by this ESXi as a time-source.

Use only Stratum 1 or Stratum 2 NTP Servers

Verify NTP Functionality with: ntpq –p 127.0.0.1

VMs which are already timeservers (such as Domain Controllers) should use either native time services such as w32time or VMware Tools time synchronization, not both! See: VMware KB 1318

Upgrading to VCSA 6 fails

I began an upgrade of the VMware vCenter Server Appliance from 5.5 to 6 for a small (in VMware’s own terminology ‘Tiny’) vSphere environment of 3 hosts and about 30 VMs. I certainly didn’t anticipate any trouble beyond the usual hassles associated with upgrading an infrastructure-level service like vCenter.

Unfortunately, carefully following VMware documented procedures and the best advice some of my favorite blogs had to offer (see: VMware KB 2109772, http://www.vladan.fr/how-to-upgrade-from-vcsa-5-5-to-6-0/, and http://www.virtuallyghetto.com/2015/09/how-to-upgrade-from-vcsa-5-x-6-x-to-vcsa-6-0-update-1.html ), and the upgrade still failed.

When the installer failed, the only message the web-installer had to offer was the well-known: Firstboot script execution error.

Firstboot script execution error

Furthermore, the log files failed to download, leaving me with fewer diagnostic resources than I might otherwise find on my Windows desktop.

Normally, the Firstboot script execution error is the result of incorrectly configured DNS, so the first thing I tested was forward and reverse DNS; both were working perfectly.

I decided to delve further into the issue, and found that the SSH daemon on the VM had started, so I connected with Putty for a look around.

Side-note: upgrading from VCSA 5.5 to VCSA 6 requires the creation of a brand-new VM, then the supposed automatic migration of data, leaving you with your original VCSA in a powered-off while a new VM is intended to take its place.

As long as I was in with SSH (Putty), I did some more poking around, and finally ran: df –h on the VM that was supposed to be my new ‘upgraded’, VCSA 6.

storage_seat_full

The problem was immediately apparent; the /storage/seat partition (virtual disk) was completely full! In VCSA 6, /storage/seat is used for the Postgres Stats Events And Tasks (SEAT).

Side note: VCSA 6 puts all of its primary partitions on separate virtual disks, 11 in total. This is a great advantage for long-term scalability, but somewhat of a disadvantage as compared to one disk where every partition can grow to the capacity of the disk. To learn more about what all of the different partitions/disks do, look at this excellent write-up on virtuallyGhetto: Multiple VMDKs in VCSA 6.0?

What I (and VMware) had failed to take into account in sizing of the VCSA, was the potential for an extraordinary number of Tasks and Events. While this may be a ‘Tiny’ deployment by VMware’s standards, with a Horizon View environment plus Veeam Backup and Replication running on a sub 1-hour R.P.O., the number of Tasks and Events presents more like what VMware seems to expect from a ‘Medium’ deployment.

One potential solution may be to Reclaim or purge data from the Postgres database on the VCSA 5.5 before trying the upgrade; but the owner decided in favor of preserving all of the data if possible.

The Solution

In the end, the solution was simply to select a larger deployment size while going through the web-installer wizard. As it turned out, the /storage/seat disk for a ‘Tiny’ deployment was only 10GB, while it was 50GB for a ‘Medium” deployment.

During the upgrade (which took over 2 hours), I connected via SSH as soon as the daemon had started and ran df –h a number of times (I should’ve used: watch). I saw the /storage/seat volume grow slowly, eventually reaching over 17GB of used space, before settling back to 16GB on the successful upgrade.

The only drawback I can think of, to having specified a ‘Medium’ size deployment for a relatively small environment is that the vCPU and RAM allocated to the VM are now vastly beyond what is required with 24GB RAM and 8 vCPU. I plan to shut the VCSA down and scale back to around 16GB RAM and maybe 4 vCPU, better to suit the environment at my earliest opportunity.

NET 3.5 in Windows 10 while offline

I was recently working on a Windows 10 Desktop with an isolated network, when the need to install the VMware vSphere Client for Windows arose. Of course, the vSphere Client requires .NET Framework 3.5, and Windows 10 presents special challenges to those of us who are forced to work without a connection to the Internet.

Here’s how to accomplish the installation offline, provided you have the installation media, or a copy of the SxS folder from the media.

I copied the x64\sources\sxs\ folder from the media (actually a usb) to C:\sxs on the VM before I ran the command, but there is no reason these steps wouldn’t apply to any windows 10 system, using any type of media.

Once I had the sxs folder on the root of C:\, I ran the command:

dism /online /enable-feature /featurename:NetFx3 /all /source:C:\sxs

NET Framework offline installation in Windows 10

and the whole installation took about 30 seconds!

Using VMware Paravirtual devices

VMware Paravirtual

One of the most common oversights in vSphere deployments is a failure to use the Paravirtual drivers that VMware has provided us for networking and storage.

On a physical platform, one chooses supported device(s) for networking and storage, and then installs the correct driver(s) to support those devices. For example; on a physical system, you might specify LSI SAS for storage and Intel E1000 NIC’s for network. That particular combination is, in fact, so common that Operating Systems like Windows have the drivers for those devices pre-installed so they will be recognized both during and after installation. The ‘during’ part is particularly important too, because if the storage driver is not present at the time of install, the hard disk will not be recognized, and the installation fails!

On a virtual platform, it’s a completely different story. Even if the host ESXi server actually has LSI SAS storage adapters and Intel E1000E NIC’s, there is no correlation to the network and storage device for Virtual Machines. In fact, if you choose LSI or Intel (they are the default choices for Windows Server VM builds), the only potential benefit will be that Windows includes those drivers by default. You will, in fact, be emulating the corresponding physical devices by LSI and Intel, with resulting loss of performance!

The only true native storage and network devices for vSphere VMs are the VMware Paravirtual SCSI ( pvscsi ) and Network ( vmxnet3 ) device types and corresponding drivers. Problem is; while Linux distros (most all of them) will include support for Paravirtual devices by default, Microsoft is not so magnanimous. Users choosing to use either (or both) of the VMware Paravirtual device types, will have to install the corresponding drivers.

In most cases, VMware Paravirtual devices are supported for installation in Windows Family 5 (Server 2003, XP) and later, and natively supported by most Linux OS.

Benefits of using VMware Paravirtual SCSI and Network devices include:

  • Better data integrity[1] as compared to Intel E1000e
  • Reduced CPU Usage within the Guest
  • Increased Throughput
  • Less Overhead
  • Better overall performance

I have created an example Windows Server 2012 R2 VM using only the default E100e and LSI SAS device types and I am going to show you how easy it is to convert from the default (emulated physical) to VMware Paravirtual drivers. For the following steps to work, the VMware Tools must be installed in the VM which is being updated.

Upgrading a VM to vmxnet3 Paravirtual Network Adapter

During the following procedures, it is important to use the Virtual Machine Remote Console (as opposed to RDP) because we will be causing a momentary disconnection from the network.

The biggest challenge is that the static IP address, if assigned, is associated with the device and not with the VM. Therefore, when you upgrade to the vmxnet3 adapter, your challenge will be un-installing and eliminating any trace of the “old” NIC to avoid seeing the dreaded message: “The IP address XXX.XXX.XXX.XXX you have entered for this network adapter is already assigned to another adapter[2]

Using the VMRC, log in to your Windows VM and run the device manager with: devmgmt.msc

You will see that the Network adapter is clearly listed as an Intel

Now go to the Network and Sharing Center and click on any (all) of the active Networks to observe their settings

You will notice that the speed is clearly 1.0 Gbps

Click on: Properties

Choose TPC/IPv4 and then click: Properties

Take note of the IP Address, Subnet Mask, Gateway, and DNS

Go to: VM > Edit Settings

image068

Remove the Network Adapter(s) from the VM and click OK. In truth, you could both remove the old adapter and add the new vmxnet3 adapter simultaneously, but we will do it in separate steps for clarity.

Notice, the active networks list is empty

Although we have removed the device from the VM, we have not removed its configuration from the system. Therefore, the IP address we saw earlier is still assigned to the E1000e Virtual NIC we just removed. In order to cleanly install a Paravirtual NIC, we need to remove the Intel NIC completely.

Open a command window (this must be done first from the command window) and run the following commands:

set devmgr_show_nonpresent_devices

start devmgmt.msc

After the device Manager window is open, select: View > Show Hidden Devices

Many admins falsely believe that is is simply enough to show hidden devices, but this is not true. It is absolutely necessary to “show_nonpresent_devices” at the command line first!

You should now be able to find the (now removed) Intel NIC listed in lighter text than the devices which remain resent.

Right-click and select: Uninstall

image023

OK

And it’s gone!

Go to: VM > Edit Settings

Edit Settings

 

Click: Add

Choose: Ethernet Adapter

Set the Type to: VMXNET 3 and then choose the appropriate Network Connection (usually VM Network), then click: Next

Click: Finish

Now click: OK

You will see the vmxnet3 Ethernet Adapter added to the Device Manager

Now click the active network, in this case “Ehternet”

Notice the speed listed as 10 Gbps. This does not mean that there are 10 Gbps NICs in the ESXi host merely that the observed speed of the network for this VM is 10 Gbps.

Click on: Properties

Now choose: TCP/IPv4 and select: Properties

Re-assign all of the IP addresses and subnet mask you observed earlier

And you have upgraded to the VMware Paravirtual device VMXNET 3

Upgrading a VM to pvscsi VMware Paravirtual SCSI Adapter

The trick in switching to the VMware Paravirtual SCSI adapter is in adding a dummy disk to the VM, which will force Windows to install the pvscsi driver, included with the VMware Tools package you have installed as part of a separate process.

Start the device manager with devmgmt.msc

Observe the LSI Adapter listed under Storage Controllers

Go to: VM > Edit Settings

Edit Settings

Choose: Add

Select: Hard Disk and then: Next

Choose: Create a new virtual disk and then: Next

The disk you create can be most any size and provisioning. We choose 10 GB Click: Next

In this step, it is critical that you place the new disk on an unique SCSI Node. That is to say, if the existing disk is on 0:0, then plane the new disk on 1:0 (you must not combine it with any LSI nodes, such as 0:1 or the process will not work)

Now click: Finish

Notice, you have added, not just a disk, but also a New SCSI Controller.

Now click: Change Type

Select: VMware Paravirtual

Now click: OK

Once the disk is added, look again in the Windows Device Manger and make sure that you can see the VMware PVSCSI Controller. If you can, that means the PVSCSI drivers have successfully loaded, and you can proceed.

Now we have to shut down the VM.

Shut Down Guest

Once the VM is off, Go to: VM > Edit Settings

Edit Settings

Choose the dummy disk (whichever one it was, BE CAREFUL HERE! and click: Remove

Although I failed to do so in creating this demo, you probably want to choose “Remove from virtual machine and delete files from disk,” to avoid leaving orphan files around.

Now select the SCSI controller(s) which are not already Paravirtual and choose: Change Type

Select: VMware Paravirtual

Now click: OK

Power your VM back on and observe that only the VMware Paravirtual device remains!

It should be noted; just as with the Intel NIC, the LSI device remains as a “nonpresent” device. If you feel like going the extra mile, repeat the steps to show nonpresent devices and uninstall the LSI device!

  1. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2058692
  2. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1179

Backing up and restoring the vCenter Server Appliance 6 database

One extremely important advantage of the VMware vCenter Server Appliance (VCSA) is its native PostgreSQL (vPostgres) database. With the embedded database and VCSA, it is now possible to support installations which scale to the maximum capability of vCenter, without additional Operating System or Database licensing costs.

Incumbent with the use of VCSA, however, comes a certain degree of responsibility for backing up the PostgreSQL database embedded with the vCenter Server Appliance. A good backup of the VCSA database makes the following tasks much easier:

  • In-place restore of VCSA database
  • Migration to a different installation of VCSA
  • Protection of vCenter tasks & events for auditing purposes

The process is actually very simple and detailed in VMware KB: 2091961, however typical to VMware, there are few actual procedural details which might help an admin who was not intimately familiar with Linux procedures, for example:

  • How do you create the folder you want to keep database backups on the VCSA?
  • How do you transfer the vCenter vPostgres backup and restore package linux_backup_restore.zip to the VCSA?
  • How do you extract a ZIP archive on a Linux system?
  • How do you retrieve the database backup from the VCSA once it is created?

I will answer these questions in a simple, step-by-step procedure with screenshots and suggestions for applications and settings to use in the process.

Preparing to back up the vCenter Server Appliance (6.X) Database

Open the VMware KB: 2091961 and scroll down to locate the attachment: linux_backup_restore.zip

Save the file to an appropriate folder on your local Windows system

Install WinSCP

Now, if you don’t already have it installed, go get the WinSCP Installation Package. WinSCP is free and one of the most useful utilities with vSphere in general, but there are some WinSCP settings specific to vSphere and the VCSA.

Get the Installation Package so you can save settings specific to your environment

Run as administrator

image005

Choose your language and then: OK

Next

Accept and then Install

Choose to donate and/or Finish

Enable SSH on the VMware vCenter Server Appliance 6

Using a vSphere Client, open a Virtual Machine Remote Console window to your VCSA installation to make sure SSH is enabled.

image010

 

This works just like it does in ESXi:, press F2 to log-in

The password you specify here will be the OS password you set when installing the VCSA.

Using the up/down arrow on your keyboard, scroll down to: Troubleshooting Mode Options then press: Enter

Now, using the up/down arrow, highlight: Enable SSH and press: Enter to toggle SSH for your VCSA installation. NOTE: it is not necessary at this time to enable the BASH Shell, we will do that from Putty

When set correctly, this is how it looks:

Press: Esc to exit

Log in to the vCenter Server Appliance Linux console as root

Open Putty (I am going to take for granted that you already have this one!) and type in the IP address of your VCSA installation (in my example above, it is: 192.168.153.110)

Yes to accept

Enter the username: root and the password that you set for your OS installation.

Now, type (or copy & paste) the two commands to Enable BASH Access and Launch BASH Shell on your VCSA

You will find yourself at the root of the VCSA installation

Now, create a folder to store the PostgresSQL backups, at least until you are able to transfer them off the system. Run the command: mkdir db_backups

Now, list that folder with permissions by typing: ls –la to list the root folder

We can now see the permissions for: db_backups as: drwx (or just rwx for the User)

The Linux permissions listing works as follows:

Read Write eXecute Read Write eXecute Read Write eXecute
d r w x

so the fact that our directory “db_backups” shows “drwx” means that the user (that’s us) has Read, Write and eXecute on this folder.

Change directory into the: db_backups folder

Connecting to VCSA with WinSCP

Using WinSCP successfully with the VMware vCenter Server Appliance requires one of two things to occur:

1.Reconfigure VCSA

–OR-

2.Reconfigure the connection on WinSCP

I universally choose to re-configure WinSCP to work with my vCenter Appliance!

Enter your basic connection parameters in WinSCP and click: Advanced

Now choose: SFTP and enter the following value as SFTP Server: shell /usr/lib64/ssh/sftp-server

Now click: OK and then: Login

Yes

Continue

And you are in

Now locate the location you saved: linux_backup_restore.zip (on the left) and the folder you create on VCSA (on the right)

and drag-and-drop the file to copy to your VCSA

Extract the ZIP on the VCSA

List the contents of the directory: db_backups with the command: ls –la

Unzip: linux_backup_restore.zip

List the contents of the directory: db_backups with the command: ls –la

Neither of the scripsthas eXecute permissions, so add eXecute for the User with the command: chmod +x *.py

List the contents of the directory: db_backups with the command: ls –la (again)

The *.py have become eXecutable!

Backup the VMware vCenter Server Appliance PostgreSQL (vPostgres) database

Run the: backup_lin.py script, provide a filename: python backup_lin.py –f 11112015_VCDB.bak

Now use WinSCP to transfer the backup to a different location

You will have to refresh the folder listing in WinSCP (Ctrl+R) to see the files created

Drag the database backup to your chosen folder on Windows

Restore the VMware vCenter Server Appliance PostgreSQL (vPostgres) database

First, use WinSCP to upload the appropriate backup file. WinSCP will prompt you to overwrite, if a copy of that file exists. Make your choice.

Stop the vCenter Server with: service vmware-vpxd stop

Stop the vCenter Datacenter Content Library Service with: service vmware-vdcs stop

Restore the vCenter database with: python restore_lin.py –f 11112015_VCDB.bak (or whatever the name of your file)

There may be numerous “NOTICE” lines referencing parts of the vCenter Server Appliance which simply don’t exist in your configuration. Just ignore these and look for the ultimate message: Restore completed successfully

Now start the vCenter with: service vmware-vpxd start

Now start the vCenter Datacenter Content Library Service with: service vmware-vdcs start

And you should be in business!