VCSA disks become full over time

I’ve recently spoken with a number of VMware vCenter Server Appliance 6 (VCSA) users that have had issues with the root filesystem of VCSA running out of space.

This situation seems to be occurring more often now due to a combination of when the VCSA 6 went mainstream (18 to 24 months ago) and the default 365 day password expiration. The combination is just long enough for the root password to expire and after about 6 months (depending entirely on the size and activity of the vSphere environment) the /dev/sda3 disk fills! Continue reading “VCSA disks become full over time”

HPE Custom Image for ESXi 6.5U1 has been withdrawn due to purple-screen issues

HPE has quietly withdrawn the HPE Custom Image for ESXi 6.5U1 July 2017 due to purple-screen issues being experienced on a number of current VMware-supported servers (http://vmware.com/go/hcl)!

The particular issue purple screen we saw when deploying this ISO against a HP BL460 G7 was:

#PF Exception 14 in world 6824:sfcb-smx IP 0x1 addr 0x1

 

Continue reading “HPE Custom Image for ESXi 6.5U1 has been withdrawn due to purple-screen issues”

Revisiting scripted installation for ESXi 6.5

I thought I would revisit scripted ESXi installation for my lab. It’s been since 5.0 or prior since I actually went into depth on this and there are some significant changes for 6.5. The example script draws heavily from other sources and it is now working.
Continue reading “Revisiting scripted installation for ESXi 6.5”

OVF and OVA formatted Virtual Appliances

The Open Virtual machine Format (OVF) originally came about in 2007 as the result of a proposal by vendors (VMware, HP, Dell and others) to the Distributed Management Task Force (DTMF), the goal being to create an open standard for interchangeability (portability) of Virtual Machines between hypervisors.

VMware was an early and enthusiastic adopter of the OVF standard, with support for import and export of OVF packaged VMs included in its hypervisors by 2008. Other vendors have shown varying degrees of support for the OVF standard, possibly as a reaction to VMware’s early adoption of the OVF standard. Most vendors and Cloud architectures have supported the OVF standard since the DTMF announced OVF 2.0[1] in January 2013. Continue reading “OVF and OVA formatted Virtual Appliances”

Patch your ESXi Hosts from the command line easily and quickly

In many situations it is desirable to patch your ESXi host(s) prior to being able to install or use VMware vSphere® Update Manager™.

UPDATED 4/18/2016: HP has a new URL for HP Customized VMware ISO’s and VIB’s

For example:

  • Prior to installing vCenter in a new cluster
  • Standalone ESXi installations without a vCenter Server
  • Hardware replacement where you have ESXi Configurations backed-up with vicfg-cfgbackup.pl, but the rest of the hosts in the cluster are running a higher build number than the latest ISO available
  • It is just convenient on a new ESXi host, when internet connectivity is available!
  • Non-Windows environments that do not to intend to create a Windows instance just for patching ESXi

Continue reading “Patch your ESXi Hosts from the command line easily and quickly”

vCenter 6 Email configuration remains stuck in 1997

On the release of vCenter 6, I was personally very excited to see several email configurations which had not been previously possible. Listed under vCenter Server Settings > Advanced Settings were the following new keys:

  • mail.smtp.password
  • mail.smtp.username

vCenter SMTP keys

One might be led to believe that the ability to configure a SMTP username and password implied that one could also use a SMTP username and password; unfortunately that is not the case!

I have confirmed, after many hours of frustrating calls with VMware Support, that VMware vCenter still does not support email servers which require authentication.

To answer your question about SMTP authentication – I expect this to work in one of the next versions and from my understanding username and password fields are there for that reason, just not implemented under the hood yet.   -VMware Escalation Engineer

This fact is (and has been) noted in VMware KB 1004070 for quite some time, but the addition of the new keys led to some hope. I am told that KB 1004070 is being amended to reflect the fact that the keys “mail.smtp.password” and “mail.smtp.username” are fully without function in vCenter 6.

Changing existing LUNs to Round Robin on ESXi

In the following steps, I am going to show you how to set all of the VMFS Volumes (LUNs) on an ESXi Host to use the PSP known as Round Robin, using only the ESXi Shell and/or SSH. This is clearly the simplest and most direct method of changing the PSP for existing volumes, and it is available from all ESXi Hosts in every environment.

There are other ways of changing the PSP, including using the vSphere Client and setting each VMFS Volume individually or using the VMware vSphere PowerCLI and setting the PSP for all of the VMFS Volumes at once, but either of these methods may be undesirable or unusable in any given situation:

  • Using the vSphere client and setting VMFS Volume PSP for each LUN individually would be extremely time-consuming if there were more than just a few ESXi Hosts or volumes.
  • Some environments may not have a vCenter Server, or the vSphere PowerCLI may not be available at the time you need to change the PSP.

Determining the default PSP for an ESXi Host

In the following example, we have an ESXi Host on which all of the VMFS Volumes have been created using the Most Recently Used (MRU) PSP, which is not the best or most optimal choice for our SAN.

MRU

To begin, let’s check what the default PSP is for ALUA arrays (VMW_SATP_ALUA) for new VMFS volumes on this ESXi Host.

Run the command:

esxcli storage nmp satp list

In the first line of the output, we can see that the default PSP for ALUA Arrays is Most Recently Used (VMW_PSP_MRU), which is not correct or desirable for our SAN.

Change the default PSP for new VMFS Volumes to Round Robin.

Run the command:

esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_ALUA

And check your success by running the command:

esxcli storage nmp satp list

Notice, the association for VMW_SATP_ALUA is now VMW_PSP_RR; or put in simpler terms, we have changed the default PSP from Most Recently Used to Round Robin for ALUA Arrays. Unfortunately, even though we changed the default PSP for the ESXi, all of the existing ESXi volumes retain their former PSP.

Changing existing VMFS volumes to use Round Robin

Esisting VMFS volumes may either be changed to Round Robin one at a time, or by using a scriptlet, we can search for all VMFS Volumes on a host, and then change them all to use Round Robin at once!

First, list all of the LUNs by running the command:

ls /vmfs/devices/disks | grep naa.600

You will see two lines for each LUN, one is the device (first 36 characters) and the other is the first partition (:1).

Because we only need to set the PSP for the device and not the partition, we will cut the first 36 characters from our grep as a variable ‘i’, and pipe the variable to the command ‘esxcli storage nmp device set –device’ and insert the cut characters ‘$i’ in place of the device name, like this:

Run the command:

for i in `ls /vmfs/devices/disks/ | grep naa.600 | cut -b 1-36` ; do esxcli storage nmp device set --device $i --psp VMW_PSP_RR;done 

When complete, you will find that all of the VMFS Volumes on this ESXi Host have been switched to using the PSP Round Robin!

Timekeeping on ESXi

Timekeeping on ESXi Hosts is a particularly important, yet often overlooked or misunderstood topic among vSphere Administrators.

I recall a recent situation where I created an anti-affinity DRS rule (separate virtual machines) for a customer’s domain controllers. Although ESXi time was correctly configured, the firewall had been recently changed and no longer allowed NTP. As it happened, the entire domain was running fine and time was correct before the anti-affinity rule took effect. Unfortunately, as soon as the DC migrated (based on the rule I created), its time was synchronized with the ESXi host it was moved to, which was approximately 7 minutes slow! The net result was users immediately experienced log-in issues.

Unfortunately, when you configure time on your ESXi Host, there is no affirmative confirmation that the NTP servers you specified are reachable or valid! It doesn’t matter if you add correct NTP servers, or completely bogus addresses to the Time Configuration; the result is that the ESXi will report that the NTP client is running and seemingly in good health! Moreover, there is no warning or alarm when NTP cannot sync with the specified server.

Let’s create an example where we add three bogus NTP servers:

In this example, you can see the three bogus NTP servers, yet the vSphere Client reports that the NTP Client is running and there were no errors!

The only way to tell if your NTP servers are valid and/or functioning is to access the shell of your ESXi host (SSH or Console) and run the command: ntpq –p 127.0.0.1

image004

The result from ntpq –p demonstrates that *.mary.little.lamb is not a NTP server.

Now, let’s try using three valid NTP servers:

In this example, I have used us.pool.ntp.org to point to three NTP valid servers outside my network and the result (as seen from the vSphere Client) is exactly the same as when we used three bogus servers!

image008

The result from ntpq –p demonstrate that there are three valid NTP servers resolvable by DNS (we used pool.ntp.org), but that the ESXi host has not been able to poll them. This is what you see when the firewall is blocking traffic on port 123!

Additionally, when firewall rules change, preventing access to NTP, the ‘when’ column will show a value (sometimes in days!) much larger than the poll interval!

When an ESXi host is correctly configured with valid NTP servers and it is actually getting time from those servers, the result form ntpq –p will look like this:

image010

Here you see the following values:

remote Hostname or IP of the NTP server this ESXi host is actually using,
rfid Identification of the time stream.

  • INIT means the ESXi host has not yet received a response
  • CDMA means that the time stream is coming from a cellular network.
st Stratum
t tcp or udp
when last time (in seconds) the NTP was successfully queried. This is the important value: when the ‘when’ value is larger than the “poll” field, NTP is not working!
poll poll interval (in seconds)
reach An 8-bit shift register in octal (base 8), with each bit representing success (1) or failure (0) in contacting the configured NTP server. A value of 377 is ideal, representing success in the last 8 attempts to query the NTP server.
delay Round trip (in milliseconds) to the NTP Server
offset Difference (in milliseconds) in the actual time on the ESXi host and the reported time from the NTP server.
jitter the observed variance in the observed response from the NTP server. Lower values are better.

The NIST publishes a list of valid NTP IP addresses and Hostnames, but I prefer to use pool.ntp.org in all situations where the ESXi Host can be permitted access to a NTP server on port 123. The advantage to pool.ntp.org is that it changes dynamically with availability and usability of NTP servers. Theoretically, pool.ntp.org is a set-and-forget kind of thing!

ESXi Time Best Practices

Do not use a VM (such as a Domain Controller) that could potentially be hosted by this ESXi as a time-source.

Use only Stratum 1 or Stratum 2 NTP Servers

Verify NTP Functionality with: ntpq –p 127.0.0.1

VMs which are already timeservers (such as Domain Controllers) should use either native time services such as w32time or VMware Tools time synchronization, not both! See: VMware KB 1318

SSL3 GET RECORD

I was doing an in-place upgrade of a vCenter 5.5 to vCenter 6 (Windows), when I encountered an unusual error, that didn’t seem to have a relevant KB article or much other information. This was most definitely NOT a database incompatibility, as was indicated by the second error.

It turns out that the solution is buried in the vCenter 6.0 U1b Release Notes

On Windows OS:

Open file C:\ProgramData\VMware\CIS\runtime\VMwareSTS\Conf\Server.xml.

Remove the tag sslEnabledProtocols=”TLSv1,TLSv1.1,TLSv1.2″ from the below line in the server.xml file: <Connector SSLEnabled=”true” sslEnabledProtocols=”TLSv1,TLSv1.1,TLSv1.2″

Restart VMwareSTS and VMwareIdentityMgmtService services.

Start the SSO service.

Of course, in true VMware style, you will find no services with the names “VMwareSTS,” “VMwareIdentityMgmtService,” and/or “SSO,” so after you edit the file, restart all VMware and VirtualCenter services.

Here are the steps:

Edit the file: C:\ProgramData\VMware\CIS\runtime\VMwareSTS\Conf\Server.xml

 

Remove the text: sslEnabledProtocols=”TLSv1,TLSv1.1,TLSv1.2″

The file should look like this now:

Now restart all of the VMware and VirtualCenter services.

Data Center Etiquette

For the last 10 to 15 years, I have been in and out of Data Centers of all types. I’ve also had the privilege of working in many different capacities, from corporate IT pawn, to the guy running the project. During this time I have had the privilege of working with and learning from some wise old hands, but I’ve also witnessed colossal screw-ups on the part of uninformed Newbies!

I have recently been on an extended 3-city trip, working in and on large corporate Data Centers. I’m talking about million plus square foot facilities around Kings Mountain and Maiden North Carolina, highly-sensitive financial systems in Manhattan, and big-name IT in Silicon Valley. There is a particular etiquette to working in these massive corporate facilities; some of it written, some not. I thought I would collect my notes and present them here. Hopefully, this document will evolve and grow over time and with contribution.

Gaining access to the site/facility

Be expected or make an appointment

Even if the facility is a hosting or co-location facility and you are the client, simply showing up at the door is probably not going to work. You may need to be accompanied to and from your work area, or you may need to be accompanied the entire time you are in the facility. Either way, you need to make an appointment so the facility can schedule adequate resources to address your needs.

Be on-time

This should be self-explanatory. Never be late, but also don’t be early by more than 10 or 15 minutes. This is especially true if you are a contractor (as I often am) and/or working on a construction site; just wait in the car until it is your time.

Bring proper ID

Generally one form of government-issued picture ID will be acceptable. Some sites may issue their own ID/Access card, which you will be expected to provide on future visits.

Be prepared to sit orientation training

Hosting and colocation facilities generally do not require any sort of orientation prior to granting access. Corporate facilities, however, may require you to sit in orientation/security/safety training, before allowing you to access the site. These training “sessions” generally last 2-4 hours, but can take up most of a full day depending on the availability of Human Resources personnel to deliver the training.

If you ask in advance about the protocol required to access the site, you can be prepared to budget the necessary additional time in your work schedule.

Know the hours

Don’t get caught in a facility after-hours, or make people wait around on you. Some of the individuals who will be tasked with your visit may work for entirely different departments, or even different companies. Your decision to get “just one more thing done” might end up costing a great deal of money or making you very unpopular with management by pushing someone else into overtime! Don’t even be 5 minutes late leaving the facility if you have been given a hard-stop time!

Once you are in the facility

Don’t use anything that isn’t specifically earmarked for your use

What I am talking about can be as innocuous as a table or chair, or liability-inducing as a ladder. If you are working in a room and there are 3 ladders crowding the floor space, there’s a reason: one ladder belongs to the electricians, one ladder belongs to the HVAC team and one ladder belongs to the guys pulling copper and fiber through the facility! Chances are, you aren’t any one of these so don’t use their ladders under any circumstances!

Assume everything is being video-recorded

Any SSAE 16 audited facility will have recorded video surveillance throughout the facility. It’s not simply that you need to act professionally, but you need to mind your business and your business only! Don’t go looking around, peeking into server racks/cabinets that you haven’t been tasked with, or wander around. If you are observed looking or walking in places not specifically designated for you, the best case result is you will not be asked back on site; the worst case is you will be shown the door!

Assume all Internet traffic is being captured

If you are working as a customer in a colocation facility, your network is your network. If you have the authority to do so, you can ping, scan and trace to your hearts content.

If you are working as a contractor or employee on a corporate network, you must assume that all traffic is being monitored. First and foremost, don’t take your personal laptop with you if it has anything loaded on it that might cause unwanted/prohibited traffic. Peer-to-Peer (BitTorrent), gaming client, and remote access software are certain to be not only prohibited, but also easily detectable.

Diagnostic scanning and packet-capture (even done for legitimate diagnostic proposes) should only be done with the proper authorization and notice!

Special considerations for construction sites

When the Data Center you will be working on or in is still under construction, an entirely different set of rules apply, over and above the generalities expressed above.

PPE on a construction siteBe prepared with PPE (Personal Protective Equipment)

  • Safety glasses
  • Hard Hat
  • Steel-toe shoes with ankle protection
  • Safety jacket or vest

Some contractors may have loaner PPE, but you will be imposing and end up looking like a doofus. Bring your own PPE whenever possible.

Wear your PPE at all times, unless you are specifically allowed to remove part or all of it in areas of a facility. A good example from experience; hard-hats don’t fit well inside server-racks. Make sure to get permission to remove your hard-hat, prior to removing it and leaning into a rack!

Be ready to take a drug-test

Be ready to take a drug-test on-site or immediately in advance of accessing the site. Contractors will often require their own drug-testing protocol and provider, over and above any pre-employment or random policy your employer may already have in place.

Sign in and out daily

Even if the opportunity to circumvent security presents itself, be sure to sign in and especially out! Signing out of a job-site releases both you and the General Contractor of liability for things than happen when you aren’t there.

Equipment

If you’re going to be working IN the Data Center, then your work probably isn’t 100% logical. It’s best to be prepared for all eventualities, because your work will probably involve “touching” some form of equipment. A good Data Center kit will allow you to do your work, and address eventualities that may carry well-beyond what you intended to do.

IT toolkitTools and Equipment

  • Gigabit Ethernet port. My laptop doesn’t have a built-in Ethernet port, so I carry a USB to Dual-Gigabit Ethernet adapter.
  • DB-9 Serial port. Unfortunately these 1980’s-technology ports are still all over the place in the Data Center, and you may need to connect to one at any given time. Most computers of the last 10 years no longer have these obsolete ports, so I carry a USB to DB-9 Serial Adapter. You’ll also want:
    • USB Extension
    • Null-modem cable
    • DB-9 Gender-changer
  • External DVD-RW Drive
  • Quality computer-connectable label maker
  • RJ45 Cable-crimper
  • Cable tester, preferably the kind that not only tests continuity but also qualifies speed (CAT 5, CAT6, etc.)
  • Scissors
  • Clamshell tool kit with screwdrivers, Alan wrenches, side cutters
  • Needle-node pliers
  • Side-cutters

Supplies

  • Cable ties and/or Velcro
  • Flexible label tape
  • DVD-R Media
  • RJ45 ends
  • Colored vinyl tape (Red, Blue, Green, White)

Binaries

Believe it or not, many Data Centers either do not have Internet Access, or only highly restricted access. You should bring with you all of the binaries you intend to use to complete your work, plus a standard set of utilities, just in case. I can’t emphasize how important it is to download these utilities from trusted sources in advance of the job and then scan them with up-to-date enterprise antivirus software. Moreover, check with network administration prior to running a utility that creates any sort of broadcast traffic (like an IP or port scanner), or you will likely be shown the door in short order!

  • Wireshark
  • IP and Port Scanners. Sometimes, known good IP scanners are flagged by firewalls and antivirus software as malicious because they could be used by hackers – get better antivirus software if this happens, because these tools are legitimate and necessary in the right hands.
  • Gparted partition manager
  • Putty or a trusted Terminal/SSH client. You may have to emulate VT-100.
  • Have Telnet installed on your workstation – yes another 1980’s technology that remains prevalent in today’s Data Centers!
  • Windows source files for any Windows OS you will be touching. You may have to install .NET Framework or similar while offline.

Staying in touch

Some Data Centers are built with the properties of a Faraday Cage, or simply have no cell-phone signal inside. Yet many of these same facilities have a “Guest” WiFI network inside. It makes perfect sense; they get to control and statefully inspect all traffic in or out!

Since we have all become totally dependent on constant communication, think about how you will stay in-touch while in a Wi-Fi equipped facility with no cell-phone signal.

  • I have CSIP Simple (VoIP) installed on my cell-phone and can connect to the company PBX anywhere there is a reasonable WiFi connection (although Verizon blocks SIP port 5060 over mobile networks)
  • There are dedicated wired and WiFi VoIP phones available
  • There are VoIP clients available for your laptop computer
  • You can use GoToMeeting or similar technology
  • A good noise-cancelling headset with microphone. Data Centers are noisy places, so test your headset in advance at the noisiest location you can find. Remember, it’s not just that you need to hear the other person, they need to be able to hear and understand you as well.