Posts Tagged ‘esxi’

Running VMware vSphere ESXi 5.0 on the HP Proliant Microserver

VMware

HP Proliant Microserver & VMware vSphere 5Those of you that have bought an HP Proliant Microserver for your work or home VMware vSphere lab are probably wondering whether it will work with VMware vSphere 5.0?  Well, the good news is that it does and that even the installation  process goes through without a hitch both to local internal USB pen drive or local disk.  I have tested this with the final RTM version of vSphere ESXi 5.0, not just the beta builds, and can confirm that the CPU, storage controller, memory and network card are detected without a problem meaning that you’re all set to go for when VMware make available the VMware vSphere ESXi 5.0 download available sometime soon. 

With the free downloadable version of ESXi 5.0 the amount of physical memory accessible in the host has been reduced to 8GB so this combined with the fact that the HP Microserver can only take 8GB anyway won’t leave you feeling like you’ve wasted your money on adding extra memory.  Those of you, like myself, that also have an HP Proliant ML110 G6 with 16GB will either have to look at only using 8GB of it’s memory, keep installing the eval license on a regular basis (too much hassle) or purchasing an entry level vSphere license to allow me to access the full 16GB of memory in the server.  Either way, the new exciting features (see my post here for more details) found in VMware vSphere 5.0 is too much of a temptation to leave my vSphere lab servers at vSphere 4.1.

 

HP Proliant Microserver & VMware vSphere 5 - Summary

VMware vSphere and HP Proliant - CPU

HP Proliant and VMware vSphere 5 - Memory

HP Proliant Microserver & VMware vSphere 5 - Storage Adapters

HP Prolian Microserver & VMware vSphere 5 - Network Adapters

VMware ESXi vSwapping with SandForce SSDs

VMware

If, like me (James), your lab server is struggling along maxed out with only 8GB of RAM, disk IO can be a real problem because of guest paging and vSwapping.

I should really upgrade the box, but that means ditching the trusty ML115 – and in any case, the whole point of ESX is to do more with less.  There are also still many machines appearing with 8GB max RAM capacity, such as HP’s new microserver.  So I’ve looked at other options.

Memory Over-commitment

ESXi 4 achieves RAM over-commitment with page sharing, ballooning and vSwapping, and v4.1 adds compression.  Yet whenever my box is over-committed to any serious extent, my mediawiki VM for example takes minutes to spew out a page, and any audio streaming just stops.

The disks are the issue – a 4-drive RAID-10 volume for everything provided by a Perc 5i RAID controller.  A pretty harsh test is firing up a 3GB VM with the box already running at 90% RAM – ballooning requests guest level paging and the SATA array clatters away at something like 500 IOPS to service it.  With vSwapping doing more of the same, the controller queue depth of 128 commands pushes latency to a whopping 250ms with one an inevitable consequence – everything pretty much grinds to a halt.

SandForce SSDs

I’ve had an eye on SSDs since reading this vmware community’s article on vSwapping to SSD at the start of the year.  The concept is simple enough – use an SSDs for vSwapping, which can respond 50 times quicker than a mechanical disk.  The problem is that SSDs have been pretty expensive and many of the more affordable drives have awful 4K random write performance (slower than a mechanical disk in some cases).

Simon wrote on SSDs earlier this year, which inspired me to look again – and led me to the line of drives appearing with SandForce controllers, one of which I’ve been testing in my ML115.

Running an OCZ Vertex II in the ML115 G5

At time of writing there is a compatibility issue with the SandForce 1200 controller and the nVidia MCP55 SATA controller used in the ML115 G5.  The drive is detected in the BIOS, but ESXi doesn’t see it so I’ve hooked it up to the Perc 5i instead.  My box is pretty full so I’ve made do with slotting the SSD into one of the PCI card supports with a screw – hardly ideal, but the box isn’t going anywhere.

SSDs on Dell Perc RAID Controllers

The 5i is an old design and lacks SATA NCQ.  The controller works fine with the SSD, but performance is sub-optimal (about 7,000 4K random IOPS) so I swapped out the 5i for a 6i since it’s a drop-in replacement – it’s a bit faster, uses less power and has SSD and SATA NCQ support.  Note that in the ML115 G5, the on-board SATA ports need to be disabled to get into the card’s <CTRL-R> BIOS utility.

NCQ enables the controller the pass multiple commands to the SSD, making use of internal parallelism in the drive and so boosting throughput.  The 4K IOPS test (50% write) jumped to 14,700 on the 6i – quite simply stunning compared to mechanical storage and just what’s needed for vSwapping!

NCQ also gives the SATA RAID-10 array a 30% speed boost.  The 6i struggles through with the SandForce SSD, with sequential write levelling out at about 60MB/s.  Neither OCZ nor Dell could help unfortunately but it’s not of concern for my random IO use anyway.

Configuring vSwapping to run on SSD

With the SSD working OK on the 6i, a datastore can be created in the usual way and then used for vSwapping (see ‘Virtual Machine Swapfile Location’ page in the vSphere client). VMs then need to be suspended and resumed for the change to take effect.

vSwapping is a very blunt stick though – ballooning generates less disk IO and usually has less impact to VMs, because the VM’s own memory management is more intelligent.  It knows about areas that shouldn’t be paged and RAM that hasn’t been used recently, where vSwapping just chooses a bunch of pages at random and swaps them out (and stalls the VM while it does so).

So it seems to me that guest swap space should also be on the SSD, by adding a thick-provisioned disk to each VM on the SSD, creating a 64K aligned partition on the disk in the VM, and finally moving the swap file onto that drive (see here for Linux info).  The downside is the amount of space required – essentially twice the VM’s RAM allocation.

Does the SSD Deliver?

It’s already been demonstrated that SQL-Server performs quite well, but that was in an environment with much more RAM in the first place and an enterprise grade SSD.  My testing is less scientific, but here’s what I’ve found.

My test box typically runs with about 7GB used, so generating memory over-commitment isn’t hard – just starting a vSphere lab does the trick: two ESXi VMs, vCentre Server, a Windows domain controller and a virtual router together take the RAM load to about 15GB.  Ramping that up in a short space of time is a harsh test and without the SSD completely stalled the box – RDP sessions were dropped, audio streaming died and web servers appeared offline.

With the vSwap and guest paging all running from the SSD, the host survives the test with some stuttering on audio streaming.  Response from web servers on a ‘first page out’ basis after the test seems to vary from about nine to about 30 seconds.  An active session to a VM running Photoshop during test was a bit patchy but mostly usable.

Once the system stabilises however, performance of everything seems pretty OK with the RAM initially truely maxed.

Being a home lab server, it hasn’t got hundreds of users pounding every VM so the system hangs together pretty well.  Essentially I guess that provided the active memory is comfortably within the physical RAM, performance should hold up.

SSD Swap and RAM Compression

With RAM compression disabled, the swap rate peaked at 36 MB/s with the ongoing rate depending entirely on the load.

I was expecting RAM compression to help a bit, but was surprised by how much.  RAM compressed consistently reduced the swap rate by well over half with my ‘keen’ settings, but did seem to slow things down quite a bit too:

mem.MemZipAllocPct – 50
mem.MemZipLowMemMaxSwapOut – 50
mem.MemZipBalloonXferPct – 30
mem.MemZipMaxRejectionPct – 10
mem.MemSwapSkipPct – 75

Tweaking MemZipAllocPct and MemZipLowMemMaxSwapOut to 25% seems to provide a happy balance of swap and RAM compression throughputs.  Reducing the write load on the SSD is a good thing due to their limited write cycle life.

With this configuration, creating sudden memory pressure by starting a 3GB VM seems to enable the system to work everything much harder – the SSD peaks at nearly 50MB/s and compression 30MB/s.  Writing this, a WHS VM is performing de-duplicated backups, I’m installing vCentre, and audio streaming is continuing pretty well.

In the lab environment some attention is also needed to mem.idletax, since it is not always desirable for idle VMs to be more heavily paged.

In Summary

The SandForce SSDs completely change the storage dynamic for small offices and home labs – a single SSD provides three times the random (swap/database) throughput than the EqualLogic PS4000 reviewed on Techhead previously, at 1/200th of the power consumption.

Paging memory to memory is hardly new (remember EMS?), but for ESX, using SSD for swapping enables much higher RAM over-commitment (and hence VM density).  Echoing the earlier vmware communities blog on the subject, vSwapping to SSD is something it seems that vmware should be looking at supporting formally, for example by adding TRIM support in the vSwapping configuration (to maintain the SSD performance) and enabling any queue depth throttling to be overridden for a vSwap datastore.

The quick RAM loading test performed here proves that there is no substitute for real RAM, but with workloads more in line with the box capacity (each with perhaps 10% of physical RAM allocated) everything holds up without a hitch with some serious over-commitment.

I found best performance with the guest paging and vSwapping on the SSD and RAM compression enabled.  The balloon driver was able to ramp up recovered RAM more quickly than vSwap, and VM responsiveness was improved because spinning storage latencies were not then affected by host level swapping by multiple VMs.

The bottom line – adding the SSD has significantly increased the VM capacity of my ML115, but for how long will remain to be seen.

_________________________________________________

About the Author

Author: James PearceJames is regular guest contributor to TechHead and is a Kent based qualified accountant, currently working in information security and technical architecture with  most of his  time “being spent on virtualisation and business continuity at the moment”. Check out his virtualisation and storage blog here for more interesting and informative posts.

_________________________________________________

 

Technorati Tags: ,,,,

VMware ESX “I moved it” or “I copied it” – What’s the difference?

VMware

When you copy or move the data store location of an existing VM you will be presented with a message box (as seen below) in the vCenter Client asking if your VM has either been ‘moved’ or ‘copied’.  As you can see the message box also mentions “msg.uuid.altered: This virtual machine may have been moved or copied”, but what does this actually mean?

VMware ESX msd.uuid.altered

Figure 1. Has the VM been moved or copied?

What is a VM’s UUID?

Firstly, it is important to have an understanding of what a ‘UUID’ (universally unique identifier) is.  As the name suggests the UUID is a ‘identifier’ (128 bit integer) which is ‘unique’ to that VM, and effectively gives it a digital fingerprint to differentiate it from other VMs.

The UUID is automatically generated when a VM is first powered on or moved, with the UUID value being based on the physical host’s identifier and also the path to the VM’s configuration (vmx) file.  Within this configuration file the UUID value is stored in two places:

  • uuid.bios
  • uuid.location (hash based on the current path of the VM)

For example: uuid.bios = "56 4d 5e 58 66 f5 2d 04-03 31 0a bd 6f a7 19 88"

The UUID is also stored in the SMBIOS system information (ie: the BIOS of the VM) descriptor.  When the VM is started or moved the location UUID (ie: uuid.location) which is hashed from the VM’s data store path is compared to the UUID location hash which already exists in the configuration file.  At this point if the new and existing location UUID value differs then ESX knows that the VM is now running from a different data store location and will present the ‘Virtual Machine Message’ in figure 1 above.

But why do we care if the VM has the same or a new UUID? 

We saw in the message above provided by ESX informing that the UUID has in someway been altered but why does this really matter?  The answer to this you’ll be pleased to know is quite The vmx file contains important VMware ESX UUID informationsimple.  A VM’s unique UUID is used to generate other unique values used by the VM such as the unique MAC (media access control) address of the network card(s).  For example if you had multiple copies of the same VM/Guest OS running in your vSphere environment all with the same (ie: non-unique) network MAC address you will likely receive duplicate MAC address error messages within the guest OS which can cause a number of issues.

Another potential point to be mindful of is that some software licensing can be linked to a MAC address of a guest OS’s network card.  This includes software such as Microsoft Windows where changing the MAC address and some other key hardware components (eg: moving from an Intel based ESX host to a AMD based ESX host) can mean you have to re-activate the software again.  The changing of a VM’s MAC address will occur when you select “I copied it”, the next couple of sections will go into more detail on what exactly is altered.

 

Should I Select “I Moved It” or “I Copied It”?

So what is the difference between selecting “I_moved it” or “I_copied it”?  The easiest way to demonstrate the differences is by viewing the configuration file (vmx) for the VM before and after the two different options have been selected.

 

“I Moved It”

By indicating that you had moved the VM (instead of copying it) the only UUID change that is made to the configuration file is to the ‘uuid.location’ setting, which as you’d expect indicates a change of location for the VM. The ‘uuid.bios’ and the existing generated network MAC address remains that same.

You will also notice that the CPUID settings have also changed which is also the case for when you indicate that the VM was copied.

The “I Moved It” option should be used when ‘moving’ the location of where a VM resides and a copy of the VM has not been made.

VMware ESX I Moved It

 

“I Copied It”

When you select that the VM has been copied then there a few more changes that are made to the VM’s configuration file when compared to just moving it.  These changes are to the ‘uuid.bios’, ‘uuid.location’ and as a result of these changes a newly generated network MAC address (ethernet.generatedaddress).

The “I Copied It” option should be used when you’ve made, and intend to run, more than one copy of the VM in your vSphere environment.

VMware ESX I Copied It

 

To summarise, here is a table which outlines the changes that are made when either the “I Moved It” or “I Copied It” are selected

  “I Moved It” (change?) “I Copied It” (change?)
uuid.bios uuid.bios uuid.bios
uuid.location uuid.location uuid.location
ethernet.generatedaddress ethernet.generatedaddress ethernet.generatedaddress
guestCPUID.x guestCPUID guestCPUID
hostCPUID.x hostCPUID hostCPUID
userCPUID.x userCPUID userCPUID

 

As you can see it is worth spending the time to understand the changes which will be made when presented with the “I moved it” or “I copied it” options as it can impact (eg: software re-activation) the guest OS of the VM.

I hope this helps clarify this small aspect of vSphere administration which can sometimes be an area of confusion.

 

How to automatically shut down VMware ESXi gracefully during power failure using an APC UPS.

VMware

VMware ESX ESXi - How to shut down using APC UPSIn this latest post by TechHead guest contributor James Pearce he covers a topic near and dear to many of us – how to get VMware ESX/ESXi and its VMs to shut down gracefully upon power failure to the host.  Tighter integration between a UPS and a VMware ESX/ESXi host is no doubt something that will become more mature over time though for now it can be an issue for many administrators especially those running the free version of ESXi.  So read on to find out how James overcame this issue in his virtualization lab.

 

A Nearly Free UPS

APC UPSI recently acquired an APC Smart UPS that was being chucked out from work (having never worked) for my home lab along with an ancient AP9606 management card. With the batteries changed the UPS burst into life – but after some messing about getting the right firmware on it, I was disappointed to find no easy way to get it to shutdown my ESXi box when it needed to.

 

VMware License Restriction

The VMware management appliance (vima) can shutdown only paid-for installations of ESXi (using apcupsd and VMware community member lamw’s scripts) – the necessary interfaces on the free version have been made read-only since ESXi v3.5 U3.

 

Burp Suite

VMware ESX ESXi - How to shut down using APC UPS I’ve been finding a lot of use recently for network sniffers, so thought I’d have a look at how the VMware vSphere Client works, as obviously that can shut down the host. As luck would have it, the client is nothing more than a glorified web browser with the slight complication that it’s talking over SSL – but that’s no problem for PortSwigger’s Burp suite in its transparent proxy mode.

The traffic captures revealed that only three frames would be needed to perform the shutdown (hello, authenticate, and shutdown). A little manipulation is needed to get the session keys in, but that is basically it. ESXi’s startup and shutdown policy will do the work suspending or shutting down individual VMs, as configured through the vSphere Client.

 

The Script – shutdown.bat

Using this newly found knowledge I’ve created a Windows batch file (with a few supporting text files which are basically HTTP requests) that takes the hostname, username and password as parameters and will then shut down the host cleanly. The script needs something to launch it – APC PowerChute Network Shutdown in my case – and a utility to send the commands over SSL, for which I’ve used Nmap ncat (which just needs to be installed).

I have put all the necessary script files into a single convenient zipped file which you can download from here – the scripts are fairly well commented so you should be able to follow what is happening.

 

APC PowerChute

A potential issue is that APC’s PowerChute Network Shutdown utility will always shut down the Windows machine it’s running on. I’ve therefore used a separate Windows management VM to host PowerChute and my script, since I wanted everything else just suspended.

PowerChute has an option to ‘run this command’ but it’s limited to 8.3 paths and won’t accept command line parameters. A separate batch file is needed (poweroff.bat) that runs the shutdown script with the parameters – but that could shut down other ESXi boxes as well if required. Also the PowerChute service needs to be run as local Administrator as the default Local System account doesn’t have sufficient rights.

 

Testing the Scripts

VMware ESX ESXi - How to shut down using APC UPS

Download the ZIP and extract the files – I’ve assumed the package will be extracted to c:\scripts\esxi; update the path in poweroff.bat otherwise. Also the hostname, username and password also need to be specified in poweroff.bat.

Next install and configure PowerChute (in particular change the service user account) and enter the script in the ‘run this command’ box – I also increased the time allowed, but in practice it runs in a few seconds.

VMware ESX ESXi - How to shut down using APC UPS

Some waiting around can be avoided when testing by setting the UPS low-battery duration as high as it will go – just remember to change it back.

Next open up vSphere Client from a real machine, pull the UPS plug and once the battery get’s down to the specified number of minutes remaining, the script should run and the tasks will appear in vSphere Client.  Shortly afterwards the VM used to launch the script will itself shutdown under the control of PowerChute!

 

In Summary

The complete set of files can be downloaded here, and nMap ncat installation for Windows from here. Then a UPS management application is needed, for APC Smart UPSs use PowerChute for Windows.

The shutdown script includes logging and should report most errors. Bear in mind though that once a host is shutdown, it probably won’t be restarted when utility power is restored.

Burp Suite is a handy utility to bypass device limitations by enabling the scripting of management tasks that are only usually available through a web interface. I’ve used it to build scripts to regularly reboot home-spec routers every couple of weeks to keep them stable, and to set the time on the APC AP9606 management card daily since it doesn’t support NTP – and here to build a UPS shutdown script for ESXi; functionality that should really be built into ESXi in the first place.

 

AppAssure
TechHead Needs You - Top 25 Blog Sites
Trilead
VMware vSphere Recommended Reads
Veeam #1
TrainSignal - vSphere Pro
StarWind Software