Thursday, May 18, 2017

VM Performance Issue Troubleshooting

Always Remove Unnecessary Hardware on a VM after Doing P2V


One day, a customer of mine called me asking for advise for an application that his team has just P2V but was having performance issue. As usual, my response was to understand the issue and the application itself, came up with some suggestions, including asked them to raise a support request (SR) to VMware Support. Couple of days passed by, but no resolution on the issue. So I decide to visit the customer to see whether I can help. This blog documents the process I took which ended on resolve the issue, and points out one important step post P2V which tends to be missed out.

What's the issue? On what ground the user said the application is slow?


This is the first thing I try to understand if facing a performance issue. Can the user really quantify the slowness? Or is it just based on felling? For this case, the performance issue was quite clear. They showed me that running one process took about 10 seconds in P2V app, where in physical normally only 3 seconds. Ok, now I know what to expect. My goal was to get that 3 seconds back.

What's the application? How's the architecture? How user access it?


Next is to know what application is that. We might want to cross-check whether any documented best practice available for that application. One place to check is at Virtualizing Business Critical Applications page on VMware website, or just Google with keyword "application name on VMware best practices". If one exists, you can use it to later check whether the VM/application already configured according to the best practices.

Thursday, May 4, 2017

My VCAP6-DCV Deployment Exam Preparation

Last Friday, I finally sat on my VCAP6-DCV Deployment Exam after around 6 weeks of intensive preparation. The preparation itself is really exhausting. Need a lot of determination to keep it going with my study plan, and that includes sacrificing so many precious time with my loved ones during weekends and public holidays, which somehow there's so many public holiday in Indonesia in the last 6 weeks, and I don't know how many glass of coffees to ensure my eyes and brain active to receive all the information. My study plan is simple, VCAP6-DCV Deploy Exam has 26 objectives from 8 sections. So everyday, I try to cover all skills and abilities required for one objective. Though, a lot of time I missed the time target since the objective is so extensive to cover, or I was bustled with my work. I started my study with the blueprint sections which I think I strong, which are storage, network, availability and scalability, and performance, and then continue to the rest of blueprint sections.

Here is some sources which help me preparing the exam:
  • Official Exam Blueprint: https://mylearn.vmware.com/mgrReg/plan.cfm?plan=88753&ui=www_cert
    • Officially only available online, pdf (like in VCAP5 days) no longer available
    • This blueprint will tell you all the official documents you need to be able to cover all skills and abilities required.
      • Read each skill and ability from each objective, find related basic theory on the manual, and then practice the steps required to implement that objective.
  • VCAP6-DCV Deployment Study Group: https://plus.google.com/u/0/communities/108002000588564278612
    • Check this Google+ group for other test takers experience on preparing the exam.
  • Notable Study Guide (For Kyle Jenner, Mordi Shushan, and Ramy Mahmoud who dedicated their time to write these great study guides, your work are awesome! Much appreciated.)
  • VMware Hands-on Lab: http://labs.hol.vmware.com.
    • One of the key thing for this exam is to get your hands-on, so having access to a lab is really crucial. I have my personal lab, build a nested ESXi using VMware Fusion on two Macbook Pro, but I also leverage VMware Hands-on Lab extensively.
      • FYI my HOL account transcript records that I completed 25 labs during my exam preparation period. Some labs taken couple of times since the same lab can be used to cover some objectives.

Monday, November 28, 2016

vSphere (and Some Other Products) Upgrade Notes



Recently, a lot of my customer are planning, doing, or have just done vSphere upgrade. Mostly due to vSphere 5.1 which already in end of general support phase per 24 August 2016. Technical guidance will still be provided for vSphere 5.1 until 24 August 2018 (For a complete important date on your product support phase, please check this VMware product lifecycle matrix.), but please note that no more security patches or bug fixes will be released for vSphere 5.1 in the future, unless stated otherwise. Other than that, during technical guidance phase, support request will only be given to low-severities issues on supported configuration only as stated in this VMware lifecycle policies. This is my personal notes on some information which can help in planning VMware environment upgrade.

Monday, November 14, 2016

Why Guest OS Task Manager is Showing Different Value Compare to vSphere Performance Monitor?

Demystifying CPU States in vCPU World


Have you experienced a situation where your guest OS task manager is showing different value compare to vSphere performance monitor? Or you get a request for additional vCPU from the application team which uses your VM because they see their VM utilizing almost all vCPU they have, but when you check vCPU usage of that VM in your vSphere web client, it only shows low utilization? Is there something wrong? Before you think there's something wrong with vSphere performance monitor, read this article to understand what's causing that situation.

Figure 1. Windows task manager shows ~100% CPU Utilization
Before going further, let me first describe the situation clearer. Figure 1 and 2 are coming from the same Virtual Machine, perf-worker-01b. The first figure shows Windows Task Manager where the CPU utilization is hitting 100% for most of the last 4 minutes. The second figure shows vSphere performance monitor which taken about the same time as Figure 1, and this figure reveals that for the last 4 minutes, VM CPU usage is only around 50%. FYI, I actually ran a CPU benchmark tool on perf-worker-01b for about 30 minutes, and for most of the time in that period, Windows task manager showed 100% CPU utilization, while vSphere performance monitor showed around 50% CPU usage. Why vSphere performance monitor only showed 50% CPU usage when Windows task manager showed ~100%?

Monday, November 7, 2016

Quickly Identify Whether My Virtual Machines Get All CPU Resources They Need

Background

One of the capability brings by virtualization is the ability to run several virtual machines in one physical machine. This ability may lead to something called over provisioned, where we provisioned resources to VMs more than what we have in the physical layer. For instance, we can create 20 VMs, where each has 4 vCPUs - in total of 80 vCPUs provisioned, while the server we used only has 20 CPU cores. Wait.. wait.... If we only has 20, how can we give 80? How can we give more than what we actually had? Actually the answers is one of the reason why virtualization rose in the first place: most of our server has - in average - low CPU utilization and each server has different time in experiencing peak and low utilization. VMware vSphere manages how VMs get their turn utilizing physical CPU resources in efficient and fair manner by a component called CPU Scheduler. In simple word, CPU Scheduler is like traffic light. It rules who may go, or in this case who may use the physical CPU resources, and who need to stop and wait. More about CPU Scheduler can be found on this CPU Scheduler Technical Whitepaper.

Using the analogy of traffic light, we know that at one time, the number of cars can go will be defined by numbers of lanes available. If the road has 4 lanes, then only maximum of 4 cars can pass at the same time, other cars will queue behind the first row. This is also true in virtualization, even though we can do over provisioning, what CPU scheduler can schedule at a time will be limited to how many  logical CPUs available on the physical server. Means, if at any one time there are several VMs, with total vCPUs more than available logical CPUs, asking for their share to use logical CPUs, then some of those VMs will need to queue. By having to queue, it will takes longer for a VM to finish its job. Now the challenge is how to identify this queue, and furthermore how to manage that queue into an acceptable timeframe. This article will try to answer the first part, while for the latter will be discussed in the future article.

Monday, October 24, 2016

VMware Virtual Machine Virtual Disk Security


Last week, during VMworld 2016 Europe, VMware announces the latest release of vSphere 6.5. You can check this, amongst other things being announced, on this press release. One area of improvement for vSphere 6.5 is in virtual infrastructure security which you can read here. What interests me related to the new security features is VM encryption, as some customers which I met asked about this capability. So I dug out an old post which originally was a personal notes I wrote back in 2014 about some points of discussion regarding virtual disk security, and modify it to be relevant with the recent announcement. 

OK, let's understand the problem first. Remember one of the characteristic of virtualisation? Encapsulation. In other word, VM basically is only a set of files. If those files happened to be walked out the door, then people can mount it up, extract the files/information, or even have the VM up and running. Check this article if you want to get the idea on how that could be done.

You might say that if that situation happened, that means that company not applying a good security policy, and if that is the case, anything can happened, even in non virtualise world. Well you got that right, but let's see what we can do to prevent that situation, how VMware able to cater this situation, how VMware can make sure that if virtual disk leakage happened, the person who have it could not take advantage from it.

Monday, October 17, 2016

[lunar.lab] Build My Lab Network Using VyOS

Am trying to build my own lab. The idea is to have three "virtual datacenter" as described in the following figure. Datacenter A and datacenter B would be two independent datacenter, where later I can simulate DR failover, workload mobility, stretch network, etc across those two datacenter.  Each datacenter will have their own ESXi hosts and vCenter. Datacenter C is where I keep shared services which are required by either datacenter A or B, but not relevant to the test that I want to perform. Other than that, datacenter C will hosts some workload which mimic as user accessing workload on datacenter A or B. Each datacenter will have their own router, and dynamic routing should be configured between those 3 datacenter as later I want to explore NSX multi site capabilities. You can see the network and address that I plan to use on the following figure.