Virtualization Realized

March 23, 2012

vSphere 5.0 – Cannot Create Diagnostic Partition

Filed under: Uncategorized — dferguson75 @ 2:43 pm

While I have been working with Auto Deploy to hatch out an army of stateless ESXi hosts, it has come to my attention that the ESXi dump collector functionality is intermittent at best. As a result, we have been forced (sort of) to using local disks to capture PSoD dumps, which works fine. That is, until I have had to set up diagnostic partitions on my HP servers. For some odd reason, when I add a local diagnostic partition on my HP hosts, I get the message Call “HostDiagnosticSystem.CreateDiagnosticPartition” for object “diagnosticSystem-60″ on vCenter Server “VC5″ failed.

So, I log into the host and try it from there, only to get the message Call “HostDiagnosticSystem.CreateDiagnosticPartition” for object “diagnosticsystem” on ESXi “esxiserver” failed.

I see now, it’s not necessarily a vCenter issue or a host issue, per se. So I turn on ESXi shell and dig up the command parameters for partedUtil and manually create the partition on a pair of hosts by entering this really long command (which works):

partedUtil setptbl "/vmfs/devices/disks/naa.6005............................" gpt "1 128 200000 9D27538040AD11DBBF97000C2911D1B8 0"

Since this could get quite nasty, I poked around to see what my options were and discovered that, if I create a VMFS datastore on the disk (either via vCenter or by managing the host directly), I can then remove the VMFS datastore and then create the diagnostic partition. Looks like I have found a little more work for VMware if they haven’t already discovered it. This issue doesn’t seem to be present with Dell servers, since I created a freshly-initialized RAID1 volume on the onboard PERC adapter prior to booting and ended up with a 4GB scratch partition, a 100MB vmkcore partition, and the rest being VMFS. This happened on my R900s and my R710s.

January 10, 2012

Doodle of the day

Filed under: Uncategorized — dferguson75 @ 3:56 pm

20120110-155141.jpg

I admit I’m a pretty crappy artist, but here is a visual of what was happening to us because of some in-box QLogic NetXen NIC drivers on hosts running ESX 4.0u3.

December 1, 2011

Follow-up on my previous post

Filed under: Computers and Internet — Tags: — dferguson75 @ 10:38 pm

In a previous post, I ranted about RIM’s ineptitude in kindling any excitement surrounding their latest product line-up. Why did RIM possess such a market share, only to find the rug practically pulled out from beneath them?

I learned the answer when I sold my Bold 9700 for enough to score my new iPhone 4 and an otterbox defender case. I was fortunate enough to have found a seller in the Amazon marketplace selling a replacement OEM bezel to fix the damage done by one of my little ones. The fix, plus an $8 unlock code, helped me rake in $150 for a nearly two year old device that I got for free.

Since I was fairly impressed with my wife’s 3G S she scooped up for free, I decided to go for the iPhone 4. I really like the mail client (no more clumsy mobile web sites that I had to deal with using the Bold without a data plan) and iMessage is quite a treat. The calendar could use some improvement (namely being able to set up weekly recurrence that spans multiple days of the week), the ringtones are pretty lame, and it goes through a data plan like a liberal through the debt ceiling. (I would not advise using a 200MB data plan. Minimum should be at least 1GiB.)

The wealth of apps available speaks volumes for its ecosystem. Though I do have a few pet peeves about the app store. I’d like to be able to shun games from the top 25 lists. I am a recovering time waster and games are like an enormous black hole to me. Another feature I’d like to see is to be able to sort or filter by rating. If thousands of reviewers agree a product is 4+ star quality, I’ll wager some time in seeing if the app suits my use case for it.

Ok, so why is RIM falling so far behind? Probably because it’s a high-tech version of the horse and buggy. Devices ran on Java. Each device had ‘just enough’ hardware to work, but not enough to future-proof the device adequately. I saw evidence of this when I took my Blackberry to VMworld. Not only did I feel like the skunk at the garden party, I was repeatedly frustrated to no end with its constant out-of-memory issues trying to access the VMworld mobile website. There were times when the browser would dump me immediately and others where it would dump me at random (dump, meaning the browser would close all tabs, only one in nearly all cases, and open my home page in a new tab. Staying logged in to the hotspots was also an issue. It repeatedly would act as if it were still logged in but I couldn’t surf until I manually opened the Cisco hotspot logout page then logged in again.

I was running Blackberry 6.0 on my 9700, even before it hit the AT&T downloads page. By the time VMworld rolled around, I was running the second official 6.0 build for the 9700, and really liked the improvements in usability over 5.0. The real reason I updated to 6.0 was all the Java exception errors I used to get at random. At the cost of precious RAM resources and application space, I updated to 6.0. It took some searching, but I found a build for the 9700 and after several wipe-and-load operations, I had my device running 6.0 reliably. No more exception errors and web browsing, thanks to the webkit functionality, was finally viable. I could zoom pages and only have to scroll down to read the text I zoomed for in the first place.

The other chief complaint was functionality. Applications and features, which are child’s play on Android and iOS, are non-existent on Blackberry. This goes back to the Java part. In my experience, Java apps run really well on *nix platforms; not so well on others. In order to keep such a tight footprint on Blackberry, it really can be likened to a car salesman asking a customer whether they want power steering or wheels on the car he’s selling, because you can’t have both. Most customers would leave the lot lickety-split, and that’s what the market is doing with RIM. Sorry, guys. I was a Blackberry skeptic back in ’03-’06, then turned evangelist, and now I’m once again a skeptic. RIM is in damage-control mode and, from there, odds are pretty slim for emerging. Most tech companies fade into obscurity once they reach this point and few return to prominence. RIM needs some executives with vision and fast. They need to find the next Steve Jobs or Paul Maritz and fast.

October 23, 2011

A little reflection on what Steve Jobs said during his commencement speech at Stanford

Filed under: Uncategorized — dferguson75 @ 11:02 pm

I bought an iPhone for my wife the other day (it was the 3GS 8GB model that was free, which appealed to my Scottish origin). This post isn’t about the iPhone, but as my wife was impressed with the experience, the conversation surrounding it led to a discussion about the statement Steve Jobs made during his commencement speech at Stanford U.

Mr. Jobs talked about how he dropped out of college – partially. Basically, he abandoned his major of studies and ‘dropped in’ on various classes of interest to him. From that, he said (and I paraphrase) that it led to the birth of the Mac, which we all know today to be a fine piece of equipment, no matter which flavor one partakes of.

That got me to thinking about my own story. I too had been in college in my early adult life, and just like Mr. Jobs, I dropped out. I was an electrical engineering student at a local community college. I got there because I graduated in the top tenth of my class with a 3.06 GPA. The ironic part of it was that I was pretty much a slacker throughout high school. Metal shop and (eventually) computers were my interests, because I was a very practical minded person. Sure, math, science, and the other core academics had their place (nobody would want someone who couldn’t add or subtract milling metal stock that would be instrumental in keeping an aircraft at 30,000 feet with hundreds of passengers aboard), but to me, I enjoyed seeing concept becoming a finished product. More importantly, I was a hands-on person that liked to touch every stage of production. Kind of like an artist, but using machines instead of art supplies. When my four year sentence at the local high school was commuted, I found myself, not at a fork in the road, but at the base of a seemingly immense mountain. It was actually a hill, but any hill looks like a mountain when you’re smack-dab in front of it. I left the graduation ceremony not knowing what I was going to do with myself.

Since I liked to tinker with electronics and didn’t have the luxury of taking any formal classes on it during high school, I enrolled in Electrical Engineering at the college where I had been given a two-year scholarship. I figured I’d eventually be designing some high-tech computer components that would solve complex problems. Boy was I wrong. My first semester had me wondering how anyone would subject themselves to such boring material. I did pretty well in English comp, but learned that homework, which I spent the last four years blowing off, was really critical when taking trigonometry and higher level math courses. During my second and third semesters, before I had trashed my academic progress so badly that I lost the remainder of my scholarship, I dropped in on a couple of classes on PC networking using Netware 3.12. I was also employed at the college and had been a co-instructor for a nighttime PC repair class there. Over time, between those classes and taking a job at the college maintaining lab PCs, I discovered that pursuing a degree in EE was not what was in my best interest.

Then the letter came. I came to the realization that I had trashed my chances of completing my AA degree when I read that my scholarship was being revoked for three reasons: 1) my GPA was below the acceptable threshold, 2) I spent too many credits on elective classes, meaning I would not graduate on the scholarship alone, and 3) I wasn’t maintaining full-time status, on account of being employed full-time by this time. It turned out to be a blessing in disguise.

I got married in the fall of my sophomore year in college. Just prior to that event, I received news that the grant money which funded the department in which I worked at the college had been cut back such that I could only work about eight hours a week. I went from about 30 hours down to 8, which made it impractical to continue working there.

Shortly after getting married, I interviewed at Computer City for two positions in preparation for the Christmas season in ’94. One was in sales; the other was the configuration desk at the back office repair facility. I was offered the sales position and the manager offered to keep it open to me, had the configuration tech position not panned out. I was offered the configuration tech position and eventually moved up the ranks, meaning I got a more regular, predictable schedule and was performing repairs, which earned more commission per capita than installing memory, hard disks, and other peripherals.

Armed with some knowledge of networking, Netware 3.12, and a good deal of enthusiasm to expand the horizons of the back office service center, I approached the division director and asked him what he thought about having us perform onsite services and getting more involved with networking and the network OSes. After all, those services were listed on our rate chart, so why not? His reply was like the seasoned hunter skillfully shooting down an unsuspecting water foul in total coolness. They were nice enough to pay for me to take my A+ exam, though. I took it and passed it with absolutely no preparation other than on-the-job experience (one with a score of 90% and the other with a score of 100%; it consisted of two sections in ’96).

I interviewed with a company called Bay Resources out of St. Petersburg, FL in 1996 and accepted an offer from them. The pay was about the same and it involved some driving. The good of it was that there was clearly-defined advancement opportunities, once I got used to not being a bench tech anymore. Tom, the service manager, offered me a significant raise if I could get MCSE-certified within a certain timeframe, putting me on the advanced services team, where I would be performing the kind of work I proposed to the division director at my previous post. From the wisdom of the last chapter of Dumas’ The Count of Monte Cristo, “Wait and hope”. Had I stayed the course of a college education at the time, I would have probably amassed more than just the $3600 student loan debt from that computer I so badly ‘needed’. It only cost about $1000 between books and exams to get certified, all of which was taken on by my employer.

Since then, I have been laid off twice, started and closed my own business, and have moved to another state. I have been a consultant twice, worked with the DoD and small, medium, and large companies. Like Steve Jobs, had I not dropped out of what would probably have been a dead-end career for me and dropped in on something interesting, there’s no telling what it would be like for me and those in my sphere of influence. I really think folks need to look at their lives and apply that little bit of wisdom. There is a good deal of disillusionment going around because of this idea that a college education equates to a better way of life. Maybe it’s time to start listening to more stories from successful college dropouts and less from broke graduates.

July 20, 2011

VUM failure caused by Emulex VMware Core Kit

Filed under: Uncategorized — dferguson75 @ 6:11 pm

For the past few days, I have been chasing my tail on an issue with VUM not patching certain hosts. The seemingly common thread was that they were upgraded from 3.5U2. At first, I thought it was due to lack of free space on the / partition, but eventually that proved to be a flawed theory when I couldn’t even apply a single patch in the few hundred KB range. A colleague and I were up and down the esxupdate.log and couldn’t find anything (though it was late at night and difficult to spot anything when you’re bleary-eyed).

Finally, I caught the culprit (practically in the corner of my eye). Something about a failed RPM transaction showed up while tailing the esxupdate.log file. I was able to zero in on the error and found out that there was an unresolved dependency with the elxvmwarecorekit RPM and a missing Emulex CIM provider RPM.

Could it be? I quickly uninstalled the RPM and retried remediation. It worked!
It was only a small subset of patches, so I had to try a bigger batch. I threw the whole U3 baseline (all patches released on or before 5/30/11) at another host that had only the U1 base build (that’s another story) and ran the remediation. It succeeded this time, so I ran the uninstall against the rest of the hosts that kept failing.

I used the following command (so it only actually fires on hosts with the core kit installed) to remove the core kit:
for i in `rpm -qa | grep elxvmwarecorekit-esx40-4.0a44-1` ; do rpm -e $i ; done

Crossing my fingers for tonight’s follow-on deployment…

July 16, 2011

Never change the date & time on a HA-enabled ESX host in a vDS

Filed under: Uncategorized — dferguson75 @ 1:31 pm

…Unless it’s in maintenance mode. I made this mistake yesterday and induced a HA isolation event. Here’s the scenario and how something seemingly insignificant became so impacting:

The configuration:

  • 8 ESX hosts in a HA + DRS-enabled cluster.
  • vCenter running in a VM on a host in said cluster.
  • Distributed Virtual Switching enabled with vCenter VM in a vDS port group.
  • Cisco Nexus 1000v (though I am in agreement with VMware that it didn’t come into play much in the incident).
  • One of said hosts had its clock 5 hours behind the rest. Not sure how that happened since we use the same Kickstart template to build all hosts in this and other clusters.
  • So here’s what happened:
    An Exchange admin pointed out that two BES servers had their clocks set wrong. Upon investigation, I asked if VMware tools was set to sync time from the host. He checked and it was. A lightbulb went off in my head that all my troubles trying to SSH into that same host were likely due to the time skew being out of tolerance by nearly five hours.

    I promptly investigated the host and found that both a) NTP wasn’t enabled, and b) the clock was about 5 hrs behind. So, I enabled NTP and proceeded to fix the date & time on the host. What happened next was quite a ride.

    vCenter reported a possible host failure on the host I just attempted to adjust the date & time on. However, vCenter was still operational, so the host didn’t fail. Within seconds, the vCenter ‘dropped out’, meaning it lost connection with its linked mode peers and I was unable to manage it.

    I promptly engaged our network resource that had the most experience with the 1000v and logged a SR with VMware. While on hold, our network resource and I determined that, if VMware had no better course of action, we would pull one of the two 10g uplinks out of the vDS and create a standard vSwitch and VM port group and put the vCenter VM into the ‘emergency port group’.

    VMware had no tricks up their sleeve, so we proceeded with the previously-discussed plan of setting up a standard vSwitch, etc. Eventually the Nexus sync’ed with vCenter and the hosts’ VEMs synced with the NX1k VSM and connectivity was gradually restored. I say ‘gradually’ because it took a while to apply the port group policies to the 8 hosts.

    A review of the logs on the host (grep -A4 -B4 -i isolation /var/log/vmk*) showed that I had adjusted the date and time and triggered HA to ping the default isolation address. Apparently the time change occurred in such a way that the RTT on all heartbeat traffic on that host spiked, leading AAM to believe it was isolated.

    In the end, I learned my lesson. Any config changes that might impact HA behavior should only be performed in maintenance mode, or at least with HA put into maintenance mode or disabled altogether. And all this just hours before migrating another wave of servers from our old datacenter using SRM installed, guess where, on the vCenter server impacted by this misfire.

    July 1, 2011

    RIM’s next ‘Ho-Hum’

    Filed under: Uncategorized — dferguson75 @ 1:58 pm

    With RIMs announcement of their next generation of the Blackberry Bold (9900 & 9930), my response was a resounding ‘YAWN’. The specs are nice, but RIM isn’t doing a good enough job of targeting the smartphone buyers. I own a Bold 9700 and, while I don’t have full-blown buyer’s remorse, I would have chosen differently had I the chance to do it over again. Multimedia and selection of apps (and the fact that most apps don’t work if you don’t have a data plan) have been key contributors to my buyer’s remorse.

    What I think would be a game-changer for RIM would be to do the following:

    1. Develop a BES client app for the different mobile platforms (tablets included) and make it available for free to compete with ActiveSync. It should keep BES data in a sandbox so that individuals don’t lose personal data when their employers’ IT staff wipes the ‘devices’ since they are virtual. Appropriate security measures should be taken so that information spillage can’t occur between the personal device and the ‘corporate’ virtual device. One of my chief complaints about ActiveSync comes during every password change cycle. These days, it’s becoming more commonplace for users to have more than one device (e.g. an iPhone and an iPad) on ActiveSync, making account lockouts more frequent and likely.

    2. Price the BES carrier service so that it fits into the mobile carriers’ personal service pricing models. For example, AT&T’s 200MB service for $15 would include BES. Their 2GB service for $30 would also include BES. No more up-charges for using BES. Make the end-user experience be as convenient and seamless as possible. Download app, perform activation, sync, and go.

    Being a former BES user, I have a real appreciation for the user experience it offers. Once your device is activated, there’s not much else to it. You receive your email and your contacts, calendar, tasks, and notes are always in sync.

    One caveat would be that any applications that ride on BES would need to be re-engineered to run on the different mobile platforms and the BES client would have to include hooks that allow those apps access to BES.

    Supposedly Good does something like it, but field reviews are mixed and the product is expensive.

    May 18, 2011

    How to determine if your VM is using hardware assisted CPU/Memory features

    Filed under: Uncategorized — dferguson75 @ 6:53 pm

    Newer AMD and Intel(r) CPUs have the ability to offload, or assist, CPU instructions and memory translation from the hypervisor, but figuring out which feature is in use is a little bit challenging.
    To determine which is in use, you will need to look at the vmware.log that is stored in the same folder as the VM’s .vmx file.

    grep "HV Settings" vmware.log

    will return a line similar to:

    May 16 15:10:43.998: vmx| HV Settings: virtual exec = 'hardware'; virtual mmu = 'hardware'

    ‘virtual exec’ tells you whether or not VT or similar technology is in use.
    ‘virtual mmu’ tells you whether or not RVI/EPT is in use.
    Intel EPT is only available on Nehalem or newer CPUs.

    May 5, 2011

    A little esxcli relief when it comes to masking rules

    Filed under: Uncategorized — dferguson75 @ 3:58 pm

    Special thanks to Antonia AbuMatar from Level3 storage on this one…

    I’m a complete newbie when it comes to shell scripts so I asked Antonia for a little help with this. Basically this script will dump out the esxcli commands to create the claim rules necessary to mask LUNs presented to ESX hosts prior to having them removed. (For ESXi, just substitute the esxcfg-mpath command with the vicfg-mpath command, but it should run on your vMA). This command implies that you have dumped your naa IDs to a text file called naa.txt.

    #!/bin/bash
    #
    # bulkLunMask.sh
    #
    # Parse naa list file (naa.txt) and build claim rules
    # Make sure you verify first
    #
    cat /dev/null > cleanupclaimrules.sh
    count=111
    cat naa.txt | while read naa; do
      esxcfg-mpath -b -d $naa | grep 'vmhba' | awk '{print $1}' | sed 's/:/ /g' | tr -d C | tr -d T | tr -d L >file1
      cat file1 | while read hba cont target lun; do
        #Remove next line's echo to actually create the claim rules. VERIFY the output and compare with existing claim rules first!!!
        echo esxcli corestorage claimrule add --rule $count -t location -A $hba -C $cont -T $target -L $lun -P MASK_PATH
        #Creating cleanup script to facilitate removing the claim rules once no longer necessary
        echo esxcli corestorage claimrule delete --rule $count >> cleanupclaimrules.sh
        let "count = $count + 1"
      done
      counter=`wc -l file1 | awk '{print $1}'`
      let "count = $count + $counter"
    done
    echo esxcli corestorage claimrule load >> cleanupclaimrules.sh
    # Load claim rules and reclaim devices
    esxcli corestorage claimrule load
    cat naa.txt | while read naa; do
      esxcli corestorage claiming reclaim -d $naa ; done
    echo Refresh host storage in vCenter now.
    

    Your output will look like this:

    esxcli corestorage claimrule add --rule 111 -t location -A vmhba0 -C 0 -T 0 -L 117 -P MASK_PATH
    esxcli corestorage claimrule add --rule 112 -t location -A vmhba0 -C 0 -T 1 -L 117 -P MASK_PATH
    esxcli corestorage claimrule add --rule 113 -t location -A vmhba1 -C 0 -T 1 -L 117 -P MASK_PATH
    esxcli corestorage claimrule add --rule 114 -t location -A vmhba1 -C 0 -T 0 -L 117 -P MASK_PATH
    

    This will really rock.

    April 13, 2011

    PowerCLI to the rescue: “Help! Someone took my IP and I don’t know who!”

    Filed under: Computers and Internet — dferguson75 @ 1:54 pm

    Have no fear, PowerCLI is here!

    Ok, so you got a critical issue and find out that the VM can’t ping anything due to a duplicate IP on the network. Rather than shutting down VMs one-by-one (well, not exactly), you call on your best friend and superhero PowerCLI. PowerCLI reveals the offender and you put the rogue vagrant behind bars (sort of). So what’s the secret?

    A quick look at the decoder ring reveals the incredibly cryptic code as follows:

    get-cluster clustername | get-vm | Get-NetworkAdapter | where {$_.MacAddress -eq "00:50:56:xx:xx:xx"} | Select Parent,MacAddress
    

    The get-cluster portion is optional; it will help hunt down the offender faster unless the offender is on another cluster that has the same VLANs available.

    Just replace the MAC address in BOLD type with the MAC address of the offending VM and voila!

    Your output will look like the following (names and addresses obfuscated to protect the guilty):

    Parent                MacAddress
    ------                ----------
    ROGUEVM               00:50:56:xx:xx:xx
    
    Older Posts »

    Theme: WordPress Classic. Blog at WordPress.com.

    Follow

    Get every new post delivered to your Inbox.