I’ve spent more time than usual the past two weeks talking with people, and listening to people, about Hadoop. I’ve been administering Hadoop clusters for (part of) a living for about 4-5 years now, and I’ve gotten pretty good at answering questions people don’t have, or want, answers for.
In the past week or so I’ve heard one vendor advocate that Hadoop gives you a free analytics environment with no need for expensive developers since it’s free software, and another vendor advocated that you can just virtualize Hadoop by putting lots of datanodes on a single host and save lots of money. Easy peasy, right?
I’m proposing we consider Mohs’ Law in this situation.
No, I’m not misspelling Moore’s Law, which tells us that compute power/efficiency will double every 24 months. I’m suggesting a law that’s more of a diamond in the rough, if you don’t mind.
Hadoop is hard.
It’s based on Friedrich Mohs developing a method of describing hardness of materials about 200 years ago. And it’s a great pun. But it’s also a reminder that “yum install” does not a production application make.
But Rob, I can get Hadoop in 15 minutes!
It is pretty easy to get started with Hadoop. It’s even free of charge to get started (or even to go into production) with the platform itself. I recommend it. Go do it now. I’ll wait.
For starters, go grab the Cloudera QuickStart VM or the Hortonworks Sandbox VM from their respective websites. Pull it into your desktop virtualization platform of choice. Look at the docs. Run some of the tests. At that point you’re farther along than most people who promote Hadoop.
But at that point you don’t have a functioning business intelligence/data warehouse/analytics application environment, any more than installing Ubuntu 13.04 into VirtualBox gives you a production e-commerce site.
There’s still a lot of work to be done. Some of it is difficult, but a fair bit of it is just downright hard. Understand what you want to do, what data you can pull into your environment. Figure out what your customers/users/analysts need out of the data. Make sure you can validate the output. Automate all your tests. Go back to your data sources and make sure you’re getting all the data. Go back to your end users and make sure you’re giving them what they want. Lather, rinse, repeat.
Rob’s Corollaries to Mohs’ Law
If you remember nothing else, think about an analytics environment the way you would a monitoring environment. I’ve supported both for almost a decade, and the take-home I’ll save you ten years on is this:
Make sure you’re measuring what you think you’re measuring.
Make sure you’re measuring what you need to be measuring.
This rule also applies to a lot of other technology… customer surveys, dating sites, and so forth. But it takes formidable effort to get these two corollaries right (without coronaries), and even if you do throw together something with Insta-analytics.com (probably not a real site, not meant as an endorsement), they won’t be able to tell you what you need or whether you’re getting it.
So where do we go from here?
First of all, if you’re interested in getting familiar with Hadoop, go grab a VM above and give it a try. Simulate Pi Indiana-style. Grab a book and try some of the stuff it suggests.
Then, go talk to the BI team in your company, or the analyst who does performance dashboards when she’s not writing code and designing employee event signage and chasing your kids out of the server closet, or whoever. Find out what they’re doing.
And finally, unless your vendor makes its livelihood supporting Hadoop, don’t take their take on Hadoop as gospel. Apocrypha maybe, mistranslation at worst, and probably not enough to go on.
Hey, I’m in Silicon Valley and want to learn more, what can I do?
Funny you should ask.
BayLISA is hosting a Hadoop meeting on Thursday, May 16, at Yahoo! in Sunnyvale. There’s a waiting list but it usually fades closer to the event. Come see Alan Gates of Hortonworks, Eric Sammer of Cloudera, and Ryan Orban of Nutanix talking about Hadoop innovations and how to get involved. (Disclaimer: I am president of BayLISA, but I don’t get any profit or direct benefit if people come to the meetups.)
There’s also a Hadoop User Group meetup on Wednesday, May 15, although it’s a bit more suited to advanced users who are already familiar with the technology. Their waitlist is also a fair bit longer. But check it out and see if it fits your needs.
If you’re not in Silicon Valley, check Meetup for local groups, or see if one of the Hadoop vendors has local meetings or events you can attend. If you find one, feel free to add it in the comments here so other people will know where to look.
I made it back from my first Interop expedition. I’m sure a lot of you are finding my blog as a result of meeting me at Interop — I owe a few of you an email to follow up on our conversations, and those will be going out next week. Feel free to initiate contact if you like, leave a comment, drop me an email, or catch me on Twitter.
I’d like to take a moment to thank Stephen Foskett and the Tech Field Day organization for faciliatating my Interop visit, as well as Spirent, NEC Networking, and Juniper Networks for sponsoring our activities and presence this week. I’d also like to thank Jennifer “JJ” Jessup, General Manager of Interop, for her help dealing with an interesting PR contact before the event, and Jamie Porter from the UBM/Interop PR team for helping to set up a couple of meetings with exhibitors while I was there.
I met a lot of interesting vendors, found some products and technologies to dig into more over the next couple of months, and managed to catch up on my email. There will be a couple more blog entries coming this month, but oddly one of the most impressive things I saw at Interop was that a couple of my babies were running in the core of the network.
From 1997 to 2000, I worked as the sysadmin for what used to be called Rapid City Communications. They brought out an Accelar line of routing switches with Gigabit Ethernet, got acquired by Bay networks, got acquired by Nortel Networks, and somewhere along the line converted to the Passport naming structure with the 8000 line of chassis switches. The picture to the right is the descendant of the 8606, which I probably built code for tens of thousands of times.
The Avaya fellow I spoke with came from Bay Networks… Avaya acquired Bay/Nortel’s Ethernet Routing Switch product line in 2009. Even if 10/100 with two 1GBE ports doesn’t seem that powerful anymore, and even if Nortel (and now Avaya) have had 10GBE for 12 years on the ERS8600 line, I still have a soft spot for the whole Accelar line, and always love seeing them turn up in places like the Tech Museum in San Jose and now Interop in Las Vegas.
Stay tuned for some further thoughts and experiments with WAN load balancing, Hadoop grumblings, and some interesting consumer tech that I’m expecting to try out in the foreseeable future. Thanks for dropping by.
VMware lab who? VMware lab in pocket-size format!
So in our last installment, I found out that I can upgrade my Shuttle SH67H ESXi servers to support Ivy Bridge processors. If you want to read more about that, feel free to visit my Compact VMware Server At Home post from Winter 2012, and my Upgrading my home VMware lab with Ivy Bridge post from Spring 2013.
The replacement boards came in from Shuttle, and they’ll be going back into the chassis. But as you may have seen at the end of the last post, I discovered the Intel Next Unit of Computing server line. The NUC line current includes three models.
- DC3217IYE - i3-3217U processor at 1.8 GHZ dual core with 3MB cache), dual HDMI, Gigabit Ethernet at $293 (pictured)
- DC3217BY - i3-3217U processor, single HDMI, single Thunderbolt, - no native Ethernet – at $323
- DCCP847DYE- Celeron 847 (1.1 GHZ dual core with 2MB L3 cache, dual HDMI, Gigabit Ethernet at $172
(Prices are estimated list from Intel’s site–probably cheaper by a buck or ten at Amazon, Fry’s, Central Computer, or your favorite retailer. Feel free to use my links and help me buy the next one. )
All three have three USB 2.0 ports outside (one front, two rear), as well as two USB headers inside, conceivably useful for a USB flash drive or flash reader. They also have an mSATA-capable Mini-PCIe slot as well as a short mini-PCIe slot suitable for a WiFi/Bluetooth card. And there are two DDR3 SODIMM slots, supporting a reported 16GB of RAM (the processor supports 32GB, but the board/kit do not mention this). They all include VT-x with EPT.
I don’t see the Celeron being that useful for virtualization labs, but these are rather multifunctional for a little 4″ square computer. Imagine putting a broadband modem (3G/4G/Wimax) inside for reasonably potent portable kiosk purposes (VESA mount kit included). A card reader and a DVD burner for saving and sharing (and even editing) photos. Intel’s WiDi wireless display technology is supported as well, if you have a suitable receiver. Or use it with a portable projector for presentations on the go (no more fiddling with display adapters for presentations at your meetings!).
But we’re talking about a VMware lab here.
And let me get this out of the way… this was one of the coolest features of the NUC.
That’s correct, the box has its own sound effects.
Let’s get this party started…
Those of you of my era and inclinations may remember when KITT’s brain was removed and placed in a non-vehicle form factor on the original Knight Rider tv series. When I got ready to RMA my Shuttle motherboards, I was thinking about this sort of effect for a couple of VMs on the in-service ESXi server that was about to have its board sent to southern California. And that’s what I had to do. I couldn’t quite miniaturize the server Orac-style, but that thought had crossed my mind as well.
So I picked up the DC327IYE unit at Fry’s, got an mSATA 64GB card (Crucial m4 64GB CT064M4SSD3) and a spare low profile USB flash drive (Patriot Autobahn 8GB (PSF8GLSABUSB)) at Central Computers, and took a Corsair 16GB DDR3 Kit (CMSO16GX3M2A1333C9) from my stock. Assembling it took only a few minutes and a jeweler’s screwdriver, and then I was ready to implement ESXi.
I backed up the VMs from the original system using vSphere Client, so that I could re-deploy them later to the NUC. Someday I’ll get Veeam or something better going to actively back up and replicate my VMs, but for the limited persistent use of my cluster (cacti and mediawiki VMs) this was sufficient.
One gotcha: Fixing the NUC network…
I originally tried reusing the 4GB usb drive my existing server was booting from, but it didn’t recognize the Ethernet interface. I installed a fresh 5.0u2 on a new flash drive, and still no luck. I found a post at tekhead.org that detailed slipstreaming the new driver into ESXi’s install ISO. I did so, installed again, and was up and running.
I did have to create a new datastore on the mSATA card — my original server had used a small Vertex 2 SSD from OCZ, which obviously wouldn’t work here. But I was able to upload my backed up OVF files and bring up the VMs very quickly.
And one warning I’ll bring up is that the unit does get a bit warm, and if you use a metal USB flash drive, it will get hot to the touch. My original ESXi lab box used a plastic-shelled USB drive, and I’m inclined to go back to that.
What’s next, Robert?
My next step is going to be bringing central storage back. There is a new HP MicroServer N54L on the market, but my N40L should be sufficient for now–put the 16GB upgrade in and load it up with drives. As those of you who saw my lab post last year know, it was running FreeNAS 8, but I’m thinking about cutting over to NexentaStor Community Edition.
I’ve taken the original Shuttle box and replaced a mid-tower PC with it for my primary desktop. I will probably set the other one up with a Linux of some sort.
And in a week or so I’ll grab a second NUC and build it out as a second cluster machine for the ESXi lab. All five of them are slated to go into my new EXPEDIT shelving thingie in the home office, and I’ll bring you the latest on these adventures as soon as they happen.
My most popular post on rsts11 has been my compact VMware server at home post. Thanks to Chris Wahl mentioning me on the VMware forums, and linking from his lab post, I see a dozen visits or more a day to that page.
Imitation is the sincerest form of laziness^wflattery
I have to admit that I’ve been a follower in my use of intriguing lab environments. I got the vTARDIS idea from Simon Gallagher, and built a version of it at work at my last job on a Dell Core 2 Quad workstation under my desk. Then I saw Kendrick Coleman tweet about this new SH67H3 from Shuttle that supported 32GB of non-registered RAM… bought one and put 32GB and an i7-2600S processor into it, as mentioned in the “server at home” post mentioned above.
Now as you may know, the i7-2600 series processors are now a generation behind. Sandy Bridge gave way to Ivy Bridge (the i7-3×00 processors) which are currently easily found at retail. But… SH67H3 v1.0 motherboards don’t support Ivy Bridge. And that’s what was shipping when I bought mine in early 2012.
I found an unbelievable deal on a second SH67H3 open (missing) box at Fry’s in February 2013… let’s just say I spent more on a basic Pentium chip to test it with than I did on the chassis itself. But alas, the second one also had a v1.0 motherboard.
Let’s make the ivy (bridge) grow!
I found sources on the Internets that said a v2.0 board supporting Ivy Bridge was out. I further discovered that Shuttle would trade in your v1.0 board for a v2.0 board for $40. Instructions here at Cinlor Tech’s blog if you’re interested in doing this yourself. Note that you can request the case number through Shuttle’s web-email portal if you prefer this to calling. That’s what I did.
I shipped off my two boards in a medium Priority Mail box to Shuttle on the 26th. On the 29th I got confirmation of the return shipment. They should be at my door on April 2nd. I’ll be reinstalling them, and at some point upgrading to the i7-3770s processors on both.
Waitasec, 2600 “S”? 3770 “S”? What’s this all about, then?
Yes, that’s correct. I chose to go with a low power version of the i7-2600 processor a year and change ago. The i7-2600s has a lower base speed than the 2600 or 2600k (unlocked version), 2.8ghz vs 3.4ghz. All three support turbo boost to 3.8ghz though. And the i7-2600s is 65W where the others are 95W.
(Here’s a comparison chart of the three i7-2600 and three i7-3770 processor options via Intel, if you’re curious.)
Other noteworthy differences are on the 2600k, which costs $20 more, but does not support VT-d (directed I/O), vPro management features, or Trusted Execution. VT-d is the only feature of particular concern when you’re building your virtualization lab though. (I’ll admit the VT-d was an accidental discovery–I chose the 2600s more for power savings than anything else). If you’re building a desktop, the “K” model has HD3000 graphics vs HD2000 for the other two chips, by the way.
Now that I’m building a second box, I find that my usual local retail sources don’t have the i7-2600s in stock anymore. I could order one on eBay or maybe find it at Fry’s, but for about the same price I could get the Ivy Bridge version and be slightly future-proofed. Once again, the “S” is the way to go.
The 3770 series run at 3.1ghz (“S”), 3.4ghz (3770), and 3.5ghz (“K”) base speeds, all turbo capable to 3.9ghz. The “S” processor is 65W again, vs only 77W for the other two chips. They all have Intel’s HD4000 integrated graphics and the newer PCIe 3.0 support. They support 1600mhz RAM speeds, vs 1333 top for the previous generation. The “K” processor lacks VT-d, vPro, and Trusted Execution, but does have a nearly $40 premium over the other two chips.
All six of these chips have VT-x including extended page tables (EPT/SLAT), hyperthreading, and enhanced SpeedStep. And they’re all 4 core/8 thread/32gb RAM capable processors that make a great basis for a virtualization environment.
So what’s next, Robert?
Well, with two matching machines, I’ll be basically starting from scratch. Time to upgrade the N40L Microserver NAS box to 16GB (Thanks Chris for finding this too!) and probably splitting off a distinct physical storage network for that purpose.
But now, thanks to Marco Broeken’s recent lab rebuild, I’ve been introduced to Intel’s Next Unit of Computing (NUC), so tune in soon for my experience with my first NUC system. Sneak peek of the ESXi splash screen and the actual unit here… stay tuned!
I was just looking up some Juniper gear I saw in a local auction… and discovered that the wheels of progress are indeed rolling along.
According to the Hardware EOS Milestone page, the NetScreen 5XT and 5GT, cute little firewall/vpn boxes that seem to be all over the place, reach their end of support life on June 30th and December 31st, 2013, respectively. Considering they were announced as EOL about 5 years ago, this isn’t a big surprise.
I was a bit concerned when the same page reported that the replacement products, the SSG-5 and SSG-20, had their EOL announced in December 2011, and their “Last Date to Convert Warranty” and “Same Day Support Discontinued” date is April 29th of this year (4 weeks away). But it looks like this only applies to the Japan, Korea, and Taiwan versions. Whew.
However, some further digging… and I see ScreenOS is on its own End Of Life path… 6.1 is gone, 6.2 has through the end of 2013, and 6.3 is gone at the end of 2015.
I actually use an SSG-20 with the ADSL2+ PIM for my store’s Internet connection… and while it’s not under warranty and I don’t expect to need support, this did make me wonder what I should consider for my next CPE need.
I’d be tempted to put together an SRX240 with DOCSIS and ADSL2+, but best price I can imagine for that is $2k or so, which is more than I want to spend on this project. So maybe I’ll drive the SSG-20 into the ground, and deal with the problem when it arises. There’s always a spare ADSL2+ modem in the cabinet just in case…
Why so blue, panda bear?
I’m not all that sad, to be honest. But I have a habit of going with old technology until it no longer does what I need. Or until it’s cheaper to replace than to maintain, which can be the same thing.
Heck, I have actually installed Windows XP in the past month… and it stops getting updates any day now. And I’m used to far worse support prognoses–I’m looking at you, Cisco Linksys, with the “it’s a year old? Oh, no updates for you!” policies on a lot of your home network gear (wouldn’t be so bad if it was stuff that can run DD-WRT or OpenWRT… but RV042 and the like aren’t a fit there).
Anyway, this gear has had a good run, in the market and in my own environment. So I’ll keep an eye out for new and better gear within a minimal budget, and see where the world takes my networks.