Both the “briefcases” and the “backpacks” were out in force at the Hadoop Summit at the San Jose Convention Center June 26–27. Marketing types (the “briefcases”) and data scientists and hackers (the “backpacks”) showed up to discuss and show off all things Hadoop. The exhibition hall was a mix of established veterans, such as Salesforce, Yahoo, and Dell, as well as startups of varying degrees of maturity.
Like many conferences focused on technology—and especially with enterprise-class server technology like Hadoop—nearly everyone was showcasing ways to make things easier to use and more accessible.
Startup Qubole was there showing off their SaaS product that’s reminiscent of Amazon’s Elastic Map Reduce (EMR). If you don’t want to own any computing infrastructure, yet you still want to run MapReduce jobs on a Hadoop cluster, then you can use Qubole’s convenient GUI interface to access their managed Hadoop cluster (which, by the way, runs on Amazon’s EC2). Their GUI interface includes some nascent web-based tools for editing code and data management and goes well beyond what EMR offers in some cases. The pricing model is aggressive.
On the other hand, if you want to run Hadoop in your own server room, Dell’s service group will gladly fill it with their servers as well as install and support Hadoop via their partnership with Cloudera.
If you just want Hadoop software and support, then Hortonworks, MapR, and Cloudera were there to help out.
Almost one hundred presentations were delivered over the two-day event, ranging from case studies like Sears/MetaScale’s “Move to Hadoop, Go Faster, and Save Millions – Mainframe Legacy Modernization” to practical nuts-and-bolts pieces like Yahoo’s “Running YARN at Scale.”
The forthcoming Hadoop 2.0 release is an important overhaul of Hadoop, and there was quite a bit of buzz about it at the Summit. It’s already in use on several large clusters, and it’s expected to be production-ready before the end of 2013. There are many improvements in Hadoop 2.0 that strengthen its robustness, but the most interesting feature is Yet Another Resource Negotiator (YARN), which, among other things, lets other analysis engines run alongside MapReduce. This will let businesses get a lot more use out of a cluster, because they’ll be able to run other types of analyses on their clusters (besides MapReduce) without having to switch to a system other than Hadoop.
To make things easier for the more typical enterprise end user who’s not a data scientist or itinerant hacker, there were plenty of announcements and talks about various ways to use Structured Query Language (SQL) with or without Hadoop. SQL shows no signs of wear despite being invented in the 1970s. Can you think of any other 1970s or 1980s computing language that is still in such widespread use and has an upward growth curve? With so many people who know how to analyze and mine data via SQL, it is sure to have a long shelf life. The SQL-related talks at Hadoop Summit focused on future improvements to Hive (Hadoop’s SQL-like interface) and various initiatives, like Tez, that offer faster ways to execute SQL without using MapReduce. For more background on SQL at the Summit, see my ARS Technica article.
Plenty of business intelligence (BI) vendors, like IBM, SAS, and Teradata, were also in the room to let you know that they can make use of Hadoop on the backend and then turn your results into some attractive data visualization. This is a big deal for many enterprise customers who already have these licenses and want to leverage Hadoop.
Hadoop Summit attendance has grown 10-fold since the first event in 2006, and it appears that the technology is poised for significant growth. If you’re a Hadoop aficionado or vendor, Hadoop Summit is worth putting on your calendar for 2014.
Jason Levitt is a writer and consultant with 25 years of IT experience in roles ranging from technology evangelist to software programming and development. He is currently a business development manager for Spirit.io as well as a Hadoop trainer for Global Knowledge.