This final segment of our interview with Brad Johnson, who manages Apache Hadoop certification at Cloudera University and recently completed the testing upgrades to CDH 4.1, covers student preparation.
Ron: While there aren’t any official prerequisites to taking a Cloudera exam, what would you recommend as a baseline of knowledge and expertise?
Brad: Well, I think if you have good experience as a Linux administrator, or if you have good Java programming experience, you know, you can move into Hadoop and do well quickly. That experience, I think, translates into the exams. I think you would be able to come up to speed. The people I see who come from those backgrounds come up to speed a lot quicker.
Ron: Yep, that’s good. So, students have this mysterious exam they have it in front of them. How does the student understand when to prepare to take that next step? It’s a pretty expensive proposition for students. There are lots of practice exams out there, I wonder what there is available for Cloudera?
Brad: Well, practice tests are on the road map, and now that I’ve gotten the CDH4.1 tests out the door, practice tests are coming close on their heels. We’ve been working on that for a while. By the time this interview goes live, the main things we have – we’ve posted our main test objectives, and we publish what’s covered on the tests, and we do publish sample questions so candidates can get a feel for the style and types of test items. But providing a full-fledged exam prep is certainly forthcoming, but obviously today, we’re not there yet.
Ron: I gotcha, in the pipeline. Okay, do you have any recommended reading lists or suggestions for students looking to certify beyond what’s available through Cloudera?
Brad: Yeah, I think Tom White’s [Hadoop:]The Definitive Guide remains central and key to anyone’s test prep. I find myself returning to that book during test reviews fairly consistently.
Outside of Tom’s book, Eric Sammer’s Hadoop Operations is a great resource, especially for the Administrator test. We have both Amandeep Khurana’s HBase in Action and Lars Georges’s HBase: The Definitive Guide are excellent for HBase prep. And, of course, to get back to that whole value proposition, you’ll notice all those people are Cloudera folks who are writing the books and also helping to review the tests. And so it’s extremely helpful to have that kind of talent in-house.
We also have a bunch of free video tutorials on Cloudera.com for people to come up to speed. But we also offer free VM’s of CDH and of course you can download the software for free. And so downloading the source, installing it, configuring it, checking out the web UI’s. Understanding what’s going on and how to configure cluster. There’s kind of unlimited amounts of resources around. There are free datasets out there to play with. There really are quite a few resources in the community that will get you up to speed. The thing that we’re missing at this point is really, “Okay, I’ve got my skills and knowledge, what do I expect from the test?” That’s where that whole test prep stuff comes in, and it’s incumbent upon us to get that test prep out.
Ron: We’re looking forward to that, Brad. I noticed how you didn’t put any dates here on your comments about when it’s coming out. We’ll look for it sometime soon in the near future.
Brad: We’re close to releasing them.. At least the first one this quarter, by the end of this year.
Here’s the hard thing about practice tests. With a practice test, you really get a chance to spend a lot of time with it, and hammer on it, and try and figure out ways to make it wrong. We’re taking a lot of time to say, “This is the answer, this is why this is the answer,” because practice tests can become a substantial topic for debate. I want to make sure we roll these things out properly so that we’re not creating more anxiety in our community. I want to be able to offer this as a resource and have it work well and help people prepare for a test and not create something that is contentious.…
Ron: I got you, like a fine line between like giving them the answers versus really, truly preparing them as an extension of their education.
Brad: Right, and so we want these practice tests to be a real learning experience. We want to provide references, some of which will be links back to The Definitive Guide or some of these books that we talked about before, to say if you want to read more about this, or why we’re saying this is the answer, here are some places to look for further study. And that just takes time.
Ron: Okay, I guess that’s the answer. Now we’re down to our very last question. What do you see on the horizon for Cloudera certification exams? Can you give us a little bit of a peek into the future?
Brad: I think you see the future with our just announced Data Science certification. This is going to be our first performance-based certification, which will require the candidate to pass a written test very similar to our current test. That’s a prequalifying test. And then you must successfully complete a hands-on, real-world data science challenge working on a live system. So the candidate will have to acquire data from disparate sources using a variety of techniques. They’ll have to clean the data, get it into a cluster, transform the data using any tools and languages they want. We’re not going to limit how they work. And produce a useable data product that solves a particular challenge.
So it’s hands-on with real data on a real system. There’s so many great data sets out there and so many great challenges.
Ron: So, Brad, will this Data Science certification be sort of an extension between – is there a hierarchy within these certifications or is it a complete standalone that sort of represents the BI part of it, the analytics part of it?
Brad: In the case of the Data Science, the certification is its own thing. On the training side, we’re now live with a very exciting class called Introduction to Data Science, Building Recommender Systems. In that class, students will go through all these common processes and workflows that building a data product requires and really come to understand a data science project lifecycle, and then build a recommendation engine around a movie rating system. And so the test will be
Ron: like Netflix.
Brad: Yes, something similar to that. And I think you’re going to see our certification follow a similar path in the sense of looking at real data and solving interesting problems. Data Science is, I think, a little unique in that it allows us to sort of set up a series of problems and say, come up with a solution for it. And we can evaluate that solution but also create a series a series of checks along the way to make sure a candidate understands a broad range of data challenges.
Ron: So looking forward to seeing that all come out into the public, and it was a privilege to get a sneak preview.