Abstract
Learning how to program and develop for the Hadoop platform can lead to lucrative new career opportunities in Big Data. But like the problems it solves, the Hadoop framework can be quite complex and challenging. Join Global Knowledge instructor and Technology Consultant Rich Morrow as he leads you through some of the hurdles and pitfalls students encounter on the Hadoop learning path. Building a strong foundation, leveraging online resources, and focusing on the basics with professional training can help neophytes across the Hadoop finish line.
Sample
If I've learned one thing in two decades of IT, it's that the learning never ends.
As the leader of an independent consulting firm, one of the most important decisions I make throughout the year is choosing what technologies I and other consultants need to learn. If we can identify and quickly ramp up on technologies that really move the needle for our clients, then everyone wins. In the following pages, I'd like to walk you through the path that I took in identifying Hadoop as a "must have" skill that our clients will need, and how I quickly got ramped up on the technology.
Hadoop is a paradigm-shifting technology that lets you do things you could not do before - namely compile and analyze vast stores of data that your business has collected. "What would you want to analyze?" you may ask. How about customer click and/or buying patterns? How about buying recommendations? How about personalized ad targeting, or more efficient use of marketing dollars?
From a business perspective, Hadoop is often used to build deeper relationships with external customers, providing them with valuable features like recommendations, fraud detection, and social graph analysis. In-house, Hadoop is used for log analysis, data mining, image processing, extract-transform-load (ETL), network monitoring - anywhere you'd want to process gigabytes, terabytes, or petabytes of data.
Hadoop allows businesses to find answers to questions they didn't even know how to ask, providing insights into daily operations, driving new product ideas, or putting compelling recommendations and/or advertisements in front of consumers who are likely to buy.
The fact that Hadoop can do all the above is not the compelling argument for it's use. Other technologies have been around for a long, long while which can and do address everything we've listed so far. What makes Hadoop shine, however, is that it performs these tasks in minutes or hours, for little or no cost versus the days or weeks and substantial costs (licensing, product, specialized hardware) of previous solutions.
Hadoop does this by abstracting out all of the difficult work in analyzing large data sets, performing its work on commodity hardware, and scaling linearly. Add twice as many worker nodes, and your processing will generally complete 2 times faster. With datasets growing larger and larger, Hadoop has become the solitary solution businesses turn to when they need fast, reliable processing of large, growing data sets for little cost.
Because it needs only commodity hardware to operate, Hadoop also works incredibly well with public cloud infrastructure. Spin up a large cluster only when you need to, then turn everything off once the analysis is done. Some big success stories here are The New York Times using Hadoop to convert about 4 million entities to PDF in just under 36 hours, or the infamous story of Pete Warden using it to analyze 220 million Facebook profiles, in just under 11 hours for a total cost of $100. In the hands of a business-savvy technologist, Hadoop makes the impossible look trivial.