Hadoop is a open-source platform designed to efficiently query and analyze data distributed across large clusters. It's built around MapReduce, Google's algorithm for rapidly creating a distributed index of the Internet.Nearly 100 tested, ready-to-use techniques
Conceptual overview of Hadoop and MapReduce
Real problems, real solutions
Because it's especially effective for "Big Data" systems, many well-known companies use Hadoop, including Apple, eBay and LinkedIn. Yahoo and Facebook each claim to have the largest Hadoop implementation, with petabytes of data spread across thousands of machines.
The theory behind MapReduce is straight-forward: break down a large unit of work into small parts that execute in parallel across a cluster. It gets more complicated when you start applying Hadoop to problems like complex queries, statistical calculations, real-time financial transactions, and machine learning. You need tested, practical techniques you can rely on to get the job done.
Hadoop in Practice collects nearly 100 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you'll face, like querying big data using Pig or writing a log file loader. You'll explore each problem step by step, learning both how to build and deploy that specific solution along with the thinking that went into its design. As you work through the tasks, you'll find yourself growing more comfortable with Hadoop and at home in the world of big data.
This book assumes you've already started exploring Hadoop and want concrete advice on how to use it in production.
Unless otherwise noted above, most orders ship within 1 to 2 days. We will promptly notify you if there is a stock problem with any items on your order and provide you with an estimated delivery date. If you have a firm need by date, please provide such information in the comment section at checkout.
Page Count (est.): 511
Pub Date: 10/8/2012