欢迎来到相识电子书!

标签:MapReduce

  • Hadoop权威指南(中文版)

    作者:(美) Tom White

    本书是您纵情享用数据之美的得力助手。作为处理海量数据集的理想工具,Apache Hadoop架构是MapReduce算法的一种开源应用,是Google(谷歌)开创其帝国的重要基石。本书内容丰富,展示了如何使用Hadoop构建可靠、可伸缩的分布式系统,程序员可从中探索如何分析海量数据集,管理员可以了解如何建立与运行Hadoop集群。. 本书完全通过案例学习来展示如何用Hadoop解决特殊问题,它将帮助您: 使用Hadoop分布式文件系统(HDFS)来存储海量数据集,通过MapReduce对这些数据集运行分布式计算.. 熟悉Hadoop的数据和I/O构件,用于压缩、数据集成、序列化和持久处理 洞悉编写MapReduce实际应用程序时常见陷阱和高级特性 设计、构建和管理专用的Hadoop集群或在云上运行Hadoop 使用Pig这种高级的查询语言来处理大规模数据 利用HBase这个Hadoop数据库来处理结构化和半结构化数据 学习Zookeeper,这是一个用于构建分布式系统的协作原语工具箱 如果您拥有海量数据,无论是GB级还是PB级,Hadoop都是完美的选择。本书是这方面最全面的参考。
  • Hadoop: The Definitive Guide

    作者:Tom White

    Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The book includes case studies that illustrate how Hadoop solves specific problems. Organizations large and small are adopting Apache Hadoop to deal with huge application datasets. Hadoop: The Definitive Guide provides you with the key for unlocking the wealth this data holds. Hadoop is ideal for storing and processing massive amounts of data, but until now, information on this open-source project has been lacking -- especially with regard to best practices. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. With case studies that illustrate how Hadoop solves specific problems, this book helps you: * Learn the Hadoop Distributed File System (HDFS), including ways to use its many APIs to transfer data * Write distributed computations with MapReduce, Hadoop's most vital component * Become familiar with Hadoop's data and IO building blocks for compression, data integrity, serialization, and persistence * Learn the common pitfalls and advanced features for writing real-world MapReduce programs * Design, build, and administer a dedicated Hadoop cluster * Use HBase, Hadoop's database for structured and semi-structured data And more. Hadoop: The Definitive Guide is still in progress, but you can get started on this technology with the Rough Cuts edition, which lets you read the book online or download it in PDF format as the manuscript evolves.