读书人

Hadoop学习札记

发布时间: 2012-06-29 15:48:47 作者: rapoo

Hadoop学习笔记

Hadoop介绍

Hadoop是Google云计算框架的开源实现,是一个分布式存储和分布式计算的框架,主要包括HDFS和MapReduce的实现。

HDFS

HDFS由一个NameNode和多个DataNode组成,其中NameNode相当于系统的元数据存放地,它是Hadoop系统的神经中枢,而多个DataNode存储数据。

MapReduce:分布式计算

一个调用客户端由一个JobTracker代表,它将一个任务划分为多个子任务,每个子任务分别由一个TaskTracker负责。TaskTracker和DataNode在一起,本地数据本地计算。


Hadoop的子项目

Avro?

A data serialization system.


Cassandra?:

A scalable multi-master database with no single points of failure.

Chukwa?:

A data collection system for managing large distributed systems.

HBase?:

A scalable, distributed database that supports structured data storage for large tables.

Hive?:

A data warehouse infrastructure that provides data summarization and ad hoc querying.

Mahout?:

A Scalable machine learning and data mining library.

Pig?:

A high-level data-flow language and execution framework for parallel computation.

ZooKeeper?:

A high-performance coordination service for distributed applications.
呵呵,只是入门。不建议周末学习啊,出去走走更好!

读书人网 >开源软件

热点推荐