Hadoop学习笔记
Hadoop介绍
Hadoop是Google云计算框架的开源实现,是一个分布式存储和分布式计算的框架,主要包括HDFS和MapReduce的实现。
HDFS
HDFS由一个NameNode和多个DataNode组成,其中NameNode相当于系统的元数据存放地,它是Hadoop系统的神经中枢,而多个DataNode存储数据。
MapReduce:分布式计算
一个调用客户端由一个JobTracker代表,它将一个任务划分为多个子任务,每个子任务分别由一个TaskTracker负责。TaskTracker和DataNode在一起,本地数据本地计算。
Hadoop的子项目
Avro?
A data serialization system.
Cassandra?:
A scalable multi-master database with no single points of failure.
Chukwa?:
A data collection system for managing large distributed systems.
HBase?:
A scalable, distributed database that supports structured data storage for large tables.
Hive?:
A data warehouse infrastructure that provides data summarization and ad hoc querying.
Mahout?:
A Scalable machine learning and data mining library.
Pig?:
A high-level data-flow language and execution framework for parallel computation.
ZooKeeper?:
A high-performance coordination service for distributed applications.
呵呵,只是入门。不建议周末学习啊,出去走走更好!