读书人

HDFS的JAVA接口API操作范例

发布时间: 2012-11-08 08:48:11 作者: rapoo

HDFS的JAVA接口API操作实例

20:55 2010-6-2运行环境:Hadoop.0.20.2CentOS 5.4?java version "1.6.0_20-ea"配置的是单机Hadoop环境先看下我的运行截图HDFS的JAVA接口API操作范例
主要参考这篇文章http://myjavanotebook.blogspot.com/2008/05/hadoop-file-system-tutorial.html
1.Copy a file from the local file system to HDFSThe srcFile variable needs to contain the full name (path + file name) of the file in the local file system.?The dstFile variable needs to contain the desired full name of the file in the Hadoop file system.

Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path srcPath?=?new?Path(srcFile);
??Path dstPath?=?new?Path(dstFile);
??hdfs.copyFromLocalFile(srcPath,?dstPath);



2.Create HDFS fileThe fileName variable contains the file name and path in the Hadoop file system.?The content of the file is the buff variable which is an array of bytes.

//byte[] buff - The content of the file

??Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path path?=?new?Path(fileName);
??FSDataOutputStream?outputStream?=?hdfs.create(path);
??outputStream.write(buff,?0,?buff.length);


3.Rename HDFS fileIn order to rename a file in Hadoop file system, we need the full name (path + name) of?the file we want to rename. The rename method returns true if the file was renamed, otherwise false.

Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path fromPath?=?new?Path(fromFileName);
??Path toPath?=?new?Path(toFileName);
??boolean?isRenamed?=?hdfs.rename(fromPath,?toPath);



4.Delete HDFS fileIn order to delete a file in Hadoop file system, we need the full name (path + name)?of the file we want to delete. The delete method returns true if the file was deleted, otherwise false.

Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path path?=?new?Path(fileName);
??boolean?isDeleted?=?hdfs.delete(path,?false);

Recursive?delete:
??Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path path?=?new?Path(fileName);
??boolean?isDeleted?=?hdfs.delete(path,?true);


???5.Get HDFS file last modification timeIn order to get the last modification time of a file in Hadoop file system,?we need the full name (path + name) of the file.

Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path path?=?new?Path(fileName);
??FileStatus fileStatus?=?hdfs.getFileStatus(path);
??long?modificationTime?=?fileStatus.getModificationTime


???6.Check if a file exists in HDFSIn order to check the existance of a file in Hadoop file system,?we need the full name (path + name) of the file we want to check.?The exists methods returns true if the file exists, otherwise false.

Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path path?=?new?Path(fileName);
??boolean?isExists?=?hdfs.exists(path);


???7.Get the locations of a file in the HDFS cluster?A file can exist on more than one node in the Hadoop file system cluster for two reasons:Based on the HDFS cluster configuration, Hadoop saves parts of files on different nodes in the cluster.Based on the HDFS cluster configuration, Hadoop saves more than one copy of each file on different nodes for redundancy (The default is three).?

Configuration?config?=?new?Configuration();
??FileSystem hdfs?=?FileSystem.get(config);
??Path path?=?new?Path(fileName);
??FileStatus fileStatus?=?hdfs.getFileStatus(path);

??BlockLocation[]?blkLocations?=?hdfs.getFileBlockLocations(path,?0,?fileStatus.getLen());

BlockLocation[]?blkLocations?=?hdfs.getFileBlockLocations(fileStatus,?0,?fileStatus.getLen());
?? ??//这个地方,作者写错了,需要把path改为fileStatus
??int?blkCount?=?blkLocations.length;
??for?(int?i=0;?i?<?blkCount;?i++)?{
????String[]?hosts?=?blkLocations[i].getHosts();
????// Do something with the block hosts
???}


8. Get a list of all the nodes host names in the HDFS cluster
??his method casts the FileSystem Object to a DistributedFileSystem Object.???This method will work only when Hadoop is configured as a cluster.???Running Hadoop on the local machine only, in a non cluster configuration will?? cause this method to throw an Exception.???

Configuration?config?=?new?Configuration();
??FileSystem fs?=?FileSystem.get(config);
??DistributedFileSystem hdfs?=?(DistributedFileSystem)?fs;
??DatanodeInfo[]?dataNodeStats?=?hdfs.getDataNodeStats();
??String[]?names?=?new?String[dataNodeStats.length];
??for?(int?i?=?0;?i?<?dataNodeStats.length;?i++)?{
??????names[i]?=?dataNodeStats[i].getHostName();
??}


????程序实例

/*
?*?
?* 演示操作HDFS的java接口
?*?
?* */


import?org.apache.hadoop.conf.*;
import?org.apache.hadoop.fs.*;
import?org.apache.hadoop.hdfs.*;
import?org.apache.hadoop.hdfs.protocol.*;
import?java.util.Date;

public?class?DFSOperater?{

????/**
???? * @param args
???? */
????public?static?void?main(String[]?args)?{

????????Configuration?conf?=?new?Configuration();
????????
????????try?{
????????????// Get a list of all the nodes host names in the HDFS cluster

????????????FileSystem fs?=?FileSystem.get(conf);
????????????DistributedFileSystem hdfs?=?(DistributedFileSystem)fs;
????????????DatanodeInfo[]?dataNodeStats?=?hdfs.getDataNodeStats();
????????????String[]?names?=?new?String[dataNodeStats.length];
????????????System.out.println("list of all the nodes in HDFS cluster:");?//print info

????????????for(int?i=0;?i?<?dataNodeStats.length;?i++){
????????????????names[i]?=?dataNodeStats[i].getHostName();
????????????????System.out.println(names[i]);?//print info

????????????}
????????????Path f?=?new?Path("/user/cluster/dfs.txt");
????????????
????????????//check if a file exists in HDFS

????????????boolean?isExists?=?fs.exists(f);
????????????System.out.println("The file exists? ["?+?isExists?+?"]");
????????????
????????????//if the file exist, delete it

????????????if(isExists){
?????????????????boolean?isDeleted?=?hdfs.delete(f,?false);//fase : not recursive

?????????????????if(isDeleted)System.out.println("now delete "?+?f.getName());?????????????????
????????????}
????????????
????????????//create and write

????????????System.out.println("create and write ["?+?f.getName()?+?"] to hdfs:");
????????????FSDataOutputStream os?=?fs.create(f,?true,?0);
????????????for(int?i=0;?i<10;?i++){
????????????????os.writeChars("test hdfs ");
????????????}
????????????os.writeChars("\n");
????????????os.close();
????????????
????????????//get the locations of a file in HDFS

????????????System.out.println("locations of file in HDFS:");
????????????FileStatus filestatus?=?fs.getFileStatus(f);
????????????BlockLocation[]?blkLocations?=?fs.getFileBlockLocations(filestatus,?0,filestatus.getLen());
????????????int?blkCount?=?blkLocations.length;
????????????for(int?i=0;?i?<?blkCount;?i++){
????????????????String[]?hosts?=?blkLocations[i].getHosts();
????????????????//Do sth with the block hosts

????????????????System.out.println(hosts);
????????????}
????????????
????????????//get HDFS file last modification time

????????????long?modificationTime?=?filestatus.getModificationTime();?// measured in milliseconds since the epoch

????????????Date?d?=?new?Date(modificationTime);
?????????System.out.println(d);
????????????//reading from HDFS

????????????System.out.println("read ["?+?f.getName()?+?"] from hdfs:");
???? FSDataInputStream dis?=?fs.open(f);
?????System.out.println(dis.readUTF());
???? dis.close();

????????}?catch?(Exception?e)?{
????????????// TODO: handle exception

????????????e.printStackTrace();
????????}
????????????????
????}

}



编译后拷贝到node1上面运行,杯具,不会用Eclipse插件

[cluster?/opt/hadoop/source]$cp?/opt/winxp/hadoop/dfs_operator.jar?.
[cluster?/opt/hadoop/source]$hadoop?jar dfs_operator.jar DFSOperater
list of all the nodes in HDFS cluster:
node1
The file?exists??[true]
now?delete?dfs.txt
create?and?write?[dfs.txt]?to hdfs:
locations of file in HDFS:
[Ljava.lang.String;@72ffb
Wed Jun 02 18:29:14 CST 2010
read?[dfs.txt]?from hdfs:
est hdfs test hdfs test hdfs test hdfs test hdfs test hdfs



运行成功!查看输出文件

[cluster?/opt/hadoop/source]$hadoop?fs?-cat dfs.txt
test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs

?


?

1 楼 qkshan 2011-08-29 能帮我把我的Gmail邮箱删掉吗,谢谢,好多垃圾邮件啊 2 楼 landyer 2011-09-04 qkshan 写道能帮我把我的Gmail邮箱删掉吗,谢谢,好多垃圾邮件啊
删除了,给你添麻烦了

读书人网 >编程

热点推荐