读书人

hadoop 自定义inputformat跟outputfor

发布时间: 2013-02-19 11:11:41 作者: rapoo

hadoop 自定义inputformat和outputformat

?

hadoop的inputformat和outputformat

?

最好的例子vertica :虽然是在pig中实现的udf,但是就是hadoop的inputformat和outputformat,在hive里也可以照用,贴个下载的地址:http://blackproof.iteye.com/blog/1791995

?

再贴一个项目中,在实现hadoop join时,用的inputformat和outputformat的简单实例:

hadoop join在http://blackproof.iteye.com/blog/1757530

? ?自定义inputformat(泛型是maper的input)

public class MyOutputFormat extends FileOutputFormat<Text, Employee> {@Overridepublic RecordWriter<Text, Employee> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {// TODO Auto-generated method stubConfiguration conf = job.getConfiguration();Path file = getDefaultWorkFile(job, "");FileSystem fs = file.getFileSystem(conf);FSDataOutputStream fileOut = fs.create(file, false);return new MyRecordWriter(fileOut);}public static class MyRecordWriter extends RecordWriter<Text, Employee>{protected DataOutputStream out;private final byte[] keyValueSeparator; public static final String NEW_LINE = System.getProperty("line.separator");public MyRecordWriter(DataOutputStream out){this(out,":");}public MyRecordWriter(DataOutputStream out,String keyValueSeparator){this.out = out;this.keyValueSeparator = keyValueSeparator.getBytes();}@Overridepublic void write(Text key, Employee value) throws IOException,InterruptedException {if(key!=null){out.write(key.toString().getBytes());out.write(keyValueSeparator);}out.write(value.toString().getBytes());out.write(NEW_LINE.getBytes());}@Overridepublic void close(TaskAttemptContext context) throws IOException,InterruptedException {out.close();}}}

?

读书人网 >软件架构设计

热点推荐