读书人

贝叶斯并行归类分析

发布时间: 2012-12-21 12:03:49 作者: rapoo

贝叶斯并行分类分析

1 贝叶斯训练器

所在包:Package org.apache.mahout.classifier.bayes实现机制

The implementation is divided up into three parts:

    The Trainer -- responsible for doing the counting of the words and the labels

    The Model -- responsible for holding the training data in a useful way

    The Classifier -- responsible for using the trainers output to determine the category of previously unseen documents

1训练器

The trainer is manifested in several classes:

    BayesDriver

    创建Hadoop贝叶斯作业,输出模型,这个类封装了4map/reduce类。

    common.BayesFeatureDriver

    common.BayesTfIdfDriver

    common.BayesWeightSummerDriver

    BayesThetaNormalizerDriver

训练器的输入是KeyValueTextInputFormat格式,第一个字符时类标签,剩余的是特征(单词),如下面的格式:

hockey puck stick goalie forward defenseman referee ice checking slapshot helmet football field football pigskin referee helmet turf tackle 

hockey 和football 是类标签,剩下的是特征。

2模型

读书人网 >编程

热点推荐