mahout学习资源

https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html
http://www.cnblogs.com/biyeymyhjob/archive/2012/07/18/2597711.html
file:///home/lixiaoming/tools/hadoop-1.0.4/docs/cn/index.html

欧式距离:用来算相似度,其实就是坐标系统任意两点之间的距离,距离越近,相似性越高.
http://www.blogjava.net/spec-second/archive/2008/08/17/222609.html

export MAHOUT_HOME=/home/lixiaoming/open-sources/mahout-distribution-0.7
export HADOOP_HOME=/home/lixiaoming/tools/hadoop-1.0.4

$HADOOP_HOME/bin/hadoop fs -mkdir testdata
$HADOOP_HOME/bin/hadoop fs -put /home/lixiaoming/synthetic_control.data testdata


$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job

EuclideanDistanceMeasure:欧式距离算法
org.apache.mahout.common.distance.CosineDistanceMeasure
余弦距离,计算文本时最合适

bin/hadoop fs -cat output/*

推荐

聚集
capnoy参数怎么设置?
InputDriver:
This class converts text files containing space-delimited floating point numbers into Mahout sequence files of VectorWritable suitable for input to the clustering jobs in particular, and any Mahout job requiring this input in general.
CanopyDriver:

猜你喜欢

转载自bjmike.iteye.com/blog/1878857