To use the map reduce written word count program in eclipse generate jar package and run in a virtual machine

--- --- restore content begins

1. First prepare a required statistical word file word.txt, our words are separated by spaces, according to statistics when separated by a space

hello hadoop
hello yarn
hello zookeeper
hdfs hadoop
select from hadoop
select from yarn
mapReduce
MapReduce

2. Upload word.txt to the root directory hdfs

$ bin/hdfs dfs -put test/word.txt /

3. Upon completion of the preparatory work writing code in eclipse, write Map, Reduce, Driver and other Java files are

WordCountMap.java

map enforce our word.txt file is executed in rows, each row execute a map

WordCountMap.java

map enforce our word.txt file is executed in rows, each row execute a map

package com.ijeffrey.mapreduce.wordcount.client;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
/**
* map 输出的键值对必须和reducer输入的键值对类型一致
* @author PXY
*
*/
public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {

private Text keyout = new Text();
private IntWritable valueout = new IntWritable(1);

@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {

Line = value.toString String ();
// My word is a space-documented record of words, so here is a space to intercept
String [] words = line.split ( "");

// loop through the array, and to kv to form the output
for (String Word: words) {
keyout.set (Word);
context.write (KEYOUT, valueout);
}
}

}

WordCountReducer.java

package com.ijeffrey.mapreduce.wordcount.client;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

/**
* reducer 输入的键值对必须和map输出的键值对类型一致
* map <hello,1> <world,1> <hello,1> <apple,1> ....
* reduce 接收 <apple,[1]> <hello,[1,1]> <world,[1]>
* @author PXY
*
*/
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable valueout = new IntWritable();

@Override
protected void reduce(Text key, Iterable<IntWritable> values,
The Reducer <the Text, IntWritable, the Text, IntWritable> .context context) throws IOException, InterruptedException {
int COUNT = 0; // count the total number of

// iterate, cumulative summation
for (IntWritable value: values) {

// IntWritable type and not int type addition, the need to use a method of converting an int get
COUNT + = value.get ();
}

// statistical results converted into IntWritable
valueout.set (COUNT);

// to reduce the final output of the final kv of
context.write (Key, valueout);

}
}

WordCountDriver.java

package com.ijeffrey.mapreduce.wordcount.client;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
* 运行主函数
* @author PXY
*
*/
public class WordCountDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();

// get a job object, used to complete a job mapreduce
the Job job = Job.getInstance (the conf);

// find the program to the main entrance
job.setJarByClass (WordCountDriver.class);

directory // specified input data, calculation data designated after completion of the directory output
// sbin / Share Yarn JAR / Hadoop / xxxxxxx.jar WordCount / WordCount / INPUT / / WordCount / output /
FileInputFormat.addInputPath (Job, the Path new new (args [0]));
FileOutputFormat.setOutputPath (Job , new new Path (args [1]));

// told me to call methods that map and reduce methods
job.setMapperClass (WordCountMap.class);
job.setReducerClass (WordCountReducer.class);

// specify the type of map output key-value pairs
job.setMapOutputKeyClass (Text.class);
job.setMapOutputValueClass (IntWritable.class);

// the specified key type to reduce the output of
job.setOutputKeyClass (Text.class);
job.setOutputValueClass(IntWritable.class);

// 提交job任务
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);

}
}

}

4. The write completion code labeled jar package and runs on a cluster

The jar uploaded to our own writing MapReduce to run after the server, start the service, word.txt and operating results in the root directory of written output statistics

$ bin/yarn jar test/wordCount.jar com.ijeffrey.mapreduce.wordcount.client.WordCountDriver /word.txt /output

Note: Run Driver jar when you want to add the full path

After the run is completed view the output results:

$ bin/hdfs dfs -text /output12/part-r-00000

 

Guess you like

Origin www.cnblogs.com/huashengweilong/p/10924916.html