surroundings
- JDK 1.8
- Intellij Idea 2018.1
- Hadoop 2.6.0 (Hadoop is not installed locally)
- maven 3.5.4
Create word count project
- Create a new maven java project in idea (configure maven jdk slightly)
Configure pom dependencies
- pom.xml file
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>commons-cli</groupId> <artifactId>commons-cli</artifactId> <version>1.2</version> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> </dependency> </dependencies>
-
Create mapper class
package com.lens.task; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; import java.util.StringTokenizer; /** * @author lens * @create 2020-02-25 10:24 */ public class VoteCountMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer words = new StringTokenizer(value.toString()); while (words.hasMoreTokens()) { word.set(words.nextToken()); context.write(word, one); } } }
-
Create reducer class
package com.lens.task; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; /** * @author lens * @create 2020-02-25 10:24 */ public class VoteCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int count = 0; for (IntWritable value : values) { count += value.get(); } result.set(count); context.write(key, result); } }
-
Create voteCount driver class
package com.lens.task; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; /** * @author lens * @create 2020-02-25 10:22 */ public class VoteCount extends Configured implements Tool { public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(),new VoteCount(),args); System.exit(res); } public int run(String[] args) throws Exception { if (args.length !=2){ System.out.println("Incorrect input, expected: [input] [output]"); System.exit(-1); } Configuration conf = this.getConf(); Job job = new Job(conf, "word count"); job.setJarByClass(VoteCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(VoteCountMapper.class); job.setReducerClass(VoteCountReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputValueClass(TextOutputFormat.class); job.setMapOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.submit(); return job.waitForCompletion(true) ? 0 : 1; } }
Note: Here the File Format needs to import the package under lib
5. Create wordcount file input directory input. The file characters are counted, and the count result is output.
First, you need to configure the input path. Here src
, create a new folder under the project (the directory at the same level) input
and add one or more text files to input
it (uploaded) as an example.
需要注意:File
-> Project Structure
, Select Modules
items in the pop-up dialog box , here the input
folder is marked asExcluded
.
Configure operating parameters
Here you need to configure the input input output output path required by the Main class and VoteCount when the program is running.
Select- Run
> in the Intellij menu bar Edit Configurations
and click in the dialog box that pops up to +
create a new Application
configuration. Configure Main class
as Vote Count (you can click on the ...
selection on the right ), that Program arguments
is input/ output/
, the input path is the created input
folder, and the output isoutput(可以不配)
run
After the configuration is completed, click the menu bar- Run
> Run 'VoteCount'
to start running the MapReduce program. After the program is completed, a folder will appear on the upper left output
, and part-r-00000
the result is the operation!
Input file
operation result