mapreduce学习笔记2——获取成绩中最高分

文章目录

需求说明

有一个样例文件 subject_score，即成绩表 A。文件中的每一行数据包含两个字段：科目和分数。要求获取成绩列表中每个科目成绩最高的记录，并将结果输出到最高成绩表 B。

数据准备

语文 96
数学 102
英语 130
物理 19
化学 44
生物 44
语文 109
数学 118
英语 141
物理 72
化学 21
生物 7

完整数据：https://download.csdn.net/download/weixin_44018458/14953973

实现思路

在 Mapper 类中，map 函数读取成绩表 A 中的数据，直接将读取的数据以空格分隔，组成键值对<科目，成绩>，即设置输出键值对类型为<Text,IntWritable>。
在 Reduce 中，由于 map 函数输出键值对类型是<Text,IntWritable>，所以 Reducer 接收的键值对是<Text,Iterable<IntWritable>>。针对相同的键（即科目），遍历比较它的值（即成绩），找出最高值（即最高成绩），最后输出键值对<科目，最高成绩>。

编写代码

FindMax.java

package test;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


import java.io.IOException;

public class FindMax {
    
    
    public static class FindMaxMapper extends Mapper<LongWritable, Text,Text,IntWritable>{
    
    
        Text course = new Text();
        IntWritable score = new IntWritable();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    
    
            String [] values = value.toString().trim().split(" ");
            course.set(values[0]);
            score.set(Integer.parseInt(values[1]));
            context.write(course,score);

        }
    }
    public static class FindMaxReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
    
    
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    
    
            int maxScore = -1;
            Text course = new Text();
            for(IntWritable score:values){
    
    
                if (score.get()>maxScore){
    
    
                    maxScore = score.get();
                    course = key;
                }
            }
            context.write(course,new IntWritable(maxScore));
        }
    }
    public static void main(String [] args) throws Exception{
    
    
        if (args.length != 2){
    
    
            System.out.println("FindMax <input> <output>");
            System.exit(-1);
        }

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf,"findmax");
        job.setJarByClass(FindMax.class);
        job.setMapperClass(FindMaxMapper.class);
        job.setReducerClass(FindMaxReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setNumReduceTasks(1);
        FileInputFormat.addInputPath(job,new Path(args[0]));
        FileSystem.get(conf).delete(new Path(args[1]),true);
        FileOutputFormat.setOutputPath(job,new Path(args[1]));
        System.out.println(job.waitForCompletion(true) ? 0 : 1);

    }
}

集群运行

将 MapReduce 程序提交给 Yarn 集群，分发到很多的节点上并发执行
处理的数据和输出结果应该位于 HDFS 文件系统
提交集群的实现步骤：装程序打包成 JAR 包，并上传，然后在集群上用 hadoop 命令启动
```
hadoop jar findMax-1.0-SNAPSHOT.jar test.FindMax findmax findmax_out
```

实现结果

在这里插入图片描述

mapreduce学习笔记2——获取成绩中最高分

文章目录

需求说明

数据准备

实现思路

编写代码

集群运行

实现结果

猜你喜欢