MapReduce案例之好友推荐

可能遇到的问题:

Cannot create directory /mr/fof/input. Name node is in safe mode.
解决方法:退出安全模式
bin/hadoop dfsadmin -safemode leave
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决方法:向hadoop-2.6.5/etc/hadoop/log4j.properties文件添加
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
---------------正文开始----------------------------------------------------------

需求分析:

在这里插入图片描述
将具有共同好友人数的TOP2推荐给用户.

数据集

tom hello hadoop cat
hadoop tom hive world
world hadoop hello hive
cat tom hive
mr hive hello
hive cat hadoop world hello mr
hello tom world hive mr

说明:第一条数据:tom hello hadoop cat
tom的好友有hello\hadoop\cat

数据特点

每一行是一条记录
每一条记录第一个名字是其本身
每一条记录第二个以后的名字是其好友
好友之间有可能认识

案例分析

推荐者与被推荐者一定有一个或者多个相同的好友
全局去寻找好友列表的中的两两关系
去除直接好友
统计两两关系中出现次数

tom hello hadoop cat
tom:hello 0   0表示直接好友
tom:hadoop 0 
tom:cat  0

hello:hadoop 1  表示间接好友，有共同好友
hello:cat  1
hadoop:cat  1

无论是直接好友关系，还是间接好友关系，编写key是注意顺序。保证顺序  无论是A和B还是B和A，都拼接为 A:B。目的是将他们排到一组中。
所以将以上的key调整（将来可以编写一个方法专门处理）：
hello:tom	 	0
hadoop:tom 	0
cat:tom 		0
hadoop:hello 1
cat:hello		1
cat:hadoop	1
Mapper任务便可以完成该操作

将全部文件遍历处理完毕，然后将所有相同的关系全部找出来，统计共同好友数，注意：如果这两个用户存在直接好友关系，则丢弃。（已经是直接好友，也就没有推荐的必要了。）

hadoop:hive 1
hadoop:hive 1
hadoop:hive 1
hadoop:hive 0
hadoop:hive 1
类似如上情况出现时，该组数据在reduceTask执行时丢弃（不让context.write()执行）
cat:hadoop	2
cat:hello	2

第一次MR之后可以得到如下中间数据:

cat:hadoop	2
cat:hello	2
cat:mr	1
cat:world	1
hadoop:hello	3
hadoop:mr	1
hive:tom	3
mr:tom	1
mr:world	2
tom:world	2

然后接着分析

cat:hadoop  2    可以给cat推荐hadoop，因为共同好友是2；同理也可以给hadoop推荐cat，共同好友为2
分别以两个用户为中心，可以将数据进行拆分后在输出：
Key    value
cat    hello,2
hadoop   cat,2

将全部的中间数组处理完后然后通过分组，得到如下数据，取其中一个举例：

cat:hadoop  2    可以给cat推荐hadoop，因为共同好友是2；同理也可以给hadoop推荐cat，共同好友为2
分别以两个用户为中心，可以将数据进行拆分后在输出：
Key    value
cat    hello,2
hadoop   cat,2

将全部的中间数组处理完后然后通过分组，得到如下数据，取其中一个举例：

key         value
hadoop     cat,2
hadoop     hello,3
hadoop     mr,1 
Map<String,Integer> map = ......;
map.put(“cat”,2);
map.put(“hello”,3);
map.put(“mr”,1);
List<Map.Entry<String,Integer>> list = ...

hadoop  hello,cat

然后进行排序，取top2。

所以我们需要编写两套MR程序，用MR1的数据结果，作为MR2的输入数据。

具体实现

目录结构:
在这里插入图片描述
MainClass

public class MainClass {
    public static void main(String[] args) throws Exception {
        //yarn jar fof1.jar com.bupt.fof.mr1.MainClass /input /output2
        if(args==null||args.length!=2){
            System.out.println("Usage:yarn jar fof1.jar com.bupt.fof.mr1.MainClass <inputPath> <outputPath>");
            System.exit(1);
        }
        //创建Configuration对象
        Configuration configuration = new Configuration(true);
        //设置本地运行
        configuration.set("mapreduce.framework.name","local");
        //创建Job对象
        Job job = Job.getInstance(configuration);
        //设置JarByClass
        job.setJarByClass(MainClass.class);
        //设置作业的名称
        job.setJobName("好友推荐MR1");
        //设置Mapper以及Mapper的输出key和value的类型
        job.setMapperClass(FofMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //设置Reducer类以及他的输出key和value的类型
        job.setReducerClass(FofReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //设置输入和输入路径
        FileInputFormat.addInputPath(job,new Path(args[0]));
        Path outputPath = new Path(args[1]);
        /*FileSystem fileSystem = FileSystem.get(configuration);
        if(fileSystem.exists(outputPath)){
            fileSystem.delete(outputPath,true);
        }*/
        if(outputPath.getFileSystem(configuration).exists(outputPath)){
            outputPath.getFileSystem(configuration).delete(outputPath,true);
        }
        FileOutputFormat.setOutputPath(job,outputPath);
        //提交作业
        job.waitForCompletion(true);

    }
}

FofMapper

public class FofMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
    private Text mkey = new Text();
    private IntWritable mval = new IntWritable();
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //tom hello hadoop cat
        String[] names = value.toString().split(" ");
        //先处理直接关系  tom--> hello hadoop cat
        for (int i = 1;i<names.length;i++){
            //names[0]+names[i]  保证顺序  无论是A和B还是B和A，都拼接为 A:B
            mkey.set(getFof(names[0],names[i]));
            mval.set(0);//0表示直接关系，1表示间接关系
            context.write(mkey,mval);
            //处理间接关系  hello-> hadoop cat    hadoop-> cat
            for(int j = i+1;j<names.length;j++){
                mkey.set(getFof(names[i],names[j]));
                mval.set(1);//0表示直接关系，1表示间接关系
                context.write(mkey,mval);
            }
        }

    }
    //保证顺序  无论是A和B还是B和A，都拼接为 A:B
    private String getFof(String s1,String s2){
        if(s1.compareTo(s2)<0){
            return s1+":"+s2;
        }
        return s2+":"+s1;
    }
}

FofReducer

public class FofReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable rval = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        //hello:hadoop 1
        //hello:hadoop 1
        //hello:hadoop 0
        //hello:hadoop 1
        boolean flag = true;
        int sum = 0;
        for (IntWritable val:values) {
            if(val.get()==0){
                flag = false;
            }
            sum += val.get();
        }
        if(flag){
            rval.set(sum);
            context.write(key,rval);
        }
    }
}

MainClass2

public class MainClass2 {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration(true);
        //设置本地运行
        conf.set("mapreduce.framework.name","local");
        Job job = Job.getInstance(conf);
        job.setJarByClass(MainClass2.class);

        job.setMapperClass(FofMapper2.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setReducerClass(FofReducer2.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        // 设置inputformat的具体实现，key是行中第一个\t之前的部分，如果没有\t，则整行是key，value是空
        job.setInputFormatClass(KeyValueTextInputFormat.class);

        FileInputFormat.addInputPath(job,new Path(args[0]));
        Path outputPath = new Path(args[1]);
        if(outputPath.getFileSystem(conf).exists(outputPath)){
            outputPath.getFileSystem(conf).delete(outputPath,true);
        }
        FileOutputFormat.setOutputPath(job,outputPath);

        job.waitForCompletion(true);
    }
}

FofMapper2

public class FofMapper2 extends Mapper<Text,Text,Text,Text> {
    @Override
    protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {
        //入：key->cat:hadoop    value->2
        String names[] = key.toString().split(":");
        String numString = value.toString();

        //出：key->cat  value->hadoop,2
        context.write(new Text(names[0]),new Text(names[1]+","+numString));
        //出：key->hadoop   value->cat,2
        context.write(new Text(names[1]),new Text(names[0]+","+numString));
    }
}

FofReducer2

public class FofReducer2 extends Reducer<Text,Text,Text,Text> {
    Map<String,Integer> contents = null;
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        /**hadoop     cat,2
         * hadoop     hello,3
         * hadoop     mr,1
         */
        contents = new HashMap<>();

        for (Text val:values) {
            String arrs[] = val.toString().split(",");
            contents.put(arrs[0],Integer.parseInt(arrs[1]));
        }
        //map:
        /**     cat    2
         *      hello  3
         *      mr     1
         */
        //list:
        /**     hello  3
         *      cat    2
         *      mr     1
         */
        List<Map.Entry<String,Integer>> list = new ArrayList<>();
        //遍历map集合
        for (Map.Entry<String,Integer> entry:contents.entrySet()) {
            int valNum = entry.getValue();
            boolean flag = false;
            //遍历list集合，添加到对应的位置
            for (int i =0;i<list.size();i++) {
                if(valNum>list.get(i).getValue()){
                    list.add(i,entry);
                    flag = true;
                    break;
                }
            }
            //比已有的都小，添加到最后
            if(!flag){
                list.add(entry);
            }
        }
        //获取推荐好友：top2
        String result ="";
        for (int i = 0;i<(list.size()>2?2:list.size());i++){
            result += list.get(i).getKey()+":"+list.get(i).getValue()+",";
        }
        //去掉最后一个","
        result = result.substring(0,result.length()-1);
        context.write(key,new Text(result));

    }
}

测试结果

在这里插入图片描述

吴成伟0122

发布了262 篇原创文章 · 获赞 491 · 访问量 33万+

私信关注