玄学hadoop(三)之MapWritable的使用

  刚接触hadoop不久,最近需要用到MapWritable,但直接使用后,在part-r-00000中的输出结果类似为(以下结果摘自https://blog.csdn.net/jiyuanyi1992/article/details/37739413 ,因自己的代码已修改+覆盖):

key1      org.apache.hadoop.io.MapWritable@396cbd97   
key2      org.apache.hadoop.io.MapWritable@17991de1   
key3      org.apache.hadoop.io.MapWritable@18f63055

  究其原因还是MapWritable默认的toString函数无法识别我们自己代码中的Map对,因此只需新建一个java类,继承MapWritable,并重载其toString()函数即可,以我自己的代码作为示例,我的MapWritable中存储的分别是Text和IntWritable两种数据类型,因此新建的MapWritable编码如下:

package mySimijoin;

import java.util.Set;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Writable;

public class myMapWritable extends MapWritable{

    @Override  
    public String toString(){  
        String s = new String("{ ");  
        //String s = "";
        Set<Writable> keys = this.keySet();  
        for (Writable key : keys) {  
            IntWritable count = (IntWritable) this.get(key);  
            s = s + key.toString() + " " + count.toString() ;  
        }  
        s = s + " }";  
        return s;  
    }   
}

  中间那部分是用于遍历MapWritable的,我们知道已知key值,可以通过MapWritable.get(key)来获取value值,那么若两眼一抹黑,在什么都不知道的情况下,就可以通过先用keySet()函数得到一个key的集合,然后遍历这个集合即可获取到每个key对应的value,这样也就实现了MapWritable的遍历。
  上述就能解决MapWritable的输出问题,那么现在既然提到了MapWritable,就顺便将其存取问题再细化一些。
  MapWritable中不止可以存储单独的Writable类型数据,还能存放MapWritable类型,似乎只要是继承自Writable的都可以,如下所示:

        MapWritable mapWritable = new MapWritable();
        MapWritable mapWritable1 = new MapWritable();
        MapWritable mapWritable2 = new MapWritable();

        Text text1 = new Text("hello");
        IntWritable intWritable1 = new IntWritable(1);
        Text text2 = new Text("hi");
        IntWritable intWritable2 = new IntWritable(11);

        mapWritable1.put(text1, intWritable1);
        mapWritable2.put(text2, intWritable2);
        mapWritable.put(mapWritable1, mapWritable2);

  遍历方法类似,可以通过两次keySet函数实现,如我的reduce函数如下:

public static class myReducer extends Reducer<Text, myMapWritable, Text, myMapWritable> {
        public void reduce(Text key, Iterable<myMapWritable> values, Context context)
                throws IOException, InterruptedException {
            myMapWritable tmp = new myMapWritable();
            for (myMapWritable val : values) {
                for (Writable valkey : val.keySet()) {
                    IntWritable intWritable = (IntWritable) val.get(valkey);
                    tmp.put(new Text(valkey.toString()), intWritable);
                }
            }
            context.write(key, tmp);
        }
    }

  当然,toString可能也需要根据自己的需求进行重载。

猜你喜欢

转载自blog.csdn.net/u013700358/article/details/80786263