hadoop series of 12 cases --MapReduce packet sequencing and GroupingComparator use

GroupingComparator

In mapreduce hadoop programming model, the end of the map when the output processing is completed key-value pairs, the same key will only reduce the end to reduce the same function to execute. However, when using the java object is a key, how to determine the Java object is a key with it, this time you need GroupingComparator, compare the use of the method in this class, according to their needs, set the same key conditions to put the same reduce a processing method.
GroupingComparator packet (secondary sort)

Reduce phase data are grouped according to one or several fields.

Packet sequencing steps:

(1) custom class that inherits WritableComparator

(2) can be rewritten compare () method, according to their needs, provided the comparison operations, returns 0, then the two objects are set to the same key

ublic class GroupComparator extends WritableComparator{
WritableComparator:
    public GroupComparator() {
     //以使用hadoop中的GroupingComparator对其进行分组,先要定义一个类继承
        super(TextPair.class,true);
    }
 
    @Override
    public int compare(WritableComparable a, WritableComparable b) {  
    //根据自己的需求,设置比较业务,返回0,则表示两个对象是设置为相同的key
 
        TextPair t1 = (TextPair) a;
        TextPair t2 = (TextPair) b;                       
        return t1.getFirst().compareTo(t2.getFirst());
    }
}

Packet sequencing Case

data

order001, u001, millet 6,1999.9,2
order001, u001, Nescafe, 99.0,2
order001, u001, An Muxi, 250.0,2
order001, u001, classic Double Happiness, 200.0,4
order001, u001, waterproof laptop bag, 400.0 , 2
order002, U002, millet bracelet, 199.0,3
order002, U002, durian, 15.0,10
order002, U002, apples, 4.5,20
order002, U002, soap, 10.0,40

demand:

Each order is necessary to obtain the largest amount of turnover are three items of
nature: request packet TOPN

Realization of ideas:

1. Set a java object data stored order, to the OrderBean and its interfaces to achieve serializable interface and the comparator, is achieved WritableComparable <> interface rule data is then compared: the total amount of the first ratio if the same, another name than the product.
2 rewrite data distribution rules Partitioner, distributed under OrderId.
3map: reading the segmentation data field, data is encapsulated into a bean as a transmission key, key according to turnover than the size
4reduce: first N output to reduce each set of data in the process, but if you want to meet the requirement. OrderId required to be the same as a key, to reduce the same function to execute, if you want to achieve this requirement, it is necessary to rewrite the same GroupingComparator key determination rules.
5 Custom GroupingComparator, order id the same as a key.

Map
Here Insert Picture Description

Code shows:

Set OrderBean

Java object to a set of data stored order, to the OrderBean and its interfaces to achieve serializable interface and the comparator, is achieved WritableComparable <> interface rule data is then compared: the total amount of the first ratio if the same, then ratio product name.

package cn.edu360.mr.order.topn.grouping;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.io.Serializable;

import org.apache.hadoop.io.WritableComparable;

public class OrderBean implements WritableComparable<OrderBean>{

	private String orderId;
	private String userId;
	private String pdtName;
	private float price;
	private int number;
	private float amountFee;

	public void set(String orderId, String userId, String pdtName, float price, int number) {
		this.orderId = orderId;
		this.userId = userId;
		this.pdtName = pdtName;
		this.price = price;
		this.number = number;
		this.amountFee = price * number;
	}

	public String getOrderId() {
		return orderId;
	}

	public void setOrderId(String orderId) {
		this.orderId = orderId;
	}

	public String getUserId() {
		return userId;
	}

	public void setUserId(String userId) {
		this.userId = userId;
	}

	public String getPdtName() {
		return pdtName;
	}

	public void setPdtName(String pdtName) {
		this.pdtName = pdtName;
	}

	public float getPrice() {
		return price;
	}

	public void setPrice(float price) {
		this.price = price;
	}

	public int getNumber() {
		return number;
	}

	public void setNumber(int number) {
		this.number = number;
	}

	public float getAmountFee() {
		return amountFee;
	}

	public void setAmountFee(float amountFee) {
		this.amountFee = amountFee;
	}

	@Override
	public String toString() {

		return this.orderId + "," + this.userId + "," + this.pdtName + "," + this.price + "," + this.number + ","
				+ this.amountFee;
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeUTF(this.orderId);
		out.writeUTF(this.userId);
		out.writeUTF(this.pdtName);
		out.writeFloat(this.price);
		out.writeInt(this.number);

	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.orderId = in.readUTF();
		this.userId = in.readUTF();
		this.pdtName = in.readUTF();
		this.price = in.readFloat();
		this.number = in.readInt();
		this.amountFee = this.price * this.number;
	}

	// 比较规则:先比总金额,如果相同,再比商品名称
	@Override
	public int compareTo(OrderBean o) {
		
		return this.orderId.compareTo(o.getOrderId())==0?Float.compare(o.getAmountFee(), this.getAmountFee()):this.orderId.compareTo(o.getOrderId());
		
	}

}

package cn.edu360.mr.order.topn.grouping;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Partitioner;

public class OrderIdPartitioner extends Partitioner<OrderBean, NullWritable>{

	@Override
	public int getPartition(OrderBean key, NullWritable value, int numPartitions) {
		// 按照订单中的orderid来分发数据
		return (key.getOrderId().hashCode() & Integer.MAX_VALUE) % numPartitions;
	}

}

Custom Partitioner

Rewrite data distribution rules Partitioner, distributed under OrderId.

package cn.edu360.mr.order.topn.grouping;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Partitioner;

public class OrderIdPartitioner extends Partitioner<OrderBean, NullWritable>{

	@Override
	public int getPartition(OrderBean key, NullWritable value, int numPartitions) {
		// 按照订单中的orderid来分发数据
		return (key.getOrderId().hashCode() & Integer.MAX_VALUE) %  numPartitions;   //numPartitions就是reduce的数量
	}

}


MapReduce programs

map: reading the segmentation data field, data is encapsulated into a bean as a transmission key, key size according to turnover than
reduce: first N output to reduce each set of data in the process, but if you want to meet the requirement. OrderId required to be the same as a key, to reduce the same function to execute, if you want to achieve this requirement, it is necessary to rewrite the same GroupingComparator key determination rules.

package cn.edu360.mr.order.topn.grouping;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class OrderTopn {

	public static class OrderTopnMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable>{
		OrderBean orderBean = new OrderBean();
		NullWritable v = NullWritable.get();
		@Override
		protected void map(LongWritable key, Text value,
				Mapper<LongWritable, Text, OrderBean, NullWritable>.Context context)
				throws IOException, InterruptedException {
			
			String[] fields = value.toString().split(",");
			
			orderBean.set(fields[0], fields[1], fields[2], Float.parseFloat(fields[3]), Integer.parseInt(fields[4]));
			
			context.write(orderBean,v);
		}
		
		
	}
	
	
	public static class OrderTopnReducer extends Reducer< OrderBean, NullWritable,  OrderBean, NullWritable>{
		
		/**
		 * 虽然reduce方法中的参数key只有一个,但是只要迭代器迭代一次,key中的值就会变
		 */
		@Override
		protected void reduce(OrderBean key, Iterable<NullWritable> values,
				Reducer<OrderBean, NullWritable, OrderBean, NullWritable>.Context context)
				throws IOException, InterruptedException {
			int i=0;
			for (NullWritable v : values) {
				context.write(key, v);
				if(++i==3) return;
			}
			
		}
		
		
	}
	
	public static void main(String[] args) throws Exception {

		
		Configuration conf = new Configuration(); // 默认只加载core-default.xml core-site.xml
		conf.setInt("order.top.n", 2);
		
		Job job = Job.getInstance(conf);

		job.setJarByClass(OrderTopn.class);

		job.setMapperClass(OrderTopnMapper.class);
		job.setReducerClass(OrderTopnReducer.class);
		
		job.setPartitionerClass(OrderIdPartitioner.class);
		job.setGroupingComparatorClass(OrderIdGroupingComparator.class);
		
		job.setNumReduceTasks(2);

		job.setMapOutputKeyClass(OrderBean.class);
		job.setMapOutputValueClass(NullWritable.class);
		
		job.setOutputKeyClass(OrderBean.class);
		job.setOutputValueClass(NullWritable.class);

		FileInputFormat.setInputPaths(job, new Path("F:\\mrdata\\order\\input"));
		FileOutputFormat.setOutputPath(job, new Path("F:\\mrdata\\order\\out-3"));

		job.waitForCompletion(true);
	}
	
}

Custom GroupingComparator

Custom GroupingComparator, order id the same as a key.

package cn.edu360.mr.order.topn.grouping;

import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

public class OrderIdGroupingComparator extends WritableComparator{
	
	public OrderIdGroupingComparator() {   // 用于告知方法,需要对什么类进行操作。
		super(OrderBean.class,true);
	}
	
	@Override
	public int compare(WritableComparable a, WritableComparable b) {   //相同的OrderId认为是一个key。
		
		OrderBean o1 = (OrderBean) a;
		OrderBean o2 = (OrderBean) b;
		
		return o1.getOrderId().compareTo(o2.getOrderId());
	}
	
	
Published 44 original articles · won praise 0 · Views 867

Guess you like

Origin blog.csdn.net/heartless_killer/article/details/102643325