pig将多对象按相同属性集合分组

--对event和clicks分别取出分组字段,整体属性字段包装起来。
events = foreach events generate opxpid, client_id, TOTUPLE(*) as actual;
clicks = foreach clicks generate opxpid, client_id, TOTUPLE(*) as actual;
--合并
cstream = union events, clicks;
--分组
grpd = group cstream by (opxpid, client_id) parallel 18;
--取出分组后的数据流
strmi = foreach grpd generate FLATTEN(cstream.actual);
strmi = foreach strmi generate FLATTEN(actual);

猜你喜欢

转载自schooltop.iteye.com/blog/2109039