hadoop系列--mapreduce: 用户画像

根据以下条件, 求出用户的近似画像(年龄段, 性别)的概率:

  1. app 编号文件: 103 KB
  2. 用户使用app的数据文件: 50MB

app编号文件:【app_id : 0】

10013|Kakao Talk|0.001|0.001|0|0.2|0.3|0.2|0.3
10014|Whatsapp|0.001|0.001|0|0.2|0.3|0.2|0.3
10015|比邻|0.001|0.001|0|0.2|0.3|0.2|0.3
20016|新浪读书|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20017|潇湘书院|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20018|红袖添香|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20019|纵横中文网|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20020|掌上书院|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20021|和阅读|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20022|掌阅iReader|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20023|QQ阅读|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20024|百阅|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20025|塔读小说|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20026|Flipboard|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20027|zaker|0.001|0.001|0.1|0.3|0.3|0.2|0.1
....

用户使用app的记录文件[  user_id=0,   user_playTime=11, user_onlineTime=12,  user_appid=15 ]

/hSE2vKA9QLK25kCSpd/eg==|1|100.88.255.72|100.88.190.158|2152|2152|0|166987602|18332|102309528|cmnet.mnc000.mcc460.gprs|1471020968058|0|1|05|050053|0|2|10.167.98.215||56908|0|218.203.111.17||80|269|0|2|vpic.video.qq.com|/22499741/q0184zboviu.png||Dalvik/2.1.0 (Linux; U; Android 6.0.1; SM-A9000 Build/MMB29M)||||0|0||||2|0|255|4002|40020078|3000-4000|||412|119.52556|45.69899|951||1||
...

解题思路:

  1. 一个用户可能-----> 使用多个app(多条app使用记录)
  2. job1: map-reduce----->求出: 

猜你喜欢

转载自blog.csdn.net/eyeofeagle/article/details/81368776