第一个python实现的mapreduce程序

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/weijianpeng2013_2015/article/details/71908340

map:

# !/usr/bin/env python

import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for word in words:
    print ("%s\t%s") % (word, 1)

reduce:

#!/usr/bin/env python
import operator
import sys
current_word = None
curent_count = 0
word = None
for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    try:
        count = int(count)
    except ValueError:
            continue
    if current_word == word:
        curent_count += count
    else:
        if current_word:
            print '%s\t%s' % (current_word,curent_count)
        current_word=word
        curent_count=count

if current_word==word:
    print '%s\t%s' % (current_word,curent_count)

测试:

[root@node1 input]# echo "foo foo quux labs foo bar zoo zoo hying" | /home/hadoop/input/max_map.py | sort | /home/hadoop/input/max_reduce.py

这里写图片描述

执行:可将其写入脚本文件

 //注意\-file之间一定不能空格
hadoop jar /hadoop64/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-*streaming*.jar -D stream.non.zero.exit.is.failure=false \-file /home/hadoop/input/max_map.py -mapper /home/hadoop/input/max_map.py \-file /home/hadoop/input/max_reduce.py  -reducer /home/hadoop/input/max_reduce.py \-input /input/temperature/ -output /output/temperature

这里写图片描述

猜你喜欢

转载自blog.csdn.net/weijianpeng2013_2015/article/details/71908340