MapReduce(2): How does Mapper work - 代码天地

MapReduce(2): How does Mapper work

其他 2019-05-29 23:01:33 阅读次数: 0

In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data into many logical unit, and each one will be processed by a RecordReader, who will generate input (K,V) pairs for Mapper. I used to be confused about how (K,V) pairs are generated, but actually it just breaks a 128M file into single lines (just an example), and each line is a (K,V) pair. A mapper process these pairs one by one untill the end of the file.

A user-defined mapper, takes input (K,V) pairs from RecordReader, generate new key/value pair set at the output side.Usually we call the new (K,V) pairs as 'immediate (K,V) pairs'. For example: in the post (Using MapReduce on Azure), we define a Mapper as following:

#!/usr/bin/env python
"""mapper.py"""

import sys

# input comes from STDIN (standard input)
for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()
    # split the line into words
    words = line.split()
    # increase counters
    for word in words:
        # write the results to STDOUT (standard output);
        # what we output here will be the input for the
        # Reduce step, i.e. the input for reducer.py
        #
        # tab-delimited; the trivial word count is 1
        print '%s\t%s' % (word, 1)

We can see, this mapper just breaks a line into words set, and ouput immediate (K,V) pairs, in which key is the word and value is 1.

A funny but intuitive illustration for this process is cutting a car into pieces:

猜你喜欢

转载自www.cnblogs.com/rhyswang/p/10946727.html

MapReduce(2): How does Mapper work

How Does Google Work

What is Bitcoin? How does it work?

MPLS Network: How Does It Work?

【AutoGPT】How does AutoGPT work?

How does LinkedIn's recommendation system work?

What Is an Internet Switch and How Does It Work?

Selenium - How does the Selenium WebDriver work?

How does tf.app.run() work?

UEFI boot: how does that actually work, then?

Plastic Sprayer Manufacturer - Sprayer: How Does It Work?

[转帖]How does a CPU work?

Linsheng-How Does An Air Compressor Work?

Kotlin Interface Default Implementation — How does it work ?

how does Array.prototype.slice.call() work?

Define class with itself as generic implementation. Why/how does this work?

How does 'route' layer , 'yolo' layer work in yolov3?

How does asp.net web api work?

DICOM 学习笔记之 How Does DICOM Work ?

How the heck does async/await work in Python 3.5?

how networks work first note(2)

[KB]How does the Xcelsius caching work in BOE 3.1 SP3

How Does a C Debugger Work? (GDB Ptrace/x86 example)

Why not inherit from List<T>? Define class with itself as generic implementation. Why/how does this work?

How does Chrome Extension crx Downloader work? ——— From crxdown.com

Know Web How to work?

How-Browers-Work

How Tomcat Work——容器

How to work with database objects

How browsers work

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)