短小精悍算例：Python和Spark实现字数统计(word count) - 代码天地

短小精悍算例：Python和Spark实现字数统计(word count)

其他 2021-04-05 05:52:17 阅读次数: 0

如题，实现文本字数统计，文本在D盘，名称是testfileA.txt
文本内容如下：

话不多说，上程序：

from pyspark import SparkContext

sc = SparkContext("local", "wordcount")
text_file = sc.textFile("D:/Python_Path/testfileA.txt")

## \表示换行连接。(word, 1)中只能为1，是2的话表示出现个数的2倍，3的话表示三倍。
wordcount = text_file.flatMap(lambda line : line.split(" "))\
                         .map(lambda word: (word, 1))\
                         .reduceByKey(lambda a, b : a+b)

wordcount.foreach(print)  #依次打印统计次数

运行结果如下：
**结果**

猜你喜欢

转载自blog.csdn.net/weixin_39464400/article/details/105678832

短小精悍算例：Python和Spark实现字数统计(word count)

Spark 实现word count

Word Count

count(1),count(*)和count(列)的比较

count(*)，count(1)和count（字段）的区别

count(*)、count(val)和count(1)的解释

count（*）、count（1）和count（列名）的区别

count(*)、count(1)和count(列名)的区别

count(1)/count(*)和count(列名)区别

count（*）和count（1）,count（列名）的区别

count(1)，count(*)和count(列名)的区别

count(1)、count(*)和count(列名)区别

python实现Word Count

count(0)、count(1)和count(*)、count(列名) 的区别

count(*)和count(字段)的区别

COUNT(1)和COUNT(*) 的区别

count(1)和count(*)的区别？

Spark的word count

Spark Word Count

Spark Streaming的Word Count

Spark Streaming的Word Count

短小精悍算例：用TensorFlow实现Hello World

短小精悍算例：Python中zip()函数的用法

短小精悍算例：Python绘制柱状图

浅谈count(普通列)， count(*)，count(1)，count(index)和count(主键)执行效率

python pandas 实现SQl的count(*),count(distinct **)

count(*) 和 count(1)和count(列名)区别

count（*）和count（1）和count（列名）的区别

转：count（*）和count（1）和count（列名）的区别

mysql中count(*)和count(1)和count(column)区别

今日推荐

Electron中的关于静态资源加载问题解决方案

《Cursor-AI编程》基础篇-界面指南

《Cursor-AI编程》基础篇-Tab代码智能补充

《Cursor-AI编程》基础篇-Composer功能详解

《Cursor-AI编程》基础篇-Chat功能详解

《Cursor-AI编程》进阶篇-自定义模型

《Cursor-AI编程》进阶篇-上下文详解

【大模型系列篇】最强检索增强技术GraphRAG基本原理详解

【大模型系列篇】基于Ollama和GraphRAG v2.0.0快速构建知识图谱

解释什么是迁移学习？在 CNN 中如何应用？（面试题200合集，高频、关键）

解释数据增强（Data Augmentation）的概念和方法（（面试题200合集，高频、关键））

揭秘大模型“魔法”：Function Calling 让 AI 不止会说，更能“做”！

周排行

ConfigurationClassParser类的parse方法源码解析

基础大讲堂-java 位运算符

ConsecutiveInteger判断给定的整数n能否表示成连续的m(m>1)个正整数之和

多项式问题之六——多项式快速幂

Spring Security技术栈开发企业级认证与授权（四）RESTful API服务异常处理

Linux基础命令---apachectl

MATLAB中的线性插值

Unity编辑器拓展之十七：NGUI ComponentSelector增加搜索框

SqlServer 备份还原教程

[Unity动画]01.

每日归档

更多

2025-04-12(10529)

2025-04-11(9561)

2025-04-10(1213)

2025-04-09(10354)

2025-04-08(12998)

2025-04-07(0)

2025-04-06(0)

2025-04-05(0)

2025-04-04(0)

2025-04-03(0)