[spark-src] 1-overview - 代码天地

[spark-src] 1-overview

企业开发 2018-05-10 10:14:28 阅读次数: 0

what is

"Apache Spark™ is a fast and general engine for large-scale data processing....Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." stated in apache spark

in despite of it's real a fact or not, i think certain key concepts/components to support these points of view:

a.use Resilient Distributed Datasets(RDD) program modeling largely differs from common ideas,eg. mapreduce.spark uses many optimized algorithms(e.g. iterative,localization etc) spread workload to across many workers in cluster.specially in reuse of data computation.

RDD:A resilient distributed dataset (RDD) is a read-only col- lection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.[1]

b.uses memory as far as possible.most of the intermediate results from spark retains in memory other than disks,so it's needles suffer from the io problem and serial-deserial cases.

in fact we use many tools to do similar stuffs ,like memocache,redis..

c.emphasizes the parallism concept.

d.degrades the jvm supervior responsibilities.eg. use one executor to hold on certain tasks instead of one container per task in yarn.

architecture

(the core component is as a platform for other components)

usages of spark

1.iterative alogrithms.eg. machine learning,clustering..

2.interactive analystics. eg. query a ton of data loaded from disk to memory to reduce the latency of io

3.batch process

program language

most of the source code are writing with scala( i think many functions,ideas are inspirated from scala;),but u can also write with java,python in it

flex integrations

many popular frameworks are supported by spark,e.g. hadoop,hbase,mesos etc

ref:

[1] some papers

[spark-src]-source reading

猜你喜欢

转载自leibnitz.iteye.com/blog/2284871

[spark-src] 1-overview

[spark-src]-source reading

《PyTorch深度学习实践》-1-Overview

Spark Overview

Spark 集群模式OverView

Spark: Cluster Mode Overview

[Spark笔记]Apache Spark — Overview

Spark Cluster Mode Overview 翻译

【part 1】foundation of Overview

Lecture 1: Overview

1 Computer Networking notes: overview

1 Overview of make（make概述）

[spark-src-core] 2.3 shuffle in spark

[spark-src-core] 4.1 spark on yarn

overview

如何使用Lumberyard制作特效[【1】——Overview

nginx模块开发入门（二） -1 Overview

hbase-memstore flush -1 overview

RESTEasy 系列 Chapter 1 Overview 概述

Rover: L1 navigation overview

UI Framework-1: Aura Overview

[Alsa Document]1, overview.txt

Operating Systems1-Overview and History

BUAA_OO_2020_Unit1_Overview

Linux内核进程调度overview(1)

[spark-src-core] 6. checkpoint in spark

[spark-src-core] 5.big data techniques in spark

[spark-src-core] 7.1 application in spark-PageRank

[spark-src-core] 2.5 core concepts in Spark

[spark-src-core] 3.3 run spark in standalone(cluster) mode

今日推荐

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

周排行

计算机组成与设计（七）—— 除法器

Integer Approximation(分治+枚举)

大话数据库索引

windows10系统JDK的配置及下载地址

mysql实现秒值转换中原六仔平台搭建

Codeforces Round #556 (Div. 1)

百练1064 网线主管

Codeforces 995F Cowmpany Cowmpensation

子集生成之增量构造法，位向量法，二进制法

ERROR: cmd.exe failed with args /c "/APK\gradle\rungradle.bat...

每日归档

更多

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)