Summary:
RDD: Resilient Distributed Data Set, is a special collection, supports multiple sources , has fault tolerance mechanism , can be cached , supports parallel operations, an RDD represents a dataset in a partition
RDD has two operation operators:
Transformation: Transformation is a delayed calculation. When one RDD is converted into another RDD, the transformation is not performed immediately. It just remembers the logical operation of the dataset. Action
(execution): triggers the running of the Spark job, which actually triggers the transformation calculation. Sub-calculation
This series mainly explains the function operations commonly used in Spark:
1. Basic
RDD conversion 2. Key-value RDD conversion
(execution): triggers the running of the Spark job, which actually triggers the transformation calculation. Sub-calculation
This series mainly explains the function operations commonly used in Spark:
1. Basic
RDD conversion 2. Key-value RDD conversion
Connection: https://www.cnblogs.com/MOBIN/p/5384543.html#9