RDD中cache和persist的区别

RDD中cache和persist的区别

通过观察RDD.scala源代码即可知道cache和persist的区别:
def persist(newLevel: StorageLevel): this.type = {
  if (storageLevel != StorageLevel.NONE && newLevel != storageLevel) {
    throw new UnsupportedOperationException( "Cannot change storage level of an       
                 RDD after it was already assigned a level")
  }
  sc.persistRDD(this)
  sc.cleaner.foreach(_.registerRDDForCleanup(this))
  storageLevel = newLevel
  this
}
/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)
 
/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def cache(): this.type = persist()
可知:
1)RDD的cache()方法其实调用的就是persist方法,缓存策略均为MEMORY_ONLY;
2)可以通过persist方法手工设定StorageLevel来满足工程需要的存储级别;
3)cache或者persist并不是action;
注意:cache和persist都可以用unpersist来取消
参考:http://www.it610.com/article/5230635.htm

猜你喜欢

转载自blog.csdn.net/helloxiaozhe/article/details/80464069