spark3 uses zstd compression

If spark does not want to split the file, it will start several tasks depending on the size of the file, parameter setting
–conf spark.sql.files.maxPartitionBytes=2147483648 --conf spark.sql.files.openCostInBytes=2147483648

The written file is compressed using zstd, and spark3 only starts to support
–conf spark.sql.parquet.compression.codec=zstd

Guess you like

Origin blog.csdn.net/weixin_43015677/article/details/131686983