Difference between orc and parquet format

参考:

https://www.cnblogs.com/ITtangtang/p/7677912.html

https://blog.csdn.net/yu616568/article/details/51868447

https://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/

总结

两者都是参考了Google 的Dremel 的数据格式, 列存储, 有预存统计信息

区别是Parquet 对于 nested data (嵌套类型, 复杂类型 比如struct)有更好的支持

其他方面ORC性能好点

Cloudera推Parquet, Hortonworks推ORC

猜你喜欢

转载自blog.csdn.net/rav009/article/details/82706307
今日推荐