Use HyperLogLog functions for advanced analysis in the Apache Spark

Use HyperLogLog functions for advanced analysis in the Apache Spark

Pre-polymerization is commonly used in high-performance analytics technology, for example, every hour 10 billion websites can access data through polymerization commonly used queries latitude, is reduced to 10 million access statistics, so it can reduce data processing times 1000 amount, thereby significantly reducing the amount of computation at query time, enhance the response speed. Higher level of aggregation may bring further performance improvements, for example, in the time dimension by day polymerization, or by polymerization site rather than a URL.
In this paper, we will introduce  the Spark-Alchemy  HyperLogLog this advanced features of this open-source library, and explore how it is to solve big data, data aggregation. First, let's discuss the challenges faced by this one.

And then polymerization (Reaggregation) challenges

Prepolymerization is a powerful data analysis techniques field, provided that the index is re-calculated to be polymerized. Polymerization operation, by definition, is associative, so it is easy to introduce further polymerization

Guess you like

Origin yq.aliyun.com/articles/718186