Bucket sort solve big data sorting

Big data can not be loaded into memory once, so sub-divided barrel files do again and again

 

Integer, for example, if the sorting is an integer of 32 bits, a total of 2 ^ 32 buckets, each bucket to Integer (4 bytes) count (counting sequencing)

^ 10 ^ 2 * 2 * 2 ^ 10 ^ 2 * 2 * 10 * 4 bytes = 16G, 16g memory required in order to calculate the median of an array of integers, 1g now assumed that available memory, the number of required compression of the tub 2 ^ 4 times, then the bucket number = x / (2 ^ 4), then falls on the number of buckets assigned tub

2 ^ 28, each bucket 4 bytes, i.e. 1G, where the memory is found median tub, and then traversed again (provided uniformly distributed) recording details of the tank, quick sort, to get

If the unevenness is recursively divided into treatment according to the following diagram:

 

The two posts are true:

https://www.jianshu.com/p/9f4ce4ec5684

"Step Four: keep on going until the end of the least significant byte (7-0bit) of bucket sort I believe that this time can be used once ranked in the fast memory can be a " less rigorous

https://juejin.im/post/5d4c2158f265da03ae7861c7?utm_source=gold_browser_extension

Orders continue to constantly referred to the uneven division, " If, after division, order or too much between 101 yuan to 200 yuan, can not be read into memory only once, and then it would continue to divide until all files can be read into memory so far " I referred to the first recursive partitioning of a stop condition, but without taking into account the float extreme cases, there is very data-intensive file bucket under the continuous division still can not be read into memory

Guess you like

Origin www.cnblogs.com/silyvin/p/11613772.html