Big data can not be loaded into memory once, so sub-divided barrel files do again and again
Integer, for example, if the sorting is an integer of 32 bits, a total of 2 ^ 32 buckets, each bucket to Integer (4 bytes) count (counting sequencing)
^ 10 ^ 2 * 2 * 2 ^ 10 ^ 2 * 2 * 10 * 4 bytes = 16G, 16g memory required in order to calculate the median of an array of integers, 1g now assumed that available memory, the number of required compression of the tub 2 ^ 4 times, then the bucket number = x / (2 ^ 4), then falls on the number of buckets assigned tub
2 ^ 28, each bucket 4 bytes, i.e. 1G, where the memory is found median tub, and then traversed again (provided uniformly distributed) recording details of the tank, quick sort, to get
If the unevenness is recursively divided into treatment according to the following diagram:
The two posts are true:
https://www.jianshu.com/p/9f4ce4ec5684
"Step Four: keep on going until the end of the least significant byte (7-0bit) of bucket sort I believe that this time can be used once ranked in the fast memory can be a " less rigorous
https://juejin.im/post/5d4c2158f265da03ae7861c7?utm_source=gold_browser_extension
Orders continue to constantly referred to the uneven division, " If, after division, order or too much between 101 yuan to 200 yuan, can not be read into memory only once, and then it would continue to divide until all files can be read into memory so far " I referred to the first recursive partitioning of a stop condition, but without taking into account the float extreme cases, there is very data-intensive file bucket under the continuous division still can not be read into memory