Arrays.sort()方法的源码分析

Arrays.sort(Object[] a)方法的源代码如下:

public static void sort(Object[] a) {
    if (LegacyMergeSort.userRequested)
        legacyMergeSort(a);
    else
        ComparableTimSort.sort(a, 0, a.length, null, 0, 0);
}
/** To be removed in a future release. */
private static void legacyMergeSort(Object[] a) {
    Object[] aux = a.clone();
    mergeSort(aux, a, 0, a.length, 0);
}

sort有一个分支判断,当LegacyMergeSort.userRequested为true的情况下,采用legacyMergeSort,否则采用ComparableTimSort。LegacyMergeSort.userRequested的字面意思大概就是“用户请求传统归并排序”的意思。
legacyMergeSort是传统的归并排序,而ComparableTimSort是优化的归并排序。


当执行如下语句:

Person[] p = new Person[4];
p[0] = new Person("AAA",30);
p[1] = new Person("BBB",20);
p[2] = new Person("CCC",25);
p[3] = new Person("DDD",5);
Arrays.sort(p);

调用Arrays的sort(Object[] a)方法,源码如下:

public static void sort(Object[] a) {
     if (LegacyMergeSort.userRequested)
         legacyMergeSort(a);
     else
         ComparableTimSort.sort(a, 0, a.length, null, 0, 0);
}

若LegacyMergeSort.userRequested(大概就是“用户请求传统归并排序”的意思)为true,则调用legacyMergeSort(Object[] a)方法,否则调用ComparableTimSort的sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen)方法。

先来看一下legacyMergeSort(Object[] a)方法,源码如下:

/** To be removed in a future release. */
private static void legacyMergeSort(Object[] a) {
    Object[] aux = a.clone();
    mergeSort(aux, a, 0, a.length, 0);
}

文档注释中说明了该方法在将来的版本中会被移除;
然后调用了mergeSort(Object[] src,Object[] dest,int low,int high,int off)方法,传入的参数分别是aux,a,0,a.length,0。源码如下:

/**
 * Src is the source array that starts at index 0
 * Dest is the (possibly larger) array destination with a possible offset
 * low is the index in dest to start sorting
 * high is the end index in dest to end sorting
 * off is the offset to generate corresponding low, high in src
 * To be removed in a future release.
 */
 private static void mergeSort(Object[] src,
                                  Object[] dest,
                                  int low,
                                  int high,
                                  int off) {
        int length = high - low;

        // Insertion sort on smallest arrays
        if (length < INSERTIONSORT_THRESHOLD) {
            for (int i=low; i<high; i++)
                for (int j=i; j>low &&
                         ((Comparable) dest[j-1]).compareTo(dest[j])>0; j--)
                    swap(dest, j, j-1);
            return;
        }

        // Recursively sort halves of dest into src
        int destLow  = low;
        int destHigh = high;
        low  += off;
        high += off;
        int mid = (low + high) >>> 1;
        mergeSort(dest, src, low, mid, -off);
        mergeSort(dest, src, mid, high, -off);

        // If list is already sorted, just copy from src to dest.  This is an
        // optimization that results in faster sorts for nearly ordered lists.
        if (((Comparable)src[mid-1]).compareTo(src[mid]) <= 0) {
            System.arraycopy(src, low, dest, destLow, length);
            return;
        }

        // Merge sorted halves (now in src) into dest
        for(int i = destLow, p = low, q = mid; i < destHigh; i++) {
            if (q >= high || p < mid && ((Comparable)src[p]).compareTo(src[q])<=0)
                dest[i] = src[p++];
            else
                dest[i] = src[q++];
        }
                                  }

文档注释中说明了参数的作用:
src是源数组,也就是需要排序的数组,下标从0开始
dest是目标数组,也就是排序后的数组
low是需要排序的起始下标
high是需要排序的终点下标
off是

**

归并排序就是将一组数分割成两个子数组,再对子数组进行排序,然后再归并起来。

**

int length = high - low;

获取待排序的元素个数

// Insertion sort on smallest arrays
if (length < INSERTIONSORT_THRESHOLD) {
     for (int i=low; i<high; i++)
        for (int j=i; j>low &&((Comparable) dest[j-1]).compareTo(dest[j])>0; j--)
           swap(dest, j, j-1);
    return;
}

若待排序的元素个数小于INSERTIONSORT_THRESHOLD=7,也就是排序的元素小于7个的时候,

/**
  * Tuning parameter: list size at or below which insertion sort will be
  * used in preference to mergesort.
  * To be removed in a future release.
  */
  private static final int INSERTIONSORT_THRESHOLD = 7;

直接进行排序,每个元素和在它前面的元素一一比较,若小于前面的元素,就交换两者的位置,实际上就是一个逆向的冒泡排序。

若待排序的元素个数小于INSERTIONSORT_THRESHOLD>=7,则不使用冒泡排序。继续往下执行代码:

// Recursively sort halves of dest into src
int destLow  = low;
int destHigh = high;
low  += off;
high += off;
int mid = (low + high) >>> 1;
mergeSort(dest, src, low, mid, -off);
mergeSort(dest, src, mid, high, -off);

这里就是归并排序了,将待排序的数组一分为二,利用递归,不断地切分数组,当切分后的数组元素小于7个时,就利用冒泡排序进行排序,然后再归并,即可完成排序功能。


再来看看ComparableTimSort的sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen)方法。
TimSort算法是一种起源于归并排序和插入排序的混合排序算法,设计初衷是为了在真实世界中的各种数据中可以有较好的性能。该算法最初是由Tim Peters于2002年在Python语言中提出的。
TimSort 是一个归并排序做了大量优化的版本。对归并排序排在已经反向排好序的输入时表现O(n2)的特点做了特别优化。对已经正向排好序的输入减少回溯。对两种情况混合(一会升序,一会降序)的输入处理比较好。
ComparableTimSort源码如下:

    static void sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen) {
        assert a != null && lo >= 0 && lo <= hi && hi <= a.length;

        int nRemaining  = hi - lo;
        if (nRemaining < 2)
            return;  // Arrays of size 0 and 1 are always sorted

        // If array is small, do a "mini-TimSort" with no merges
        if (nRemaining < MIN_MERGE) {
            int initRunLen = countRunAndMakeAscending(a, lo, hi);
            binarySort(a, lo, hi, lo + initRunLen);
            return;
        }

        /**
         * March over the array once, left to right, finding natural runs,
         * extending short natural runs to minRun elements, and merging runs
         * to maintain stack invariant.
         */
        ComparableTimSort ts = new ComparableTimSort(a, work, workBase, workLen);
        int minRun = minRunLength(nRemaining);
        do {
            // Identify next run
            int runLen = countRunAndMakeAscending(a, lo, hi);

            // If run is short, extend to min(minRun, nRemaining)
            if (runLen < minRun) {
                int force = nRemaining <= minRun ? nRemaining : minRun;
                binarySort(a, lo, lo + force, lo + runLen);
                runLen = force;
            }

            // Push run onto pending-run stack, and maybe merge
            ts.pushRun(lo, runLen);
            ts.mergeCollapse();

            // Advance to find next run
            lo += runLen;
            nRemaining -= runLen;
        } while (nRemaining != 0);

        // Merge all remaining runs to complete sort
        assert lo == hi;
        ts.mergeForceCollapse();
        assert ts.stackSize == 1;
    }

来一步步分析:

int nRemaining  = hi - lo;
if (nRemaining < 2)
   return;  // Arrays of size 0 and 1 are always sorted

获取待排序数组的元素个数,若个数小于2,则无需排序。

// If array is small, do a "mini-TimSort" with no merges
if (nRemaining < MIN_MERGE) {
    int initRunLen = countRunAndMakeAscending(a, lo, hi);
    binarySort(a, lo, hi, lo + initRunLen);
    return;
}

传入的待排序数组若小于阈值MIN_MERGE(Java实现中为32,Python实现中为64),则调用 binarySort,这是一个不包含合并操作的 mini-TimSort。

private static final int MIN_MERGE = 32;

a) 从数组开始处找到一组连接升序或严格降序(找到后翻转)的数
b) Binary Sort:使用二分查找的方法将后续的数插入之前的已排序数组,binarySort 对数组 a[lo:hi] 进行排序,并且a[lo:start] 是已经排好序的。算法的思路是对a[start:hi] 中的元素,每次使用binarySearch 为它在 a[lo:start] 中找到相应位置,并插入。

我们来看看countRunAndMakeAscending函数是如何实现查找严格升序或者严格降序的
源码如下:
@SuppressWarnings({"unchecked", "rawtypes"})
private static int countRunAndMakeAscending(Object[] a, int lo, int hi) {
   assert lo < hi;
   int runHi = lo + 1;
   if (runHi == hi)
      return 1;
   // Find end of run, and reverse range if descending
   if (((Comparable) a[runHi++]).compareTo(a[lo]) < 0) {// Descending
      while(runHi<hi&&((Comparable)a[runHi]).compareTo(a[runHi-1])<0)
          runHi++;
      reverseRange(a, lo, runHi);
   } else {// Ascending
      while(runHi<hi&&((Comparable)a[runHi]).compareTo(a[runHi-1])>=0)
         runHi++;
   }
   return runHi - lo;
}
countRunAndMakeAscending方法接收的参数有三个,待查找的数组,起始下标,终点下标。
基本思想是:判断第二个数和第一个数的大小来确定是升序还是降序,
    若第二个数小于第一个数,则为降序,然后在while循环中,若后后面的数依旧小于前面的数,则runHi++计数,直到不满足降序;然后调用reverseRange进行反转,变成升序。
    若第二个数大于第一个数,则为升序,然后在while循环中,若后面的数依旧大于前面的数,则runHi++计数,直到不满足升序。
    返回runHi - lo也就是严格满足升序或者降序的个数。且这个严格序列是从第一个开始的。最后都是严格的升序序列。

所以在执行binarySort方法的时候只需要将lo + initRunLen后的数依此插入前面的升序序列中即可。

若待排序数组若大于阈值MIN_MERGE,则直接进行排序。

/**
 * March over the array once, left to right, finding natural runs,
 * extending short natural runs to minRun elements, and merging runs
 * to maintain stack invariant.
 */
ComparableTimSort ts = new ComparableTimSort(a,work,workBase, workLen);
int minRun = minRunLength(nRemaining);
do {
    // Identify next run
    int runLen = countRunAndMakeAscending(a, lo, hi);

    // If run is short, extend to min(minRun, nRemaining)
    if (runLen < minRun) {
        int force = nRemaining <= minRun ? nRemaining : minRun;
        binarySort(a, lo, lo + force, lo + runLen);
        runLen = force;
    }

    // Push run onto pending-run stack, and maybe merge
    ts.pushRun(lo, runLen);
    ts.mergeCollapse();

    // Advance to find next run
    lo += runLen;
    nRemaining -= runLen;
} while (nRemaining != 0);

来一步步分析

int minRun = minRunLength(nRemaining);

根据待排序数组元素个数nRemaining,调用minRunLength()方法来确定minRun大小,之后待排序数组将被分成以minRun大小为区块的一块块子数组。

minRunLength源码如下:
private static int minRunLength(int n) {
   assert n >= 0;
   int r = 0;// Becomes 1 if any 1 bits are shifted off
   while (n >= MIN_MERGE) {
      r |= (n & 1);
      n >>= 1;
   }
   return n + r;
}
-----------------------------------------------------------------------
>>表示右移,右移的规则是:**符号位不变,左边补上符号位**
表达式为:exp1 >> exp2
表示把数exp1向右移动exp2位。
以int型为例:

res = 20 >> 2
int型占4个字节,则20的二进制位表示如下:
0(符号位) 000 0000 0000 0000 0000 0000 0001 0100,
右移2位,高位补上符号位
0(符号位) 000 0000 0000 0000 0000 0000 0000 0101,则res = 5;

res = -20 >> 2
20的二进制表示如下:
0000 0000 0000 0000 0000 0000 0001 0100
则-20表示如下:符号位取1,其余取反,末位加1
符号位取1:1000 0000 0000 0000 0000 0000 0001 0100
其余取反:1111 1111 1111 1111 1111 1111 1110 1011
末位加1:1111 1111 1111 1111 1111 1111 1110 1100
则-20表示为:1111 1111 1111 1111 1111 1111 1110 1100
右移两位,高位补上符号位
1111 1111 1111 1111 1111 1111 1111 1111 1011
符号位为1,说明仍为负数,
符号位取1:1111 1111 1111 1111 1111 1111 1111 1111 1011
其余取反:1000 0000 0000 0000 0000 0000 0000 0000 0100
末位加1:1000 0000 0000 0000 0000 0000 0000 0000 0101,则res = -5
-----------------------------------------------------------------------
n&1
把n与1按位与,因为1除了最低位为1,其他位都为0.所以按位与结果取决于n的最后一位,如果n的最后一位是1,则结果位1,反之结果为0
(n&1)==1可以判断最后一位是不是1(可用来判断n值的奇偶性)
也就是说,若n为奇数,n&1为1;若n为偶数,n&1为0
-----------------------------------------------------------------------
r |= (n & 1);等价于r = r | (n & 1)
由上可知,n&1的值为0或者1,则上式的结果分析如下:
r=0, n&1=0,则r=0
r=0,n&1=1,则r=1
r=1,n&1=0,则r=1
r=1,n&1=1,则r=1
-----------------------------------------------------------------------
若n大于或者等于MIN_MERGE,求出r,n右移1位,直到n小于MIN_MERGE。
若n小于MIN_MERGE,则返回本身。
最终返回的值范围:[16,32]

接着来分析dowhile循环中的代码:

int runLen = countRunAndMakeAscending(a, lo, hi);

找到a中从第一个数开始的严格升序,如果是降序,则反转成升序序列。

if (runLen < minRun) {
   int force = nRemaining <= minRun ? nRemaining : minRun;
   binarySort(a, lo, lo + force, lo + runLen);
   runLen = force;
}

若找到的升序序列的个数runLen小于minRun,由上面可知,minRun的范围在:[16,32],force取nRemaining和minRun中较小的那个。然后利用折半查找binarySort进行排序,

ts.pushRun(lo, runLen);
ts.mergeCollapse();

a[lo,runLen]是有序的,将其入栈ts.pushRun(lo, runLen);//为后续merge各区块作准备:记录当前已排序的各区块的大小

lo += runLen;
nRemaining -= runLen;

这里实际就是剔除了有序的a[lo,runLen]段,然后将剩下的重新循环dowhile的步骤,直到全部有序。

猜你喜欢

转载自blog.csdn.net/qq_16403141/article/details/78234868
今日推荐