Sorting algorithm-insert sorting series and performance test

First acquaintance with insertion sort

Insertion sort is also a relatively simple sorting algorithm. The idea is to take the first element of the array as a sequence that has been sorted, and then take the remaining elements in the array from left to right, one element at a time, and insert them into the ordered sequence on the left Continue to expand the length of the ordered sequence on the left until the entire array is ordered.

The diagram is as follows

Suppose you are going to insert sort of the following array

First add the first element to the ordered sequence

Then take the first element of the unsorted part and insert the element into the appropriate position in the ordered sequence on the left

The first element of the unsorted part is 3, and it is found that 3 should be inserted before 9

Then the results of the first round of insertion are as follows

Continue to the next round, take the first element of the unsorted part , 6

Insert 6 into the appropriate position in the ordered sequence on the left, that is, between 3 and 9

Continue to the next round, take the first element 1 of the unsorted part of the array, and find that 1 should be inserted to the left of 3

Continue to the next round, 4 is inserted between 3 and 6

In the next round, 2 is inserted between 1 and 3

… Finally, the entire array is ordered

Note: Insert a number into an appropriate position in the ordered sequence. This process can be implemented in multiple ways. The simplest one is to use a process similar to bubbling, constantly comparing and exchanging two adjacent numbers, and always swapping the elements to be inserted to the appropriate position. According to this idea, the code for insertion sort is as follows

public void insertSort(int[] array) {
    
    
		/*不断地将数字插入到合适的位置,扩大有序序列的边界 */
		for (int sortedSize = 1; sortedSize < array.length; sortedSize++) {
    
    
			for (int i = sortedSize - 1; i >= 0 ; i--) {
    
    
				if (array[i + 1] < array[i]) {
    
    
					swap(array, i + 1, i);
				} else {
    
    
					break;
				}
			}
		}
	}

Optimization ideas

Idea One

Change comparison and exchange to one-way assignment

The above insertion sort actually has room for optimization. Because in each round of sorting, you actually only need to find a suitable position. In the above implementation, the process of finding a suitable insertion position is accomplished through constant exchange of elements. Each time two numbers are exchanged, three assignment operations are required . Put it another thought, we can actually put the elements to be inserted up to scratch, and then be inserted into the elements and elements in an ordered sequence from right to left for comparison, as long as the ordered sequence of elements than elements to be inserted into the large If it is, move its position in the ordered sequence to the right by one bit, so that every time the place that needs to be exchanged is changed to one-way assignment , only one assignment operation is required. Finally, after finding a suitable position, directly assign the temporarily stored element to be inserted. The diagram is as follows

Assume that the array state at a certain time during the insertion sort process is as follows

The next element to be inserted is 5, temporarily store it, and then start from the rightmost element in the ordered sequence, and compare it with 5 in turn, and find that 9 is greater than 5, then assign 9 unidirectionally to the right, overwriting it 5. Logically speaking, the original position of 9 is actually useless, because the 9 has been moved one place to the right, and since we temporarily stored 5, there is no need to worry about losing it. Useless elements are marked in purple in the picture below

Then go to the left and look at the next element in the ordered sequence, 8

Compare 8 and 5, find that 8 is greater than 5, then 8 is also assigned to the right, covering the purple 9

Continue to the left,

Comparing 2 and 5, it is found that 2 is less than 5, indicating that a suitable insertion position has been found, and the previously temporarily stored 5 is assigned to the position of the purple element in one direction

This completes this round of insertion sort

Logically speaking, it is equivalent to moving the larger elements in the ordered sequence one position to the right in turn, leaving a space for the element to be inserted. Because it is a one-way assignment, it is less than the previous comparison exchange method. Many assignment operations have been performed, so the performance will improve. But an extra space is needed to temporarily store the elements to be inserted. According to this idea, the code is written as follows

	public void insertSortV1(int[] array) {
    
    
		for (int sortedSize = 1; sortedSize < array.length; sortedSize++) {
    
    
			int temp = array[sortedSize];
			int i = sortedSize;
			while (i > 0 && array[i - 1] > temp) {
    
    
				array[i] = array[i - 1];
				i--;
			}
			array[i] = temp;
		}
	}

Idea two

Change linear search to binary search

Because of each round of insertion sort, the key is to find the appropriate insertion position , which is actually a search process. The above implementation uses linear search, that is, from the rightmost end of the ordered sequence, and then compares to the left until a suitable position is found. The time complexity of this search is linear, that is, O(n). We can optimize the search process by replacing linear search with binary search . Binary search is a jump search. Each time the middle position of the sequence to be searched is taken and compared with the target value. If it is less than the target value, continue on Search in the right half, if it is greater than the target value, continue to search in the left half. Binary search can reduce the search space by half each time, and its time complexity is O(log(n))

The optimized code using binary search is as follows

	/**
	 * 二分查找
	 * @param left 左边界(inclusive)
	 * @param right 有边界(inclusive)
	 * */
	private int binarySearch(int[] array, int left, int right, int target) {
    
    
		while (left <= right) {
    
    
            /* 取中间位置 */
			int mid = (left + right) >> 1;
			/*
			* 临界情况有2种
			* 1. 待查找区间还剩2个数 ->  此时 right = left + 1 , mid = left
			*  1.1 若判断 arr[mid] > target, 则该查找左半部分,此时应该插入的位置是mid,也就是left
			*  1.2 若判断 arr[mid] < target, 则该查找右半部分,此时应该插入的位置是mid + 1,也就是更新后的left
			* 2. 待查找区间还剩3个数 -> 此时 right = left + 2 , mid = left + 1
			*  2.1 若判断 arr[mid] > target, 则查找左半部分,回到情况1
			*  2.2 若判断 arr[mid] < target,则查找右半部分,更新完后下一轮循环 left = right = mid,
			*      若arr[mid] > target,则应插入的位置是left,若arr[mid] < target,则更新完后的left是应插入的位置
			* */
			if (array[mid] > target) {
    
    
				/* 往左半边查找 */
				right = mid - 1;
			} else if (array[mid] < target) {
    
    
				/* 往右半边查找 */
				left = mid + 1;
			} else {
    
    
				/* 相等了,返回待插入位置为 mid + 1 */
				return mid + 1;
			}
		}
		return left;
	}

	@Override
	public void insertSortBinarySearch(int[] array) {
    
    
		for (int sortedSize = 1; sortedSize < array.length; sortedSize++) {
    
    
			int temp = array[sortedSize];
			/* 获得待插入的位置 */
			int insertPos = binarySearch(array, 0, sortedSize - 1, temp);
			/* 将待插入位置之后的有序序列,全部往后移一位 */
			for (int i = sortedSize; i > insertPos ; i--) {
    
    
				array[i] = array[i - 1];
			}
			array[insertPos] = temp;
		}
	}

Idea 3

Hill sort

Hill sort is a variant of insertion sort. Its core idea is to introduce the concept of step size . Hill sort uses step size to divide the array into many small arrays, uses insertion sort in each small array, and then gradually reduces the step. Long , the last step is reduced to one, and the Hill sort with a step of one is the simple insertion sort above. Compared with simple insertion sort, the advantage of Hill sorting is that elements can be moved across multiple positions, while the original simple insertion sort must be compared one by one to find a suitable insertion position.

The diagram of Hill sorting is as follows, the initial step size is generally set to half of the array length

The above array is divided into 4 groups of elements according to gap=4. Elements whose position interval is equal to gap are divided into the same group. The first group is [1,5,9], (5 and 1 are separated by 4 positions, 9 and 5 are separated by 4 positions...) are marked in light blue, and the second group is [4,7], It is marked as light yellow, the third group is [2,6], which is marked as light purple, and the fourth group is [8,3], which is marked as orange. For each group of elements, a simple insertion sort is performed, and then the gap value is reduced, and the insertion sort is continued for each group until the gap is reduced to 1. The last round is equivalent to a simple insertion sort. Because insertion sort can achieve high efficiency when the array is basically ordered, Hill sorting introduces the concept of step size, which allows elements to be inserted across multiple positions without having to operate one by one. After each round of Hill sorting, the entire array becomes more ordered as a whole (it can be understood that after a round of Hill sorting, the elements in the front position of the entire array are the smallest elements in each group, and the rows The following elements are the larger elements in each group)

After the first round of Hill sorting, the array status is as follows (only the positions of 3 and 8 are exchanged)

In the second round of Hill sorting, the gap is reduced by half to 2

Insert and sort the 2 groups of elements respectively, and the results are as follows (the first group, marked in light blue, without any changes; the second group, marked in orange, only the positions of 3 and 4 are exchanged)

Finally, gap=1, perform the last round of Hill sorting (this is equivalent to a simple insertion sort)

From the naked eye, we can see that the last round only needs to exchange 2 and 3, 6 and 7. Two exchanges, that is, complete sorting

The above array is sorted by Hill, and the sorting is completed after a total of 4 swap operations are performed, which is very efficient. If simple insertion sort is used, the number of swap operations is greater than 4. This shows the power of Hill sorting. Based on the above content, we can also better understand the meaning of Hill sorting, which is also called reduced incremental sorting . Step size, increment, and gap are all the same thing. The choice of increment sequence will also affect the efficiency of Hill sorting. Usually, the initial increment is half the length of the array, and then the increment sequence is halved each time.

The code implementation of Hill sorting is as follows

	public void shellSort(int[] array) {
    
    
		for (int gap = array.length / 2; gap > 0; gap /= 2) {
    
    
			for (int i = gap; i < array.length; i++) {
    
    
				int j = i;
				int temp = array[i];
                /* 这里没有采用二分查找,直接使用的线性查找 */
				while (j - gap >= 0 && array[j - gap] > temp) {
    
    
					array[j] = array[j - gap];
					j -= gap;
				}
				array[j] = temp;
			}
		}
	}

Note that in the code implementation, it is to traverse directly from the position after the gap to the last element of the array, and for each element, the gap is incremented, and the direct insertion sort is performed forward. This is somewhat different from the description of insert sorting for each group of elements in the diagram. The advantage of writing this way is that the entire Hill sorting code only needs 3 layers of loops and is easier to understand. According to the description of the diagram, each time the insertion sort of a group of elements is used as a loop, the code written requires 4 layers of loops, which is not easy to understand. Interested readers can refer to the following code

	public void shellSort(int[] array) {
    
    
		int gap = array.length / 2;
		while (gap >= 1) {
    
    
			for (int i = 0; i < gap; i++) {
    
    
				/* 这一层循环就是对每一组元素进行插入排序 */
				for (int j = i; j + gap < array.length; j += gap) {
    
    
					int pos = j + gap;
					while (pos - gap >= 0) {
    
    
						if (array[pos] < array[pos - gap]) {
    
    
							swap(array, pos, pos - gap);
							pos -= gap;
						} else {
    
    
							break;
						}
					}
				}
			}
			gap /= 2;
		}
	}

Performance Testing

Using random arrays ranging from 10,000 to 500,000, the performance test of each insertion sort algorithm is performed, and the line graph is drawn as follows

It can be seen that the performance of Hill sorting has exploded in several other versions. The second is optimized by binary search, and the second is optimized by one-way assignment. The unoptimized performance is the worst.

Guess you like

Origin blog.csdn.net/vcj1009784814/article/details/109026483