Various data structures, their advantages, disadvantages and applications

1. Sequence table
2. Linked list
3. Binary sorting tree
4. Balanced Binary Tree
5. Red and black trees
6. B-tree
7. B+ tree
7. Huffman tree
8. Heap
9. Stack
10. Queue
11. Hash table
12. Figure

1. Sequence table

Advantages: It is a continuous memory block, and the access efficiency of elements is high O(1).
Disadvantages: fixed size, large expansion overhead, large insertion and deletion overhead O(n).
Usage scenario: Programs that require large access to elements and small additions and deletions.

2. Linked list

Advantages: Inserting and deleting nodes is very efficient O(1) and does not require expansion.
Disadvantages: Linked lists are not random storage structures, and the cost of finding elements is O(n).
Usage scenarios: Suitable for programs that need to perform a large number of adding and deleting elements without requiring access to elements.

3. Binary sorting tree

Features: small on the left and large on the right, search efficiency O(logn), in extreme cases O(n).
Insertion: Recursive implementation, the newly inserted node must be a leaf node.
Delete: Leaf node: delete directly. Delete a node with only left or right subtree: replace itself with left or right subtree. The left subtree and right subtree of a deleted node have both: swap positions with the direct successor of the inorder traversal and delete itself.

4. Balanced Binary Tree

Features: small on the left and large on the right, no duplicate values. It was born to solve the problem that binary sorting tree degenerates into a linked list.
Advantages: The query efficiency of a balanced binary tree is inversely proportional to the height. The smaller h is, the faster the query is O(logn).
Disadvantages: Insertion and deletion will destroy the balance, requiring random left and right rotations to restore balance. Frequent insertion and deletion will affect efficiency.
Adjust imbalance: LL: right-hand rotation; RR: right-hand rotation; RL: right-hand rotation - left-hand rotation; LR: left-hand rotation - right-hand rotation.

5. Red and black trees

Features: The root node is always black, with no adjacent red nodes. In the path from the root node to any node whose descendants are NULL, the number of black nodes is the same, and each leaf node is black.
Advantages: The red-black tree gives up the pursuit of complete balance and pursues a rough balance, ensuring that each insertion only requires up to three rotations to achieve balance, and is simpler to implement O(logn).
Disadvantages: The search is slower than the AVL tree (higher), but the deletion efficiency is greatly improved.
Application: Set, MultiSet, Map, MultiMap in STL.
Insertion rules:
1. If X is root, set X to black and end.
Uncle is red:
1. Mark the newly inserted node X in red. 2. It is found that the Parent§ of X is also red, which violates the third rule of the red-black tree. 3. It is found that Uncle (U) of X is also red. 4. Mark P and U in black. 5. Mark X and X's GrandParent (G) with the same color, that is, red. Repeat steps 2 and 3. 6. Find that G is the root node, marked black. 7. End.
Uncle is black:
3. If X’s uncle (U) is black, we have to deal with it in four situations.
3.1. LL, P is the left child of G, and X is the left child of P.
3.1, LR, P is the left child of G, and X is the right child of P.
3.1. RR, P is the right child of G, and X is the right child of P.
3.1. RL, P is the right child of G, and X is the left child of P.
3.2. LL: right-hand rotation; RR: right-hand rotation; RL: right-hand rotation - left-hand rotation; LR: left-hand rotation - right-hand rotation.

6. B-tree

Features: Also called balanced multi-path search tree, when the B-tree is applied to the database, the database makes full use of the principle of the disk (disk data storage is stored in the form of blocks, each block size is 4K, each time IO is read, Use one disk block to read it out at once), limit the node size and fully use it within the disk block size range. By increasing the number of node keywords in the tree, the levels are fewer than before, reducing the number and complexity of data searches.
Application: database indexing technology, file system.

7. B+ tree

Features: B+ tree is an upgraded version of B tree. B+ tree makes full use of the node space, making the query speed more stable, and the speed is completely close to binary search.
Application: database, file system, interval access.
Advantages:
1. Each non-leaf node stores more keywords, the tree has fewer levels, and the query speed is fast.
2. The query speed is stable. All keywords are on leaf nodes, and the number of searches is the same each time, so the query speed is more stable than B+ tree.
3. The B+ tree naturally has a sorting function: all leaf node data of the B+ tree forms an ordered linked list, which is more convenient when querying data in large and small intervals. The data is very compact and the cache hit rate is higher than that of the B-tree.
4. B+ tree full node traversal is faster. B+ tree facilitates the entire tree by traversing all leaf nodes. There is no need to traverse each layer like B tree, which is conducive to full table scan of the database.
5. The advantage of B-tree is equivalent to B+ tree is that if the frequently accessed data is very close to the root node, and the non-leaf node of B-tree itself stores the address of the keyword machine data, so this kind of retrieval is faster than B+ tree.
Rules:
1. The non-leaf nodes of the B+ tree do not save the pointers of keyword records, but only perform data indexing. This greatly increases the number of keywords that each non-leaf node of the B+ tree can save.
2. The B+ tree leaf node saves the pointers of all keyword records of the parent node. All data addresses must be obtained from the leaf node, so the number of queries is the same each time.
3. The keywords of the leaf nodes of the B+ tree are arranged in order from small to large. The data at the end of the left side will save the pointer to the starting data of the node on the right.
4. The number of child nodes of non-leaf nodes = the number of keywords. MySQL uses B+ tree to implement it.

7. Huffman tree

Definition: When building a tree with N nodes (all leaf nodes and all with their own weights), if the length of the weighted path to build this tree is the smallest, it is called the optimal binary tree, also called Ha. Fman tree. The node with greater weight in the Huffman tree is closer to the root node.
Application: Decision making in minimum cost problems. Huffman coding, data compression, encryption and decryption, file transfer.

8. Heap

Property 1: The heap is a complete binary tree, also called a binary heap. Any node in the heap is always no greater than/no less than the value of its child node.
Property 2: Since the heap is a complete binary tree, according to the properties of the binary tree, the complete binary tree can be perfectly mapped to the array structure. Large top pile: arr[i]>=arr[2i+1]&&arr[i]>=arr[2i+2]; small top pile: arr[i]<=arr[2i+1]&&arr[i]< =arr[2i+2]; Therefore, arrays are often used to implement heap structures.
Property 3: Since the heap is regarded as a partial order, there is only the size relationship between the parent node and the child node, and there is no size relationship between the two child nodes. Therefore, the actual storage order of the heap constructed using different algorithms for the same batch of elements in the array is not necessarily certain. Same, and heap sort is also an unstable sorting algorithm.
Application: The heap is often used to implement priority queues. The priority queue can add data freely, but when removing data, it must be removed in order starting from the minimum value. The heap is also used to implement heap sort.

9. Stack

Application: arbitrary base conversion of numbers, bracket matching test, expression evaluation, binary tree traversal, etc.

10. Queue

Application: Use its first-in-first-out feature to solve specific business scenarios such as message notification, order processing, asynchronous processing, etc.

11. Hash table

1. HashMap
application: suitable for situations where search performance requirements are high and there is no logical relationship between data elements.
Definition: The bottom layer is an array structure, and each item in the array corresponds to a linked list. This structure is called linked list hash, also called hash table, hash table.
Stored procedure:
1. Call the HashCode method on the HashMap's Key and return the int value, which is the corresponding HashCode.
2. Use this HashCode as the index of the hash table to find the corresponding position of the hash table. If the current position is empty, wrap the KeyValue of the HashMap into an Entry array and put it into the current position.
3. If the current position is not empty, continue to search the linked list stored at the current index, use the Equals method to find the Entry array with the same Key, and replace its Value value.
4. If the same Key is not found, move the linked list at the current position backward and put the new Entry array at the head of the linked list.
2. HashSet
definition: HashSet is implemented through HashMap. The HashMap input parameter consists of two KeyValues. When implementing HashSet, keep HashMap only processing its Key object.
Stored procedure:
1. HashSet will first call the HashCode method of the element to obtain the hash value of the element.
2. Then calculate the storage location in the hash table by performing operations such as shifting on the hash value.
3. If there is no element stored at the current position, the element can be placed directly at that position.
4. If there are other elements, the equals method will be called to compare with the element at that position. If true is returned, it is regarded as a duplicate element and addition is not allowed. If fasle is returned, addition is allowed.
3. Solution to Hash Collision
Reason: Different keys get the same value after calculation by the hash function.
1. Open hashing method: In the array on which the hash table is based, each position is the head node of a linkedlist, so that conflicting tuples will be placed in the same linked list.
2. Closed hashing method: When a conflict occurs, subsequent elements will go to the next position to find an empty space.
4. Expansion of hash table
Reason: As the number increases, each element in the hash table will become longer and longer, so the efficiency will be greatly reduced.
Expansion: Increase the storage space of the hash table to twice the original size. At this time, all data need to be modified (recall the hash function to obtain the new position).

12. Figure

Storage of graphs
1. Adjacency matrix
Definition: The adjacency matrix storage method of a graph is to use two arrays to represent the graph, a one-dimensional array to store the vertex information in the graph, and a two-dimensional array to store the edge information in the graph. Insert image description here
Note:
1. In simple applications, you can directly use a two-dimensional array as the adjacency matrix of the graph.
2. When the elements in the adjacency matrix only indicate whether the corresponding edge exists, an enumeration type with a value of 0 or 1 can be defined.
3. The adjacency matrix of an undirected graph is a symmetric matrix, and a compression matrix can be used for large-scale adjacency matrices.
4. The space complexity of the adjacency matrix representation is O(n2), where n is the number of vertices of the graph |V|. 5. Using the
adjacency matrix method to store the graph, it is easy to determine whether there is an edge between any two vertices in the graph. connected. But to determine how many edges there are on the way, each element must be detected by row and column, which is very time-consuming.
6. Dense graphs are suitable for storage representation by adjacency matrices.

2. Adjacency list

Concept: The adjacency list refers to the establishment of a singly linked list for each vertex vi in the graph. The lower i nodes in the singly linked list represent the edges attached to the vertex vi. This singly linked list becomes the edge list of the vertex vi. The head pointer of the edge table and the data information of the vertices are stored sequentially, so there are two types of nodes in the adjacency list: vertex table nodes and edge table nodes. The vertex table node consists of a vertex field and a pointer pointing to the first adjacent edge, and the edge table node consists of an adjacent point field and a pointer field pointing to the next adjacent edge.
Insert image description here

Features:
1. If the graph G is an undirected graph, the storage space required is O(|V|+2|E|); if G is a directed graph, the storage space is O(|V|+|E| ), the former is a multiple of 2 because each edge in the undirected graph appears twice in the adjacency list.
2. For coefficient graphs, using adjacency list representation will greatly save storage space.
3. In the adjacency list, given a vertex, it is easy to find all its adjacent edges, because only its adjacency list needs to be read. In the adjacency matrix, the same operation requires scanning each row, which takes O(N) time. However, if you want to determine whether there is an edge between two given vertices, you can check it immediately in the adjacency matrix, while in the adjacency matrix In the table, you need to find another node in the edge corresponding to the corresponding node, which is less efficient.
4. In the adjacency list representation of a directed graph, to find the out-degree of a given vertex, you only need to calculate the number of nodes in its adjacency list, but to find the in-degree of a vertex, you need to traverse the entire adjacency list, so you can use Inverse adjacency list storage.
5. The adjacency list representation of the graph is not unique, because in the adjacency list corresponding to each vertex, the connection order of each edge node can be arbitrary, which depends on the algorithm for establishing the adjacency list and the input order of the edges.

3. Cross linked list
Definition: Cross linked list is a linked storage structure of a directed graph. For directed graphs, the adjacency list is defective. If you are concerned about the out-degree problem, you must traverse the entire graph to know. The cross linked list solves this problem very well. The fixed-point structure is shown in the figure below, where firstin represents the head pointer of the incoming edge table, pointing to the first vertex in the incoming edge table of the vertex. firstout represents the head pointer of the outgoing edge table, pointing to the first node in the outgoing edge table of the vertex. The edge table nodes are as follows, where tailvex refers to the subscript of the starting point of the arc in the vertex table, headvex refers to the subscript of the end point of the arc in the vertex table, headlink refers to the pointer field of the edge table, pointing to the next edge with the same end point, and taillink is Point out the edge table pointer field and only want the next edge with the same starting point. Insert image description here

The advantage of the cross-linked list is that it integrates the adjacency list and the inverse adjacency list, so it is easy to find the arc ending with v1 and the arc starting with v1, so it is easy to find the out-degree and in-degree of the vertex. In addition to its more complex structure and shorter structure, the time complexity of creating a graph algorithm is also the same as that of an adjacency list. Therefore, in the application of directed graphs, the cross linked list is a very good data structure model.
4.
Definition of edge set array: The edge set array is composed of two one-dimensional arrays. One stores the vertex information, and the other stores the edge information. Each data element of this array is subscripted by the starting point and the end point of an edge. Composed of targets and weights. As shown in the figure below, searching for a vertex in the edge set array requires scanning the entire edge array, which is not very efficient. Therefore, it is more suitable for operations that process edges in one operation, but not for operations related to vertices. Insert image description here

5. Graph traversal
1. DFS algorithm performance: DFS requires a recursive workstation, so the space complexity is O(v). For a graph with n vertices and e edges, the adjacency matrix is a two-dimensional array and needs to be searched. The adjacent points of each vertex need to access all elements in the array, so they all require O(V2) time. When the adjacency list is used as a storage structure, the time required to find adjacent points depends on the number of vertices and edges, so it is O(v+E). Obviously, for sparse graphs with many points and few edges, the adjacency list structure makes the algorithm faster in time. The efficiency is greatly improved. For directed graphs, since it is only feasible or infeasible for the existence of channels, there is no change in the algorithm and it is completely universal.
2. Depth-first spanning tree and spanning forest.
Depth-first search will generate a priority spanning tree. Calling DFS on a connected graph will generate a priority spanning tree. Otherwise, a priority spanning forest will be generated.
3. Performance analysis of BFS algorithm.
Regardless of the storage method of adjacency list or adjacency matrix, BFS algorithm requires the use of an auxiliary queue Q. N vertices need to be queued once. In the worst case, the space complexity is O(v ). When using the adjacency list storage method, each vertex needs to be searched once. When searching for the adjacent points of any vertex, each edge must be visited at least once. The total time complexity of the algorithm is O(E+V), and the adjacency matrix is used. In the storage mode, the time required to find the adjacent points of each vertex is O(V), so the total is O(V2). Note
: The adjacency matrix representation of the graph is unique, but for the adjacency list, if the edge The input order is different, and the generated adjacency list is also different. Therefore, for the same graph, the DFS and BFS sequences obtained based on the adjacency matrix traversal are unique, and the DFS and BFS sequences based on the adjacency list traversal are not unique.
4. Determine the connectivity of the graph through graph traversal
For undirected graphs, if the undirected graph is connected, starting from any node, all nodes can be accessed in only one traversal. If the phase graph is non-connected, then starting from any node, all nodes can be visited in just one traversal. If there is no phase graph that is not connected, then starting from a vertex and traversing sequentially, only all vertices of the connected component where the vertex is located can be accessed, and vertices of other connected components on the way cannot be accessed through this traversal. For a directed graph, if there is a path from the initial point to every vertex in the graph, then all vertices in the graph can be easily accessed, otherwise not. Therefore, a second for loop is added to BFSTraverse() and DFSTraverse(), and then the initial point is selected and the traversal continues to prevent all vertices of the graph from being unable to be traversed. For undirected graphs, the number of calls to the above two functions is equal to the number of connected components; for directed graphs, this is not the case, because a connected directed graph is divided into strongly connected and non-strongly connected, and its connected subgraph is also divided into strongly connected components and non-strongly connected components. For non-strongly connected components, calling DFS and BFS in sequence cannot access all the vertices of the connected component.

6. Minimum Spanning Tree
The spanning tree of a connected graph is a very small connected subgraph. It contains all the vertices in the graph, but is only enough to form n-1 edges of a tree. If one of its edges is cut off, This will turn the spanning tree into a non-connected graph. If you add a whip to it, it will form a loop in the graph. For a fully connected undirected graph G = (V, E), the spanning tree is different, and the book with the smallest sum of edge weights is called the minimum spanning tree of G.
There are many algorithms for constructing minimum spanning trees, but most algorithms take advantage of the following properties of minimum spanning trees: Assume that G = (V, E) is a weighted connected undirected graph, and U is a non-empty subset of the vertex set V , if (u, v) is an edge with minimum weight, where u∈U, v∈VU, then there must be a minimum spanning tree containing (u, v). Minimum spanning tree algorithms based on this property mainly include Prim algorithm and Kruskal algorithm, both of which are based on greedy algorithm strategies.
1. Prim's algorithm (Prim)
in layman's terms means: starting from a vertex, on the premise of ensuring that no loop is formed, each time a shortest edge is found and added, the currently formed connected component is treated as a whole or a Click View, then repeat the operation of finding the shortest side and adding it.
Insert image description here
2. Kruskal algorithm (Kruskal)
The process of constructing the minimum spanning tree by Kruskal algorithm is shown in the figure below. Initially, it is an edgeless non-connected graph T=V with only n vertices. Each vertex forms a connected component of its own. Then it is sorted according to the edge weights from small to large, and the smallest weight value that has not been selected is continuously selected. If the vertices attached to this edge fall on different connected components in T, add this edge to T; otherwise, discard this edge and select the next edge with the smallest weight. And so on, until all vertices in T are on a connected component. Insert image description here
Algorithm idea: We directly build with the edge target, because the weight is on the edge, it is also a natural idea to directly find the edge with the minimum weight to build the spanning tree, but we must consider whether a loop will be formed when building. At this time we use the edge set array structure in the storage structure of the graph.
7. Shortest path
The meaning of the shortest path is different in network graphs and non-network graphs. Since non-network graphs do not have edge weights, the so-called shortest path actually refers to the path with the least number of edges between two vertices, and For network graphs, the shortest path refers to the path with the smallest sum of weights of the edges passing between two vertices, and we call the first vertex on the path the origin and the last vertex the end point.
1. Dijkstra algorithm (Dijkstra)
Dijkstra algorithm is used to construct the shortest path from a single source point, that is, the distance from a certain point to any other point in the graph is the shortest. For example, when building a map application, find your own coordinate distance. The shortest distance to a landmark. Can be used for directed graphs, but negative weights cannot exist. Take the picture below as an example. The general point is that this algorithm does not find the shortest path from v0 to v8 at once, but finds the shortest path to the vertices between them step by step. The process is based on the shortest path that has been found. On the basis of the path, find the shortest path to the farther vertex and finally get the result.
The DIjkstra algorithm is also based on a greedy strategy. When expressed using an adjacency matrix or a weighted adjacency list, the time complexity is O(V2). People may only want to find the shortest path from the source point to a specific vertex, but this problem It is as complicated as solving the shortest path from the origin to all other vertices, and the time complexity is also O(V2). 2.
Floyd’s algorithm (Floyd)
The idea of Floyd’s algorithm is: first initialize the distance matrix, and then start from the Starting from a point, the matrix point value is gradually updated. d[i][j] represents the distance from point i to point j. When updating for the kth time, judge the size of d[i][k]+d[k][j] and d[i][j]. If the former is small, update this value, otherwise it will remain unchanged.
State transition equation: map[i,j]:=min{map[i,k]+map[k,j],map[i,j]};

Insert image description here