The third part of JAVA multithreading (three) atomic variables and non-blocking synchronization mechanism

Concurrent Notes Portal:
1.0 Concurrent Programming-Mind Map
2.0 Concurrent Programming-Thread Safety Fundamentals
3.0 Concurrent Programming-Basic Building Module
4.0 Concurrent Programming-Task Execution-Future
5.0 Concurrent Programming-Multi-threaded Performance and Scalability
6.0 Concurrent Programming- Explicit lock and synchronized
7.0 concurrent programming-AbstractQueuedSynchronizer
8.0 concurrent programming-atomic variables and non-blocking synchronization mechanism

Many classes of JAVA concurrent packages, such as Semaphoreand ConcurrentLinkedQueue, provide synchronizedhigher performance and scalability than mechanisms. The main source of this performance improvement is 原子变量and 非阻塞同步机制the application.

Disadvantages of locks

Scheduling overhead

When multiple threads compete for locks, the JVM needs to use the function of the operating system to suspend some threads and resume running later. When a thread resumes execution, it must wait for other threads to execute their time slice before it can be scheduled for execution. There is a lot of overhead in the process of suspending and resuming threads, and there is a long-term interruption.

`volatile`Limitations of

volatileVariables are a more lightweight synchronization mechanism, because operations such as context switching or thread scheduling will not occur when using these variables. However, volatilealthough visibility guarantees are provided, they cannot be used to construct atomic compliance operations.

For example: i++self-increasing problem. It looks like an atomic operation, but it actually contains three independent operations:

Get the current value of the variable
Increase the value by 1
Write new value

So far, the only way to achieve this atomic operation is to lock. The same can cause 调度开销problems.

Blocking problem

When a thread is waiting for the lock, it cannot do anything else. If a thread is delayed while holding a lock, all threads that need this lock will not be able to execute.

Priority Inversion (Priority Inversion)

During multi-thread competition, if the priority of the blocked thread is higher, and the priority of the thread holding the lock is lower, even if the thread with the higher priority can be executed first, it still needs to wait for the lock to be released.

Hardware support for concurrency

Early on concurrent multiprocessors provided some special instructions, such as: Test-and-Set, Fetch-and-Increment, Swap, etc. Now, almost any multiple processors are included in some form of atoms 读- 改- 写instructions, such as compare and swap (Compare-and-Swap), the associated load / store conditions (Loading-Linked / Store-Conditional ). The operating system and JVM use these instructions to implement locks and concurrent data structures.

Exclusive lock is a pessimistic technology, it assumes the worst case, need to ensure that other threads will not cause interference to execute correctly.

For fine-grained operations, optimistic locking is a more efficient method that can complete update operations without interference. This method requires a quick conflict checking mechanism to determine whether there is interference from other threads during the update process. If there is, this operation will fail.

CAS instruction

In most processor architectures will implement a compare and exchange (CAS) instruction

CAS contains three operands:

Need to read and write memory location V

The value to compare A

New value to be written B

The meaning of CAS is: I think the value of position V should be A, if it is, then update the value of V to B, otherwise it will not modify and tell me what the value of V is actually.

Java implementation version-informal version:

public class SimulatedCAS {
    
    
    private int value;
    public synchronized int get(){
    
    
        return value;
    }
    public synchronized int compareAndSwap(int expectValue,int newValue){
    
    
        int oldValue = value;
        if(oldValue == expectValue){
    
    
            value = newValue;
        }
        return oldValue;
    }

    public synchronized boolean compareAndSet(int expectValue,int newValue){
    
    
        return expectValue == compareAndSwap(expectValue, newValue);
    }
}

CAS is an optimistic technology, it hopes to successfully perform the update operation, and if another thread modifies the variable, CAS can detect this error.

A very useful rule of thumb is: on most processors, the cost of the "fast code path" for lock acquisition and release without contention is about twice the cost of CAS

JAVA lock and CAS

Although the locking syntax of the Java language is relatively concise, the JVM and the tasks that need to be completed when managing locks are not simple. A very complex code path in the JVM needs to be traversed when locking is implemented, and may cause operating system-level locking, thread suspension, context switching, etc. The main disadvantage of CAS is that the caller needs to actively deal with contention issues (retry, rollback, abandon), while the lock can automatically deal with the problem (blocking).

It is wise to do nothing when CAS fails. When CAS fails, it means that other threads may have completed the operation you want to perform.

Java support for CAS

After JAVA 5.0, the atomic variable class was introduced to provide an efficient CAS operation for numeric types and reference types. Under the java.util.concurrent.atomicpackage (for example: AtomicIntegerand AtomicReferenceetc.)

Atomic variable class

Atomic variables are more granular and lighter than locks, and it is critical to achieve high-performance concurrent code on multiple processors.
Atomic variables can be used as a kind of "better volatiletype variable." It provides the volatilesame memory semantics as type variables, in addition to supporting atomic update operations.

JAVA 5 adds 12 atomic variable classes, divided into four 标量类groups: 更新器类, 数组类, 复合变量类, .

`标量类`	`更新器类`	`数组类`	`复合变量类`
`AtomicBoolean`	`AtomicIntegerFieldUpdater`	`AtomicIntegerArray`	`AtomicStampedReference`
`AtomicLong`	`AtomicLongFieldUpdater`	`AtomicLongArray`	`AtomicMarkableReference`
`AtomicReference`	`AtomicReferenceFieldUpdater`	`AtomicReferenceArray`
`AtomicInteger`

If the amount of thread-local calculations is small, the competition for locks and atomic variables will be very fierce.
If the amount of thread-local calculations is large, the competition on locks and atomic variables will decrease.

Under low-to-medium-level competition, atomic variables can provide higher scalability, and under high-intensity competition, locks can effectively avoid competition.

If the use of shared state can be avoided, the overhead will be smaller. We can improve scalability by improving the efficiency of handling competition, but only by completely eliminating competition can we achieve true scalability. (It's really abstract, but from the example code, we can understand the following ThreadLocalcategories)

Non-blocking algorithm

In a certain algorithm, the failure or suspension of one thread will not cause the failure or suspension of other threads, then this algorithm is called a non-blocking algorithm.

Non-blocking algorithms can be used in many common data structures, including stacks, queues, priority queues, hash tables, etc.

Security counter-non-blocking version:

public class CasCounter {
    
    
    /**
     * 原子操作，线程安全。这是个假的 CAS 类，纯粹演示用哈
     */
    private SimulatedCAS simulatedCAS;
    /**
     * 非线程安全变量
     */
    private int temp;
    public CasCounter() {
    
    
        this.simulatedCAS = new SimulatedCAS();
    }
    public int get() {
    
    
        return simulatedCAS.get();
    }
    public int increment() {
    
    
        int value;
        do {
    
    
            value = simulatedCAS.get();
        } while (value != simulatedCAS.compareAndSwap(value, value + 1));
        return value + 1;
    }
    public void tempIncrement() {
    
    
        temp++;
    }
    public static void main(String[] args) throws InterruptedException {
    
    
        CasCounter casCounter = new CasCounter();
        CountDownLatch count = new CountDownLatch(50);

        for (int i = 0; i < 50; i++) {
    
    
            new Thread(new Runnable() {
    
    
                @Override
                public void run() {
    
    
                    for (int j = 0; j < 30; j++) {
    
    
                        try {
    
    
                            Thread.sleep(100);
                        } catch (InterruptedException e) {
    
    
                            e.printStackTrace();
                        }
                        casCounter.increment();

                        try {
    
    
                            Thread.sleep(100);
                        } catch (InterruptedException e) {
    
    
                            e.printStackTrace();
                        }
                        casCounter.tempIncrement();
                    }
                    count.countDown();
                }
            }).start();
        }
        count.await();
        System.out.println("Thread safe final cas Counter : " + casCounter.get());
        System.out.println("Thread unsafe final temp value : " + casCounter.temp);
    }
}

Non-blocking stack

The key to creating a non-blocking algorithm is to figure out how to reduce the scope of atomic modification to a single variable while maintaining data consistency.

The stack is the simplest chained data structure: each element points to only one element, and each element is referenced by only one element.

/**
 * 通过 AtomicReference 实现线程安全的入栈和出栈操作
 *
 * @param <E> 栈元素类型
 */
public class ConcurrentStack<E> {
    
    
    private final AtomicReference<Node<E>> top = new AtomicReference<>();

    /**
     * 将元素放入栈顶
     *
     * @param item 待放入的元素
     */
    public void push(E item) {
    
    
        Node<E> newHead = new Node<>(item);
        Node<E> oldHead = null;
        do {
    
    
            oldHead = top.get();
            newHead.next = oldHead;
        } while (!top.compareAndSet(oldHead, newHead));
    }

    /**
     * 弹出栈顶部元素
     *
     * @return 栈顶部元素，可能为 null
     */
    public E pop() {
    
    
        Node<E> oldHead;
        Node<E> newHead;
        do {
    
    
            oldHead = top.get();
            if (oldHead == null) {
    
    
                return null;
            }
            newHead = oldHead.next;
        } while (!top.compareAndSet(oldHead, newHead));
        return oldHead.item;
    }

    /**
     * 单向链表
     *
     * @param <E> 数据类型
     */
    private static class Node<E> {
    
    
        public final E item;
        public Node<E> next;

        public Node(E item) {
    
    
            this.item = item;
        }
    }
}

Non-blocking linked list

The linked list queue is more complicated than the stack because it requires separate head and tail pointers. When a new element is successfully inserted, both pointers need to be updated by atomic operations.

We need to understand the following two skills:

Technique 1

In an update operation involving multiple steps, ensure that the data structure is in a consistent state. In this way, when the B thread arrives, if it is found that A is performing an update, then the B thread can know that an operation has been partially completed and cannot immediately start its own update operation. Then B can wait (by checking the queue flag repeatedly) until A is updated, so that the two threads will not interfere with each other

Technique 2

If B finds that A is modifying the data structure when B arrives, there should be enough information in the data structure so that B can complete A's update operation. If B "helps" A completes the update operation, then B can perform his own operation without waiting for A's operation to complete. When A tries to complete other operations after recovery, you will find that B has completed it for it.

for example:

public class LinkedQueue<E> {
    
    
    /**
     * 链表结构
     * next 使用 AtomicReference 来管理，用来保证原子性和线程安全
     *
     * @param <E> 数据类型
     */
    private static class Node<E> {
    
    
        final E item;
        /**
         * 通过 AtomicReference 实现指针的原子操作
         */
        final AtomicReference<Node<E>> next;

        /**
         *  Node 构造方法
         * @param item 数据元素
         * @param next 下一个节点
         */
        public Node(E item, Node<E> next) {
    
    
            this.item = item;
            this.next = new AtomicReference<>(next);
        }
    }

    /**
     * 哨兵，队列为空时，头指针（head）和尾指针（tail）都指向此处
     */
    private final Node<E> GUARD = new Node<>(null, null);
    /**
     * 头节点，初始时指向 GUARD
     */
    private final AtomicReference<Node<E>> head = new AtomicReference<>(GUARD);
    /**
     * 尾节点，初始时指向 GUARD
     */
    private final AtomicReference<Node<E>> tail = new AtomicReference<>(GUARD);

    /**
     * 将数据元素放入链表尾部
     *
     * 在插入新元素之前，将首先检查tail 指针是否处于队列中间状态，
     * 如果是，那么说明有另一个线程正在插入元素。
     *      此时线程不会等待其他线程执行完成，而是帮助他完成操作，将 tail 指针指向下一个节点。
     *      然后重复进行检查确认，直到 tail 完全处于队列尾部才开始执行自己的插入操作。
     * 如果两个线程同时插入元素，curTail.next.compareAndSet 会失败，这种情况下不会对当前数据结构造成破坏。当前线程只需重新读取tail 并再次重试。
     * 如果curTail.next.compareAndSet执行成功，那么插入操作已生效。
     * 此时 tail.compareAndSet(curTail, newNode) 会进行尾部指针的移动：
     *      如果移动失败，那么当前线程将直接返回，不需要进行重试
     *      因为另一个线程在检查 tail 时候会帮助更新。
     *
     * @param item 数据元素
     * @return true 成功
     */
    public boolean put(E item) {
    
    
        Node<E> newNode = new Node<>(item, null);
        while (true) {
    
    
            Node<E> curTail = tail.get();
            Node<E> tailNext = curTail.next.get();
            //判断下尾部节点是否出现变动
            if (curTail == tail.get()) {
    
    
                //tailNext节点为空的话，说明当前 tail 节点是有效的
                if (tailNext == null) {
    
    
                    //将新节点设置成 当前尾节点 的 next节点，此处为原子操作，失败则 while 循环重试
                    //技巧1 实现点
                    if (curTail.next.compareAndSet(null, newNode)) {
    
    
                        //将 tail 节点的指针指向 新节点
                        //此处不用担心 tail.compareAndSet 会更新失败
                        //因为当更新失败的情况下，肯定存在其他线程在操作
                        //另一个线程会进入 tailNext!=null 的情况，重新更新指针
                        tail.compareAndSet(curTail, newNode);
                        return true;
                    }
                } else {
    
    
                    //当前尾节点 的 next 不为空的话，说明链表已经被其他线程操作过了
                    //直接将 tail 的 next 指针指向下个节点
                    //技巧2 实现点
                    tail.compareAndSet(curTail, tailNext);
                }
            }
        }
    }
}

Judging from the latest code, many tool classes in the concurrent package have been modified and optimized. For example , the internal implementation has been changed to the realization mode of most concurrent classes:

        private static final sun.misc.Unsafe UNSAFE;
        private static final long itemOffset;
        private static final long nextOffset;

        static {
    
    
            try {
    
    
                UNSAFE = sun.misc.Unsafe.getUnsafe();
                Class<?> k = Node.class;
                itemOffset = UNSAFE.objectFieldOffset
                    (k.getDeclaredField("item"));
                nextOffset = UNSAFE.objectFieldOffset
                    (k.getDeclaredField("next"));
            } catch (Exception e) {
    
    
                throw new Error(e);
            }
        }
    }

ABA problem

CAS operation for the ABA problem is really a headache, Java provided AtomicStampedReferenceto avoid the ABA problem by adding references to the version number. Similarly, AtomicMarkableReferenceuse the boolean type to mark whether the node is deleted to solve the ABA problem.

to sum up

Non-blocking algorithms maintain thread safety through underlying concurrency primitives (such as CAS instead of locks). These underlying primitives are exposed to the outside world through the atomic variable class.
Non-blocking algorithms are very difficult to design and implement, but they usually provide higher scalability. In the JVM upgrade process, the main improvement in concurrency performance comes from the use of non-blocking algorithms (in the JVM already in the platform library).