After working for 5 years, I don't even know the volatile keyword?

"After working for 5 years, I don't even know the volatile keyword!"

After listening to the architect who just finished the interview, several other colleagues also participated in this time.

It is said that domestic interviews are "interviewing aircraft carriers and screwing the screws at work." Sometimes you will be PASSed because of a problem.

How long have you been working? Do you know the volatile keyword?

Today, let us learn about the volatile keyword together and be a screw worker who can build an aircraft carrier in an interview!

Introduction to volatile+

volatile

The definition of volatile in the third edition of the Java Language Specification is as follows:

The java programming language allows threads to access shared variables. In order to ensure that shared variables can be updated accurately and consistently, threads should ensure that this variable is obtained separately through an exclusive lock.

The Java language provides volatile, which is more convenient than locks in some cases.

If a field is declared volatile, the Java thread memory model ensures that all threads see the value of this variable consistent.

Semantics

Once a shared variable (class member variable, class static member variable) is modified by volatile, then it has two layers of semantics:

  1. This ensures the visibility of different threads operating on this variable, that is, a thread modifies the value of a variable, and the new value is immediately visible to other threads.

  2. Reordering instructions is prohibited.
  • note

If the final variable is also declared as volatile, then this is a compile-time error.

ps: One means that changes are visible, and the other means that they never change. Natural fire and water are incompatible.

Problem introduction

  • Error.java
//线程1
boolean stop = false;
while(!stop){
    doSomething();
}

//线程2
stop = true;

This code is a typical piece of code, and many people may use this marking method when interrupting a thread.

problem analysis

But in fact, will this code run correctly? Will the thread be interrupted?

Not necessarily, maybe in most of the time, this code can interrupt the thread, but it may also cause the thread not to be interrupted (although this possibility is very small, once this happens, it will cause an infinite loop).

The following explains why this code may cause the thread to fail to be interrupted.

As explained earlier, each thread has its own working memory during operation, so when thread 1 is running, it will copy the value of the stop variable and place it in its own working memory.

Then when thread 2 changes the value of the stop variable, but has not had time to write it into the main memory, thread 2 shifts to do other things,

Then thread 1 does not know thread 2's changes to the stop variable, so it will continue to loop.

Use volatile

First: using the volatile keyword will force the modified value to be written to the main memory immediately;

Second: using the volatile keyword, when thread 2 is modified, the cache line of the cache variable stop in the working memory of thread 1 will be invalid (reflected to the hardware layer, it is the corresponding cache line in the CPU's L1 or L2 cache invalid);

Third: Because the cache line of the buffer variable stop in the working memory of thread 1 is invalid, thread 1 will go to the main memory to read the value of the variable stop again.

Then when thread 2 modifies the stop value (of course, this includes two operations, modifying the value in the working memory of thread 2, and then writing the modified value to the memory),
the cache line of the variable stop will be cached in the working memory of thread 1 Invalid, and then when thread 1 reads, it
finds that its cache line is invalid. It will wait for the main memory address corresponding to the cache line to be updated, and then go to the corresponding main memory to read the latest value.

Then thread 1 reads the latest correct value.

Does volatile guarantee atomicity?

From the above, we know that the volatile keyword guarantees the visibility of the operation, but can volatile guarantee that the operation of the variable is atomic?

Problem introduction

public class VolatileAtomicTest {

    public volatile int inc = 0;

    public void increase() {
        inc++;
    }

    public static void main(String[] args) {
        final VolatileAtomicTest test = new VolatileAtomicTest();
        for (int i = 0; i < 10; i++) {
            new Thread(() -> {
                for (int j = 0; j < 1000; j++) {
                    test.increase();
                }
            }).start();
        }

        //保证前面的线程都执行完
        while (Thread.activeCount() > 1) {
            Thread.yield();
        }
        System.out.println(test.inc);
    }
}
  • What is the result of the calculation?

You may think it is 10,000, but it is actually smaller than this number.

the reason

Maybe some friends will have questions. That's not right. The above is to auto-increment the variable inc. Since volatile guarantees visibility,
then after the auto-increment of inc in each thread, it can be seen in other threads. The modified value, so 10 threads have performed 1000 operations respectively, then the final inc value should be 1000*10=10000.

There is a misunderstanding here. The volatile keyword can ensure that visibility is not wrong, but the above program is wrong in that it does not guarantee atomicity.

Visibility can only ensure that the latest value is read each time, but volatile cannot guarantee the atomicity of operations on variables.

  • Solution

Use Lock synchronized or AtomicInteger

Can volatile guarantee orderliness?

The volatile keyword prohibits instruction reordering has two meanings:

  1. When the program executes the read operation or write operation of the volatile variable, all the changes in the previous operation must have been carried out, and the result has been visible to the following operation; the operation behind it must have not been carried out;

  2. When optimizing instructions, you cannot place the statements that access volatile variables behind them for execution, and you cannot place the statements following volatile variables before them for execution.

Instance

  • Example one
//x、y为非volatile变量
//flag为volatile变量

x = 2;        //语句1
y = 0;        //语句2
flag = true;  //语句3
x = 4;        //语句4
y = -1;       //语句5

Since the flag variable is a volatile variable, in the process of instruction reordering, statement 3 will not be placed before statement 1 and statement 2, and statement 3 will not be placed after statement 4 and statement 5.

But note that the order of statement 1 and statement 2, and the order of statement 4 and statement 5 are not guaranteed.

And the volatile keyword can guarantee that when statement 3 is executed, statement 1 and statement 2 must be executed, and the execution results of statement 1 and statement 2 are visible to statement 3, statement 4, and statement 5.

  • Example two
//线程1:
context = loadContext();   //语句1
inited = true;             //语句2

//线程2:
while(!inited ){
  sleep()
}
doSomethingwithconfig(context);

In the previous example, it was mentioned that statement 2 will be executed before statement 1, so that the context has not been initialized for a long time, and the uninitialized context is used in thread 2 to operate, causing program errors.

If the inited variable is modified with the volatile keyword, this kind of problem will not occur, because when statement 2 is executed, it must be guaranteed that the context has been initialized.

Common usage scenarios

The volatile keyword has better performance than synchronized in some cases,

But note that the volatile keyword cannot replace the synchronized keyword, because the volatile keyword cannot guarantee the atomicity of operations.

Generally speaking, the use of volatile must meet the following two conditions:

  1. Write operations to variables do not depend on the current value

  2. The variable is not included in an invariant with other variables

In fact, these conditions indicate that the effective values ​​that can be written to volatile variables are independent of the state of any program, including the current state of the variable.

In fact, my understanding is that the above two conditions need to ensure that the operation is atomic, in order to ensure that the program using the volatile keyword can be executed correctly when concurrently.

Common scenarios

  • Status flag
volatile boolean flag = false;

while(!flag){
    doSomething();
}

public void setFlag() {
    flag = true;
}
  • Singleton double check
public class Singleton{
    private volatile static Singleton instance = null;

    private Singleton() {

    }

    public static Singleton getInstance() {
        if(instance==null) {
            synchronized (Singleton.class) {
                if(instance==null)
                    instance = new Singleton();
            }
        }
        return instance;
    }
}

JSR-133 enhancements

In the old Java memory model before JSR-133, although reordering between volatile variables was not allowed, the old Java memory model allowed reordering between volatile variables and ordinary variables.

In the old memory model, the VolatileExample sample program may be reordered to execute in the following sequence:

class VolatileExample {
    int a = 0;
    volatile boolean flag = false;

    public void writer() {
        a = 1;                      //1
        flag = true;                //2
    }

    public void reader() {
        if (flag) {                //3
            int i =  a;            //4
        }
    }
}
  • timeline
时间线:----------------------------------------------------------------->
线程 A:(2)写 volatile 变量;                                  (1)修改共享变量 
线程 B:                    (3)读取 volatile 变量; (4)读共享变量

In the old memory model, when there is no data dependency between 1 and 2, it is possible to reorder between 1 and 2 (3 and 4 are similar).

The result is: when the reader thread B executes 4, it may not necessarily see the modification of the shared variable by the writer thread A when it executes 1.

Therefore, in the old memory model, volatile write-read does not have the memory semantics of monitor release-acquisition.

In order to provide a lighter-weight mechanism for communication between threads than monitor locks,

The JSR-133 expert group decided to enhance the memory semantics of volatile:

Strictly restrict the reordering of volatile variables and ordinary variables by the compiler and processor, and ensure that volatile write-read and monitor release-acquisition have the same memory semantics.

From the perspective of compiler reordering rules and processor memory barrier insertion strategy, as long as the reordering between volatile variables and ordinary variables may destroy the memory semantics of volatile,
this reordering will be reordered by the compiler and processor memory barriers. The insertion policy is prohibited.

volatile implementation principle

Definition of Terms

the term English vocabulary description
Shared variable Shared variables Variables that can be shared among multiple threads are called shared variables. Shared variables include all instance variables, static variables and array elements. They are all stored in the heap memory, volatile only acts on shared variables
Memory barrier Memory Barriers Is a set of processor instructions used to limit the order of memory operations
Buffer line Cache line The smallest storage unit that can be allocated in the cache. When the processor fills in the cache line, it loads the entire cache line, which requires multiple main memory read cycles
Atomic manipulation Atomic operations An uninterruptible operation or series of operations
Cache line fill cache line fill When the processor recognizes that the operand read from memory is cacheable, the processor reads the entire cache line to the appropriate cache (L1, L2, L3 or all)
Cache hit cache hit If the memory location for the cache line filling operation is still the address accessed by the processor next time, the processor reads the operand from the cache instead of from the memory
Write hit write hit When the processor writes the operand back to a memory cache area, it first checks whether the memory address of the cache is in the cache line. If there is a valid cache line, the processor writes the operand back to the cache Instead of writing back to memory, this operation is called a write hit
Missing write misses the cache A valid cache line is written to a memory area that does not exist

principle

So how does volatile guarantee visibility?

Under the x86 processor, use the tool to obtain the assembly instructions generated by the JIT compiler to see what the CPU will do when writing volatile.

  • java
instance = new Singleton();//instance是volatile变量

Corresponding assembly

0x01a3de1d: movb $0x0,0x1104800(%esi);
0x01a3de24: lock addl $0x0,(%esp);

When writing a shared variable modified with a volatile variable, there will be a second line of assembly code.
By checking the IA-32 architecture software developer's manual, it can be known that the lockprefixed instructions will cause two things under multi-core processors.

  • The data of the current processor cache line will be written back to the system memory.

  • This operation of writing back to the memory will invalidate the data cached at that memory address in other CPUs.

In order to improve the processing speed, the processor does not directly communicate with the memory, but first reads the data in the system memory to the internal cache (L1, L2 or other) before performing the operation, but after the operation, it is not known when it will be written to the memory ,

If a volatile variable is declared for a write operation, the JVM will send a Lock prefix instruction to the processor to write the data in the cache line where the variable is located back to the system memory.

But even if it is written back to memory, if the cached values ​​of other processors are still old, there will be problems when performing calculation operations.

Therefore, under multi-processors, in order to ensure that the caches of each processor are consistent, a cache coherency protocol will be implemented. Each processor checks the value of its own cache by sniffing the data transmitted on the bus to see if the value of its cache has expired.
When the processor finds that the memory address corresponding to its cache line has been modified, it will set the current processor’s cache line to an invalid state. When the processor wants to modify the data, it will force the data to be reloaded from the system memory. Read into the processor cache.

Visibility

These two things are explained in detail in the multiprocessor management chapter (Chapter 8) of the third volume of the IA-32 Software Developer Architecture Manual

The Lock prefix instruction causes the processor cache to be written back to memory

The Lock prefix instruction causes the LOCK# signal of the voice processor during the execution of the instruction.

In a multi-processor environment, the LOCK# signal ensures that the processor can exclusively use any shared memory during the assertion of the signal. (Because it locks the bus, other CPUs cannot access the bus. Failure to access the bus means that the system memory cannot be accessed.) However, in recent processors, the LOCK# signal generally does not lock the bus, but locks the cache. The bus overhead is relatively large.

There is a detailed description of the impact of the lock operation on the processor cache in chapter 8.1.4. For Intel486 and Pentium processors, the LOCK# signal is always declared on the bus during the lock operation.

But in P6 and recent processors, if the accessed memory area is cached inside the processor, the LOCK# signal will not be asserted.

On the contrary, it locks the cache of this memory area and writes it back to the memory, and uses a cache coherency mechanism to ensure the atomicity of the modification. This operation is called "cache locking". The
cache coherency mechanism prevents simultaneous modifications from being modified. Memory area data cached by two or more processors.

Writing back the cache of one processor to memory will invalidate the cache of other processors

IA-32 processors and Intel 64 processors use the MESI (modify, exclusive, share, invalidate) control protocol to maintain the consistency of the internal cache and other processor caches.

When operating in a multi-core processor system, IA-32 and Intel 64 processors can sniff other processors to access system memory and their internal caches.

They use sniffing technology to ensure that the data in its internal cache, system memory, and other processor caches remain consistent on the bus.

For example, in Pentium and P6 family processors, if one processor is sniffed to detect that another processor intends to write a memory address, and this address is currently dealing with the shared state,
then the processor that is sniffing will invalidate its cache line. When accessing the same memory address the same time, the cache line filling is forced.

Optimizing the use of volatile

The well-known Java concurrent programming master Doug Lea added a queue collection class to the concurrent package of JDK7. When using volatile variables LinkedTransferQueue,
he used a way of appending bytes to optimize the performance of queue dequeue and enqueue.

Can additional bytes optimize performance? This method looks amazing, but if you understand the processor architecture in depth, you can understand the mystery.

Let's take a look at LinkedTransferQueuethis class,
it uses an internal class types to define the head of the queue queue (Head) and the tail node (tail),
and this inner class PaddedAtomicReference relative to the parent class AtomicReference only do one thing, it will The shared variable is appended to 64 bytes .

We can calculate that the reference of an object occupies 4 bytes, and it adds 15 variables to occupy a total of 60 bytes, plus the Value variable of the parent class, which makes a total of 64 bytes.

  • LinkedTransferQueue.java
/** head of the queue */
private transient final PaddedAtomicReference < QNode > head;

/** tail of the queue */

private transient final PaddedAtomicReference < QNode > tail;

static final class PaddedAtomicReference < T > extends AtomicReference < T > {

    // enough padding for 64bytes with 4byte refs 
    Object p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pa, pb, pc, pd, pe;

    PaddedAtomicReference(T r) {

        super(r);

    }

}

public class AtomicReference < V > implements java.io.Serializable {

    private volatile V value;

    //省略其他代码 

Why appending 64 bytes can improve the efficiency of concurrent programming?

Because for Intel Core i7, Core, Atom and NetBurst, Core Solo and Pentium M processors, the cache line of L1, L2 or L3 cache is 64 bytes wide and does not support partially filled cache lines, which means that if the queue is If the head node and tail node are less than 64 bytes, the processor will read them all into the same cache line. Under multi-processors, each processor will cache the same head and tail nodes. When a processor tries to modify The head contact will lock the entire cache line, so under the effect of the cache coherency mechanism, other processors will not be able to access the tail node in their own cache, and the queue entry and dequeue operations need to constantly modify the head Joints and tail nodes, so in the case of multiple processors, it will seriously affect the efficiency of queue entry and dequeue.

Doug lea fills up the cache line of the high-speed buffer by appending to 64 bytes, avoiding the head and tail nodes from being loaded into the same cache line, so that the head and tail nodes will not lock each other when they are modified .

  • So should it be appended to 64 bytes when using Volatile variables?

no.

This method should not be used in both scenarios.

First: For processors with cache lines that are not 64 bytes wide, such as P6 series and Pentium processors, their L1 and L2 cache lines are 32 bytes wide.

Second: Shared variables will not be written frequently.

Because the method of using additional bytes requires the processor to read more bytes to the high-speed buffer, which itself will bring a certain performance consumption. If the shared variable is not frequently written, the chance of locking is also very small. There is no need to add bytes to avoid mutual locking.

ps: Suddenly I feel that I want to specialize in the field of art, knowledge and wisdom are indispensable.

double/long thread is not safe

One of the many rules defined by the Java Virtual Machine Specification: All operations on basic types, except for certain operations on long and double types, are atomic.

The current JVM (java virtual machine) uses 32 bits as atomic operations, not 64 bits.

When the thread reads the long/double type value in the main memory into the thread memory, it may be two write operations of the 32-bit value. Obviously, if several threads operate at the same time, then there may be two high and low 32-bits. A value error occurs.

To share long and double fields between threads, they must be operated in synchronized or declared as volatile.

summary

Volatile, as a very important keyword in JMM, is basically a knowledge point that must be asked for high concurrency interviews.

I hope this article will be helpful to your work study interview. If you have other ideas, you can also share with you in the comment section.

The geeks ’ likes, favorites, forwards are the biggest motivation for Ma’s writing!

For more exciting content, you can follow [Lao Ma Xiaoxifeng]

After working for 5 years, I don't even know the volatile keyword?

Guess you like

Origin blog.51cto.com/9250070/2542681