cuda编程之 __syncthreads()

__syncthreads( ) 对一个thread block中的线程进行同步。

B.6. Synchronization Functions

void __syncthreads();

waits until all threads in the thread block have reached this point and all global andshared memory accesses made by these threads prior to __syncthreads() are visibleto all threads in the block.

__syncthreads() is used to coordinate communication between the threads of thesame block. When some threads within a block access the same addresses in shared
or global memory, there are potential read-after-write, write-after-read, or write-after-write hazards for some of these memory accesses. These data hazards can be avoided bysynchronizing threads in-between these accesses.

__syncthreads() is allowed in conditional code but only if the conditional evaluatesidentically across the entire thread block, otherwise the code execution is likely to hangor produce unintended side effects.

Devices of compute capability 2.x and higher support three variations of__syncthreads() described below.
int __syncthreads_count(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicatefor all threads of the block and returns the number of threads for which predicateevaluates to non-zero.
int __syncthreads_and(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicatefor all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for all of them.
int __syncthreads_or(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicatefor all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for any of them.

 void __syncwarp(unsigned mask=0xffffffff);

will cause the executing thread to wait until all warp lanes named in mask haveexecuted a __syncwarp() (with the same mask) before resuming execution. All non-exited threads named in mask must execute a corresponding __syncwarp() with thesame mask, or the result is undefined.

Executing __syncwarp() guarantees memory ordering among threads participating inthe barrier. Thus, threads within a warp that wish to communicate via memory can storeto memory, execute __syncwarp(), and then safely read values stored by other threadsin the warp. 


猜你喜欢

转载自blog.csdn.net/u010454261/article/details/78314587
今日推荐