About CPU's multi-core and hyper-threading technology

1. About CPU's multi-core and hyper-threading technology

CPUThe physical number is determined by the number of slots on the motherboard, each core CPUcan multiple cores, and each core may have multiple threads.

CPUEach core of a multi-core (each core is a chiplet) OSappears to be an independent CPU.

CPUFor hyperthreading , each core CPUcan have multiple threads (the number is two, such as 1core dual thread, 2core 4thread, 4core 8thread), and each thread is a virtual logic CPU(for example, Windowsthe following is the name of the logical processor called), and each thread is also OSseen as independent CPU. .

This is an act of deceiving the operating system. Physically, there are still only 1cores , but from CPUthe perspective of hyperthreading, it thinks that its hyperthreading will speed up the running of the program.

Please add a picture description

CPUThe multithreading of the program is different from the multithreading of the program. The CPUmultithreading of the hardware is the multithreading of the hardware, and the multithreading of the program is the software multithreading. The software multithreading defines multiple concurrently executed task branches. Acts as a CPUcore to schedule CPUresources for executing tasks.

To take advantage of hyper-threading, the operating system needs to be specially optimized for hyper-threading.

Hyper- CPUthreading CPUis stronger than non-multi-threaded cores in terms of capacity, but each thread is not enough to compare with independent CPUcore capacity.

CPUMultiple threads on each core share the core's CPUresources. For example, assuming that each CPUcore has only one "engine" resource, then after 1the virtual thread CPUuses this "engine", the thread 2cannot use it and can only wait.
Therefore, the main purpose of hyper-threading technology is to increase more independent instructions on the pipeline (see the previous explanation of the pipeline), so that threads 1and threads 2on the pipeline will not compete for the core CPUresources . So, hyperthreading technology takes superscalaradvantage of the (superscalar) architecture.
Multithreading means that each core can have the state of multiple threads. For example, the thread of a certain core 1is idle , and the thread 2 is running.

Hyper-threading does not provide parallel processing in the full sense, and each core can still only run one process CPUat a certain time, because threads 1and 2threads share CPUresources of a certain core. It can be simply considered that each CPUcore has a unique resource in terms of its ability to independently execute processes. If a thread 1acquires this resource, the 2thread cannot obtain it. (I will explain in depth how hyperthreading achieves parallel operation later)
However, threads 1and threads 2can be executed in parallel in many ways. For example, instructions can be fetched in parallel, decoded in parallel, and executed in parallel. So although a single core can only execute one process at a time, threads 1and threads 2can help each other to speed up the execution of processes.
Moreover, if the thread obtains the ability of the core to execute the process 1at a certain moment, assuming that the process sends IOa request at this moment, the ability of the thread 1to execute the process can be 2obtained by the thread, that is, it can be switched to the thread 2. This is a switch between execution threads and is very lightweight.(WIKI: if resources for one process are not available, then another process can continue if its resources are available)

There may be a phenomenon in multi-threading: if there are two 2core 4threads CPUto be scheduled, then only two threads will be running. If these two threads are on the same core, the other core will completely idle and be in Wasted state. A more desirable result is to have one on each core that schedules the two processes CPUseparately .

2. The relationship and difference between CPU multithreading and program multithreading

The multithreading of the program is software multithreading; multiple soft threads provide the possibility of concurrent execution of multiple tasks.

CPUThe number of threads in it is hyperthreading, which is hardware multithreading. Each hardware multithreading (hyperthreading) can be seen as logic cpu, but not in the full sense CPU.

CPUHyper-threads in the core share some functional units in the core. For example, when hyper-threads 1perform addition operations, hyper-threads 2can perform multiplication operations, but hyper-threads 2cannot perform addition operations at the same time. This is just an example to illustrate, which is not reasonable, but it is also It can be seen from this that the logic of CPUhyperthreading cannot be CPUregarded as a complete core of physics. In addition, since each hyperthread is CPUa resource and can run tasks independently, each hyperthread has its own execution state, such as having its own registers and its own PC.

On the other hand, because hyperthreads are all in the core, they share the core L1and L2cache, so each core L1and L2need to have a dedicated cache controller and a dedicated cache strategy.

Let's go back to the scheduling problem of multi-process and software multi-threading.

Whether it is multi-process or software multi-threading, they mean multiple task branches that can be executed concurrently. If it is CPUunder tasks can be executed in parallel, so multiple Processes and software multi-threads can be scheduled and assigned to hyper-threads for execution, because hyper-threads are CPUresources .

3. Multiple CPUs

For the architectural organization of multiple CPUs, there are:

  • AMP (Asymmetric multiprocessing): Asymmetric multiprocessor structure
  • SMP (Symmetric multiprocessing): Symmetrical multiprocessor structure
  • UMA (Uniform memory access): consistent storage access structure
  • NUMA (Non-uniform memory access): non-uniform memory access structure
  • MPP (massively parallel processing): Large-scale (massive) parallel processing structure
    is usually used to illustrate SMPand NUMAis MPPa massively parallel processing structure.

4. SMP

The symmetric multiprocessing structure considers that CPUall roles are equal, CPUand all share resources such as memory and bus. In fact, the CPUinternal is also SMPa structure, and all cores share memory and bus resources.

insert image description hereFor SMPthe structure , since each CPUcore needs to operate shared storage: memory, it is necessary to ensure the consistency of memory data (that is, cache consistency). For CPU1example , Core1to modify data in A, if bus snoopingthe cache , it is necessary to send a broadcast on the bus to notify CPUall nodes Coreto Ainvalidate their caches for data . With a small CPUnumber of , this isn't a huge problem, but as CPUthe number increases, the bus traffic due to cache coherence and shared objects explodes.

Therefore, SMPthe structure is not conducive to more expansion CPU. For example 2-4, the architectureCPU can be considered if 1 or more, but it is not suitable for use if more than 1 is used .SMP4CPUSMP

5. NUMA

NUMA(Non-uniform storage access structure) structure enables each bank CPUto have its own memory resource, CPUand each bank can CPUalso access other CPUbank's memory resources through the interconnection module between each bank.

Please add a picture description
Because CPUall have their own local memory, they can manage their own memory to ensure their own cache consistency. However, because the memory of CPUeach is separated, the memory accessedCPU1 through the interconnection module is very slow (because it passes through the intermediate data transmission medium and the distance is farther), so when using the structure, the program should try to avoid interactive parallelism between them .CPU2NUMACPU

In addition, CPUthe larger the number, the farther the distance CPUto access memory may be, and the worse the speed will be, so NUMAthe performance of the structure cannot increase linearly with the increase of CPUthe number .

The figure below is 4a CPUcomposition NUMAstructure , a total of 32cores, a total of allocated 32Gmemory , and each CPUallocated 8Gmemory as its own local memory.

Please add a picture description

#Original address

Original address: https://www.junmajinlong.com/os/multi_cpu/

Guess you like

Origin blog.csdn.net/qq_36393978/article/details/126744103