1. Pipeline Hazards
In a pipelined processor, there are situations where an instruction in the current stage of the pipeline may prevent the execution of the next successive instruction in the pipeline during the expected clock cycle. Such a situation is called a pipeline hazard ( ) Pipeline Hazards
. When a pipelining hazard occurs, the ideal speedup brought by pipelining will be reduced. There are three types of adventures as follows.
[注]:流水化加速比 = 非流水化指令平均执行时间 / 流水化指令平均执行时间
(1) Structural hazard ( Structural Hazards
) , when the processor works in a pipelined manner, instructions will overlap and execute. If the hardware cannot support all combinations of instructions at the same time, resource conflicts will occur, resulting in structural hazards. That is, the hardware required by the current instruction is working for the previous instruction;
(2) data hazard ( Data Hazards
) , when the processor works in a pipelined manner, the instructions will overlap and execute, if there is a dependency between the previous and subsequent instructions, it may lead to data hazards; ( For example, the instruction currently entering the pipeline needs to use the result after the execution of the previous instruction for calculation, and data hazards occur at this time) (
3) Control hazards ( Control Hazards
) , branch instructions and other instructions that change the program counter may be pipelined Leading to control hazards, that is, the next step needs to be determined based on the execution results of previous instructions.
This article first takes a closer look at control adventures, and other adventure types will be studied in subsequent articles. Before that, let's review some of the things you are familiar with.
2. Branch and jump instructions (Branches, Jumps)
You should have noticed that branch and jump instructions are a bit different from other regular or arithmetic logic instructions. Execution of both types of instructions affects PC-Program Counter
the value of the program counter ( ).
The branch instruction changes PC
the PC
target value based on the current value plus the offset—relative addressing instruction.
Jump instructions (unconditional jumps) PC
change registersPC
without caring about the current value - "absolute addressing instructions". PC
It should be noted that double quotation marks are added here. The jump instruction is intended to jump to an absolute address, but in fact, when the instruction length is fixed and the same as the address length, no instruction can accommodate the opcode and jump at the same time. Therefore, the jump instruction is also realized by adding an offset to the current PC
value .
- Unconditional jump (Jumps)
instructions need to care about: opcode (Opcode
), offset (Offset
), program counter (PC
)
For example, the address to MIPS
which J
the instruction jumps to is not directly specifying 32
the address of the bit (all MIPS
instructions are 32
bit long, it is impossible to use all of them to address the data field, such instructions are invalid, maybe only nop
): due to the destination address The highest 4
bit cannot be given in the code of the instruction, and 32
the highest 4
bit takes PC
the highest 4
bit the current value. For general programs, the jump28
space supported by the bit address is large enough.256MB
Another example is that the instruction in RISC-V
the architecture ( ) uses the immediate number encoding method of the format. After the offset of the integer multiple of is added to the value as the jump target address, so the instruction can be controlled to jump to the range before and after . The instruction stores ( ) in . The standard software system call convention uses registers as return address registers.RV32
JAR
J
2
PC
1 MiB
JAL
PC + 4
rd
x1
JAL
When is of rd = x0
, it is a simple Jump
instruction (pseudo-instruction in assembler J
).
OK, when the computer fetches instructions ( IF
) to an JUMP
instruction , in ID
the stage the decoder Opcode
determines , and then needs to determine the offset given in the instruction Offset
and the current program counter PC
value , This offset then needs to be added to the program counter, which needs to be done in ALU
or in a special adder, which eventually changes PC
the value.
- Register jump (Jump Register)
instructions need to care about: opcode (Opcode
), register value
When the computer fetches an instruction ( IF
) to a register jump instruction, the decoder at ID
the stage Opcode
determines that the current instruction is a register jump instruction through the opcode of the instruction, but at this time it does not know the address to jump to but only knows to save the jump The jump address register, and then get the jump address from the register. There is no offset here, but a direct jump to the address held by the jump register.
- Conditional Branches (Conditional Branches)
instructions need to care about: opcode (Opcode
), program counter (PC
), register value (for judging conditions), offset (Offset
)
This becomes a bit complicated. When the computer fetches ( IF
) a conditional jump instruction, the decoder in ID
the stage Opcode
determines that the current instruction is a conditional jump instruction through the opcode of the instruction, and then needs to obtain PC
the value and view the corresponding register, get the conditional result through the value of the register (for example, 0
compare to see if it is greater or less than), and then use the offset given by the instruction plus PC
the value of the program counter to get the jump address to complete a branch PC
related to .
3. Control Hazards
3.1. Pipeline control hazards caused by jump instructions
First, let's understand the basic control adventure. The most basic control adventure is how to ensure the correct execution of the next instruction. Look at the following pipeline diagram.
In the above pipeline, there are two instructions I1
and I2
, the instruction 1
is to r0
take the value from the register and add the immediate value and 10
then save the result to r1
the register, the instruction 2
is to r2
take the value from the register and add the immediate value 17
and then save the result to r3
the register go. These are two very simple instructions, there is no data dependency between these two instructions, so no data hazard will occur in this case. The focus here is on controlling the adventure.
Instructions 1
will executed in the normal five-stage pipeline execution sequence, fetching IF
, decoding ID
, executing EX
, storing MEM
, and writing back WB
. You may find that the execution of the second instruction here seems to be a bit different. When the second instruction enters the pipeline, the first instruction has just completed the instruction fetching stage. At this time, a problem will arise: "What is the second instruction Is it the instruction we need to execute?" The reason for this problem is that the first instruction has not been decoded, so it is not clear whether the first instruction is a branch or jump instruction. Therefore, the instructions that flow into the pipeline in sequence at this time 2
are not necessarily the instructions that the program needs to execute. Until the first instruction is decoded by the decoder in ID
the stage , at which point we learn, "Oh ~ the last instruction was not a branch or jump instruction, or was indeed a branch or jump instruction".
So what if the instruction 1
is a branch or jump instruction? Then in the decoding stage, it will be determined that the instruction 1
is such an instruction, so the clock cycle 1
of ID
the stage the instruction will change the program counter PC
and change the instruction address to be read in the instruction register. This will cause control hazards. In order to avoid risks, we ID
will insert a bubble ( Bubble
) at this stage to delay the value fetching stage of the next instruction by one cycle. If this continues, it will become as follows.
[注]:为避免这类冒险,常常会使流水线插入一个空操作 nop。这样的空操作通常被称为流水线气泡或直接称为气泡 (Pipeline Bubble/Bubble)。
Every instruction that flows into the pipeline will have to consider control hazards, so to avoid hazards, you need to insert one for each ID
stage Bubble
, then the value of each instruction needs two clock cycles, you will realize that this will be very inefficient assembly line. Now let's analyze such a pipeline in detail, let's take another look at the pipeline drawing method (coordinate change).
In this way, it will be clearly seen that the order of execution of this pipeline is I1
, nop
, I2
, nop
, I3
, nop
, I4
... According to the calculation of this pipeline CPI = 2 (1 + 1)
, ideally CPI = 1
, the performance of the machine executed according to this design is strictly halved.
[注]:
非流水化 CPI = 指令执行周期 / 执行指令个数;
流水化 CPI = 理想 CPI + 每条指令的流水线停顿时钟周期;(理想 CPI = 1)
3.2. Solve the control hazard caused by the jump instruction (basic method)
Now that it is clear that there will be jump instructions in the pipeline, which will bring control hazards to the pipeline, so how to solve this problem? In fact, the method is simple, that is to guess ( Speculate
) that the next instruction is not a jump instruction, so directly add the value PC
of 4
(if the instruction length is 4
bytes ).
The current pipeline processing method is the part circled by the purple circle in the figure. Guess that the startle instruction is not a jump instruction, but directly use an PC
adder 4
make it point to the next continuous instruction. According to the normal order, it should be executed here. 96
Instructions for address 100
address, 104
address.
But in fact, looking at this instruction code, 100
the instruction at the address is a jump instruction. When the instruction at the 100
address is fetched, ID
it is found in the decoding stage that the instruction at 100
the address is a jump instruction, but at this time 104
the instruction at the address has been fetched. (Because the last guess is that the instruction is not a jump instruction, the value will be taken sequentially). 100
The address instruction tells us that we should 304
fetch the address and execute it. Then we need to do two things at this time. First, prevent the 104
address continuing to execute in the pipeline (that is, kill the current pipeline); Second, change PC
the value to the address to jump to.
In order to solve the first problem above, we add a selector to the pipeline IRSrc
. When the previous instruction is interpreted as a jump instruction in the decoding stage, the selector will switch to an empty instruction nop
. And at the end of the cycle, an additional adder is used to add a part of the PC
instruction to obtain a new jump address PC
, so as to complete the second point.
The process of executing the program in the above pipeline circuit is described in the form of a timeline table. As shown in the above figure, the second instruction is decoded as a jump instruction at ID
the stage Although I3
the instruction fetch has been completed, the selector switches to nop
, then I3
the instruction will flow into the pipeline and will not execute the actual action. At the same time , the calculation of the value will be completed at the end I2
of the instruction ID
phase clock cycle. When this clock cycle is reached, the instruction will be fetched again, and the instruction at the address to complete the instruction. jump.PC
t3
IF
304
3.3. Pipeline control risks brought by conditional branches
I1 096 ADD
I2 100 BEQZ r1 +200
I3 104 ADD
108 ...
I4 304 ADD
Here is a piece of instruction code, 100
the address instruction will judge whether the value of r1
the register is equal to 0
or not, and if so, jump to the address offset 200
by . In fact, you find that a branch of the pipeline has been generated here, so the problem is that
- 1. How to know whether to adopt this branch;
- 2. And what to do after adopting this branch.
3.4. Solve the pipeline control hazard caused by conditional branch instructions (basic method)
Let’s look at the first question first, how to know whether to use this branch, and whether it can be completed directly in the decoding stage like a jump instruction (that is, judged according to the type of the decoded instruction). This method seems to be used in conditional branch instructions. Unreasonable, because at this time, it is only known that it is a conditional branch instruction, but it is not clear whether the condition is true. Therefore, it is necessary to use a hardware logic unit capable of comparison. Such subtraction or comparison operations are very suitable for ALU
completion , and then lead to a zero line ( wire
), as shown in the figure below.
According to such an approach, it will be determined whether to select a branch (according to zero wire
judgment . PC
The prediction scheme is still the way of guessing without jumping, then IF
the stage will fetch the address once in the stage ofI2
, and when calculating the clock cycle of the branch, it will fetch the address again , so when we can determine whether to choose the branch, we have already Extracting the next instruction.ID
PC + 4 = 104
I2
IF
PC + 4 = 108
You should have found that when we were able to determine whether to choose a branch, two instructions had already been fetched. Before that, it was not clear whether to kill
drop insert nop
) these two instructions into the pipeline until the signal zero wire
of .
Then there will be another problem here, think about it, we can use stall
the signal to stop the register movement (change), and then use the selector to redirect the pipeline inflow instructions, and eliminate the previous pipeline business through these two methods. So how should the priority of these two actions be chosen? Is it random or must there be a sequence?
Ok, now assuming that stall
the signal priority is higher, the red stall
signal line in the above picture will prevent the register from changing,
small note: (the article is not finished, it is being improved in the near future...)