High-Performance Computer Architecture 10 | Predication, If-Conversion, Condition Move, and Full …

Series: High-Performance Computer Architecture

High-Performance Computer Architecture 10 | Predication, If-Conversion, Condition Move, and Full Predication

1. The Definition of Predication

We already know about the branch predictors, but we also know some of the branches are really difficult to predict even when we have a sophisticated branch predictor. So it is a good idea to avoid branches that are hard to predict by the compiler.

Predication is another way that we can try to deal with control hazards. Recall that we have used the branch predictions to guess the branch direction and the target, while predication is another method for dealing with the control dependencies. The problem with the branch predications is mainly that we have to pay a huge penalty if we mispredict the result for the modern processors.

Predication is about doing the work in both directions, which means that we have to do both the taken branch and not the taken branch. In this way, the waste is up to 50% of the work we have been doing because we simply throw away the work from the wrong path. Therefore, the predication actually gives a penalty of waste for all the branches.

2. Prediction Vs. Predication: Which one to choose?

Let’s see how the branch prediction and predication compare on different types of conditional branches that we may want to predict.

For a loop, we usually want to do branch prediction because the iterations are often quite predictable. The predication will waste too much for getting the true result.
For a function call or return, we should also use prediction because predication actually makes no sense. Because the calls or returns are always taken branches, so it makes no sense if we waste something on the not taken state.
For a large if-then-else, we are also going to choose the prediction. In most cases, the if-then-else shows a pattern that can have only a few mispredictions. So the penalty of waste is even larger than the penalty for misprediction when it is large.
For a small if-then-else, surprisingly, the predication will be better especially for a relatively low predictor accuracy. Let’s see an example to understand this. Suppose we have 5 instructions for taken and another 5 instructions for not taken branch. If we mispredict the branch, we will lose 50 instructions (HUGE penalty).

If predication: cost 10 instructions (5 are wasted)
If prediction: 
- if we have right prediction: cost 5 instructions with 0 penalty 
- if we have misprediction: cost 55 instructions with 50 penalty

So now this problem seems like gambling. If you choose to predicate, you will lose 10 anyway. If you choose to predict, you may lose 55 or nothing. What actually determines whether or not you will take this risk of losing 55? The answer is the accuracy of the predictor.

Let’s say if we have a perfect predictor that never mispredicts the result, then of course we will choose to predict. However, when we have a normal 2BC predictor with 88% accuracy, we will then expect to lose 12% * 50 = 6 as our penalty, while the predication must lose 5 , which is smaller than 6. Therefore, the predication now seems like a good idea.

3. If-Conversion Technique

The if-conversion is important for understanding how the predication works in the hardware. The if-conversion talks about how the compiler creates the code that will be executed along both paths. Let’s see an example. Suppose we have the following C code,

if (cond) {
    x = arr[i];
    y += 1;
} else {
    x = arr[j];
    y -= 1;
}

The if-conversion means that the compiler will compile the work on both paths, then,

x1 = arr[i];
x2 = arr[j];
y1 = y + 1;
y2 = y + 2;
x = cond?x1:x2;
y = cond?y1:y2;

But there is still a question on how do we compile the decision-making tool cond?A:B;. Someone may think that this is going to be like,

BEQ ..., Label
MOV x, x2
B Done
Label: 
    MOV x, x1
Done:

Well, this is WRONG. The reason is that we want to use the predication for not predicting anything. However, the conversion above still has one conditional branch that we must guess the direction and the target. Because we have two decision-making tools in our program so there will be 2 branches that we need to guess. Let’s say if we have a GShare predictor, now we are going to have 2 mispredictions. Meanwhile, we still do both paths so there is also a 50% waste. However, if you look back to the C code, you will find that there may be only 1 misprediction without wastes. Therefore, the cond?A:B; can not be converted in that way and we must not add branches to our conversion.

4. Conditional Move Technique

So the technique that we are going to use for a decision-making tool is the conditional move. For MIPS, the conditional move should be,

MOVZ Rd, Rs, Rt

which means,

if (Rt == 0) {
    Rd = Rs;
}

Or,

MOVN Rd, Rs, Rt

which means,

if (Rt != 0) {
    Rd = Rs;
}

For x86, we have a whole set of the conditional move (CMOV) instructions such as, CMOVZ (move when zero), CMOVNZ (move when not zero), CMOVGT (move when greater than), etc. The condition is determined by the flags or we called the conditional codes.

Let’s see the pseudocode for the decision-making tool x=cond?x1:x2; by MIPS,

R3 = cond;
MOV R1, x1
MOV R2, x2
MOVN x, R1, cond
MOVZ x, R2, cond

5. Performance of MOVZ and MOVN

Let’s say we have the following program with branches,

BEQZ R1, Else
ADDI R2, R2, 1
B End
Else:
    ADDI R3, R3, 1
End:

After if-conversion, this should be,

ADDI R4, R2, 1
ADDI R5, R3, 1
MOVN R2, R4, R1
MOVZ R3, R5, R1

Let’s see we have a predictor for the original program with 80% accuracy and there’s a 40-instruction penalty if misprediction. According to the code, if the branch is taken, we will have 2 instructions, however, if the branch is not taken, we will have 3 instructions. So on average, we will have 2.5 instructions. So the expected overall number of instructions for this program is,

2.5 + (1 - 80%) * 40 = 10.5

If we consider the program after the if-conversion, there will be 4 instructions without branches, so we don’t have to worry about the misprediction problem. Thus the overall number of instructions for this program is 4. And we can find out that the predications actually have better performance.

6. Requirements for Conditional Move Instruction

For utilizing the conditional move instruction,

we need the compiler’s support
we will remove the hard-to-predict branches
we need more registers than the original code because the results from both paths should be kept
we should execute more instructions because we take both paths and we have to add additional instructions to select the results

Note that we can use the technique to make all the instructions conditional in order to deal with the problems of more registers and additional instructions for result selection. This technique is called a full predication.

7. Condition Move Vs. Full Predication

Now let’s compare what we need for the condition move and what we need for the full predication.

For condition move,

We need a separate operation code (opcode) to tell us this is a conditional instruction. we usually have a separate opcode for each particular condition, so we need a number of opcodes.

For full predication,

We add conditional bits to every instruction so every instruction word contains some bits that tell us what is the condition.

8. Full Predication: An Example

Let’s say we have the following program with branches,

BEQZ R1, Else
ADDI R2, R2, 1
B End
Else:
    ADDI R3, R3, 1
End:

If we convert this program to full predication (i.e. the Itanium), we get

MP.EQZ P1, P2, R1    // if R1 == 0, P1 = True, P2 = False
(P2) ADDI R2, R2, 1  // if P2 == True, do the ADDI
(P1) ADDI R3, R3, 1  // if P1 == True, do the ADDI

You can easily find out that the code above is much simplified than the if-conversion code. We can also use the same registers and there is now no additional work for moving the results into the destination registers.