High-Performance Computer Architecture 17 | Experiment for Branch Prediction

Series: High-Performance Computer Architecture

High-Performance Computer Architecture 17 | Experiment for Branch Prediction

NOTE: The present article does NOT include anything related to the final submission (no answers, no specific values, no results) for the course CS6290 HPCA because of the honor code. Most of the contents in this article is repeating the basic instructions and the directions of the Project 1. More Linux commands are provided as complements to the project’s guidelines.

Please feel free to contact me if this article violates the rules of Georgia Tech and I will immediately delete this article with no doubt.

Review

(1) Recall Simulation

It has been a long time since we have done the experiment last time, so we may assume that you have forgotten some of the essential things that should be carried from the last experiment. Let’s see a refresher now.

sesc.opt is an executable file for us to generate a simulation report. This file can be found from,

~/sesc/sesc.opt

cmp4-noc.conf is a readable file for the simulation configurations. This file can be found from,

~/sesc/confs/cmp4-noc.conf

report.pl is an executable for us to view the report. This file can be found from,

~/sesc/scripts/report.pl

In order to simulate a script (i.e. xxx.mipseb), we can use,

$ ~/sesc/sesc.opt -fxxx0.rpt -c ~/sesc/confs/cmp4-noc.conf -oxxx.out -exxx.err xxx.mipseb

where,

The -f option is used to specify the suffix for the file name
The -c option is used to specify the configuration file
The -o option is used to print the standard output to a file
The -e option is used to print the standard error to a file

When we want to read the simulation report, we can use,

$ ~/sesc/scripts/report.pl xxx.mipseb.xxx0.rpt

(2) Recall Compilation

Suppose we are given a C script foo.c , how can we compile this script into a MIPS-executable file? You should consider the following files,

mips-unknown-linux-gnu-gcc can be used to compile the foo.c script file. This file is located at,

/mipsroot/cross-tools/bin/mips-unknown-linux-gnu-gcc

And it can be used by,

$ /mipsroot/cross-tools/bin/mips-unknown-linux-gnu-gcc -O0 -g -static -mabi=32 -fno-delayed-branch -fno-optimize-sibling-calls -msplit-addresses -march=mips4 -o foo.mipseb foo.c

mips-unknown-linux-gnu-objdump can be used to view the assembly code of the current MIPS file for a C program foo.c. This file is located at,

/mipsroot/cross-tools/bin/mips-unknown-linux-gnu-objdump

And it can be used by,

$ /mipsroot/cross-tools/bin/mips-unknown-linux-gnu-objdump -d foo.mipseb

2. Configuration File

Now, let’s look into the configuration file. We should use the following command to view this file,

$ cd ~/sesc/confs
$ cat cmp4-noc.conf | more

The output would be,

# By Ching-Kai Liang

procsPerNode  = 4
cacheLineSize = 64
NOCdim = 2 # Assume a $(NOCdim) x $(NOCdim) NOC architecture

issue         = 2 
cpucore[0:$(procsPerNode)-1] = 'issueX'
...

From this content, we can find out that the procsPerNode is specified to 4 and these processors (cores) are specified to number 0 through 3 by,

cpucore[0:$(procsPerNode)-1] = 'issueX'

The parameter issue is specified to 2 and this means that the processor can fetch or issue up to 2 instructions per cycle,

issue         = 2

In addition, each of the processors is described in section issueX. We can then click on the enter key to move down a little bit until we see the [issueX] section, where we can see a lot of parameters defined for this core. From these parameters, we can find out that,

The clock frequency is 1GHz because,

frequency       = 1e9

The processor has an out-of-order execution feature because,

inorder         = false

Moreover, there are also many other sections that define various things, such as,

[BPredIssueX] describes the branch predictor for the current processor
[IMemory] describes the fetch instructions from a structure called IL1 (which is specified by the instrSource parameter)
[DMemory] describes reads/writes data from a structure called DL1

In these experiments, we will be modifying the BPRedIssueX section, so let’s take a closer look at it. The type parameter of this branch predictor is hybird because

type          = "hybrid"

However, this doesn’t tell us much because we still don’t know what kind of predictor we have specified for this processor. Recall the different predictors we have talked about in the previous sections,

Not-Taken Predictor: simplest, always predict not taken
BTB: Branch Target Buffer, to show where to branch
1BP(BC) BHT: 1-Bit Branch History Table, to show whether a branch should branch or not with 1-bit history, not work for the duplicated pattern
2BP BHT: 2-Bit Branch History Table, to show whether a branch should branch or not 2-bit history, not work for the duplicated pattern
1H 2BP BHT: 1-Bit History With 2-Bit Count Predictor Table, to show whether a branch should branch or not, work for the 2-bit duplicated pattern
2H 2BP BHT: 2-Bit History With 2-Bit Count Predictor Table, to show whether a branch should branch or not, work for the 3-bit duplicated pattern
NH 2BP BHT: N-Bit History With 2-Bit Count Predictor Table, to show whether a branch should branch or not, work for the (N+1)-bit duplicated pattern
RAS: the predictor for function return address

Note that we can also split the history bits in the BHT table. If we split a pattern history table (PHT) for each entry in the BHT, it is called a PShare. However, if the PHT is split as a global table for all the entries in BHT, then it is called a GShare. In order to choose between different predictors, we have to use,

Tournament Predictor: used to choose between two good predictors (i.e. PShare and GShare)
Hierarchical Predictor: used to choose between a good predictor and an Okay predictor (i.e. the result of PShare and 2BC)

Well, in our experiment, the type of hybrid actually means a tournament predictor. We can check the source code for more details about this code. The source code should be located in ~/sesc/src/libcore/ and we can view these source files by,

$ ~/sesc/src/libcore/BPred.h
$ ~/sesc/src/libcore/BPRed.cpp

The output could be viewed here.

From the source code, we can discover that from the line 1191 to 1192, the Hybrid type will be mapped to a class BPHybrid and the BPHybrid::BPHybrid function defines the globalTable, the localTable, and the metaTable.

3. Experiment 1: Comparing Different Predictors

Now we will compare some branch predictors. The LU benchmark we used in the last experiment does not really stress the branch predictor, so we will use the raytrace benchmark instead. In order to use this branch mark, we have to compile it by,

$ cd ~/sesc/apps/Splash2/raytrace
$ make

Now, let’s do some simulations. In this case, we are going to simulate the predictors with the type Hybrid (as we have talked above), Oracle (with a perfect direction predictor and a BTB target address predictor), and NotTaken (a simple not-taken predictor). You have to modify the type parameter in the cmp4-noc.conf file before the simulation.

In order to do so, we can make three copies of the file cmp4-noc.conf with names of cmp4-noc-HyA.conf , cmp4-noc-OrA.conf , and cmp4-noc-NtA.conf. In our case, we are going to create these files in the shared folder and then copy these files to the ~/sesc/confs directory,

$ cd ~/sesc/confs
$ cp /media/sf_CS6290/PRJ\ 1/Confs/cmp4-noc-* ./

We can check these files by,

$ ls | grep "cmp4-noc"

The output should be,

cmp4-noc.conf
cmp4-noc-HyA.conf
cmp4-noc-NtA.conf
cmp4-noc-OrA.conf

Then we go to the ~/sesc/apps/Splash2/raytrace directory and generate the report with these configuration files,

$ cd ~/sesc/apps/Splash2/raytrace
$ ~/sesc/sesc.opt -fHyA -c ~/sesc/confs/cmp4-noc-HyA.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env
$ ~/sesc/sesc.opt -f OrA -c ~/sesc/confs/cmp4-noc-OrA.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env
$ ~/sesc/sesc.opt -f NTA -c ~/sesc/confs/cmp4-noc-NtA.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env

When you run these commands, you can immediately see,

[0]   Thread 0 (0) Create
Begin skipping: requested 0 instructions

End skipping: requested 0 skipped 0
...

You can not rush to finish these commands because they are quite slow for our virtual machine and they are likely to take 10–15 minutes for each of them before completion. Also, you can not run these commands in different terminals simultaneously because the previous process can be killed because of that. So just take the time to drink a cup of coffee or do something else until you see something like,

...
MultipleDestInvalidation : 0
TotDestInvalidation : 0

for each of them. Then we can continue our experiment. After the simulations, we can use the following command to check the reports,

$ ls | grep sesc_raytrace.mipseb

The output should be,

sesc_raytrace.mipseb.HyA
sesc_raytrace.mipseb.NTA
sesc_raytrace.mipseb.OrA

Then we can view these reports by,

$ ~/sesc/scripts/report.pl sesc_raytrace.mipseb.NTA
$ ~/sesc/scripts/report.pl sesc_raytrace.mipseb.HyA
$ ~/sesc/scripts/report.pl sesc_raytrace.mipseb.OrA

Then you can compare the simulation time, the cycles, and the branch prediction accuracy of these three predictors.

4. Experiment 2: Comparing Different Predictors

Typo 1: (Part 2 Section D) Note that the instructions say “when the pipeline is 6 stages deeper”, but they should say “when the pipeline is 7 stages deeper” to be consistent with how renameDelay has been modified.

Typo 2: (Part 2 Section E) Section E says “The results in Part E) lead us to conclude that better branch prediction”, but it should say “The results in Part D) lead us to conclude that better branch prediction”.

Now, let’s make the pipeline deeper. The renameDelay parameter in the configuration file ~/sesc/confs/cmp4-noc.conf in the [issueX] section is set to value 1 by,

renameDelay     = 1

In this experiment, we would like to change this value to 8 to make the pipeline 7 stages longer/deeper for our processor. We would like to modify this parameter among all of the three configuration files cmp4-noc-HyA.conf , cmp4-noc-OrA.conf , and cmp4-noc-NtA.conf , and then generate three new configuration files cmp4-noc-HyC.conf , cmp4-noc-OrC.conf , and cmp4-noc-NtC.conf .

We should then copy these files to the ~/sesc/confs directory,

$ cd ~/sesc/confs
$ cp /media/sf_CS6290/PRJ\ 1/Confs/cmp4-noc-* ./

We can check these files again by,

$ ls | grep "cmp4-noc"

Then the output should be,

cmp4-noc.conf
cmp4-noc-HyA.conf
cmp4-noc-HyC.conf
cmp4-noc-NtA.conf
cmp4-noc-NtC.conf
cmp4-noc-OrA.conf
cmp4-noc-OrC.conf

Again, let’s simulate under the condition of these new configurations by,

$ cd ~/sesc/apps/Splash2/raytrace
$ ~/sesc/sesc.opt -fHyC -c ~/sesc/confs/cmp4-noc-HyC.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env
$ ~/sesc/sesc.opt -f OrC -c ~/sesc/confs/cmp4-noc-OrC.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env
$ ~/sesc/sesc.opt -f NTC -c ~/sesc/confs/cmp4-noc-NtC.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env

Then we can view these reports by,

$ ~/sesc/scripts/report.pl sesc_raytrace.mipseb.NTC
$ ~/sesc/scripts/report.pl sesc_raytrace.mipseb.HyC
$ ~/sesc/scripts/report.pl sesc_raytrace.mipseb.OrC

From these reports, we can draw a conclusion about the pipeline depth and the importance of prediction accuracy. Theoretically, we have discussed that we have to abort more instructions when we have a deeper pipeline, so the misprediction penalty for a deeper pipeline can be higher. Thus, a predictor with a better prediction accuracy should be more important for a deeper pipeline. You can check this theory based on the result of the reports above.

5. Experiment 3: Find out which branch tends to be mispredicted

In this part, we are going to use the original configuration file cmp4-noc.conf and our goal in this experiment is to determine: for each instruction in the program, how many times the direction predictor correctly predicts? And how many times it mispredicts that branch?

In this case, we have to modify the code BPred.h and BPred.cpp . And we would like to print the following information,

the # of static branch instructions that are completed 1~19 times
the # of static branch instructions that are completed 20~199 times
the # of static branch instructions that are completed 200~1999 times
the # of static branch instructions that are completed 2000+ times

Because you may want to figure out something like the GStatsCntr class, so the library of SESC can be helpful.

After configuration, we have to copy the modified files to the sesc directory,

$ cd /media/sf_CS6290/sesc/src/libcore
$ cp BPred.* ~/sesc/src/libcore

Then we have to rebuild the sesc simulator by,

$ cd ~/sesc
$ make

Finally, we should put the output in the file rt.out.Hybrid and rt.out.NT,

$ ~/sesc/sesc.opt -f HyG -c ~/sesc/confs/cmp4-noc-HyA.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env > rt.out.Hybrid
$ ~/sesc/sesc.opt -f NTG -c ~/sesc/confs/cmp4-noc-NTA.conf -ort.out -ert.err raytrace.mipseb -p1 -m128 -a2 Input/reduced.env > rt.out.NT