High-Performance Computer Architecture 34 | Cache Coherence and Multicore Experiments
High-Performance Computer Architecture 34 | Cache Coherence and Multicore Experiments

NOTE: The present article does NOT include anything related to the final submission (no answers, no specific values, no results) for the course CS6290 HPCA because of the honor code. Most of the contents in this article is repeating the basic instructions and the directions of the Project 3. More Linux commands are provided as complements to the project’s guidelines.
Please feel free to contact me if this article violates the rules of Georgia Tech and I will immediately delete this article with no doubt.
- Experiment 1: Running a Parallel Application
Before we start this experiment, we have to remove all the codes we have implemented for the previous projects. It may be easier if you do a fresh restart with all the configurations simply removed by reinstall the virtual machine. you may refer to the following article to do so,
Series: High-Performance Computer Architecturemedium.com
In this experiment, we will be using the LU benchmark with a 16-core system. The configuration file we would like to use is ~/sesc/confs/cmp16-noc.conf
and we can modify the number of threads by -p
option. We can first direct to the LU directory by,
$ cd ~/sesc/apps/Splash2/lu
We simulate the lu application with 1, 4, or 16 threads and then save the report as sesc_lu.mipseb.Ap1
, sesc_lu.mipseb.Ap4
, and sesc_lu.mipseb.Ap16
.
$ ~/sesc/sesc.opt -fAp1 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p1
$ ~/sesc/sesc.opt -fAp4 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p4
$ ~/sesc/sesc.opt -fAp16 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p16
Then we can read the report by,
$ ~/sesc/scripts/report.plsesc_lu.mipseb.Ap1
$ ~/sesc/scripts/report.plsesc_lu.mipseb.Ap4
$ ~/sesc/scripts/report.plsesc_lu.mipseb.Ap16
Now, let’s compare the result of these three reports. We can find out that the speedup for each core is not equivalent to the number of cores we have. So what do you think is the problem? What’s more, when we have more cores, core #0 will execute more instructions compared with other cores and the IPC of core #0 will be reduced, what do you think is the problem?
2. Experiment 2: Read Misses for L1 Caches
Now, let’s have a look at the number of cache-read-misses that occur in each Data L1 cache (DL1 cache). We can print out the read misses of core #0’s DL1 caches by,
$ cat sesc_lu.mipseb.Ap1 | grep "P(0)_DL1:readMiss="
$ cat sesc_lu.mipseb.Ap4 | grep "P(0)_DL1:readMiss="
$ cat sesc_lu.mipseb.Ap16 | grep "P(0)_DL1:readMiss="
From the result, we can discover that number of these misses decreases when from 1 to 16 threads. Why?
3. Experiment 3: Counting Coherence Misses
In the last project, we have counted different types of misses. In this project, we are going to count the new coherence misses especially for the multithreading program. In this project, we will count how many read misses in each core #0’s DL1 cache are compulsory (readCompMiss), replacement (capacity or conflict, the counter should be called readReplMiss), and coherence misses (readCoheMiss), and separately also classify write misses (writeCompMiss, writeReplMiss, and writeCoheMiss).
Again, we can get the value of the write misses by,
$ cat sesc_lu.mipseb.Ap1 | grep "P(0)_DL1:writeMiss="
$ cat sesc_lu.mipseb.Ap4 | grep "P(0)_DL1:writeMiss="
$ cat sesc_lu.mipseb.Ap16 | grep "P(0)_DL1:writeMiss="
You should modify some codes for the sesc
and finally, we should be able to count the following numbers,
rcompMiss
: read compulsory missesrcoheMiss
: read coherence missesrreplMiss
: read replacement misses (conflict misses + capacity misses)wcompMiss
: write compulsory misseswcoheMiss
: write coherence misseswreplMiss
: write replacement misses (conflict misses + capacity misses)
After we finish modifying the code, we have to rebuild the sesc
by,
$ cd ~/sesc
$ make
Finally, we need to re-run the simulations from Part 1 and get the resulting simulation report files by,
$ ~/sesc/sesc.opt -fHp1 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p1
$ ~/sesc/sesc.opt -fHp4 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p4
$ ~/sesc/sesc.opt -fHp16 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p16