High-Performance Computer Architecture 34 | Cache Coherence and Multicore Experiments

Series: High-Performance Computer Architecture

High-Performance Computer Architecture 34 | Cache Coherence and Multicore Experiments

NOTE: The present article does NOT include anything related to the final submission (no answers, no specific values, no results) for the course CS6290 HPCA because of the honor code. Most of the contents in this article is repeating the basic instructions and the directions of the Project 3. More Linux commands are provided as complements to the project’s guidelines.
Please feel free to contact me if this article violates the rules of Georgia Tech and I will immediately delete this article with no doubt.
  1. Experiment 1: Running a Parallel Application

Before we start this experiment, we have to remove all the codes we have implemented for the previous projects. It may be easier if you do a fresh restart with all the configurations simply removed by reinstall the virtual machine. you may refer to the following article to do so,

In this experiment, we will be using the LU benchmark with a 16-core system. The configuration file we would like to use is ~/sesc/confs/cmp16-noc.conf and we can modify the number of threads by -p option. We can first direct to the LU directory by,

$ cd ~/sesc/apps/Splash2/lu

We simulate the lu application with 1, 4, or 16 threads and then save the report as sesc_lu.mipseb.Ap1, sesc_lu.mipseb.Ap4, and sesc_lu.mipseb.Ap16.

$ ~/sesc/sesc.opt -fAp1 -c ~/sesc/confs/cmp16-noc.conf -olu.out  -elu.err lu.mipseb -n512 -p1
$ ~/sesc/sesc.opt -fAp4 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p4
$ ~/sesc/sesc.opt -fAp16 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p16

Then we can read the report by,

$ ~/sesc/scripts/report.pl sesc_lu.mipseb.Ap1
$ ~/sesc/scripts/report.pl sesc_lu.mipseb.Ap4
$ ~/sesc/scripts/report.pl sesc_lu.mipseb.Ap16

Now, let’s compare the result of these three reports. We can find out that the speedup for each core is not equivalent to the number of cores we have. So what do you think is the problem? What’s more, when we have more cores, core #0 will execute more instructions compared with other cores and the IPC of core #0 will be reduced, what do you think is the problem?

2. Experiment 2: Read Misses for L1 Caches

Now, let’s have a look at the number of cache-read-misses that occur in each Data L1 cache (DL1 cache). We can print out the read misses of core #0’s DL1 caches by,

$ cat sesc_lu.mipseb.Ap1 | grep "P(0)_DL1:readMiss="
$ cat sesc_lu.mipseb.Ap4 | grep "P(0)_DL1:readMiss="
$ cat sesc_lu.mipseb.Ap16 | grep "P(0)_DL1:readMiss="

From the result, we can discover that number of these misses decreases when from 1 to 16 threads. Why?

3. Experiment 3: Counting Coherence Misses

In the last project, we have counted different types of misses. In this project, we are going to count the new coherence misses especially for the multithreading program. In this project, we will count how many read misses in each core #0’s DL1 cache are compulsory (readCompMiss), replacement (capacity or conflict, the counter should be called readReplMiss), and coherence misses (readCoheMiss), and separately also classify write misses (writeCompMiss, writeReplMiss, and writeCoheMiss).

Again, we can get the value of the write misses by,

$ cat sesc_lu.mipseb.Ap1 | grep "P(0)_DL1:writeMiss="
$ cat sesc_lu.mipseb.Ap4 | grep "P(0)_DL1:writeMiss="
$ cat sesc_lu.mipseb.Ap16 | grep "P(0)_DL1:writeMiss="

You should modify some codes for the sesc and finally, we should be able to count the following numbers,

  • rcompMiss: read compulsory misses
  • rcoheMiss: read coherence misses
  • rreplMiss: read replacement misses (conflict misses + capacity misses)
  • wcompMiss: write compulsory misses
  • wcoheMiss: write coherence misses
  • wreplMiss: write replacement misses (conflict misses + capacity misses)

After we finish modifying the code, we have to rebuild the sesc by,

$ cd ~/sesc
$ make

Finally, we need to re-run the simulations from Part 1 and get the resulting simulation report files by,

$ ~/sesc/sesc.opt -fHp1 -c ~/sesc/confs/cmp16-noc.conf -olu.out  -elu.err lu.mipseb -n512 -p1
$ ~/sesc/sesc.opt -fHp4 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p4
$ ~/sesc/sesc.opt -fHp16 -c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n512 -p16