The attached diagrams (pdf, doc) show three types of cache: (1) a direct mapped cache with one word per line, (2) a direct mapped cache with four words per line and (3) a two-way set associative cache with four words per line. The code segments below give two possible instruction streams, each with a variable number of instructions. We will analyze how each of these caches is used with each instruction stream as described below, in order to determine how the caches affect performance of the code.
The caches are all the same size, 128 Bytes or 32 Words. Since we consider only instruction caches, the least significant two bits of each address are always zero and are not used in addressing individual bytes within the words (instructions) in the cache, as they might be in a data cache.
The caches have the following additional characteristics:
Consider cache (1) which is direct mapped and contains 32 lines of one word each. Five bits (bits 2 - 6) of the instruction address are used to index the cache line. The remaining 25 bits are stored in the tag field of the cache.
1001 1000 1111 0000 0000 1100 1101 1000 | <instruction 30> |
1001 1000 1111 0000 0000 1100 1101 1100 | addi $s5, $s5, 1 |
1001 1000 1111 0000 0000 1100 1110 0000 | bne $s5, $s6, loop |
Use the diagram of the direct mapped one-word per line cache to show the contents of the cache and the values of the tag fields after the first iteration of the loop. Calculate the time (in clock cycles) for the loop to complete 1,001 iterations. Remember to include the instructions preceeding the loop in your calculation. (Note: do not forget the compulsory cache misses the first time around the loop.)
Code segment 2 is used for this problem and contains a loop and a subroutine call.
Address | Instruction | Comment |
---|---|---|
1001 1000 1111 0000 0000 1100 0101 1100 | addi $s6, $0, 1001 | # initialize number of iterations |
1001 1000 1111 0000 0000 1100 0110 0000 | add $s5, $0, $0 | # initialize loop counter |
loop: | ||
1001 1000 1111 0000 0000 1100 0110 0100 | <instruction 1> | # beginning of loop body, |
1001 1000 1111 0000 0000 1100 0110 1000 | <instruction 2> | # which has a total |
1001 1000 1111 0000 0000 1100 0110 1100 | <instruction 3> | # of k instructions |
... | ... | |
1001 1000 1111 0000 0000 1100 0101 1100+4k | addi $s5, $s5, 1 | # instruction k-1 |
1001 1000 1111 0000 0000 1100 0110 0000+4k | bne $s5, $s6, loop | # instruction k |
# end of loop |
Address | Instruction | Comment |
---|---|---|
1001 1000 1111 0000 0000 1100 0101 0100 | add $s4, $0, $0 | # initialize total |
1001 1000 1111 0000 0000 1100 0101 1000 | addi $s6, $0, 1001 | # initialize number of iterations |
1001 1000 1111 0000 0000 1100 0101 1100 | add $s5, $0, $0 | # initialize loop counter |
loop: | ||
1001 1000 1111 0000 0000 1100 0110 0000 | add $a0, $s0, $0 | # first parameter |
1001 1000 1111 0000 0000 1100 0110 0100 | add $a1, $s1, $0 | # second parameter |
1001 1000 1111 0000 0000 1100 0110 1000 | add $a2, $s2, $0 | # third parameter |
1001 1000 1111 0000 0000 1100 0110 1100 | add $a3, $s3, $0 | # fourth parameter |
1001 1000 1111 0000 0000 1100 0111 0000 | jal function | # function call |
1001 1000 1111 0000 0000 1100 0111 0100 | add $s4, $s4, $v0 | # add result to total |
1001 1000 1111 0000 0000 1100 0111 1000 | <instruction 7> | # remainder of loop |
1001 1000 1111 0000 0000 1100 0111 1100 | <instruction 8> | # which has a total |
1001 1000 1111 0000 0000 1100 1000 0000 | <instruction 9> | # of 16 instructions |
... | ... | |
1001 1000 1111 0000 0000 1100 1001 0100 | <instruction 14> | |
1001 1000 1111 0000 0000 1100 1001 1000 | addi $s5, $s5, 1 | # instruction 15 |
1001 1000 1111 0000 0000 1100 1001 1100 | bne $s5, $s6, loop | # instruction 16 |
# end of loop | ||
... | ... | |
function: | ||
1001 1000 1111 0000 0000 1111 0110 1000 | addi $sp, $sp, -16 | # save state |
1001 1000 1111 0000 0000 1111 0110 1100 | sw $s0, 0($sp) | # instruction B |
1001 1000 1111 0000 0000 1111 0111 0000 | sw $s1, 4($sp) | # instruction C |
1001 1000 1111 0000 0000 1111 0111 0100 | sw $s2, 8($sp) | # instruction D |
1001 1000 1111 0000 0000 1111 0111 1000 | sw $ra, 12($sp) | # instruction E |
1001 1000 1111 0000 0000 1111 0111 1100 | <instruction F> | #body of subroutine |
1001 1000 1111 0000 0000 1111 1000 0000 | <instruction G> | |
1001 1000 1111 0000 0000 1111 1000 0100 | <instruction H> | |
1001 1000 1111 0000 0000 1111 1000 1000 | <instruction I> | |
1001 1000 1111 0000 0000 1111 1000 1100 | lw $s0, 0($sp) | # restore state |
1001 1000 1111 0000 0000 1111 1001 0000 | lw $s1, 4($sp) | # instruction K |
1001 1000 1111 0000 0000 1111 1001 0100 | lw $s2, 8($sp) | # instruction L |
1001 1000 1111 0000 0000 1111 1001 1000 | lw $ra, 12($sp) | # instruction M |
1001 1000 1111 0000 0000 1111 1001 1100 | addi $sp, $sp, 16 | # instruction N |
1001 1000 1111 0000 0000 1111 1010 0000 | jr $ra | #return |
In this problem, you will get introduced to the sim-cache simulator. You will use this simulator to perform cache simulation with various configurations.
The sim-cache simulator performs a functional simulation of an executable program coupled with an emulation of the memory system supporting the program. The emulated memory system is capable of supporting multiple levels of instruction and data caches, each of which can be configured for different sizes and organization. This allows us to measure the actual hit/miss rate of the given program for the emulated cache organization.
The sim-cache simulator (along with other simulators) are available on each of the machines in the Olin 219 lab. The executables for these simulators are all in the /usr/local/cs281/simplesim-3.0 directory. To ease the repeated process of running this program, you should add /usr/local/cs281/simplesim-3.0 to your search path. This can be done by adding the following line to your .bash_profile file in your home directory and then logging out and back in:
export PATH=/usr/local/cs281/simplesim-3.0/:$PATH
If you are in a hurry, you can simply execute the above command in a Terminal window, and the effect will last in that shell for as long as the Terminal window remains.Start by running sim-cache with the -h option to get the help screen listing all the options and arguments available for configuration of a simulation run:
<command-prompt>$ sim-cache -h
Notice that for an execution run, you can use the -config option to specify a configuration file, and you always must specify an executable for the simulator to "run" and gather memory statistics upon. The following configuration files can help:
cache_1a.cfg: Example configuration file for an L1 instruction cache, but no L1 data cache or any L2 caches
cache_2a.cfg: Example configuration file for an L1 data cache, but no L1 instruction cache or any L2 caches
The current version of all the simulators are configured as PISA (Portable Instruction Set Architecture), which is an instruction architecture quite similar to the MIPS that we have been working with. Also included with the simulator distribution is a set of PISA executables (both little-endian and big-endian versions). For this problem, we will use the executable: /usr/local/cs281/simplesim-3.0/tests-pisa/bin.little/test-math
You may wish to create a testing directory and copy this execuable along with the configuration files and then, as you run the simulator and accumulate your results, you can put files with output from the simulator in this same directory.
Your goal is to use single runs of the sim-cache simulator to determine the miss ratio when we "execute" the test-math program under different conditions:
Run these experiments two times, once each for a data-only cache and for an instruction only cache. Fill in the two tables below:
Miss Ratio (I-Cache) | 1-way | 2-way | 4-way | 8-way |
32 sets | ||||
64 sets | ||||
128 sets | ||||
256 sets | ||||
512 sets |
Miss Ratio (D-Cache) | 1-way | 2-way | 4-way | 8-way |
32 sets | ||||
64 sets | ||||
128 sets | ||||
256 sets | ||||
512 sets |
Once you have collected the experimental data, use Excel to plot the results of the simulations. For each of the simulations (data, instruction), plot the miss ratio versus associativity for each number of sets. Using markers, show the points on the curves which correspond to total cache sizes of 1 Kbytes, 2 Kbytes, 4 Kbytes and 8 K bytes (total cache size = sets * block size * associativity). For each simulation, you should produce something that resembles the plot below.
Now answer the following questions based on the above results.
Q 1) For a given number of sets, what effect does increasing associativity have on the miss ratio?
Q 2) For a given associativity, what is the effect of increasing the number of sets?
Q 3) For a given cache size, how does the miss ratio change when going from an associativity of one to two to four? Explain.
Q 4) If you were to design a Instruction cache, limited to a total cache size of 4 Kbytes, which cache organization would you choose, based solely on performance?
Q 5) If you were to design a data cache, limited to a total cache size of 4 Kbytes, which cache organization would you choose, based solely on performance?