next up previous
Next: Conclusion Up: Results and Interpretation Previous: Cache Performance Metrics

Interpretation of Results

Two graphs have been drawn, one for compress benchmark and the other for eqntott, each comparing the optimistic and conservative cache access latencies and the cache occupancy. Latency and occupancy are measured in average cycles per reference. Instead of taking the average of just two different benchmarks and plot them, we decided to plot the results separately and interpret the results. Although, latency and occupancy can be less than one, for our benchmarks,they are more than one. The latencies for most of the two way associative caches are more or less the same since the equations for latency are identical for the HR, CA, MRU and PSA caches are identical. Any difference in latency arises from different hit rates and the fraction of the references resolved on the first or the second cycle. While PSA(eff) has the lowest latency for the compress benchmark, CAC seems to perform better for the eqntott benchmark. However, PSA(eff) simply extends the MRU-cache to use a larger table of prediction bits, and it may not be possible to use the effective address to index a table of steering bits and access the data in a single cycle. Inspite of this, the PSA(eff) demonstrates how to improve the MRU-Cache design of Kessler for the domains considered by Kessler et al. [6]. PSA(xor) occupies the second place in latency as far as the compress benchmark is concerned, followed by CAC. PSA(xor) configuration requires an exclusive-or of the contents of the register and the offset before the SBT is accessed; this may not be possible in some designs. As far as the occupancy is concerned, the PSA caches give the best performance. While both PSA and CAC use rehash bits, to avoid examining the second half of the cache in some situations, in the optimistic model, the rehash bit has no effect on latency because misses are always initiated early, but rehash bits influence occupancy in all configurations. CAC has greater cache occupancy since exchange of entire cache lines is performed to implement a partial LRU replacement strategy, while PSA caches do not require exchange of cache lines, instead steering bits are used to maximize a first cache hit without swapping.


next up previous
Next: Conclusion Up: Results and Interpretation Previous: Cache Performance Metrics

Annamalai Ramanathan
Fri Apr 4 19:37:16 EST 1997