Cache is the first level of the memory hierarchy encountered once the address leaves the CPU [8]. The memory hierarchy is an economical solution to the unlimited requirements of fast memory. It takes advantage of locality and cost/performance of memory technologies. A direct-mapped cache allows a line of memory to be allocated to only one block in the cache. It is simple to design and implement and takes less area. It has the fastest hit access time, but always has a higher miss rate than a set-associative cache wherein a block if first mapped to a set can be placed anywhere in the set.
Caches have an address tag on each block frame that gives the block address. The tag of every cache block that might contain the desired information is checked to see if it matches the block address from the CPU. As a rule, all possible tags are searched in parallel because speed is critical. Hence a set-associative cache has a higher access time, due the comparator and multiplexer involved in the data path and also requires more area.
Several alternate direct-mapped caches have been proposed for reducing the interference misses. In Jouppi's victim cache scheme [7], a small fully associative provides some extra cache lines for data removed from the direct-mapped cache due to misses. This scheme requires a sizeable victim cache for adequate peformance because it must store all conflicting data blocks. It requires two or more accesses to fetch a conflicting datum. It does not increase in size as the primary cache increases, unlike the other caches that are following and hence is not very effective for large primary caches.
Kessler et al. [6] propose inexpensive implementations of set-associative caches by placing the multiple blocks in a set in sequential locations of cache memory. Tag checks, done serially, avoid the wide datapath requirements of conventional set-associative caches. This study focused on reduction in implementation cost. In a d-way set-associativity, only 1/d of the references would succeed in the first access. The first access can be made a hit by using the MRU scheme proposed in [4].
The MRU cache of So et al. [4] provides a scheme of speeding-up the set-associative cache accesses. It maintains a few bits with each cache set indicating the most recently used block in the set. An access to a given set immediately reads out its MRU block, betting on the likelihood that it is the desired block. If it isn't, an associative search has to be performed. The MRU information must be available, which may take one cycle. A remedy for this could be taken from the PSA cache [1].