next up previous
Next: Introduction Up: ECE 3391 Computer Architecture Previous: ECE 3391 Computer Architecture

Abstract

This paper review looks at ``Fast and Accurate Instruction Fetch and Branch Prediction'' by Calder et al. [1] (hereafter called `paper'). This paper is a logical continuation of the currently proposed and implemented designs ( Architectures combining very effective branch prediction mechanisms coupled with modified branch target buffers (BTB's) ), largely based on variations of the 2-level adaptive branch prediction of Yeh et al. [12, 13]. As the issue rate and depth of pipelining of high performance superscalar processors increase, the importance of an excellent branch predictor becomes more vital to delivering the potential performance of a wide-issue, deep pipelined microarchitecture. Mispredicted branches mean ten's of cycles may be wasted in superscalar architectures. Hence fast and more accurate instruction fetch and branch prediction schemes are required. Calder et al. [1] propose a combination of less expensive mechanisms that can achieve better performance than BTB's making the architecture simpler. The proposed schemes are (i) decoupling branch prediction from the BTB, increasing the accuracy of branch prediction for entries that miss in the BTB and decreasing the information stored in the BTB (ii) changing the BTB allocate policy, by not storing the ``fall through'' branches in the BTB, avoiding displacement of prediction information on the premise that not-taken branches do not really benefit from BTB and that static prediction is fairly accurate and (iii) dispensing with the BTB altogether, proposing a new architecture that determines instruction type and destination address through other means. The paper shows that the simpler and space-efficient `GAg' is as effective as the `PAs' architecture, using the Branch Execution Penalty (BEP) metric, which is a measure of the extra CPI and hence a system level perspective [2]. The paper proposes a method of dispensing with the BTB entirely that uses an explicit branch displacement. This new proposal tries to challenge the BTB, but more work needs to be done to make it a feasible solution to dispensing with the BTB. The combination of less expensive mechanisms used relies on a number of design choices described in the paper. Trace driven simulation has been performed to show the proposed design and further improvements have been suggested. This paper proposes simpler mechanisms and demonstrates that they can be as effective as large and complex BTBs.


next up previous
Next: Introduction Up: ECE 3391 Computer Architecture Previous: ECE 3391 Computer Architecture

Annamalai Ramanathan
Fri Apr 4 20:04:23 EST 1997