Calder et al. [1] also suggest that further increase the effectiveness of the BTB can further be increased by only storing `taken' branches in the BTB. If a branch is not in the BTB and it is "not taken," it need not entered in the BTB. If the `not taken' branch is already in the BTB, the prediction information is updated, but the LRU reference information is not - thus, `not taken' branches will be displaced more frequently than `taken' branches. Taken branches are always entered in the BTB. By not entering fall-through branches, prediction information for other branches is not displaced and `not taken' branches don't really benefit from the BTB, since they fetch the following instruction [1]. If a branch is not found in the BTB, the architecture uses a static prediction mechanism (backwards taken, forward not taken) that is fairly accurate.
Calder et al. [1] also note that storing only branches that violate the static prediction rule may not be a good alternative, because of the increased misfetches in a program with a large number of taken branches. They seemed to have experienced this in such programs like alvinn and eqntott. Calder et al. [1] report more improvement for PAs method than GAg method and explain that with the reasoning that PAs method is more sensitive to capacity misses.
It is to be noted that the Intel Pentium and P6 use this type of allocation for their BTB's and claim to have the best branch prediction accuracy.