Calder et al. [1] reason on the good performance of the `GAg' that certain functionality of the BTB can be removed. They explain the purpose of the BTB [1] as (i) By virtue of an instructions address being in the BTB, the instruction is decifered as a branch; (2) Accurate prediction information can be associated with each BTB entry and (3) The BTB provides pre-computed destination and fall-through addresses for unconditional and conditional branches. The destination of return instructions can be predicted using a return stack. Calder et al. infer from their experiments that if the instruction type can be determined through other means, the BTB can be dispensed with.
And they propose an architecture, that determines the instruction type and uses latched branches (avoiding the use of an adder in the datapath). Generally, a displacement stored in the branch instruction is sign-extended to the size of the program counter and added to the program counter. Each branch can directly address instructions at address . There have been many other branch encodings, one of them proposed by Katevenis, uses the branch displacement field as the least significant bits of the branch target address and the sign bit for the offset and carry for the addition of the lower bits are computed by the compiler and encoded in the instruction. This may lead to program stalls when the bits do not agree as shown in [1]. In the paper [1], an explicit displacement instead of a PC-relative displacement is used, so that the adder can be removed from the data path. This restricts a jump to have a span of utmost instructions. To branch outside this span, an indirect jump must be used. The architecture relies on the program linker/compiler to compensate for the limited branching. And the PC-relative code relocation, important for shared program libraries, is not possible without dynamic relinking. In the Alpha AXP, the instruction space span is a segment of 8MB with a single explicit displacement. Programs larger than 64KB are common, making the segment architectures unwanted. In this architecture, Calder et al. [1] are restricting branches within 8MB, which is not a constraint today, but cannot be ruled out as one in the future.
When the instruction itself, does not have unique bits that identify branches, Calder et al. propose having bits in the instruction cache, that contain this information. The Alpha AXP 21064 stores branch prediction information with words in the on-chip instruction cache.
In the BTB architecture, the address is offered to the instruction cache and the BTB. And, generally additional information is stored in the BTB to cache the prediction result for the next branch.
In the proposed architecture, the instruction address is offered to the instruction cache, the PHT and an adder that computes the fall-through address (PC + 4). This architecture has a single gate layer and a multiplexor between the instruction cache and the next instruction fetch address. Calder et al. admit that a detailed design is required and experimentation is required to decide on the trade-offs between the BTB and cache access time.