0% found this document useful (0 votes)

33 views19 pages

Finding Difficult Branches

The document discusses finding difficult to predict branches in SPEC2000 benchmarks. It analyzes branch contexts using different feature sets, like local history and global history, to reduce the number of unbiased branches. The polarization rate of branch contexts is calculated and branches with rates below 0.95 are considered unbiased. Simulation results show the prediction accuracy of a PAg predictor on all branches versus only unbiased branches, finding lower accuracy for the latter. Some benchmarks like bzip, gzip and twolf are found to be difficult to predict with significant percentages of unbiased contexts.

Uploaded by

Flavian Gorcea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views19 pages

Finding Difficult Branches

Uploaded by

Flavian Gorcea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

FINDING DIFFICULT PREDICTABLE BRANCHES

Lucian Vintan, Arpad Gellert, Adrian Florea, Marius Oancea Lucian Blaga University of Sibiu, Computer Science Department, Emil Cioran Street, No. 4, 550025 Sibiu, Romania, E-mail: {lucian.vintan, arpad.gellert, adrian.florea, marius.oancea}@ulbsibiu.ro

1. Introduction There are two trends that are further increasing the importance of branch prediction. From architectural point of view, processors are getting wider and pipelines are getting deeper, allowing more aggressive clock rates in order to improve overall performance. A very high frequency will determine a very short clock cycle and the prediction cannot be delivered in a single clock cycle or maximum two cycles which is the prediction latency used in the actual commercial processors (see Alpha 21264 branch predictor) [C]. Also a very wide superscalar processor can suffer from performance point of view in the misprediction case when the CPU context must be recovered and the correct paths have to be (re)issued. The performance of the Pentium 4 equivalent processor degrades by 0.45% per additional misprediction cycle, and therefore the overall performance is very sensitive to branch prediction. From technological point of view, modern high-end processors use an array of tables for branch direction and target prediction [D]. These tables are quite large in size (352K bits for the direction predictor in Alpha EV8) and they are accessed every cycle resulting in significant energy consumption - sometimes more than 10% of the total chip power [E]. Despite the neural branch predictors ability to achieve very high prediction rates, the associated complexity due to latency, large quantity of adder circuits, area and power are still obstacles to the industrial adoption of this technique. The path-based neural predictors [F] improve the
instructions-per-cycle (IPC) rate of an aggressively clocked microarchitecture by 16% over the original perceptron predictor. A branch may be linearly inseparable as a whole, but it may be piecewise

linearly separable with respect to the distinct associated program paths. In other words, the pathbased neural predictor combines path history with pattern history, resulting superior learning skills to those of a neural predictor that relies only on pattern history.

2. Simulation Methodology Our first goal is to find the difficult predictable branches in the SPEC2000 benchmarks. We consider that a branch in a certain context is difficult predictable if it is unbiased [B] the number of taken and respectively not taken outcomes followed after the context of the branch are close (as closer, as more unbiased is the branch) , and the taken and not taken outcomes are shuffled. The second goal is to improve prediction accuracy for branches with low polarization rate, introducing new feature sets that will increase their polarization rate and, therefore, their predictability. A feature is the binary context on p bits of prediction information such as local history, global history or path. Each static branch has associated k dynamic contexts in which it can appear (k 2 p ). A context instance is a dynamic branch executed in the respective context. We introduce the polarization index (P) of a certain branch context:

f 0 , f 0 0.5 P( S i ) = max( f 0 , f 1 ) = f 1 , f 0 < 0.5

where:

(1)

S = {S 1 , S 2 , ..., S k } = set of distinct contexts that appear during all branch instances; k = number of distinct contexts, k 2 p , where p is the length of the binary context; T NT f0 = , f1 = , NT = number of not taken branch instances corresponding T + NT T + NT to context Si, T = number of taken branch instances corresponding to context Si, ( ) i = 1, 2, ..., k , and obviously f 0 + f 1 = 1 ; if P( S i ) = 1, ( )i = 1, 2, ..., k , then the context Si is completely biased (100%), and thus, the afferent branch is highly predictable; if P( S i ) = 0.5, ()i = 1, 2, ..., k , then the context Si is totally unbiased, and thus, the afferent branch is not predictable if the taken and not taken outcomes are shuffled.

As it can be observed in Figure 1, we want to analyze different feature sets used by different present-day branch predictors and in this way to reduce the list of unbiased branch contexts (contexts with low polarization P). A certain Feature Set is evaluated only on the unbiased branches determined with the previous Feature Sets, not on all branches from the benchmark, because the rest were solved with the previous Feature Sets. For the final list of unbiased branches we will try to find new feature sets in order to further improve their polarization index.
Feature Sets Simulated Branches List of Unpolarized Branches Branches unpolarized on local history Used Branch Predictor PAg

Local history

All conditional branches

Global history

Branches unpolarized on local history

Branches unpolarized on global history

GAg

Global history
XOR

Branch address

Branches unpolarized on global history

Branches unpolarized on GHR XOR PC

Gshare

A certain Feature Set

Branches unpolarized on the previous Feature Sets

Branches unpolarized on this Feature Set

Predictor working with this Feature Set

Figure 1. Simulation Methodology

More exactly, as it can be observed in Figure 2, a certain context of a branch is evaluated only if that branch was unbiased for its all previously analyzed contexts. Thus, the final list of unbiased branches contains only the branches that were unbiased for all their contexts of all lengths (1628 bits).
LH
16 bits

GH
16 bits

XOR PC 16 bits

LH
20 bits

GH
20 bits

XOR PC 20 bits

LH
p bits

GH
p bits

XOR p bits

Figure 2. Reducing the number of unbiased branches through feature set extension. We concentrated only on benchmarks with a percentage of unbiased branch context instances (obtained with equation (2)), greater than a certain threshold (T=1%); the potential prediction accuracy improvement is not significant in the case of benchmarks with percentage of unbiased context instances less than 1%. If the percentage of unpredictable branch contexts is 1%, if they would be solved, the prediction accuracy would increase with maximum 1%.

NUBi = 0.01 NBi

(2)

where NUBi is the total number of unbiased branch context instances on benchmark i, and NBi is the number of dynamic branches on benchmark i (therefore, the total number of branch context instances).

3. Simulation Results We started our study evaluating the branch contexts from SPEC2000 benchmarks on local branch history of 16 bits. All simulation results are reported on 1 billion instructions, skipping the first 300 million instructions. In Table 1, for each benchmark there are presented the percentages of branch contexts with polarization indexes belonging to five different intervals.
3

U
Unbiased branches

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Dynamic Branches 118321124 85382841 42591123 71504537 70616018 90868660 79880717

Static Branches 370 1777 211 136 239 17248 3330.16

[0.5, 0.6) 10.06 6.67 15.86 15.08 14.49 3.06 10.87

Polarization Rate (P) [%] [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) 10.50 8.17 8.52 5.90 3.68 4.56 16.50 8.58 6.94 15.63 11.03 9.50 12.72 6.92 5.34 2.68 1.72 2.30 10.65 6.68 6.19

[0.9, 1.0] 62.74 79.19 52.12 48.76 60.54 90.24 65.59

Table 1. Polarization rates of branch contexts on local history of 16 bits. The column Dynamic Branches contains the number of all dynamic conditional branches for each benchmark. The column Static Branches contains the number of static branches for each benchmark. For each benchmark we generated using equation (1) a list of unbiased branch contexts, having polarization less than 0.95. We considered that the branch contexts with polarization greater than 0.95 are predictable and will obtain relatively high prediction accuracies, around 0.95, therefore, in these cases we considered that the potential improvement of the prediction accuracy is low. The following table compares the prediction accuracies obtained with original PAg predictor on all branches (Figure 3), and respectively PAg predicting only unbiased contexts. The predictors have the same configuration: 1024 entries in the first level (L1size = 1024), 16 bit LHR lengths (W = 16), and 216 entries in the second level (L2size = 65536). For the PAg predictor we used the simbpred simulator from Simplesim-3.0, with the following options: -bpred 2lev -bpred:2lev 1024 65536 16 0.
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Prediction Accuracy Address Direction Address Direction Address Direction Address Direction Address Direction Address Direction Address Direction PAg 0.9838 0.9838 0.9367 0.9367 0.9010 0.9010 0.8905 0.8905 0.8490 0.8490 0.9284 0.9299 0.9149 0.9151 PAg/Unbiased Contexts 0.8267 0.8267 0.7748 0.7748 0.6913 0.6913 0.7423 0.7423 0.7078 0.7078 0.8049 0.8059 0.7579 0.7581 Unbiased Context Instances (P<0.95) 6812313 5.76% 17589658 11252986 27692102 31763071 9809360 17486582 20.60% 26.42% 38.73% 44.98% 10.80% 24.55%

Table 2. Prediction accuracy on the unbiased branch contexts for local history of 16 bits. The column Unbiased Context Instances contains for each benchmark the number of unbiased context instances and respectively the percentage of unbiased context instances reported to all context instances (dynamic branches).

Per-address Branch History Table (PBHT)

LHR 0 PChigh PClow LHR k

Global Pattern History Table (GPHT)

log 2 L1size

Predicted PC

Prediction bits

L2size

LHR L1size-1
W bits

Figure 3. The PAg branch predictor scheme.

As it can be observed in Table 2, the bzip, gzip and twolf benchmarks are difficult predictable with the original PAg predictor (prediction accuracies less than 0.9 on all branches). The low prediction accuracies obtained with PAg predicting only the unbiased contexts, and respectively the high percentages of unbiased contexts show that the prediction accuracy can be significantly improved. We continue our work analyzing a global branch history of 16 bits only on the local branch contexts that we found unbiased for local branch history (see Table 2 last column). That means that we used a dynamic branch in our evaluations only if its 16 bit local context is one of the unbiased local contexts. In Table 3, for each benchmark there are presented the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches (the local branch context instances that we found unbiased for local history) and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark.
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Unbiased Dynamic Branches 6812313 17589658 11252986 27692102 31763071 9809360 17486582 5.76% 20.60% 26.42% 38.73% 44.98% 10.80% 24.55% Unbiased Static Branches 25 707 83 62 132 4923 988.66 Polarization Rate (P) [%] [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) 11.94 6.98 16.62 10.09 7.43 4.13 9.53 9.25 5.71 14.36 9.01 6.39 3.14 7.97 8.13 6.18 13.80 10.88 9.89 3.56 8.74

[0.5, 0.6) 14.57 6.87 19.34 8.98 8.46 4.02 10.37

[0.9, 1.0] 56.10 74.26 35.88 61.04 67.83 85.15 63.37

Table 3. Polarization rates of branch contexts on global history of 16 bits evaluating only the unbiased local branch contexts of 16 bits

Continuing the previous methodology, for each benchmark we generated using equation (1) a list of unbiased branch contexts on local and global history of 16 bits, having polarization less than 0.95. The following table compares the prediction accuracies obtained with original GAg (Figure 4) on all branches, and respectively GAg on these unbiased branch contexts. The last column contains the number of unbiased branch context instances and respectively their percentages reported to all dynamic branches. The predictors have the same configuration: one global history
5

register of 16 bits, and 216 entries in the second level (L2size = 65536). For the GAg predictor we used the sim-bpred simulator from Simplesim-3.0, with the following options: -bpred 2lev bpred:2lev 1 65536 16 0.
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Prediction Accuracy Address Direction Address Direction Address Direction Address Direction Address Direction Address Direction Address Direction GAg 0.9850 0.9850 0.9452 0.9452 0.9080 0.9080 0.9241 0.9241 0.8564 0.8564 0.9492 0.9512 0.9279 0.9283 GAg/Unbiased Contexts 0.8250 0.8250 0.7010 0.7011 0.6629 0.6629 0.7363 0.7363 0.6591 0.6591 0.7356 0.7365 0.7199 0.7201 Unbiased Context Instances (P<0.95) 3887052 3.28% 11064817 9969701 20659305 22893014 3563776 12006278 12.95% 23.40% 28.89% 32.41% 3.92% 17.48%

Table 4. Prediction accuracy on the unbiased branch contexts for local and global history of 16 bits.

W bits

Global Pattern History Table (GPHT)

GHR
W

Predicted PC

Prediction bits

L2size

Figure 4. The GAg branch predictor scheme.

Analyzing comparatively Tables 2 and 4, we can observe that the global branch history reduced the average percentage of unbiased branch context instances from 24.55% to 17.48%, and it also increased the average prediction accuracy on all branches from 0.91 with the PAg to 0.92 with the Gag predictor. But the branch contexts that are still unbiased (for local and global history of 16 bits), are more difficult predictable: on these branch contexts, with the GAg predictor, we measured an average prediction accuracy of 0.72. The next feature set we analyzed is the XOR between a global branch history of 16 bits and the lower part of branch address (PC bits 183). We used again only the branch contexts we found unbiased for the previous feature sets (local and global branch history of 16 bits). That means that we used a dynamic branch in our evaluations only if its 16 bit local context is one of the unbiased local contexts (Table 2), and its 16 bit global context is one of the unbiased global contexts (Table 4). In Table 5, for each benchmark there are presented the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark.
SPEC2000 Unbiased Dynamic Unbiased 6 Polarization Rate (P) [%]

Benchmark mcf parser bzip gzip twolf gcc Average

Branches 3887069 11065068 9969757 20659343 22893103 3565197 12006590 3.28% 12.95% 23.40% 28.89% 32.41% 3.92% 17.48%

Static Branches 19 504 76 51 112 2642 567.33

[0.5, 0.6) 30.78 23.84 28.45 20.34 21.11 24.05 24.76

[0.6, 0.7) 25.21 24.27 24.43 22.85 18.53 24.93 23.37

[0.7, 0.8) 19.54 19.87 21.12 20.43 15.93 18.93 19.30

[0.8, 0.9) 17.17 21.56 20.30 24.66 24.69 21.46 21.64

[0.9, 1.0] 7.30 10.46 5.70 11.72 19.75 10.63 10.92

Table 5. Polarization rates on the XOR between global history and branch address on 16 bits evaluating only the unbiased local and global branch contexts of 16 bits For each benchmark we generated again using equation (1), a list of unbiased branch contexts with polarization less than 0.95 (unbiased for local and global history of 16 bits and respectively XOR of global history and branch address on 16 bits). The following table compares the prediction accuracies obtained with original Gshare predictor (Figure 5) on all branches, and respectively Gshare only on the determined unbiased branch contexts. The last column contains for each benchmark the number of unbiased branch context instances and respectively their percentages reported to all dynamic branches. The predictors have the same configuration: one global history register of 16 bits, and 216 entries in the second level (L2size = 65536). For the Gshare predictor we used the sim-bpred simulator from Simplesim-3.0, with the following options: -bpred 2lev bpred:2lev 1 65536 16 1.

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Prediction Accuracy Address Direction Address Direction Address Direction Address Direction Address Direction Address Direction Address Direction

Gshare 0.9849 0.9849 0.9510 0.9510 0.9110 0.9110 0.9231 0.9231 0.8837 0.8837 0.9603 0.9623 0.9356 0.9360

Gshare/Unbiased Contexts 0.8302 0.8302 0.7031 0.7032 0.6563 0.6563 0.7352 0.7352 0.6676 0.6676 0.7388 0.7398 0.7218 0.7220

Unbiased Context Instances (P<0.95) 3887050 3.28% 11063791 9969678 20659290 22892985 3561998 12005798.7 12.95% 23.40% 28.89% 32.41% 3.91% 17.47%

Table 6. Prediction accuracy on the unbiased branch contexts for local and global history of 16 bits and respectively the XOR between global history and branch address on 16 bits. As it can be observed, the XOR of global history and branch address increased the prediction accuracy with the Gshare predictor with almost 1%, but it didnt reduced the percentage of unbiased context instances. The high percentages of unbiased branch context instances in the case of bzip, gzip and twolf benchmarks represent a potential improvement of prediction accuracy.

W bits

GHR

Global Pattern History Table (GPHT)

XOR

Predicted PC

Prediction bits

L2size

Branch address (PC)

W bits

Figure 5. The Gshare branch predictor scheme.

We now want to analyze for the unbiased branch contexts if the taken and respectively not taken outcomes are grouped separately. This study is necessary, because if the taken and not taken outcomes are grouped they are predictable, and if they are shuffled the predictors cannot learn them, and therefore are not predictable. For this study we introduce the distribution index for a certain branch context, defined as follows: 0, nt = 0 D( S i ) = nt 2 min( NT , T ) , n t > 0 where: nt = the number of branch outcome transitions, from taken to not taken and vice-versa, in context Si; 2 min( NT , T ) = maximum number of possible transitions; k = number of distinct contexts, k 2 p , where p is the length of the binary context; if D( S i ) = 1, ( )i = 1, 2, ..., k , then the behavior of the branch in context Si is contradictory (the most unfavorable case), and thus its learning is impossible; if D( S i ) = 0, ( )i = 1, 2, ..., k , then the behavior of the branch in context Si is constant (the most favorable case), and it can be learned.

(3)

We used equation (3) in order to determine the distribution indexes for each unpredictable branch context per benchmark. We evaluated only the dynamic branches having all their contexts unbiased (on local history, global history and respectively XOR of global history and branch address). Table 7 shows for each benchmark the percentages of branch contexts with distribution indexes belonging to five different intervals in the case of local branch history. In the same way, Tables 8 and 9 present the distribution indexes in the case of global history and respectively the XOR between global history and branch address. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark.

Tables 7, 8 and 9 show that in the case of unbiased branch contexts, the taken and respectively not taken outcomes are not grouped separately, more, they are highly shuffled: 76.3% of the unbiased branch contexts have highly shuffled outcomes in the case of local history o f 16 bits (see Table 7), 89.37% of them have highly shuffled outcomes in the case of local and global history of 16 bits (see Table 8), and 89.37% of them have highly shuffled outcomes in the case of local history and XOR of global history and branch address on 16 bits (see Table 9). It can be observed that we obtained the same distribution indexes for both the global history and respectively the XOR between global history and branch address (Tables 8 and 9). A distribution index of 1.0 means the highest possible alternation frequency (with taken or not taken periods of 1). A distribution index of 0.5 means again a high alternation, since, supposing a constant frequency, the taken or not taken periods are only 2, lower than the predictors learning times. In the same manner, periods of 3 introduce a distribution of about 0.25, and periods of 5 generate a distribution index of 0.15, therefore we considered that if the distribution index is lower than 0.2 the taken and not taken outcomes are not highly shuffled, and the behavior of the branch can be learned.
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Unbiased Dynamic Branches 3887069 11064250 9969752 20659339 22893094 3564489 12006332 3.28% 12.95% 23.40% 28.89% 32.41% 3.91% 17.47% Unbiased Static Branches 19 483 75 51 110 2553 548.5 Distribution Rate (D) [%] [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) 11.02 9.50 6.45 5.38 5.81 9.11 7.87 46.30 42.44 44.00 38.70 43.42 33.32 41.36 13.32 9.63 16.80 20.98 16.71 6.00 13.90

[0, 0.2) 9.21 20.23 6.78 5.10 14.63 39.07 15.83

[0.8, 1.0] 20.15 18.19 25.98 29.85 19.43 12.50 21.01

Table 7. Distribution rates on local history of 16 bits evaluating only the branches that were unbiased on all their 16 bit contexts (on local history, global history and respectively XOR of global history and branch address)

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Unbiased Dynamic Branches 3887069 11064250 9969752 20659339 22893094 3564489 12006332 3.28% 12.95% 23.40% 28.89% 32.41% 3.91% 17.47%

Unbiased Static Branches 19 483 75 51 110 2553 548.5

[0, 0.2) 0.27 6.92 0.25 0.26 0.84 8.10 2.77

Distribution Rate (D) [%] [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) 4.30 14.62 2.94 2.18 5.12 18.03 7.86 37.75 36.63 32.24 26.45 26.84 38.66 33.09 34.38 19.33 37.43 35.19 28.44 16.06 28.47

[0.8, 1.0] 23.31 22.50 27.13 35.91 38.75 19.15 27.79

Table 8. Distribution rates on global history of 16 bits evaluating only the branches that have all their 16 bit contexts unbiased (on local history, global history and respectively XOR of global history and branch address)
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Unbiased Dynamic Branches 3887069 11064250 9969752 20659339 22893094 3564489 12006332 3.28% 12.95% 23.40% 28.89% 32.41% 3.91% 17.47% Unbiased Static Branches 19 483 75 51 110 2553 548.5 Distribution Rate (D) [%] [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) 4.30 14.62 2.94 2.18 5.12 18.03 7.86 37.75 36.63 32.24 26.45 26.84 38.66 33.09 34.38 19.33 37.43 35.19 28.44 16.06 28.47

[0, 0.2) 0.27 6.92 0.25 0.26 0.84 8.10 2.77 9

[0.8, 1.0] 23.31 22.50 27.13 35.91 38.75 19.15 27.79

Table 9. Distribution rates on the XOR between global history and branch address on 16 bits evaluating only the branches that have all their 16 bit contexts unbiased (on local history, global history and respectively XOR of global history and branch address).

We continued our evaluations extending the lengths of feature sets from 16 bits to 20, 24 and respectively 28 bits, our hipothesis being that the longer feature sets will increase the polarization index and the prediction accuracy. We started with a local branch history of 20 bits, evaluating again only the branch contexts we found unbiased for the previous feature sets of 16 bits. In Table 10, for each benchmark there are presented the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 10 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 20 bits, global history of 16 bits and XOR of global history and branch address on 16 bits), and respectively their percentage reported to all dynamic branches.
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Unbiased Dynamic Branches 3887050 3.28% 11063878 12.95% 9969651 23.40% 20659242 28.89% 22892904 32.41% 3563213 3.91% 12005990 17.47% Unbiased Static Branches 19 476 75 51 110 2546 546.16 Polarization Rate (P) [%] Unbiased Context Instances [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] (P<0.95) 8.41 7.96 5.28 5.97 72.37 3147989 2.66% 8.50 6.70 3.87 4.44 76.49 7838166 9.18% 8.93 4.69 2.10 2.17 82.11 6493881 15.24% 9.98 7.47 4.55 4.84 73.16 17753722 24.82% 12.79 10.91 5.17 3.93 67.20 17540719 24.83% 7.79 6.31 3.68 4.56 77.66 2061136 2.26% 9.4 7.34 4.10 4.31 74.83 9139269 13.17%

Table 10. Polarization rates on local history of 20 bits evaluating only the branches that have all their 16 bit contexts unbiased (on local history, global history and respectively XOR of global history and branch address). Table 11 shows the results of using a global branch history of 20 bits evaluating only the branches unbiased for local history of 20 bits, global history of 16 bits and respectively XOR of global history and branch address on 16 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 11 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 20 bits, global history of 20 bits and XOR of global history and branch address on 16 bits), and respectively their percentage reported to all dynamic branches.

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Unbiased Dynamic Branches 3148005 2.66% 7838384 9.18% 6493918 15.24% 17753750 24.82% 17540776 24.83% 2062167 2.26% 9139500 13.17%

Unbiased Static Branches 18 446 74 45 103 2299 497.5

Polarization Rate (P) [%] Unbiased Context Instances [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] (P<0.95) 20.06 20.55 13.08 10.60 35.71 3057312 2.58% 15.44 14.61 10.83 11.04 48.09 7166404 8.39% 15.86 17.02 12.45 12.43 42.24 6228047 14.62% 15.32 16.89 15.88 17.75 34.16 17215762 24.07% 13.96 12.79 11.63 17.61 44.00 16240443 22.99% 14.59 13.77 9.35 9.93 52.37 1767385 1.94% 15.87 15.93 12.20 13.22 42.76 8612559 12.43% 10

Table 11. Polarization rates on global history of 20 bits evaluating only the unbiased branches on local history of 20 bits, global history of 16 bits, and the XOR of global history and branch address on 16 bits. In the same manner, Table 12 shows the results of using a XOR of 20 bits between global history and branch address, evaluating only the branches unbiased for local history of 20 bits, global history of 20 bits and respectively XOR of global history and branch address on 16 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 12 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 20 bits, global history of 20 bits and XOR of global history and branch address on 20 bits), and respectively their percentage reported to all dynamic branches.

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Unbiased Dynamic Branches 3057327 2.58% 7166723 8.39% 6228107 14.62% 17215799 24.07% 16240535 22.99% 1769008 1.94% 8612917 12.43%

Unbiased Static Branches 18 429 73 45 101 2019 447.5

Polarization Rate (P) [%] Unbiased Context [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] Instances (P<0.95) 30.53 31.28 19.91 16.14 2.13 3057309 2.58% 27.62 26.16 19.37 19.76 7.08 7166215 8.39% 26.21 28.12 20.57 20.53 4.57 6228010 14.62% 20.78 22.96 21.58 24.13 10.55 17215749 24.07% 21.26 19.48 17.70 26.81 14.74 16240434 22.99% 28.28 26.84 18.17 19.29 7.41 1766800 1.94% 25.78 25.80 19.55 21.11 7.74 8612420 12.43%

Table 12. Polarization rates on the XOR of 20 bits between global history and branch address evaluating only the branches unbiased for local history of 20 bits, global history of 20 bits and respectively XOR of global history and branch address on 16 bits.

As it can be observed a considerable number of unbiased branches become biased if the feature sets are extended from 16 bits to 20 bits. Extending the feature set length from 16 bits to 20 bits, the percentage of unbiased dynamic branches decreased from 17.47% (see Table 6) to 12.43% (Table 12), at average. Using the same simulation methodology, we extend the feature sets to 24 bits. Tables 13, 14 and 15 show the results of using a local history of 24 bits, a global history of 24 bits and respectively a XOR of 24 bits between global history and branch address. Table 13 shows the results of using a local branch history of 24 bits evaluating only the branches unbiased for local history of 20 bits, global history of 20 bits and respectively XOR of global history and branch address on 20 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 13 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 24 bits, global history of 20 bits and XOR of global history and branch address on 20 bits), and respectively their percentage reported to all dynamic branches.
SPEC2000 Benchmark mcf parser Unbiased Dynamic Branches 3057318 2.58% 7166415 8.39% Unbiased Polarization Rate (P) [%] Unbiased Context Static [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] Instances Branches (P<0.95) 18 9.04 7.95 4.59 5.41 73.01 2632531 2.22% 424 10.88 8.16 4.19 4.44 72.34 5083585 5.95% 11

bzip gzip twolf gcc Average

6228031 17215734 16240411 1768113 8612670

14.62% 24.07% 22.99% 1.94% 12.43%

73 45 101 1980 440.16

8.41 9.20 10.14 11.73 9.9

4.71 6.19 5.40 9.02 6.90

2.46 3.64 2.21 5.11 3.7

2.84 4.19 1.95 6.14 4.16

81.59 76.78 80.31 68.00 75.33

4250654 13753938 12308193 1227407 6542718

9.98% 19.23% 17.42% 1.35% 9.36%

Table 13. Polarization rates on local history of 24 bits only for branches that were unbiased on all their 20 bit contexts (on local history, global history and respectively XOR of global history and branch address). Table 14 shows the results of using a global branch history of 24 bits evaluating only the branches unbiased for local history of 24 bits, global history of 20 bits and respectively XOR of global history and branch address on 20 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 14 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 24 bits, global history of 24 bits and XOR of global history and branch address on 20 bits), and respectively their percentage reported to all dynamic branches.
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Unbiased Dynamic Branches 2632542 2.22% 5083795 5.95% 4250689 9.98% 13753960 19.23% 5459637 17.42% 1228364 1.35% 5401498 9.36% Unbiased Static Branches 18 414 73 44 93 1856 416.33 Polarization Rate (P) [%] Unbiased Context [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] Instances (P<0.95) 15.20 13.79 7.13 5.90 57.98 2568911 2.17% 18.82 16.61 10.90 10.41 43.25 4664394 5.46% 12.10 11.31 7.12 7.60 61.87 3799893 8.92% 18.43 18.17 15.37 16.36 31.67 13480788 18.85% 16.99 14.90 10.91 13.88 43.32 5144339 7.28% 17.16 14.61 9.94 10.15 48.14 1097445 1.20% 16.45 14.89 10.22 10.71 47.70 5125962 7.31%

Table 14. Polarization rates on global history of 24 bits evaluating only the branches unbiased for local history of 24 bits, global history of 20 bits and respectively XOR of global history and branch address on 20 bits.

Table 15 shows the results of using the XOR of global branch history and branch address on 24 bits evaluating only the branches unbiased for local history of 24 bits, global history of 24 bits and respectively XOR of global history and branch address on 20 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 15 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 24 bits, global history of 24 bits and XOR of global history and branch address on 24 bits), and respectively their percentage reported to all dynamic branches.
SPEC2000 Benchmark mcf parser bzip gzip twolf Unbiased Dynamic Branches 2568928 2.17% 4664693 5.46% 3799936 8.92% 13480825 18.85% 5144419 7.28% Unbiased Static Branches 18 398 72 41 89 Polarization Rate (P) [%] Unbiased Context [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] Instances (P<0.95) 35.55 32.24 16.67 13.79 1.75 2568910 2.17% 31.21 27.52 18.08 17.25 5.93 4664273 5.46% 30.43 28.45 17.91 19.13 4.07 3799859 8.92% 24.64 24.29 20.55 21.87 8.66 13480783 18.85% 27.03 23.73 17.38 22.10 9.76 5144327 7.28% 12

gcc Average

1098795 5126266

1.20% 7.31%

1668 381

30.73 29.93

26.27 27.08

17.87 18.07

18.39 18.75

6.75 6.15

1097009 5125860

1.20% 7.31%

Table 15. Polarization rates on the XOR of 24 bits between global history and branch address evaluating only the branches unbiased for local history of 24 bits, global history of 24 bits and respectively XOR of global history and branch address on 20 bits.

Extending the feature set length from 20 bits to 24 bits, the percentage of unbiased dynamic branches decreased from 12.43% (see Table 12) to 7.31% (Table 15), at average. We extended again the feature sets to 28 bits. Tables 16, 17 and 18 show the results of using a local history of 28 bits, a global history of 28 bits and respectively a XOR of 28 bits between global history and branch address. Table 16 shows the results of using a local branch history of 28 bits evaluating only the branches unbiased for local history of 24 bits, global history of 24 bits and respectively XOR of global history and branch address on 24 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 16 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 28 bits, global history of 24 bits and XOR of global history and branch address on 24 bits), and respectively their percentage reported to all dynamic branches. As it can be observed, in the case of the gcc benchmark, extending the feature set length to 28 bits, the percentage of the unbiased context instances is less than the threshold T of 1% (see equation (2)), and so we eliminate it from our next evaluations. We consider that the conditional branches from the gcc benchmark are not difficult predictable using feature lengths of 28 bits. At the computation of all values from the Average row of Table 16, we omitted the results obtained with the gcc benchmark, since it is eliminated from our evaluations.
SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average Unbiased Dynamic Branches 2568923 2.17% 4664502 5.46% 3799904 8.92% 13480777 18.85% 5144325 7.28% 1098269 1.20% 5931686 8.54% Unbiased Static Branches 18 395 71 41 87 1644 122.4 Polarization Rate (P) [%] Unbiased Context Instances [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] (P<0.95) 10.62 8.64 4.69 5.35 70.69 2174101 1.83% 11.17 7.09 3.72 4.07 73.95 3301587 3.86% 10.16 5.90 3.04 3.59 77.30 2728593 6.40% 9.76 6.14 3.50 4.14 76.46 10691142 14.95% 9.03 4.44 2.81 3.76 79.96 4208376 5.95% 13.68 10.29 5.68 6.76 63.59 774654 0.85% 10.14 6.44 3.55 4.18 75.67 4620759 6.60%

Table 16. Polarization rates on local history of 28 bits only for branches that were unbiased on all their 24 bit contexts (on local history, global history and respectively XOR of global history and branch address)

Table 17 shows the results of using a global branch history of 28 bits evaluating only the branches unbiased for local history of 28 bits, global history of 24 bits and respectively XOR of global history and branch address on 24 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 17 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 28 bits, global history of 28 bits and XOR of global history and branch address on 24 bits), and respectively their percentage reported to all dynamic branches.
13

8SPEC2000 Benchmark mcf parser bzip gzip twolf Average

Unbiased Dynamic Branches 2174117 1.83% 3301768 3.86% 2728627 6.40% 10691161 14.95% 4208418 5.95% 4620818 6.60%

Unbiased Static Branches 18 370 69 41 85 116.6

Polarization Rate (P) [%] Unbiased Context Instances [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] (P<0.95) 15.41 11.53 6.18 5.29 61.60 2149108 1.81% 21.26 17.06 10.39 10.18 41.11 3041426 3.56% 11.81 8.86 5.07 5.55 68.72 2280197 5.35% 19.36 17.05 13.50 14.84 35.25 10405692 14.55% 16.53 14.43 10.21 13.55 45.29 4007088 5.67% 16.87 13.78 9.07 9.88 50.39 4376702 6.19%

Table 17. Polarization rates on global history of 28 bits evaluating only the branches unbiased for local history of 28 bits, global history of 24 bits and respectively the XOR of global history and branch address on 24 bits.

Finally, Table 18 shows the results of using the XOR of global branch history and branch address on 28 bits evaluating only the branches unbiased for local history of 28 bits, global history of 28 bits and respectively XOR of global history and branch address on 24 bits. The column Polarization Rate presents the percentages of branch contexts with polarization indexes belonging to five different intervals. The column Unbiased Dynamic Branches contains the number of simulated dynamic branches and respectively their percentages reported to all dynamic branches. The column Unbiased Static Branches represents the number of static branches simulated within each benchmark. The last column of Table 18 shows for each benchmark the number of unbiased dynamic branches (unbiased for local history of 28 bits, global history of 28 bits and XOR of global history and branch address on 28 bits), and respectively their percentage reported to all dynamic branches.
SPEC2000 Benchmark mcf parser bzip gzip twolf Average Unbiased Dynamic Branches 2149125 1.81% 3041691 3.56% 2280240 5.35% 10405726 14.55% 4007152 5.67% 4376787 6.19% Unbiased Static Branches 18 357 69 41 82 113.4 Polarization Rate (P) [%] Unbiased Context [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0] Instances (P<0.95) 39.26 29.37 15.73 13.46 2.17 2149107 1.81% 34.21 27.48 16.71 16.39 5.22 3041301 3.56% 36.29 27.22 15.57 17.05 3.87 2280161 5.35% 27.56 24.28 19.22 21.13 7.81 10405684 14.55% 27.73 24.21 17.12 22.73 8.21 4007068 5.67% 33.01 26.51 16.87 18.15 5.45 4376664 6.19%

Table 18. Polarization rates on the XOR of 28 bits between global history and branch address evaluating only the branches unbiased for local history of 28 bits, global history of 28 bits and respectively the XOR of global history and branch address on 24 bits.

Extending the feature set length from 24 bits to 28 bits, the percentage of unbiased dynamic branches decreased from 7.31% (see Table 15) to 6.19% (see Table 18), at average. Despite of the feature set extension, the number of unbiased dynamic branches remains still high (6.19%), and thus, it is obvious that it is not sufficient only to use longer feature sets.

Dynamic Unpolarized Contexts [%]

30 25 20 15 10 5 0 16 bits 20 bits 24 bits 28 bits Feature Set Length LH GH GH xor PC

Figure 6. Reduction of average percentages of unbiased context instances (P<0.95) by extending the lengths of feature sets.

The global history solves at average 2.56% of the unbiased dynamic branches not solved with local history (see Figure 6). The hashing between global history and branch address (XOR) behaves just like the global history, and it does not improve the polarization rate f urther. In Figure 6 can be also observed that increasing the branch history, the percentage of unbiased dynamic branches decreases, suggesting a correlation between branches situated at a large distance in the dynamic instruction stream. The results also show that the ultimative predictibility limit of context-based prediction is approximatively 94%. A conclusion based on our simulation methodology is that 94% of dynamic branches can be solved with prediction information of up to 28 bits (some of them are solved with 16 bits, others with 20, 24 or 28 bits).

20 18 16 14 12 10 8 6 4 2 0 16 bits 20 bits 24 bits 28 bits Feature Set Length

Figure 7. Reduction of the percentage of unbiased branch context instances by each feature length extension.

Taking into account that increasing the prediction accuracy with 1%, the IPC (instructionsper-cycle) is improved with more than 1% (it grows not linearly), there are great chances to obtain considerably better overall performances even if not all of the 6.19% of difficult predictable
15

Unpolarized Contexts [%]

branches will be solved. Therefore, we consider that this 6.19% represents a significant percentage of unbiased branch context instances, and in the same time a good improvement potential in terms of prediction accuracy and IPC. Focalising on these unbiased branches in order to design some efficient path-based predictors for them [8], [13] the overall prediction accuracy should increase with some percents, that is quite remarkable. The simulation results also lead to the conclusion that as higher is the feature set length used in the prediction process, as higher is the branch polarization index and hopefully the prediction accuracy (Figure 7). A certain large context (e.g. 100 bits) due to its better precision has lower occurance probability than a smaller one, and higher dispersion (the dispertion grows exponentially). Thus, very large contexts can significantly improve the branch polarization and the prediction accuracy too. However, they are not always feasable for hardware implementation. The question is: what feature set length is really feasable for hardware implementation, and more important, in this case, which is the solution regarding the unbiased branches? In our opinion, a feasable solution in this case could be given by path-predictors. The path information could be a solution for relatively short contexts (low correlations). Our hypothesis is that short contexts used together with path information should replace significantly longer contexts, providing the same prediction accuracy. A common criticism for all the present two-level adaptive branch prediction schemes consists in the fact that they used insufficient global correlation information [A]. There are situations when for the same static branch and in the same global history context pattern it is possible to find different targets. If each bit belonging to the global history will be associated during the prediction process with its corresponding PC, the context of the current branch becomes more precisely, and therefore its prediction accuracy could be better. Our next goal is to extend the correlation information with the path, according to the above idea [A]. Extending the correlation information in this way, suggests that at different occurrences of a certain static branch with the same global branch context, the path contexts can be different. In our further work, we want to increase through the path information the polarization rate, hopefully improving in this way the prediction accuracy. We started our evaluations regarding the path, studying the gain obtained by introducing the path of different lengths. The analyzed feature consists of a global branch history of 16 bits and the last p PCs. We applied this feature only to dynamic branches that we found unbiased (P<0.95) for local and global history of 16 bits and respectively XOR of global history and branch address on 16 bits. Benchmark lh16->gh16-> lh16->gh16-> lh16->gh16-> lh16->gh16-> lh16->gh16-> xor16 xor16->path1 xor16-path16 xor16->path20 xor16->lh20 bzip 23.40% 23.35% 22.16% 20.38% 15.24% gzip 28.89% 28.88% 28.17% 27.51% 24.82% mcf 3.28% 3.28% 3.28% 3.20% 2.66% parser 12.95% 12.89% 12.01% 10.95% 9.18% twolf 32.41% 32.41% 31.46% 27.10% 24.83% gcc 3.91% 3.91% 3.56% 3.02% 2.26% Average 17.47% 17.45% 16.77% 15.36% 13.17% Gain 0.02% 0.70% 2.11% 4.30% Table 19. The gain introduced by the path of different lengths (1, 16, 20 PCs) versus the gain introduced by extended local history (20 bits). The column lh16->gh16->xor16 presents the percentage of unbiased context instances for each benchmark. Columns lh16->gh16->xor16->path1, lh16->gh16->xor16->path16 and lh16-> gh16->xor16->path20 presents the percentages of unbiased context instances obtained using a global history of 16 bits and a path of 1, 16 and respectively 20 PCs. The last column presents the percentages of unbiased context instances extending the local history to 20 bits (without path). For each feature is presented the gain. It can be observed that a path of 1 introduces a not significant gain of 0.2%. Even a path of 20 introduces a gain of only 2.11% related to the more significant gain
16

of 4.30% introduced by an extended local branch history of 20 bits. The results show (Table 19) that the path is useful only in the case of short contexts. Thus, a branch history of 16 bits compresses and approximates well the path information. In other words, a branch history of 16 bits spreads well the different paths that lead to a certain dynamic branch.

Benchmark p=1 p=4 p=8 p=12 p=16 bzip 58.54% 39.00% 37.24% 35.08% 32.41% gzip 49.85% 45.93% 43.58% 35.67% 34.10% mcf 27.85% 21.30% 6.38% 5.89% 6.35% parser 57.75% 44.64% 36.37% 30.63% 27.25% twolf 67.49% 59.07% 51.28% 43.51% 37.12% gcc 34.17% 26.34% 17.65% 12.61% 9.51% Average 49.28% 39.38% 32.08% 27.23% 24.46% Table 20. The percentages of unbiased context instances using only the global history of p bits.

Benchmark p=1 p=4 p=8 p=12 p=16 bzip 38.99% 36.93% 34.41% 32.16% 30.15% gzip 48.53% 44.81% 42.20% 34.45% 33.21% mcf 26.01% 20.98% 6.23% 5.85% 6.48% parser 48.42% 39.50% 32.13% 27.48% 24.66% twolf 62.65% 55.68% 49.47% 42.60% 35.81% gcc 28.51% 20.42% 13.84% 10.53% 8.44% Average 42.19% 36.39% 29.71% 25.51% 23.13% Table 21. The percentages of unbiased context instances using as feature the global history of p bits together with the path of p PCs.

It the case of the mcf benchmark we obtained higher percentage of unbiased context instances when we extended the correlation information (Table 21) from 12 bits of global history and 12 PCs (p=12) to 16 bits of global history and 16 PCs (p=16). This growth is possible because a certain biased context (P=0.95), through extension is splitted into more contexts, and some of these longer contexts can be unbiased (P<0.95), thus increasing the number of unbiased branches.

Unbiased Context Instances

55,00% 50,00% 45,00% 40,00% 35,00% 30,00% 25,00% 20,00% p=1 p=4 p=8 p=12 p=16 Context Length

GH (p bits) GH (p bits) + PATH (p PCs)

Figure 8. The gain introduced by the path for different context lengths.
17

As it can be observed in Figure 8, an important gain is obtained through path in the case of short contexts (p<16). A branch history that is longer than 16 bits, compresses well the path information, and therefore, in this cases, the gain introduced by the path is not significant.

Conclusions The simulations show that the path is relevant for better polarization rate and prediction accuracy only in the case of short contexts. In our further work, we can try to reduce the path information extracting and using only the most important bits. Thus, the path information could be built using only a part of the branch address instead of all the 32 bits of the complete PC. We want to analyze other correlation information, too: we want to study if there is some correlation between branch behavior and some important registers (e.g. stack pointer). We also want to study some longer contexts. One of them could be a concatenation of the local history with the global history. These new contexts, being longer then the previously studied contexts, have higher precision, higher dispersion, and therefore, lower occurrence probability. Thus, for a context of 64 bits (32 bits of local history concatenated with 32 bits of global history), we expect to obtain considerably higher polarization rates and, as a consequence, better prediction accuracies. For simulations that use these longer contexts we need computers with more memory than we have at this time. The next stage of the work will consist in exploiting the information regarding the branch polarization. Thus, we can pre-train a perceptron only with dynamic branches that have polarization index greater than 0.95, avoiding in this way the contradictory behavior of the unbiased branches that is difficult to be learned. Pre-training the perceptron with the biased branches, we expect to obtain higher prediction accuracy and superior overall performances (IPC) to those of the original perceptron.

References [1] [E]Chaver D., Pinuel L., Prieto M., Tirado F., Huang M., Branch Prediction On Demand: an Energy-Efficient Solution, ISLPED03, August 2527, 2003, Seoul, Korea. [2] [F]Jimnez D., Fast Path-Based Neural Branch Prediction, Proceedings of the 36th Annual International Symposium on Microarchitecture, December 2003. [3] Jimnez D., Improved Latency and Accuracy for Neural Branch Prediction, ACM Transactions on Computer Systems (TOCS), Vol. 23, No. 2, May 2005. [4] Jimnez D., Piecewise Linear Branch Prediction, Proceedings of the 32nd International Symposium on Computer Architecture (ISCA-32), June 2005. [5] [C]Jimnez D., Lin C., Neural Methods for Dynamic Branch Prediction, ACM Transactions on Computer Systems, Vol. 20, No. 4, November 2002. [6] Loh G. H., Jimnez D., A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2005. [7] [B]Loh G. H., Jimnez D., Reducing the Power and Complexity of Path-Based Neural Branch Prediction, 5th Workshop on Complexity Effective Design (WCED5), June 2005. [8] Nair R., Dynamic Path-Based Branch Correlation, IEEE Proceedings of MICRO-28, 1995. [9] [D]Seznec A., Felix S., Krishnan V., Sazeides Y., Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor, Proceedings of the 29th International Symposium on Computer Architecture, Anchorage, AK, USA, May 2002. [10] Simplescalar The SimpleSim Tool Set , ftp://ftp.cs.wisc.edu/pub/sohi/Code/simplescalar. [11] SPEC, The SPEC benchmark programs, http://www.spec.org.
18

[12] Tarjan D., Scadron K., Merging Path and GshareIndexing in Perceptron Branch Prediction, ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 3, September 2005. [13] [A]Vintan L., Egan C., Extending Correlation in Branch Prediction Schemes, International Euromicro99 Conference, Italy, September 1999. [14] Yeh T.-Y., Patt Y. N., A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History, Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, California, May 1993.

Philosophy, Goals and Objectives
No ratings yet
Philosophy, Goals and Objectives
31 pages
Unit 4-Solvent Extraction
75% (4)
Unit 4-Solvent Extraction
58 pages
Ambleside Online Year 4 36-Week Schedule
No ratings yet
Ambleside Online Year 4 36-Week Schedule
19 pages
The Alternative Prosthesis - Final Report Internship Sri Lanka 2002 - W.D.van Dorsser and B.M.wisse
100% (1)
The Alternative Prosthesis - Final Report Internship Sri Lanka 2002 - W.D.van Dorsser and B.M.wisse
38 pages
Social Collapse Best Practices - by Dmitry Orlov
100% (1)
Social Collapse Best Practices - by Dmitry Orlov
15 pages
FUJITSU SoftwareServerView Suite Remote Management
No ratings yet
FUJITSU SoftwareServerView Suite Remote Management
426 pages
9 Types of Two Level Branch Predictor
No ratings yet
9 Types of Two Level Branch Predictor
4 pages
02c BranchPred
No ratings yet
02c BranchPred
35 pages
Module 3 - Spring 2019 (Compatibility Mode) PDF
No ratings yet
Module 3 - Spring 2019 (Compatibility Mode) PDF
60 pages
Malarity and Strength of KMnO4
No ratings yet
Malarity and Strength of KMnO4
2 pages
CA L15b BranchPrediction DynamicPredictors
No ratings yet
CA L15b BranchPrediction DynamicPredictors
25 pages
10 Branchprediction
No ratings yet
10 Branchprediction
49 pages
Chap007 1 PDF
No ratings yet
Chap007 1 PDF
69 pages
2 Level Type
No ratings yet
2 Level Type
14 pages
Numerical Methods
No ratings yet
Numerical Methods
25 pages
8 - Branch Prediction
No ratings yet
8 - Branch Prediction
29 pages
Branch Net
No ratings yet
Branch Net
13 pages
S - C L - C++ H F T: B B P H: EMI Static Onditions in OW Latency FOR IGH Requency Rading Etter Than Ranch Rediction Ints
No ratings yet
S - C L - C++ H F T: B B P H: EMI Static Onditions in OW Latency FOR IGH Requency Rading Etter Than Ranch Rediction Ints
53 pages
IPAQ - AUTOMATIC REPORT - Kuisioner
No ratings yet
IPAQ - AUTOMATIC REPORT - Kuisioner
20 pages
Lect09 Adv Branch Prediction
No ratings yet
Lect09 Adv Branch Prediction
55 pages
1 s2.0 S0925231223007609 Main
No ratings yet
1 s2.0 S0925231223007609 Main
20 pages
N Citare U Virginia
No ratings yet
N Citare U Virginia
17 pages
Lec4 Supp Branch Prediction
No ratings yet
Lec4 Supp Branch Prediction
45 pages
Application Note: Thermal Management of Golden Dragon LED
No ratings yet
Application Note: Thermal Management of Golden Dragon LED
11 pages
Bioadsorben Kulit Pisang Kepok (Musa Acuminate L.) Dalam Menurunkan Kadar Timbal (PB) Pada Larutan PB
No ratings yet
Bioadsorben Kulit Pisang Kepok (Musa Acuminate L.) Dalam Menurunkan Kadar Timbal (PB) Pada Larutan PB
7 pages
L10 PipelineHazards 3
No ratings yet
L10 PipelineHazards 3
35 pages
L11 PipelineHazards 4
No ratings yet
L11 PipelineHazards 4
30 pages
Electric Circuits
No ratings yet
Electric Circuits
10 pages
Soal Bahasa Inggris Kelas 9 SMP/MTs - Report Text
100% (10)
Soal Bahasa Inggris Kelas 9 SMP/MTs - Report Text
2 pages
17.L15 BranchPrediction
No ratings yet
17.L15 BranchPrediction
38 pages
Branch Prediction
No ratings yet
Branch Prediction
41 pages
DS 2df8236i Ael
No ratings yet
DS 2df8236i Ael
4 pages
Freud's Psychoanalytic Theory
No ratings yet
Freud's Psychoanalytic Theory
9 pages
Hurricanes Grade5
No ratings yet
Hurricanes Grade5
3 pages
TM AHU 60R410A Onoff T SA NA 171205
No ratings yet
TM AHU 60R410A Onoff T SA NA 171205
67 pages
Axial Fans PDF
No ratings yet
Axial Fans PDF
10 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
WRL TN 36
No ratings yet
WRL TN 36
29 pages
Improving Branch Prediction by Modeling Global History With Convolutional Neural Networks
No ratings yet
Improving Branch Prediction by Modeling Global History With Convolutional Neural Networks
6 pages
8 DynamicBranchPrediction
No ratings yet
8 DynamicBranchPrediction
8 pages
Building Technology 1 - Building Materials: Midterm Project
No ratings yet
Building Technology 1 - Building Materials: Midterm Project
68 pages
07 Branch Prediction
No ratings yet
07 Branch Prediction
35 pages
An Attention-Based CNN Algorithm To Predict Hard-To-Predict Branches
No ratings yet
An Attention-Based CNN Algorithm To Predict Hard-To-Predict Branches
5 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Water Pitcher Filter EN: MODELS: Time, Agate, Amethyst, Orion, Jasper, Compact, Onyx Operating Manual
No ratings yet
Water Pitcher Filter EN: MODELS: Time, Agate, Amethyst, Orion, Jasper, Compact, Onyx Operating Manual
2 pages
Branch Prediction
No ratings yet
Branch Prediction
38 pages
Branch Pred
No ratings yet
Branch Pred
27 pages
Branch Predictors
No ratings yet
Branch Predictors
41 pages
Anch Prediction
No ratings yet
Anch Prediction
25 pages
Aca Unit-4 Notes
No ratings yet
Aca Unit-4 Notes
23 pages
Branch Prediction: Case For Branch Prediction When Issue N Instructions Per Clock Cycle
No ratings yet
Branch Prediction: Case For Branch Prediction When Issue N Instructions Per Clock Cycle
13 pages
Cs146-Lecture7 2
No ratings yet
Cs146-Lecture7 2
17 pages
CA L15a BranchPrediction Intro and StaticPredictors
No ratings yet
CA L15a BranchPrediction Intro and StaticPredictors
19 pages
9.1.0 Branch Prediction Pentiums IBM PPC
No ratings yet
9.1.0 Branch Prediction Pentiums IBM PPC
163 pages
Williams - 2017 Risk and Project Management
No ratings yet
Williams - 2017 Risk and Project Management
13 pages
Axle Fabco FSD-XA
No ratings yet
Axle Fabco FSD-XA
3 pages
Инструкция Panasonic KX-TCD150FXC (77 страницы)
No ratings yet
Инструкция Panasonic KX-TCD150FXC (77 страницы)
3 pages
L12 - Advanced Branch Preiction
No ratings yet
L12 - Advanced Branch Preiction
9 pages
RISC-V Pipeline P3
No ratings yet
RISC-V Pipeline P3
24 pages
Ue21ec341b 20240412163937
No ratings yet
Ue21ec341b 20240412163937
22 pages
Software-Based and Hardware-Based Branch Prediction Strategies and Performance Evaluation
No ratings yet
Software-Based and Hardware-Based Branch Prediction Strategies and Performance Evaluation
19 pages
Dynamic Branch Prediction With Perceptrons
No ratings yet
Dynamic Branch Prediction With Perceptrons
10 pages
Lecture #3
No ratings yet
Lecture #3
12 pages
BranchNet A Convolutional Neural Network To Predict Hard-To-Predict Branches
No ratings yet
BranchNet A Convolutional Neural Network To Predict Hard-To-Predict Branches
13 pages
Network Worksheet
No ratings yet
Network Worksheet
9 pages
Computer Architecture: Branching
No ratings yet
Computer Architecture: Branching
37 pages
A240CX-BD CD DD Flameproof Coil Solenoid Valves PDF
No ratings yet
A240CX-BD CD DD Flameproof Coil Solenoid Valves PDF
1 page
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
No ratings yet
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
26 pages
البحث الثاني
No ratings yet
البحث الثاني
10 pages
Exploring Convolution Neural Network For Branch Prediction
No ratings yet
Exploring Convolution Neural Network For Branch Prediction
9 pages
The Bi-Mode Branch Predictora
No ratings yet
The Bi-Mode Branch Predictora
11 pages
Branch Prediction Maryamhamza
No ratings yet
Branch Prediction Maryamhamza
12 pages
Branch Prediction: Joel Emer
No ratings yet
Branch Prediction: Joel Emer
36 pages
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
No ratings yet
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
18 pages
Branch Prediction: Jeroen Lichtenauer
No ratings yet
Branch Prediction: Jeroen Lichtenauer
23 pages
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
No ratings yet
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
20 pages
STNW3511 Dynamic Standard For Low Voltage EG Connections
No ratings yet
STNW3511 Dynamic Standard For Low Voltage EG Connections
54 pages
Branch Prediction
No ratings yet
Branch Prediction
6 pages
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
No ratings yet
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
22 pages
Branch Handling
No ratings yet
Branch Handling
23 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
Branch Prediction
No ratings yet
Branch Prediction
5 pages
MPMC Module 3
No ratings yet
MPMC Module 3
3 pages
Branch Prediction
No ratings yet
Branch Prediction
2 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages
Cems A 6 Part I Appx
No ratings yet
Cems A 6 Part I Appx
15 pages
Candy Crossword
No ratings yet
Candy Crossword
3 pages
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
From Everand
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Redouane MEDDANE
No ratings yet
ROUTING INFORMATION PROTOCOL: RIP DYNAMIC ROUTING LAB CONFIGURATION
From Everand
ROUTING INFORMATION PROTOCOL: RIP DYNAMIC ROUTING LAB CONFIGURATION
Mulayam Singh
No ratings yet
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Finding Difficult Branches

Uploaded by

Finding Difficult Branches

Uploaded by

FINDING DIFFICULT PREDICTABLE BRANCHES

f 0 , f 0 0.5 P( S i ) = max( f 0 , f 1 ) = f 1 , f 0 < 0.5

All conditional branches

Branches unpolarized on local history

Branches unpolarized on global history

Branches unpolarized on global history

Branches unpolarized on GHR XOR PC

A certain Feature Set

Branches unpolarized on the previous Feature Sets

Branches unpolarized on this Feature Set

Predictor working with this Feature Set

Figure 1. Simulation Methodology

NUBi = 0.01 NBi

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Dynamic Branches 118321124 85382841 42591123 71504537 70616018 90868660 79880717

Static Branches 370 1777 211 136 239 17248 3330.16

[0.5, 0.6) 10.06 6.67 15.86 15.08 14.49 3.06 10.87

[0.9, 1.0] 62.74 79.19 52.12 48.76 60.54 90.24 65.59

Per-address Branch History Table (PBHT)

LHR 0 PChigh PClow LHR k

Global Pattern History Table (GPHT)

Figure 3. The PAg branch predictor scheme.

[0.5, 0.6) 14.57 6.87 19.34 8.98 8.46 4.02 10.37

[0.9, 1.0] 56.10 74.26 35.88 61.04 67.83 85.15 63.37

Global Pattern History Table (GPHT)

Figure 4. The GAg branch predictor scheme.

Benchmark mcf parser bzip gzip twolf gcc Average

Static Branches 19 504 76 51 112 2642 567.33

[0.5, 0.6) 30.78 23.84 28.45 20.34 21.11 24.05 24.76

[0.6, 0.7) 25.21 24.27 24.43 22.85 18.53 24.93 23.37

[0.7, 0.8) 19.54 19.87 21.12 20.43 15.93 18.93 19.30

[0.8, 0.9) 17.17 21.56 20.30 24.66 24.69 21.46 21.64

[0.9, 1.0] 7.30 10.46 5.70 11.72 19.75 10.63 10.92

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Global Pattern History Table (GPHT)

Branch address (PC)

Figure 5. The Gshare branch predictor scheme.

[0, 0.2) 9.21 20.23 6.78 5.10 14.63 39.07 15.83

[0.8, 1.0] 20.15 18.19 25.98 29.85 19.43 12.50 21.01

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Unbiased Static Branches 19 483 75 51 110 2553 548.5

[0, 0.2) 0.27 6.92 0.25 0.26 0.84 8.10 2.77

[0.8, 1.0] 23.31 22.50 27.13 35.91 38.75 19.15 27.79

[0, 0.2) 0.27 6.92 0.25 0.26 0.84 8.10 2.77 9

[0.8, 1.0] 23.31 22.50 27.13 35.91 38.75 19.15 27.79

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Unbiased Static Branches 18 446 74 45 103 2299 497.5

SPEC2000 Benchmark mcf parser bzip gzip twolf gcc Average

Unbiased Static Branches 18 429 73 45 101 2019 447.5

bzip gzip twolf gcc Average

6228031 17215734 16240411 1768113 8612670

14.62% 24.07% 22.99% 1.94% 12.43%

73 45 101 1980 440.16

8.41 9.20 10.14 11.73 9.9

4.71 6.19 5.40 9.02 6.90

2.46 3.64 2.21 5.11 3.7

2.84 4.19 1.95 6.14 4.16

81.59 76.78 80.31 68.00 75.33

4250654 13753938 12308193 1227407 6542718

9.98% 19.23% 17.42% 1.35% 9.36%

8SPEC2000 Benchmark mcf parser bzip gzip twolf Average

Unbiased Static Branches 18 370 69 41 85 116.6

Dynamic Unpolarized Contexts [%]

30 25 20 15 10 5 0 16 bits 20 bits 24 bits 28 bits Feature Set Length LH GH GH xor PC

20 18 16 14 12 10 8 6 4 2 0 16 bits 20 bits 24 bits 28 bits Feature Set Length

Unpolarized Contexts [%]

Unbiased Context Instances

GH (p bits) GH (p bits) + PATH (p PCs)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.