0% found this document useful (0 votes)
59 views8 pages

Performance Analysis and Loop Optimization: Winning With High-K 45nm Technology

loop

Uploaded by

hquynh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views8 pages

Performance Analysis and Loop Optimization: Winning With High-K 45nm Technology

loop

Uploaded by

hquynh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

4/1/2010

Performance Analysis and Loop Optimization


David Levinthal Principal Engineer SSG/DPD SSG/

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
1

Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELS TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Merom, Penryn, Hapertown, Nehalem, Dothan, Westmere, Sandy Bridge, and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Intel, Intel Inside, Core, Pentium, SpeedStep, and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright 2009 Intel Corporation.

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
2

4/1/2010

Risk Factors
This presentation contains forward-looking statements that involve a number of risks and uncertainties. These statements do not reflect the potential impact of any mergers, acquisitions, divestitures, investments or other similar transactions that may be completed in the future. The information presented is accurate only as of todays date and will not be updated. In addition to any factors discussed in the presentation, the important factors that could cause actual results to differ materially include the following: Demand could be different from Intel's expectations due to factors including changes in business and economic conditions, including conditions in the credit market that could affect consumer confidence; customer acceptance of Intels and competitors products; changes in customer order patterns, including order cancellations; and changes in the level of inventory at customers. Intels results could be affected by the timing of closing of acquisitions and divestitures. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of new Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intels response to such actions; Intels ability to respond quickly to technological developments and to incorporate new features into its products; and the availability of sufficient supply of components from suppliers to meet demand. The gross margin percentage could vary significantly from expectations based on changes in revenue levels; product mix and pricing; capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; excess or obsolete inventory; manufacturing yields; changes in unit costs; impairments of long-lived assets, including manufacturing, assembly/test and intangible assets; and the timing and execution of the manufacturing ramp and associated costs, including start-up costs. Expenses, particularly certain marketing and compensation expenses, vary depending on the level of demand for Intel's products, the level of revenue and profits, and impairments of long-lived assets. Intel is in the midst of a structure and efficiency program that is resulting in several actions that could have an impact on expected expense levels and gross margin. Intel's results could be impacted by adverse economic, social, political and physical/infrastructure conditions in the countries in which Intel, its customers or its suppliers operate, including military conflict and other , pp p , g y security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. A detailed discussion of these and other factors that could affect Intels results is included in Intels SEC filings, including the report on Form 10-Q for the quarter ended June 28, 2008.

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
3

Agenda
Dominant issues in loop analysis Using LBRs g Example

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
4

4/1/2010

What matters when optimizing a loop?


1. The Trip Count

2. The Trip Count

3.The TRIP COUNT!


4. Variations in the tripcount 5. And some other things

BUT..what you do about them depends on THE TRIP COUNT


And of course there are virtually no tools to assist you in determining this..other than printf (you can use PIN..)
Winning with High-K 45nm Technology
High Value, High Volume, High Preference
5

The tripcount dictates optimization options, as it defines the time available to amortize the cost of the proposed solution Short loops tripcount < 7?
Unroll the loop completely/vectorize

Basic loop tripcount ranges (for short loops)

Medium loops 7<tripcount<15-20?


Most difficult case..too short to do much of anything, only option is vectorize

Medium_long

20<tripcount<50

Almost long enough for real options

Long loops 50-100 < tripcount


Lots of options

Of course all loops have their own issues


Winning with High-K 45nm Technology
High Value, High Volume, High Preference
6

4/1/2010

Basic Branch Analysis


Vastly improved precise branch monitoring capabilities
16 deep Last Branch Record (LBR) Records Taken Branches and their targets
LBR can be filtered by branch type and privilege level

Precise br retired by branch type


Calls, conditional and all calls Coupled with LBR capture yields
Call counts Basic Block execution counts HW call graph

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
7

Processing LBRs
Branch_0 Target_0

Branch_1

Target_1

All instructions between Target_0 and Branch_1 are retired 1 time All Basic Blocks between Target_0 and Branch_1 are executed 1 time All Branch Instructions between Target_0 and Branch_1 Branch 1 are not taken

So it would all Seem Very Straight Forward


Winning with High-K 45nm Technology
High Value, High Volume, High Preference
8

4/1/2010

Shadowing and Precise Data Collection The time between the counter overflow and the PEBS arming creates a shadow, d i h d during which events cannot hi h t t be collected ~8 cycles? Ex: conditional branches retired
Sequence of short BBs (< 3 cycles in duration) If branch into first overflows counter, Pebs event cannot occur until branch at end of 4th BB Intervening branches will never be sampled

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
9

O 20 20 2 2 2 2 P 20 P C P C O P C O O

Shadowing
Assume 10 cycle shadow for this example
N N O O O P C P C P C 0 0 0

20

O means counter overflow P means PEBS enabled C means interupt occurs


Winning with High-K 45nm Technology
High Value, High Volume, High Preference
10

5N

4/1/2010

Reducing Shadowing Impact


Some events will never occur!
Falling into shadowed window

Use LBR to extend range of the single sample Count the number of objects in LBR and increment count for all of them by 1/NUM
Since you have only one sample

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
11

Minimizing Shadowing Impact on BB Execution Count


Cycles/branch taken
O P 20 C O 20 2 2 2 2 P 20 C P C P C P C O O O O O P C P C 0 N N 0 0 0

Pebs Samples taken

Number of LBR entries

16N 16N 16N 16N 17N 18N

19N

In this example there are always y 16 BBs covered BB d in the LBR. Incrementing the BB execution count for each BB detected in the LBR, by 1/NUM_LBR-1 will greatly reduce the effect of shadowing

20

5N

Many more with 20 Cycles/branch taken

Many more with N Many more with 16 samples taken N LBR Entries Winning with High-K 45nm Technology
High Value, High Volume, High Preference
12

20N

4/1/2010

Nested Loop with 8 Basic Blocks

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
13

Basic Block Execution Counts


Basic Block
0 1 2 3 4 5 6 7 8 9 10 11 12 13

Instructions inst_ret
4 8 6 8 3 2 11 40 158 40 183 3 2 5 0 0 0 0 1036 6 241 2757 8476 4228 10516 0 0 60

BB_exec(inst)
0 0 0 0

br_ret
0 0 0 0 914 140 0 904 601 830 487 0 0 116

lbr_all
0 0 0 0 9267 5054 9270 28312 9543 28602 9633 5 5 4251 3990

lbr_tkn
0 0 0 0

expected

345.33 3 21.91 68.92 53.65 105.7 57.46 12

6286 3167 6287 19394 6726 19761 6727


2

1 0.5 1 3 1 3 1 0.5

0 0

3122

tripcount sav loop executions

78.793
2000000 157587619

2762

405.1429 200000 81028571

9427.048 13333.3 1.26E+08

6480.952
13333.3 86412482

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
14

4/1/2010

Normalization = Tripcount
Using br_inst_retired.near_call is difficult as the function is only called a few thousand times, while the hottest function is called ~ ti hil th h tt t f ti i ll d 1million times The true tripcount is 16384 call count = loop_executions/16384
BB_exec(inst) br_ret 9618.38 4945.59 lbr_all 7671.73 lbr_tkn 5274.2

True Call Count from printf ~ 5750 (done with new 11.1 based build)
Winning with High-K 45nm Technology
High Value, High Volume, High Preference
15

Summary
LBRs are very useful

Winning with High-K 45nm Technology


High Value, High Volume, High Preference
16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy