Ug907 Vivado Power Analysis Optimization
Ug907 Vivado Power Analysis Optimization
of this document
Revision History
The following table shows the revision history for this document.
Chapter 1
Introduction
This chapter provides the terminology used in describing power when implementing Xilinx®
devices on a board. It also puts the device development in the greater context of the system
being designed and provides a high level description of what to expect at each stage of the
design flow. The chapter then describes the Xilinx® tools used for power estimation, analysis,
and optimization.
VIDEO: The Vivado Design Suite QuickTake Video Tutorial: Power Estimation and Analysis Using Vivado shows
how Vivado® can help you to estimate power consumption in your design and reviews best practices for getting
the most accurate estimation.
VIDEO: The Vivado Design Suite QuickTake Video Tutorial: Power Optimization Using Vivado describes the
factors that affect power consumption in a Xilinx device and how Vivado helps to minimize power consumption
in your design, and looks at some advanced control and best practices for getting the most out of Vivado power
optimization.
Power Terminology
The following terminology is used in this guide.
• Device Static Power: Device static power is the power from transistor leakage on all
connected voltage rails and the circuits required for the device to operate normally, post
configuration. This is normally measured by programing a blank bitstream into the device.
Device static power is a function of process, voltage, and temperature. This represents the
steady state, intrinsic leakage in the device.
• Design Power: Design power is the dynamic power of the user design, due to the input data
pattern and the design internal activity. This power is instantaneous and varies at each clock
cycle. It depends on voltage levels and logic and routing resources used. This also includes
static current from I/O terminations, clock managers, and other circuits that need power when
used. It does not include power supplied to off-chip devices.
• Total On-Chip Power: Total on-chip power is the power consumed internally within the
device, equal to the sum of device static power and design power. It is also known as thermal
power.
• Off-Chip Power: Off-chip power is the current that flows from the supply source through the
device power pins, then out of the I/Os and dissipated in external board components. The
currents supplied by the device are generally consumed in off-chip components such as I/O
terminations, LEDs, or the I/O buffers of other chips, and therefore do not raise the device
junction temperature.
Note: Negative off-chip power dissipated is the power that is sourced from external source and
dissipated inside our device.
• Power-On Current: Power-on current is transient current that occurs when power is first
applied to the device. This current varies for each voltage supply and depends on the device
construction as well as the ability of the power supply source to ramp up to the nominal
voltage. This current also depends on the device's operating conditions, such as temperature
and sequencing between the different supplies. Power-on current is generally lower than
operating current due to architectural enhancements as well as adherence to proper power-on
sequencing.
• Ambient Temperature (°C): Ambient temperature is the temperature of the air immediately
surrounding the device under the expected system operating conditions.
• Effective Thermal Resistance to Air (ΘJA (°C/W)): Effective thermal resistance to air is also
known as Theta-JA and TJA. This coefficient defines how power is dissipated from the device
silicon to the environment (device junction to ambient air). It includes contributions from all
elements, from the silicon chip dimensions to the surrounding air, plus any material in
between, such as the package, the PCB, any heat sink, and airflow. Typically this combines
thermal resistance and interdependencies from the two main paths by which the generated
heat can escape onto the environment:
Thermal data for Xilinx® device packages can be found using the Package Thermal Data Query
tool. A sample Thermal Data Query result is shown in the following figure.
Refer to 7 Series FPGAs Packaging and Pinout Product Specification (UG475), UltraScale and
UltraScale+ FPGAs Packaging and Pinouts Product Specification (UG575)for detailed information on
thermal resistance.
IMPORTANT! The thermal data mentioned in this user guide is for the device/package comparison only. Do
not use these values for thermal simulations. Use the thermal models provided on Xilinx.com.
Device Characterization
• Advance: Devices with the Advance designation have data models primarily based on
simulation results or measurements from early production device lots. This data is typically
available within a year of product launch. The Power model data with this designation is
considered relatively stable and conservative, although some under or over-reporting can
occur. Advance data accuracy is considered lower than the Preliminary and Production data.
• Preliminary: Devices with the Preliminary designation are based on complete early production
silicon. Almost all the blocks in the device fabric are characterized. The probability of accurate
power reporting is improved compared to Advance data.
• Production: Devices with the Production designation are released after enough production
silicon of a particular device family member has been characterized to provide full power
correlation over numerous production lots. Device models with this characterization data are
not expected to evolve further.
The accuracy of any power estimation is derived from the information input to the models.
Report Power uses the following models based on the device characterization:
• ADVANCE: +/-30%
• PRELIMINARY: +/-25%
• PRODUCTION: +/-15%
These models are accurate as the Xilinx® Power Estimator. However, report power has more
details on the design being implemented such as resource settings and usage, net fanout and net
lengths which impact the power estimation of a design. This allows report power to give more
accurate estimation. However, it is still dependent on your input as the confidence level of the
estimation is critical. PRODUCTION characterized model with LOW confidence level should be
evaluated and improved to ensure more accurate estimation.
Note: For maximum process, the static power in a device should never exceed the reported values in the
tool.
Signal Rate
Signal rate is the number of times an element changes state (high-to-low and low-to-high) per
second. Xilinx tools express this as millions of transitions per seconds (Mtr/s). For example, if a
signal changes at every four clocks cycle with respect to a 100 MHz (10 ns) Clock, then the Signal
Rate is: 1/(4*10 ns) = 25 Mtr/s.
Toggle Rate
Toggle rate (%) is the rate at which the output of a synchronous logic element switches compared
to a given clock input. It is modeled as a percentage between 0 - 100%. A toggle rate of 100%
means that on average the output toggles once during every clock cycle. As an example, If a
signal changes at every four clock cycles with respect to a clock of any frequency, then the
Toggle Rate is: (1/4)*100 = 25%.
IMPORTANT! The toggle rate for clock nets is always 200%, which means that the net toggles twice in a cycle.
TIP: Ideally a synchronous net changes at the most once per clock (except DDR nets); thus the maximum toggle
rate is 100%. If a synchronous net is prone to glitches, use Signal Rate to specify the switching activity.
For asynchronous elements such as nets and logic that are not synchronized with a clock, the
toggle rate cannot be computed. The Vivado® power tools expect the use of Signal Rate for
these kinds of elements.
By default the primary inputs of the design are not associated with a specific clock. Use the
set_input_delay constraint to associate a clock with the primary inputs. If you do not
associate a clock, the power tools compute the toggle rate with respect to either the capturing
clock or the fastest clock in the design.
Static Probability
Static probability defines the fraction of time during which the considered element is driven at a
high (1’b1) logic level and the valid range is 0 to 1. As an example, if a signal is at Logic 1 for 40 ns
in a duration of 100 ns, the static probability = 40/100 = 0.4.
TIP: Static Probability = 1 represents that the considered element is held at Logic 1 throughout the analysis
duration and never toggles (toggle/signal rate = 0). Similarly, Static Probability=0 represents that the considered
element is held at Logic 0 throughout the analysis duration and never toggles (toggle/signal rate =0).
For logic resources typically available in Xilinx® devices, refer FPGA Resources and their Power
Supply table in Chapter 1 Xilinx Power Estimator User Guide (UG440).
• Physical domain: Enclosure, board shape, power supply and power distribution network
(PDN), thermal power dissipation system.
The next chapters demonstrate the interdependencies between these two classes. These classes
differ in that the physical domain involves hardware decisions, while the functional domain
mostly involves design creation. Typically, hardware selection and sizing occurs very early in the
design flow to allow time to build prototype boards. The effect of a device functionality on
power consumption can be estimated early on, then refined as more and more of the design logic
is completed. The following figure illustrates a typical system design process, and highlights
power-related decision points. The figure demonstrates that, at the time you select your device
and associated cooling parts, the device logic is not yet available. Therefore, a careful
methodology to estimate the device logic power requirements is needed. Methodologies are
discussed in:
Technology selection
System Spec System dimensions/environment
Routing Manufacturing
Reduce power if
• Performance allows (auto) Closure Little Adjustments
• Budget exceeded (manual/auto) Possible
FPGA is the flexible
part
Lab Testing
Verify actual vs. predicted power
Impact:
- Quicker Time to Market
- Cost Savings
Board and FPGA designed concurrently: Need FPGA Power Estimates As Early As Possible
X12129-122719
The following sections provide methodologies to analyze and reduce power consumption
throughout the design process.
Figure 3: Vivado Power Estimation and Analysis Tools in the Design Process
X12986-122719
XPE is also commonly used later in the design cycle during implementation and power closure to,
for example, evaluate power implications of engineering change orders (ECO). For large designs
implemented by multiple teams, the project leader can use XPE to import usage and activity for
each team's module, then monitor the total power and reallocate the power budget to ensure
constraints are met. For more information on using the Xilinx Power Estimator, see Xilinx Power
Estimator User Guide (UG440)
Chapter 2
Introduction
This chapter describes a methodology to evaluate your design's power consumption during the
initial evaluation stage of the design cycle. You will work in Xilinx® Power Estimator during this
stage of the design cycle. If you have already completed the initial evaluation stage, go to the
next chapter, which describes a methodology to evaluate your design’s power consumption in
the later stage of the design cycle. At this stage, you will use the Vivado® Design Suite, which
automates and simplifies power estimation.
Xilinx® Power Estimator can answer these questions. It helps you develop in parallel to the
device logic and the Printed Circuit Board on which the device is to be soldered. This exercise
helps you understand the margin you can expect to have and therefore gain the confidence that
your system should work within budget once implemented. The following figure shows the Xilinx
Power Estimator interface.
Underdesigning the power or thermal system can make the operate out of specification. This can
result in the device not operating at the expected performance and can have other more serious
consequences. Overdesigning the power system is generally less serious, but is still not desirable
because it can add unnecessary cost and complexity to the overall device design. The task of
power estimation is not a trivial one before completing the design.
These steps are primarily focused on power analysis. There are several techniques for power
optimization that can be explored and applied during the analysis and can result in significant
power savings. Power Optimization techniques are discussed in the next chapter.
Step 1: Obtain the latest version of Xilinx Power Estimator for the
selected target device.
It is important to make sure you are using the latest version of the Xilinx® Power Estimator tool
because power information is updated periodically to reflect the latest power modeling and
characterization data.
The latest version of XPE can be obtained from XPE Downloads web page on the Xilinx® web
site. Check this web site occasionally during the design process to determine whether a new
version has become available. If a new version is available, you can import the data from a
previous version into the updated version using the Import File button on the updated version's
Summary sheet. Keeping the XPE up to date ensures that the most current power information is
used in the power analysis at all times during the design cycle.
• Family and Device: An improperly set Family or Device can lead to incorrect device and design
power estimations, such as the design power reported for clocks. It will also result in
improperly reported available device resources.
• Package: The package selection can affect the device's heat dissipation and thus affect the
resulting junction temperature. An incorrect junction temperature can result in an incorrect
device static power calculation.
• Speed Grade (if available): Choose the speed grade most appropriate to the design needs.
Some device families may have different power specifications for different speed grades.
• Temp Grade: Select the appropriate grade for the device (typically Commercial or Industrial).
Some devices may have different device static power specifications depending on this setting.
Setting this properly allows for the proper display of junction temperature limits for the
chosen device.
• Process: For the purposes of a worst-case analysis, the recommended process setting is
Maximum. The default setting of Typical gives a closer picture to what would be measured
statistically, but changing the setting to Maximum modifies the power specification to worst-
case values.
• Voltage ID Used: The Voltage ID (VID) voltage is the minimum possible VCCINT voltage at
which the device can run and still meet its performance specifications. This voltage is tested
when the device is manufactured and the value is programmed into the DNA (device
identifier) eFUSE register on the device. Activating the VID feature in your design to operate
the device at this VID voltage can result in a significant static power savings over operating
the device at its nominal voltage.
Note: This option applies to Virtex®-7 -1 speed grade, Commercial Temp grade, and Maximum Process
devices only.
• Ambient Temp (°C): Specify the maximum possible temperature expected inside the enclosure
that will house the device design. This, along with airflow and other thermal dissipation paths
(for example, the heatsink), will allow an accurate calculation of Junction Temperature. This in
turn will allow a more accurate calculation of device static power.
• Effective ΘJA (°C/W): Specify the value for custom ΘJA which is generally derived from
thermal modeling. Ambient Temperature and Effective θJA are to be set if the values are
derived from thermal simulations for better accuracy in estimation.
• Airflow (LFM): The airflow across the chip is measured in Linear Feet per Minute (LFM). LFM
can be calculated from the fan output in CFM (Cubic Feet per Minute) divided by the cross
sectional area through which the air passes. Specific placement of the device or the fan (or
both) may impact the effective air movement across the device and thus the thermal
dissipation. The default for this parameter is 250 LFM. If you plan to operate the device
without active air flow (still air operation), then change the 250 LFM default to 0 LFM.
• Heat Sink (if available): If a heatsink is used and more detailed thermal dissipation information
is not available, choose an appropriate profile for the type of heatsink used. This, along with
other entered parameters, will be used to help calculate an effective ΘJB, resulting in a more
accurate junction temperature and quiescent power calculation. Some types of sockets may
act as heatsinks, depending on the design and construction of the socket.
• Board Selection and # of Board Layers: Selecting an approximate size and stack of the board
will help calculate the effective ΘJB by taking into account the thermal conductivity of the
board itself.
• ΘJB: If more accurate thermal modeling of the board and system is available, use ΘJB (printed
circuit board thermal resistance) to specify the amount of heat dissipation expected from the
device.
The more accurately custom ΘJB can be specified, the more accurate the estimated junction
temperature will be, thus affecting device static power calculations.
IMPORTANT! In order to specify a custom ΘJB, the Board Selection must be set to Custom. If you do specify a
custom ΘJB, you must also specify a Board Temperature for an accurate power calculation.
Figure 9: Power Supply Voltage Source Information - Summary Sheet for 7 Series
Devices
Note: In XPE, the power number cells are configured to display values with three decimal places (for
example, 0.000). The rounding of numbers with three precision is based on Microsoft Excel behavior.
Values less than 1mW are displayed as 0.000W. You can copy a cell and paste it into the User sheet to see
the actual value with precision adjusted.
• Clock Tree Power: In the Clock sheet, enter each clock, the expected Frequency, and the
expected clocking resource it will use as shown in the following figure. If you are not certain
which clocking resource will be used, keep the default selection for Type as Global clock. At
this point, don't worry about Fanout. Fanout will be taken care of in Step 6. Leave the Clock
Buffer Enable and Slice Clock Enable set at the system defaults of 100% and 50% respectively.
• Logic Power: In the Logic sheet, enter an estimate for the number of Slice resources as shwon
in the following figure. The LUTs column should represent the number of LUTs used for
arithmetic or logic, Shift Registers are the number of LUTs configured as SRLs (Shift Register
LUTs), and SelectRAMs are the number of LUTs configured as memory. Registers are the
number of registers or latches configured in the design. Use the different rows to separate
different logic functions and characteristics (for example, clock speed and toggle rate).
In the early stages of your device design, Xilinx recommends that you work with large, rounded
numbers, because it can be difficult to get accurate numbers for end resources. As the design
progresses, you can update the values to get a more accurate representation.
TIP: When entering the clock frequency information, use Excel's capabilities to relate that cell to the cell
populated in the Clock Tree Power tab. To do this, select the desired Clock (MHz) cell in the logic view, type =,
and select the cell in the Clock sheet corresponding to the clock source for that logic. This should populate that
cell with the value in the Clock sheet. The primary benefit of this methodology is that if the clock frequency
would ever need to be changed, either by a specification change or by exploring power trade-offs vs. frequency,
the value would only need to be updated in one place and can be reflected throughout the analysis. This
methodology can also reduce the chance of errors and inconsistencies during the data entry.
• I/O Power: It is important to fill out the I/O sheet of XPE properly to get an accurate overall
estimation of all rails of the chip as shwon in th efollowing figure. Depending on the selected
I/O Standard and I/O circuitry, a significant amount of power may be consumed not only in
the VCCO rail but also in the VCCINT and VCCAUX rails. Many times it is simplest to enter
each device interface separately and also to break out the interface signals to the data,
control, and clock signals. This makes it easier to specify different I/O Standards as well as
other I/O characteristics such as load and toggle rates.
RECOMMENDED: In XPE, use the Memory Interface Configuration wizard to ease the effort of adding I/Os
associated with complex memory interfaces.
For the I/O current calculations, the predicted power assumes standard board trace and
termination is applied.
TIP: If using differential I/O each input and output should be specified as a pair. Do not specify two inputs in
the spreadsheet to indicate a single differential input.
To ease data entry for more complicated standards, such as the DDR Standards, you can use the
Memory Interface Configuration wizard as shown in the following figure. You can enter the
relevant inputs in the Memory Interface Configuration wizard and the tool will automatically
populate the relevant I/O rows in the I/O sheet.
• Block RAM Power: In the block RAM sheet as shown in the following figure, enter the number
and configurations of the block RAM intended to be used for the design. Make sure to adjust
the Enable Rate to the percentage of time the ENA or ENB port will be enabled. The amount
of time the RAM is enabled is directly proportional to the dynamic power it consumes, so
entering the proper value for this parameter is important to an accurate block RAM power
estimation. For information on how the BRAM Mode impacts power estimation, see the
Setting BRAM Mode for Improved Accuracy section in the Xilinx Power Estimator User Guide
(UG440).
RECOMMENDED: In XPE, use the Memory Generator wizard to ease the effort of adding block RAMs in the
design.
• UltraRAM Power: In the UltraRAM sheet as shown in the following figure, enter the number
and intended configurations of the UltraRAMs to be used for the design. Use realistic values
for the settings that might have the highest impact on dynamic power which include Cascade
Group Size, Input and Output Toggle Rates, Enable Rates, and the Write Enable percentage.
For information on estimating UltraRAM power, see the Xilinx Power Estimator User Guide
(UG440).
• DSP Power: Complete the DSP sheet in XPE. Note that DSP blocks can be used for purposes
other than multipliers, such as counters, barrel shifters, MUXs, and other common functions.
• Clock Manager (CLKMGR): If an MMCM and/or PLL is used in the design, specify the use and
configuration of each in the Clock Manager sheet.
• GT: If GTs (serial transceivers) are used in the design, specify the use and configuration of each
in the GT sheet.
RECOMMENDED: Use the Transceiver Configuration wizard (launched by the Add GTX Interface button) to
ease data entry and accuracy as shwon in the following figure.
For clock fanout, the easiest way to specify this in the XPE is to create an equation to SUM all
synchronous elements for any particular clock domain. For instance, in the Fanout field for a
given clock, type =SUM(and then select all of the cells which specify the number of synchronous
elements sourced by that clock (that is, BRAMs, FFs, Shift Registers, Select RAMs, etc.). When
completed, close the parenthesis to populate the Fanout cell with the appropriate number. This
method of entering clock fanout not only is often the easiest, but also has the added advantage
of automatically updating when adjustments are made to the spreadsheet resource counts. The
resulting Excel equation would be similar to this:
For logic fanout, the nature of the data and control paths need to be thought out. In designs with
well structured sequential data paths, such as DSP designs, fanouts generally tend to be lower
than the set default. In designs with many data execution paths, such as in some embedded
designs, higher fanouts may be seen. As with toggle rates, if this information is not known it is
best to leave the setting at the default and adjust later if needed.
For I/O Output Load, enter a simple capacitive load for each design output. This affects the
dynamic power of the driven output. The Output Load value is primarily made up from the sum of
the individual input capacitances of each device connected to that output. The input capacitance
can generally be obtained from the data sheets of the devices to which the device I/O is
connected.
thermal limits of the selected grade, the power reported by XPE can be used to specify the rails
for the design. If your confidence in the data entered is not very high, you may pad the numbers
to circumvent the possibility of underdesigning the power system for the device. If, however, you
are fairly certain of the data entered, no additional padding above the data reported by the tool is
necessary.
As the design matures, continue to review and update the information in the spreadsheet to
reflect the latest requirements and implementation details. This will present the most current
picture of the power used in the design and could potentially allow early identification of
adjustments to the power budgeting up or down depending on the current power trends of the
design.
See Chapter 3: Estimating Power - Vivado Design Flow Stage, which describes a methodology to
evaluate your design’s power consumption in the later stage of the design cycle, and Chapter 6:
Tips and Techniques for Power Reduction for tips and tricks to reduce power in the design.
Chapter 3
Introduction
This chapter describes tool features in the Vivado® Design Suite that automate or simplify power
estimation during the design flow stage. Once you generate and analyze a power estimation in
the Vivado Design Suite, see Chapter 6: Tips and Techniques for Power Reduction for techniques
to investigate and modify your system, to minimize the device power consumption.
Figure 17: Vivado Power Analysis - Supplying Relevant Input Data for Analysis
For more information on these settings, see Review Device/Design Settings and Adjust
Activity for Known Elements.
4. Specify the name of the report.
Power analysis uses different sources of information for activity definition, including:
For more information, see Running Power Analysis from the Tcl Prompt.
Note: The vectorless power estimator does not propagate activity to the output ports of GTs. If any design
logic depends on these activity rates, you must explicitly specify the activity rates on GT outputs using
set_switching_activity -type <rx_data|tx_data> commands to achieve an accurate analysis.
TIP: The vectorless power estimation is an average power estimation for the design, unless you have specifically
overridden switching rates and static probability for the design.
In any design, users typically know the activity of specific nodes because they are imposed by the
system specification or the interfaces with which the device communicates. Providing this
information to the tools, especially for nodes which drive multiple cells in the device (Set, Reset,
Clock Enable, or clock signals), will help guide the power estimation algorithms. These modes
include:
• Clock Activity: Users typically know the exact frequency of all device clock domains, whether
externally provided (input ports), internally generated, or externally supplied to the printed
circuit board (output ports). The design should have at least one clock specified using the
create_clock constraint. If no clock is defined, then Report Power issues a warning message
and uses a 10 GHz clock frequency for switching activity computations.
• I/O Data Ports: With your knowledge of the exact protocols and format of the data flowing in
and out of the device, you can usually specify signal transition rate and/or signal static
probability rate in the tools for at least some of the I/Os. For example, some protocols have a
DC balanced requirement (signal static probability rate = 50%) or you may know how often
data is written or read from your memory interface, so you can set the data rate of strobe and
data signals. If no user activity rate is specified on primary inputs, Report Power assigns a
default static probability of 0.5 and a default toggle rate of 12.5%.
• I/O and Internal Control Signals: With your knowledge of the system and the expected
functionality you may be able to predict the activity on control signals such as Set, Reset and
Clock Enable. These signals typically can turn on or off large pieces of the design logic, so
providing this activity information increases the power estimation accuracy. If a primary input
is found to be reset (that is, directly connected to the RESET pin of sequential elements), then
the tool assigns a default static probability of 0 and a default signal rate of 0. Similarly, if a
primary input is found to be Clock Enable (that is, directly connected to the CE pin of
sequential elements), then the tool assigns a default static probability of 0.99 and a default
signal rate of 2.
RECOMMENDED: Providing node activity information to the tools, especially for nodes which drive multiple
cells in the device (Set, Reset, Clock Enable, or clock signals), helps guide the power estimation algorithms.
Note: The vectorless power estimator does not propagate activity to the output ports of GTs. If any design
logic depends on these activity rates, you must explicitly specify the activity rates on GT outputs using
set_switching_activity -type gt_txdata|gt_rxdata commands to achieve an accurate
analysis.
coming into the simulated block. This type of information is not necessarily provided while
performing verification or validating functions. Sometimes, invalid data is given as input to verify
that the system can handle it and remain stable even when invalid data or commands are given to
it. Using such test cases to perform power analysis may result in inaccurate power estimation
because the design logic is not stimulated as it would be under typical system operation.
• System Transaction Level: Very early in the design cycle, you may have created a description
of transactions which occur between devices on a PCB or between the different functions of
your device application. You can extract from this the expected activity per functional block
for certain I/O ports and most of the clock domains. This information helps you fill in the
Xilinx® Power Estimator spreadsheet.
• Device Description Level: While defining the RTL for your application you may want to verify
the functionality by performing behavioral simulations. This helps you verify the data flow and
the validity of calculations to the clock cycle. At this stage, exact device resources used, count,
and configuration data is not available. You can manually extrapolate resource usage and
extract activity for I/O ports or internal control signals (Set, Reset, Clock Enable). This
information can be applied to refine the Xilinx Power Estimator spreadsheet information.Your
simulator should be able to extract node activity and export it in the form of a SAIF file. You
can save this file for more accurate power analysis in the Vivado® design flow, for example
after place and route, if you do not plan to run post-implementation simulations.
• Post Synthesis: The netlist is mapped to the actual resources available in the target device.
• Post Placement: The netlist components are placed into the actual device resources. With
this packing information the final logic resource count and configuration becomes available
and you can update the Xilinx Power Estimator spreadsheet for your design.
• Post Routing: After routing is complete all the details about routing resources used and
exact timing information for each path in the design are defined. In addition to verifying
the implemented circuit functionality under best and worst case gate and routing delays,
the simulator can also report the exact activity of internal nodes and include glitching.
Power analysis at this level provides you the most accurate power estimation before you
actually measure power on your prototype board.
this case it is preferable to capture from the simulation results only module I/O ports activity
and let the vectorless engine estimate internal node activity. Functional simulations do not
capture glitch activity. Also, Report Power may not be able to match all nodes between the
design and the simulation netlist because of logic transformations which happen during
implementation (optimizations, replications, gating, retiming, etc.). Nevertheless most primary
ports and control signals will be matched and this information provides the tool with realistic
activity for the matched nodes. The activity is propagated by the vectorless engine onto the
unmatched design portion and increase the accuracy of the power estimation.
• Ensure test vectors and inputs to the simulation represent the typical or expected behavior of
the design. Error handling and corner case simulations do not typically stimulate the logic in
the way it would be stimulated under normal operation.
• Post-implementation simulation results are preferred over behavioral simulation results. Full
timing simulation would be much more accurate, because it helps with capturing timing glitch
information into the SAIF results.
IMPORTANT! Report power uses vectorless algorithm and default switching rates to compute the activity on
un-matched design nets with the given SAIF file. This results in different toggle rates in Power Report and it
eventually reflects in XPE too. It is recommended not to use VHDL generated .saif files as the timing
simulation is supported in Verilog only.
IMPORTANT! To generate a SAIF file from the Vivado simulator for power analysis, refer to the Vivado Design
Suite User Guide: Logic Simulation (UG900). To generate a SAIF file from the Mentor Graphics ModelSim
simulator for power analysis within the Vivado® Design Suite, see Xilinx® Answer Record 53544. For full timing
simulation, generate a design timing information (SDF) file using the write_sdf command and annotate it
while running simulation.
Review the different input tabs to make sure they accurately represent your expected system.
The following Input Tabs are available in Report Power Dialog box:
• Environment Tab
• Power Supply Tab
• Switching Tab
• Output Tab
Environment Tab
Review the different user-editable selections in the Environment tab. Make sure the process,
voltage and environment data closely match your expected environment. These settings have a
significant influence on the total estimated power. The user-editable selections in the
Environment tab are:
• Device Settings:
• Temp Grade: Select the appropriate grade for the device (typically Commercial or
Industrial). Some devices may have different device static power specifications depending
on this setting. Setting this properly will also allow for the proper display of junction
temperature limits for the chosen device.
• Process: For the purposes of a worst-case analysis, the recommended process setting is
Maximum. The default setting of Typical will give a closer picture to what would be
measured statistically, but changing the setting to Maximum will modify the power
specification to worst-case values.
• Environment Settings:
• Output Load (pF): The board and other external capacitance driven by the outputs in the
I/O ports.
• Ambient Temperature (°C): Specify the maximum possible temperature expected inside
the enclosure that will house the device design. This, along with airflow and other thermal
dissipation paths (for example, the heatsink), will allow an accurate calculation of Junction
Temperature which in turn will allow a more accurate calculation of device static power.
• Effective ΘJA (°C/W): Specify the value for custom ΘJA which is generally derived from
thermal modeling. Ambient Temperature and Effective θJA are to be set if the values are
derived from thermal simulations for better accuracy in estimation.
• Airflow (LFM): The airflow across the chip is measured in Linear Feet per Minute (LFM).
LFM can be calculated from the fan output in CFM (Cubic Feet per Minute) divided by the
cross sectional area through which the air passes. Specific placement of the device and/or
fan may have an effect on the effective air movement across the device and thus the
thermal dissipation. Note that the default for this parameter is 250 LFM. If you plan to
operate the device without active air flow (still air operation) then the 250 LFM default
has to be changed to 0 LFM.
• Heat Sink (if available): If a heatsink is used and more detailed thermal dissipation
information is not available, choose an appropriate profile for the type of heatsink used.
This, along with other entered parameters, will be used to help calculate an effective ΘJB,
resulting in a more accurate junction temperature and quiescent power calculation. Note
that some types of sockets may act as heatsinks, depending on the design and
construction of the socket.
• Board Selection and Number of Board Layers (if available): Selecting an approximate size
and stack of the board will help calculate the effective ΘJB by taking into account the
thermal conductivity of the board itself.
• ΘJB: In the event more accurate thermal modeling of the board and system is available,
ΘJB (printed circuit board thermal resistance) should be used to specify the amount of
heat dissipation expected from the device.
The more accurately custom ΘJB can be specified, the more accurate the estimated junction
temperature will be, thus affecting device static power calculations.
IMPORTANT! In order to specify a custom ΘJB, the Board Selection must be set to Custom. If you do specify a
custom ΘJB, you must also specify a Board Temperature for an accurate power calculation.
Switching Tab
In the Switching tab, review the design’s Simulation and Default Activity Settings. The clocks
constrained in the design can also be viewed on this page as shown in the following figure.
• Reset switching activity before report power: This check-box if enabled, clears/resets all the
switching activity applied before running report power.
• Switching Activity for Resets: Sets the Switching Activity for control sets. See Deassertion of
Switching for Resets for more information.
• Simulation Settings:
• Simulation activity file (.saif): Vivado® Report Power takes input SAIF simulation data
generated for the design. Report Power then matches nets in the design database with
names in the simulation results netlist. See Specifying Switching Activity for the Analysis,
for a description of how input from a simulation results (SAIF) file can be used for a more
accurate power analysis.
• Default toggle rate: The default toggle rate to be used in power analysis on the primary
inputs of the design. The default toggle rate is set on those primary input nets whose
switching activity is not specified by the user, simulation data or constraints of the design.
On asynchronous inputs the toggle rate is set with respect to the capturing clock in the
design. Valid values are: 0 <= value < 100. The default value is 12.5.
• Default Static Probability: The default static probability to be used in power analysis on
the design. The default static probability is set on those primary inputs whose switching
activity is not specified by the user, simulation data or constraints of the design. Valid
values are: 0 <= value <= 1. The default value is 0.5.
• BRAM Port Enable: Sets the activity rate of all the BRAM enable signals of the design to
the value specified.
• BRAM Write Enable: Sets the activity rate of all the BRAM write enable signals of the
design to the value specified.
• Bidi Output Port Enable: Sets the activity rate of all the Bidirectional I/O enable signals
(i.e., T pin of IOBUF) of the design to the value specified.
• Primary Outputs: Sets the switching activity rate of all the enable signals (i.e., T pin of
OBUFT) of the primary outputs of the design to the value specified.
• Logic:
• Registers: Sets switching activity rate on Output pins of all the Registers in the design.
• Shift Registers: Sets switching activity rate on Output pins of all the Shift Registers in
the design.
• Distributed RAMs: Sets switching activity rate on Data Outputs pins of all the
Distributed RAMs in the design.
• LUTs: Sets switching activity rate on Outputs pins of all the LUTs in the design.
• DSPs: Sets switching activity rate on Data Outputs pins of all the DSPs in the design.
• Block RAMs: Sets switching activity rate on Data Outputs pins of all the Block RAMs in
the design.
• RX Data: Sets switching activity rate on RX Data Output pins of all the GTs in the
design.
• TX Data: Sets switching activity rate on TX Data Output pins of all the GTs in the
design.
Note: Specify Static Probability and Toggle Rate together. See the description of the
set_switching_activity command under Netlist Element Activity, for more information and
guidelines.
• Constrained Clocks: Expanding Constrained Clocks lists all the clocks that are constrained in
the design. Review the clock frequencies and ensure they are accurate.
TIP: Make sure all primary clocks are specified. The design clocks are identified based only on
create_clock or create_generated_clock constraints.
RECOMMENDED: Xilinx® recommends that you use the exact clock frequencies in your design for more
accurate power calculation.
Output Tab
Output Tab displays various power result files. Output tab contains the following settings:
• Output Text File: For project documentation you may want to save the power estimation
results. In other circumstances you may be experimenting with different mapping, placement,
and routing options to close on performance or area constraints. Saving power results for
each experiment will help you select the most power-effective solution when several
experiments meet your requirements.
• Output XPE file (for Xilinx® Power Estimator): This file, when selected, saves all the
environment information, device usage, and design activity in a file (.xpe) which you can later
import into the Xilinx Power Estimator spreadsheet. This proves quite useful when your power
budget is exceeded and you don't think that software optimization features alone will be able
to meet your budgets. In this case, import the current implementation results into Xilinx
Power Estimator, explore different mapping, gating, folding, and other strategies, and estimate
their impact on power before modifying the RTL code or rerunning the implementation. You
can also compare your assumptions in the Xilinx Power Estimator spreadsheet with these
synthesis results and adjust XPE where appropriate.
• Output RPX file: This file saves the power report in RPX format, which can later be opened in
Vivado® Integrated Design Environment (IDE) by using open_report command.
This is also helpful, if you want to override the default switching activity in the report_power
tool. In this case, you can create XDC constraints with desired default values and run
report_power.
The Summary view also displays a Confidence Level for the power analysis. The Confidence Level is
a measurement of the accuracy and the completeness of the input data Report Power uses as it
performs a power analysis. If you click the Confidence level value (Low, Medium, or High),
Confidence level details are displayed, and these details can suggest ways of increasing the
accuracy of the power analysis. For example, you might increase the accuracy of the power
analysis by specifying activity rates for more of the clocks or more of the I/O inputs in the
design.
Figure 20: Vivado Power Analysis - Report Power in the Vivado Integrated Design
Environment
The Power Supply section shows the current drawn for each supply source and breaks down this
total between static and dynamic power.
From the Utilization Details section you can get more details of the power at the resource level
by clicking on the different resource types in the graph as shown in the following figure. The
different resources views are organized as a tree table. You can drag a column header to reorder
the column arrangement. You can also click on a column header to change the sorting order.
If the reported power exceeds your thermal or supply budget, you can refer to Chapter 6: Tips
and Techniques for Power Reduction, for a list of available techniques to reduce the device
power. These techniques depend on the completeness of your design and your development
process’s tolerance to change.
IMPORTANT! When Maximum Process is selected in the Device table and any power-on supply current values
exceed the estimated operating current requirements, the Power Supply panel displays the minimum power-on
supply requirements, in blue. If any of the current values appear in blue, the total power indicated in the Power
Supply panel will not match the Total On-Chip power in the Summary section of Vivado Power Report.
Following properties can be modified before running the Report Power for the SD-FEC object
after implementation:
These three properties can also be provided during SD-FEC IP customization and using
set_property commands on an implemented design. Also, the generated .xpe file by Report
Power command can be imported to XPE spreadsheet for further what-if analysis.
Use the RF data converter IP customization to set all the user configuration values such as
ADC/DAC channel count, sample rate, clock source, decimation, mixer etc. Also, the power data
can be imported back to XPE sheet for further analysis of estimated power.
You can also locate HBM instance using Find in the Vivado® Integrated Design Environment as
shown in the following figure:
The property values can be modified before running report_power. The following properties are
used for power analysis:
Note: In the current release, PAGEHIT_PERCENT_00 and PAGEHIT_PERCENT_01 have a default value of
50. The default value will be corrected to 75 in a future release.
The following properties are assigned by HBM IP configuration and are not modified.
• DATARATE_00 to DATARATE_15: Data rate for each memory controller in Gbps. Properties
00 to 07 apply to Stack 0 and 08 to 15 apply to Stack 1.
• SWITCH_ENABLE_00, SWITCH_ENABLE_01: Reflects whether the dedicated AXI switch is
enabled or disabled for a stack.
The following figure is an example of Report Power output for HBM, showing the breakdown of
power between the device and HBM stacks.
All major parameters required for Report Power estimation can be set using UNISIM properties.
The UNISIM Properties are as follows:
• MODULATION_MODE: This is the GTM signal modulation scheme and is used to select NRZ
and PAM4 signaling.
• DATARATE: This is the GTM channel line rate for given modulation scheme. For PAM4, the
values range from 19.6 Gb/s to 58 Gb/s and for NRZ GTM linerate range is from 9.8 Gb/s to
29 Gb/s.
• FEC_MODE: This is the hardened RS-FEC usage. If this parameter is set to BYPASS, GTM
bypasses the hardened FEC block. To use FEC, set this attribute to 'KP4'.
• INTERFACE_WIDTH: This is the GTM interface width and this property is added for future
use. As of now, the interface width is derived from MODULATION_MODE.
• INS_LOSS_NYQ: This is the equalization mode. The value of this parameter should be less
than or equal to 10 dB for 'Low Power' mode and greater 10 dB for High Performance mode.
• TX_AMPLITUDE_SWING: This is the amplitude of TX driver's differential swing and the valid
values are 250, 275, 300, …, 1000, and1025.
Note: When the FEC_MODE parameter is set to KP4, GTM cannot bypass the hardened FEC block when
PAM4 signaling is used. You should ensure that the FEC_MODE parameter is set to KP4 using PAM4.
Chapter 4
Introduction
This chapter discusses the power-related features and flows available in the Vivado® Design
Suite to get you quickly started with power estimation, analysis, and optimization. You can
perform power analysis after synthesis, optimization, placement or routing. It is not supported
after RTL elaboration. You can perform power optimization only before and after placement.
Using either the Vivado Integrated Design Environment or the Tcl prompt, you can perform
power analysis and optimization, and can experiment with What If? scenarios in a dynamic
manner.
• Reporting the thermal characteristics that impact the static power of the design, including:
○ Thermal statistics, such as junction and ambient temperature values
○ Data on board selection, including number of board layers and board temperature
○ Data on the selection of airflow and the heat sink profile used by the design
• Reporting the device current requirements from the different power supply sources
• Allowing detailed power distribution analysis to guide power saving strategies to reduce
dynamic, thermal or off-chip power
The following figure shows the typical power estimation and analysis flow. This includes the main
steps required to ensure appropriate tool input and settings before running the estimation or
analysis, which ensures the most accurate results. You can run power estimation and analysis
commands from the Vivado Integrated Design Environment or the Tcl prompt.
Supported Inputs
• XDC constraints file to specify timing constraints.
• Simulation output activity file results from behavioral or timing simulation results (SAIF files).
• XDC/Tcl file commands to specify environment, operating conditions, tool defaults, and
individual netlist nodes activity. For UltraScale+™ devices, XPE dumps the XDC files that are
sourced from Vivado® Integrated Design Environment.
• The Vivado power analysis tool has multiple mechanisms to enter default values and node
activity rates. The list below presents the different mechanisms; the list is sorted from highest
priority to lowest.
1. Static (constant tied to GND or VCC).
2. User entered value in any of the Utilization Details views in the Power Results window.
Note: You can adjust default values in the Report Power dialog box. See Review Device/Design Settings
and Adjust Activity for Known Elements for more information.
Supported Outputs
• GUI I/O Bus, Net, and Cell Power properties
• GUI and text power reports
• XML based power report that can be imported into the Xilinx® Power Estimator spreadsheet
tool
• Reporting activity rates and operating conditions through Tcl commands
• The Power Results panel displays all the device, tool, and environment settings used with
power calculations.
• The Summary section displays a concise view of the most important thermal and supply
power results.
Navigate through your design by type of resources with the Utilization Details section or Netlist
view to review configuration, utilization, and activity details for the selected elements in the
Statistics tab of the Properties window. You can generate multiple reports to estimate power
under different operating conditions or different activity patterns. Some of the values in the
Utilization Details views (for example, Frequency in the Clocks view or Signal Rate in the I/O
view) are color coded as shown in the following figure to indicate the source of the value used by
Report Power to perform the power analysis. A legend at the bottom of the window indicates the
source specified by each color (for example, the value was supplied by a Simulation activity file,
or was User Defined, or a Default value was assigned by the vectorless propagation engine).
IMPORTANT! Report Power supports Zynq®-7000 SoC and Zynq® UltraScale+™ MPSoC power analysis on
Zynq-7000/Zynq® UltraScale+™ MPSoC blocks configured through the IP integrator. You configure the PS
usage and functionality through the IP integrator. Report Power estimates power based on these configuration
settings. The power estimate within Vivado® is read-only; you cannot edit the Signal Rate or Static Probability
of the PS specific processor, interfaces or memory at this time. For more details on the individual fields in the PS
tab of Xilinx® Power Estimator, refer to the PS Sheet section in the Xilinx Power Estimator User Guide (UG440).
IMPORTANT! Report Power supports power estimation of VCU (Video Codec Unit) for Zynq UltraScale+ EV
devices. VCU is configured through the IP integrator for resolution, color format and other properties. Report
Power estimates power based on these configuration settings. For more details, refer to the Other Sheet section
in the Xilinx Power Estimator User Guide (UG440).
When Report Power runs, the design power and the individual power supply current are
compared with this power and supply current budgets. Report Power (GUI/Text) will indicate the
power budget margin. It displays either the positive margin if the design power is less than the
power budget or a red negative margin, if the power budget exceeds the design power. If you
have not provided a power budget, then the report will display N/A for the margin. The following
figure shows the Power report when you do not specify any design power budget.
The following figure shows the power report when the design power budget is specified as 4
Watts and power margin is positive. It also displays the power margin in a negative state when
the design power budget is specified as 2 Watts.
Figure 33: Power Report With Power Budget at Positive and Negative Margins
The supply current budget for each power supply is displayed in the Power Supply section. For
each supply, Report Power displays positive margin when the supply current is less than the
specified budget and a negative margin appears in red when supply current exceeds the specified
current budget. If the current budget is not specified for any supply, then Report Power displays
the current budget as unspecified and margin as N/A for that supply rail as shown in following
figure:
open_report
When you run and open implemented design in the project mode, you see that the power report
impl_1 opens up by default like a timing report. In the checkpoint flow, you can save the report
using -rpx option with report_power tcl command:
This saved report can be restored in Vivado Integrated Design Environment using the following
Tcl command:
In the example above the Toggle Rate has been set to 1.5% and Static Probability is set to 0.8. On
the Tcl console the following XDC constraint will be displayed when the Vivado Integrated
Design Environment commits the change on OK.
IMPORTANT! This XDC constraint makes your design out of date so use Force-up-to-date to restore the design
status.
In the Vivado Integrated Design Environment, select Tools → Power Constraints Advisor to run
the Power Constraints Advisor.
Review the report table and modify inaccurate switching activity on critical control signals such
as inactive enables and reset signals that are asserted for excessive periods of time. The
following constraints are available in the Power Constraints Advisor report:
• Net: The nets are control sets, block RAM enables or Reg Enables.
• Confidence: This field shows how accurate the switching activity is for a particular net.
Following are the thresholds used by the power tools when computing the confidence level
for nets:
Low confidence means that the block RAM is not active in the design and should be
revisited to check the possibility of removing it.
• Reg Enables:
Low Confidence informs you that the Register in the design is not active and should be
revisited. Medium Confidence informs you that the registers are enabled with reasonable
amount of time either defined by you or propagated by tool.
• Fanout: This field shows the fanout for each control signal, which is the number of driven leaf-
level primitives. Signals with higher fanout are the most important for review and correction
because they are capable of disabling downstream switching of large portions of the design.
This may result in severe under-reporting of power. Low-fanout signals with inaccurate
switching will have less impact and are therefore not important.
• Fanout Type: This field specifies if the nets are control sets (set, reset, clear, preset) or bram
enable. If there are multiple entries for any control net, it means that those particular nets
have multiple fanouts and they are driving different pins in fanout cells.
• Polarity: This field identifies the polarity for the control set. You should pay attention to the
polarity while setting the static probability of a net.
• Static Probability: This is editable filed and you need to enter the correct activity based on the
fanout type and polarity of the net.
• Toggle Rate: Toggle rate for the net. This is also editable and you need to enter this field based
on the static probability.
Note: By default, PCA will be sorted by Confidence as Low and Fanout as high to low. Also, the column
filtering is enabled for PCA wizard. To use column filtering, right-click on header row and click Enable
Column Filtering.
The following process is recommended for using the Power Constraints Advisor:
1. Click the Confidence column to sort it so that LOW signals are in top.
2. Hold down the Ctrl key and click the Fanout column twice to sort it by descending values.
3. Review and define new Static Probability and Toggle Rate for all the control nets which are
LOW in confidence with fanout greater than 200.
4. Click OK to apply the constraints to the design and rerun the Report Power command.
The following are some of the examples which help you to set accurate switching activity for the
control sets and block RAM enables:
This indicates that the reset is high (active) 90% of the time. This means that the load cells are
reset for 90% of the time, which is excessive. Change the switching activity to indicate that the
reset is inactive, a more realistic condition, by setting the Static Probability to 0 and Toggle Rate
to 0.
This indicates that the BRAM is never enabled, which is overly pessimistic. Assign a more
reasonable switching activity on the BRAM Enable such as a 25% enable rate, setting the Static
Probability to 0.25 and Toggle Rate to 50. Use the following command to generate the text
report for power advisory:
Advisory table will be added at the end of the this report file.
• None: This is the default mode. In this mode, the report power tool will not set any value and
leave the activities as comes after vector-less propagation.
• Deassert: When you select this option, the report power tool will deassert all the resets in the
design.
• Do Not Deassert: In this mode, changes of deassert option will be reverted back to original
value.
set_switching_activity -deassert_resets
reset_switching_activity -no_deassert_resets
This is equivalent to Do Not Deassert option for Switching Activity for Resets. The Deassert
option will not be set in the following exceptional conditions:
• If a reset net is connected to pins of different polarity. For example, if a reset net is connected
to both the active-High reset pin and active-Low reset pin, then the command would not try
to set value on this net.
• If a net connected to active-High reset pin is also connected to an active-High enable pin at
the same time, then this command does not do anything.
• Nets connected with synchronizer circuits which provide an asynchronous clear and
synchronous deassert functionality to avoid meta-stability issue crossing different clock
domains.
To enable the switching activity reporting on schematics, click on the setting icon at the top right
hand corner on schematic view and select the SP/TR for scalar or bus pins.
• Device Environment
• Netlist Element Activity
• set_case_analysis
Device Environment
Specify all device operating conditions settings such as:
○ Heat sink
○ VCCAUX
○ VCCO
○ Process corner
• report_operating_conditions
Report all or the specified operating condition settings. Examples are:
report_operating_conditions # Reports all
report_operating_conditions -voltage
• set_operating_conditions
Modify the specified operating condition parameters. Examples are:
set_operating_conditions -process maximum -junction_temperature 50
• reset_operating_conditions
Return all or the specified operating condition parameters to the default values for the
selected device. Examples are:
reset_operating_conditions # Resets all
reset_operating_conditions -voltage
• set_switching_activity
Set the activity of the specified elements. You can set either static probability and signal rate
or static probability and toggle rate. Examples are:
○ To set default switching activity on primary ports and black box outputs of the entire
design:
set_switching_activity -default_static_probability 0.5 -
default_toggle_rate 12.5
IMPORTANT! Signal rate must be > 0 when static probability is > 0 and <1. Similarly, static probability must
be 0 or 1 when signal rate is 0. Static probability and signal rate must be specified together.
Note that the toggle rate is specific to the clock associated with the element and the valid
range is 0 to 100.
• Setting Switching activity on a group of nodes:
The set_switching_activity command can also be used to set activity rates on a group
of nodes (called types), using the -type option. The supported types are listed in the following
table:
The following section describes usage in the set_switching_activity command. To set the
specified switching activity on all LUTs in the design top scope:
To set the specified toggle rate and static probability on all registers in the hierarchy of CPU/
MEM:
To set the specified toggle rate and static probability on all registers in the hierarchy of CPU/ and
the hierarchy underneath:
IMPORTANT! Ideally, toggle rate should not include glitch rate in it, which implies that the following condition
must be satisfied, (toggle_rate/200) =< static_probability =< 1-(toggle_rate/200.
Use the signal rate setting for considering glitch switching, along with actual activity rate.
IMPORTANT! The set_switching_activity command will not have any effect on design clock nets. To change
the activity on the clock nets, please use timing constraints (create_clock,
create_generated_clock, set_case_analysis etc).
• report_switching_activity
Reports the activity of the specified elements. Displays static probability, signal rate and
toggle rate. The command also displays the source of the assigned switching activity.
Examples of report_switching_activity commands are:
○ Report static probability, signal rate, and toggle rate for a single net:
Vivado% report_switching_activity -static_probability [get_ports clk_p]
clk_p: static probability = 0.5 (C) signal rate = 400 (C) toggle rate
= 200 (C)
The source of the assigned switching activity is expressed as: (C)(D)=Tool Default, (S)= SAIF
Annotated, =XDC Constraints, (A)=User Assigned.
○ Report on group nodes:
To report switching activity for all distributed RAMs in the hierarchy CPU/:
report_switching_activity –type lut_ram [get_cells CPU/*]
See Table 4: Types (-type option) in Switching Activity Tcl Commands table for information
on the supported types.
• reset_switching_activity
Resets the activity rates (static probability, signal rate, and toggle rate) on specific netlist
elements to the tool default value. The command resets both user specified values and
Simulation activity rate settings. Examples are:
○ To reset default switching activity on primary ports and black box outputs of the entire
design:
reset_switching_activity -default
To reset the switching activity for all LUTs in the hierarchy CPU/ and levels underneath:
reset_switching_activity –type lut –hier [get_cells CPU/MEM]
See Table 4: Types (-type option) in Switching Activity Tcl Commands table for information on
the supported types.
• read_saif
Read an SAIF simulation output file and annotate matched netlist elements with the switching
activity described in the file. An examples is:
read_saif -out_file read_saif.rpt -strip_path tb/tb_core/core -file
routed.saif
• strip_path: By default it is assumed that the design top is instantiated in the test bench.
Thus the first two levels of hierarchy are stripped while annotating SAIF data into the
design. If the simulation setup has multiple hierarchy levels, then you are expected to
specify the hierarchy to be stripped off from SAIF to better match the actual design.
The read_saif command also displays the SAIF annotation summary to show the number of
design nets matched. Ideally 100% design net match is expected for an accurate analysis.
IMPORTANT! If your design contains any encrypted IP/Blocks, your simulator will not dump the SAIF
information for those IP/Blocks and for any internal blocks within the encrypted hierarchy. This incomplete SAIF
information might affect the power estimation accuracy. The read_saif command will not modify the
activities on the design clock nets. Clock nets activities will be driven by the timing constraints.
read_saif command can be executed multiple times with each saif file. This will enable you to
read multiple saif files for different blocks in design. Report power then estimates the power by
considering the switching activity information from all the saif files. If common nets exist in
multiple saif files, then the switching activity will be applied from the last read saif file using
read_saif command.
• create_clock
Synthesis and implementation constraint to specify clock waveforms. An example is:
create_clock -name clk -period 5 [get_ports clk]; # 200MHz
• create_generated_clock
Synthesis and implementation constraint to specify generated clock waveforms. An example
is:
create_generated_clock -name gen_clk -source clk1 -divide_by 2 [get_net -
hier sys_clk]
• set_input_delay
Associates primary inputs to the specific clock. This is very important in a multi-clock design,
especially if the primary port is launched at a different clock. An example is:
create_clock -name clk1 -period 5 [get_ports clk]
Note: If the primary ports are not associated with any clock, then the switching rate is computed based
on the capturing clock in the path.
By default, create_clock and create_generated_clock are defined in the XDC file and
you need not rerun them. However, to do What If? analysis, such as by changing the clock
frequency for Report Power, create_clock or create_generated_clock must be used to
reflect the change.
set_case_analysis
For global clock primitives (BUFG, BUFGCE, BUFGCE_DIV, BUFG_GT, BUFGCTRL), the enable /
selection of clock is determined by set_case_analysis command. This command guides the
timing analyzer to identify the clocks across clocking logic. For example the select signal of
BUFGMUX must be set using set_case_analysis to guide the timing analyzer's clock selection.
This in turn helps Report Power to estimate power using the right clock. For BUGCE block, CE
input must be set using set_case_analysis to enable or disable the clock output.
• Make sure the activity is defined for all clocks in your netlist.
• If possible, specify the activity of all primary input ports in your design using the Tcl
commands or reading a simulation output file. These port activity rates determine the internal
logic activity rates. Therefore, if the tool’s default settings do not match your application, the
internal logic activity may be overestimated or underestimated.
• If known, specify the activity of any high fanout nets that you defined in your HDL code, such
as global set, reset, and clock enable signals.
When reading the simulation result file, make sure the activity is representative of the worst case
design functional activity (that is, the simulation result at which the maximum design code
coverage is achieved). Using simulation results from basic and corner case tests can lead to
inaccurate power estimations.
• Environment
• Device
• Implementation
• Power tool
Figure 39: Text Report Generated for Power and Thermal Information
You can also use a Tcl script. The script examples below assume that you are using the batch
mode for sourcing the script.
# Open example project with HDL source files and timing constraints
create_project project_1 $work_dir/project_1 -part xc7k70tfbg676-2 -force
set_property target_language VHDL [current_project]
instantiate_example_design -template xilinx.com:design:cpu_hdl:1.0
#----------------------- Run Synthesis then Power estimation
-----------------
#open design
open_run synth_1
#open design
open_run impl_1
#----Run various Implementation steps then run Power estimation after every
step ----
opt_design
report_power -verbose -file ex1_post-opt_design.pwr
power_opt_design ;# Optional
report_power -verbose -file ex1_post_pwr_opt_design.pwr
place_design
report_power -verbose -file ex1_post_place_design.pwr
phys_opt_design ;# Optional
report_power -verbose -file ex1_post_phys_opt_design.pwr
route_design
# Report power
report_power -file ex3_power_before.pwr
# disable reset and enable clock enables in module fftEngine most of the
time
set_switching_activity -static_probability 0 -signal_rate 0 [get_nets
fftEngine/reset_reg]
set_switching_activity -static_probability 1 -toggle_rate 0 [get_nets
fftEngine/wb_we_i_reg]
report_power -file ex3_power_no_reset_activ.pwr
report_switching_activity [get_nets fftEngine/reset_reg fftEngine/
wb_we_i_reg]
# enable reset and disable clock enable in module fftEngine most of the time
set_switching_activity -static_probability 1 -toggle_rate 0 [get_nets
fftEngine/reset_reg]
set_switching_activity -static_probability 0 -signal_rate 0 [get_nets
fftEngine/wb_we_i_reg]
report_power -file ex3_power_reset_activ.pwr
report_switching_activity [get_nets fftEngine/reset_reg fftEngine/
wb_we_i_reg]
The dynamic power consumption of a device is determined by the operating clock frequency (f),
node capacitance (C), device operating voltage (V), and the activity (α) on various nodes in the
design. For most designs, several of the above parameters are typically fixed either by the device
technology (for example, voltage) or by design requirements (for example, operating frequency).
However, there are several nodes in the design that do not affect the output of the device but
still continue to toggle. This constitutes a significant portion of wasted dynamic power. You can
use the clock enables (CE) in the device for gating such nodes. While this is possible through
optimal coding techniques, this is rarely done by the designer either because the design contains
intellectual property (IP) from other sources or because of the amount of effort involved in
performing such fine grained clock gating. Vivado automates these power optimizations under a
single command to maximize power savings while minimizing your effort.
Vivado performs an analysis on the entire design, including legacy and third-party IP blocks, for
potential power savings. It looks at the output logic of sourcing registers that do not contribute
to the result for each clock cycle and then creates fine-grained clock gating and/or logic gating
signals that neutralize unnecessary switching activity.
Before After
Power Power
Consumption Consumption
sig sig
CE
X14352-010720
The intelligent clock gating optimization also reduces power for dedicated block RAM in either
simple dual-port or true dual-port mode as shown in the following figure. These blocks provide
several enables: an array enable, a write enable, and an output register clock enable. Most of the
power savings comes from using the array enable, and the software implements functionality to
reduce power when no data is being written and when the output is not being used.
Before After
address address
data in
ce
X23604-010720
Xilinx® intelligent clock gating optimizations do not modify user logic but instead create
additional gating logic. Therefore the functionality of the design is preserved at all times.
However, this optimization could impact timing, especially if the optimization is applied on
critical paths.
In UltraScale™ devices, in addition to the above optimization, for block RAM in Simple Dual Port
(SDP) mode, WRITE_MODE of both the read and write ports can be changed to NO_CHANGE
safely if the read and write port clocks are asynchronous. These changes help to save power in
the write cycle by not updating the output port of the block RAM. This optimization will be
performed only when there is no impact to user defined functionality and performance.
• opt_design
• power_opt_design
Optimizations that are performed during the opt_design phase occur without user intervention.
These optimizations primarily focus on power savings on block RAMs.
IMPORTANT! The power optimization might impact the timing performance of your design during
opt_design, power_opt_design, or both.
For UltraScale™ devices, the more aggressive block RAM power optimizations that may
negatively impact timing are included only in power_opt_design. This allows performance to
be traded for power savings. For UltraScale+™ devices, XPM-URAM power optimization occurs
in power_opt_design.
By default the opt_design command performs block RAM power optimization. Block RAM
power optimization can also be run explicitly and standalone by using the -bram_power_opt
option:
opt_design -bram_power_opt
To disable block RAM power optimization from the default opt_design flow, set the
NoBramPowerOpt directive to the opt_design command:
You can also set this directive in the Implementation settings window as shown in the following
figure.
Figure 42: Disabling block RAM Power Optimization During Opt Design
IMPORTANT! Power Opt Design can be enabled either pre-place or post-place in the design flow, but not in
both places. See Running Power Optimization for more details.
1. In the Flow Navigator, select Open Synthesized Design or Open Implemented Design.
2. Select Reports → Report Power Optimization.
The equivalent Tcl command to perform this operation is:
report_power_opt -name <report_name>
3. In the Report Power Optimization dialog box, specify the following options.
• Results name: Specify the name under which the power optimization report appears in the
Vivado Integrated Design Environment.
• Export to file: Check this box to generate a text report in addition to the power
optimization report in the Vivado Integrated Design Environment. Specify a file name and
location for the text report, and select whether this is a TXT or XML file.
• Open in a new tab: Check this box to add this new power optimization report to any other
power optimization reports currently displayed in the Vivado Integrated Design
Environment. Leave this box unchecked to replace any power optimization reports
currently displayed in the Vivado
4. Click OK.
A power optimization report appears in the results windows area of the Integrated Design
Environment with this new power optimization report.Vivado Integrated Design
Environment.
You can select from different views of the power optimization report.
• General Information: Information about your design, the Xilinx® device into which your
design is implemented, and the Tcl command that generated this power optimization report.
• Summary: Count of block RAMs, SRLs, and Slice Registers that were optimized by the user in
the design and by the power optimization tool.
• Recommendations: Things you can do to further optimize your design for power.
• Hierarchical Information: Details of the block RAMs, SRLs, and Slice Registers for which
Vivado has performed power optimization.
For a description of the power optimizations Vivado Integrated Design performs, see Power
Optimization Feature and Block RAM WRITE_MODE Power Optimizations.
TIP: If any hierarchical module or instance is tagged with a DONT_TOUCH attribute, Power Optimization does
not optimize this logic.
• set_power_opt
• opt_design -bram_power_opt
• power_opt_design
• report_power_opt
These commands can be used to enable power optimization as well as control portions of the
design that are to be optimized, and to generate a report that shows the effect of the
optimizations performed. For information on options, properties, applicable elements, or
returned values for a specific command:
TIP: You need to use the power_opt_design command to enable the power optimization step. The
set_power_opt command is used only for targeting the optimization.
Examples
The following example sets power optimization for block RAM and REG type cells, then adds
SRLs:
The following example sets power optimization for BRAM cells only, then excludes the
cpuEngine block from optimization, but then includes the cpuEngine/cpu_dbg_dat_i block:
synth_design
opt_design
power_opt_design
place_design
route_design
report_power
Report power
-cell Yes Top level optimization for a
specific cell
Option Name Optional Default Description
-file Yes None Write the report into the
specified file. The specified
file will be overwritten if one
already exists
-quiet Yes N/A Ignore command errors
-verbose Yes N/A Suspend message limits
during command execution
Examples
The following example creates a file named myopt.rep and reports power optimization for the
entire design:
The following example creates a file named myopt.rep and reports power optimization for the
mctrl0 sub-hierarchy of the design:
If the design has been constrained correctly, then review the design for potential coding styles
that could impact power optimizations. The three areas of potential debug are the global set and
reset signals, block RAM enable generation, and register clock gating. A low number of power
optimization generated enables could indicate the need to review coding practices or options/
properties set for design synthesis and implementation.
Finally, ensure that the signal rate and probabilities of the global set and reset signals are set
correctly prior to running power optimization and vectorless power estimation.
• Slice registers and SRLs
A number of different reasons could explain why power_opt_design might not be able to
generate clock enables for slice registers or SRLs in the design. Some examples are:
○ Having combinatorial loops in the design
○ Using set/reset signals at the flip-flops and SRLs that are sourced from primary inputs to
the design
○ Using asynchronous set/reset signals at the datapath flip-flops
○ Large number of clock domains in the design preventing enables being generated due to
clock domain crossing issues
○ SRL sizes: Typically the larger the number of shift register stages in the SRLs, the more
difficult it is to generate a single clock enable for all stages
• Block RAMs
Block RAM (BRAM) rich designs are excellent candidates for power savings. Vivado uses a
variety of optimization techniques to generate enables and save power. If block RAM gating
coverage is low after using power_opt_design, some of the possible reasons could be:
○ Block RAMs are mainly FIFO18/FIFO36 cells. These cannot be optimized by the tool.
○ Memories inferred or instantiated are mainly in true dual port (TDP) mode using
asynchronous clocks on their A and B ports that cannot be optimized by
power_opt_design.
○ Use of asynchronous reset signals to either the block RAM themselves or to the address/
write-enable flip-flops feeding the block RAMs.
Where possible, identify and apply power optimizations only on non-timing critical clock
domains or modules using the set_power_opt XDC command. If the most critical clock domain
happens to cover a large portion of the design or consumes the most power, review critical paths
to see if any cells in the critical path were optimized by power optimization. Note that objects
optimized by power optimization have an IS_CLOCK_GATED property on them. Exclude these
cells from power optimization. To locate clock gated cells, you can use the following Tcl
command:
You can use the Find Dialog box to locate these cells as shown in the following figure.
A simpler alternative is to limit power optimization to block RAMs. This minimizes the timing
impact but its effectiveness is dependent on the number of block RAMs present in the design
and how effectively they have been gated. To limit power optimization to block RAMs, run a
set_power_opt -cell_types {bram} command before running the opt_design or
power_opt_design commands.
Chapter 5
Introduction
Accurate power estimation is always challenging for the software tools, because the tools have
to assume various factors on their own. If you can guide the tool as much as possible to minimize
these assumptions, you can achieve a more accurate power estimation. For an accurate power
analysis, the following factors must be considered:
• Thermal Settings
• Power Supply Settings
• Clock Specifications
• Control Signals
• Primary Inputs
• Component Level
Thermal Settings
Ideally, static power is the sum of source to drain and gate leakage power in the transistor. Static
power is purely dependent on Thermal conditions. Providing more accurate thermal information
is a basic requirement for accurate power estimation.
Process Corners
When devices are fabricated, each device has variations of performance and power consumption,
due to the manufacturing process. Report Power offers static power estimation for two process
corners, TYPICAL and MAXIMUM. Ideally all devices should meet the TYPICAL estimation value.
But process variations result in a distribution of devices, which needs to be centered on the
TYPICAL value, adjusted manually based on process variation for any particular device. A
MAXIMUM setting, however, guarantees that the reported numbers are within operating range
and closer to hardware measurements. At a fixed Junction Temperature, the expected variation in
static power from TYPICAL to MAXIMUM would be ~2.5X on Commercial devices.
IMPORTANT! Use the MAXIMUM Process setting to achieve worst-case static power accuracy.
In Vivado®, the default Process is TYPICAL in Report Power. This can be changed to MAXIMUM
in the Environment tab of the Report Power dialog box:
Junction Temperature
Leakage current increases exponentially with Junction Temperature, which results in higher static
power. Junction Temperature depends on various factors: the total power of the device, the
cooling system, board selection, and ambient conditions. By default the Junction Temperature is
computed based on other Thermal setup inputs: Ambient Temperature, Heat Sink, Board
Selection, etc. Because Junction Temperature is directly proportional to total power, it varies
when dynamic power increases. It is very important to specify the right Junction Temperature to
estimate accurate static power.
IMPORTANT! Read the Junction Temperature at the time when power is measured on the hardware and
overwrite the existing setting in the Report Power dialog box.
To set Junction Temperature in the Vivado® IDE, enable the Junction Temperature check box in
the Environment tab of the Report Power dialog box and enter the value.
set_operating_conditions -junction_temp 45
You can measure approximate Junction Temperature by placing a simple thermistor or other
hand-held temperature measurement device on the Xilinx device. If one of the Xilinx Hardware
Programing tools is used to program the devices, then you can read the Die Temperature values.
For example, ISE-Impact reads Die Temperature values when you select Debug → Read Status
Register. Vivado Hardware Manager graphically drafts the Die Temperature plots in the System
Monitor Window.
IMPORTANT! Specify accurate power supply values in the Power Supply tab of the Report Power dialog box.
To specify power supply voltages in the Vivado® Integrated Design Environment, enter the
values in the Power Supply tab of the Report Power dialog box.
Clock Specifications
Design clocks are the main component for dynamic power computation. If no clocks are defined,
switching activity estimates will be inaccurate, resulting in inaccurate power estimates. A clock
node is identified from timing constraints which are defined using create_clock or
create_generated_clock XDC commands.
Note: All the required clocks in the design must be defined using create_clock or
create_generated_clock commands.
The Switching tab of the Report Power dialog box displays all the clocks defined in the design.
Make sure all the clocks defined in the design are displayed. Once Report Power runs, the Power
Report confirms the percentage of clocks defined in the design when you view the Confidence
Level details from the Summary page. This guides you to make sure there is a HIGH confidence
level on Clock Activity.
In Tcl mode, use the get_clocks and report_clocks commands to get the list of defined
clocks. The text report gives the Confidence Level on Clock Activity:
Control Signals
Global and Regional Resets
The Activity rate on Global Resets could change the power estimation dramatically. It conveys
the state of each logic block in the design and the probability of logic output changes. If it is not
set with the right switching information, you can get unrealistic power estimates. For example,
ideally Reset is expected to be asserted (active) at the beginning of the run for a few cycles and
remains inactive the rest of the time. This could be denoted in terms of switching activity as:
Report Power identifies primary ports which are found to be global resets and applies the above
switching activity. It uses a very conservative and safe way to identify the global resets - the
ports which are directly connected to Reset pins of leaf primitives. However this does not help
much on complex designs where the Reset logic is generated internally through special logic
circuits (reset generator, debouncer, reset stretching, etc). When there is logic involved to
generate Reset, Report Power is not aware of design intent and does not apply any default
switching information on it.
X14354-012120
In this situation, the Reset activity information is derived from the generated logic using a
probabilistic computation and propagation algorithm. Probabilistic computation is done at the
leaf primitive level of logic. At times, the probabilistic algorithm lags handling of specific logic
blocks, such as deep nested feedback logic. This results in unexpected switching activity on
Reset nets.
RECOMMENDED: Make sure to supply the correct switching information on global/regional Reset nets.
The designer is expected to be aware of such global reset nets in the design. Set activity rates
directly on these nets in the Power tab of the Net Properties window.
The Power Report also helps identify the Reset nets in the design, so you can verify the switching
information on these nets and take corrective action. You can run a first trial run of Report Power
using the default settings to analyze the activity on Reset nets.
Note that the Power Report also shows the number of logic cells that are affected by this Reset
net: Fanout. If the initial switching activity estimation does not seem correct, you can select the
net in the Power Report (as shown above) and edit the Power properties in the Net Properties
window.
Note: Report Power displays both Preset/Set and Reset nets combined in the design. The above guidelines
for Reset nets also apply to Preset/Set nets.
For example, Enable is expected to be asserted (active) throughout the run and remains inactive
only when the logic cell is not being used - if at all explicitly controlled to save power. This could
be denoted in terms of switching activity as:
Report Power identifies primary ports which are found to be global enables and applies the
above switching activity. It uses a very conservative and safe way to identify the global enables:
the ports which are directly connected to CE pins of leaf primitives.
RECOMMENDED: Make sure to supply the correct switching information on global/regional Enable nets.
The Power Report also helps identify such Enable nets in the design, so that you can quickly
validate the switching information on these nets and take corrective action. You can run a first
trial run of Report Power using the default settings to analyze the activity on Enable nets.
Note that the Power Report also gives information about the number of logic cells that are
affected by this Enable net, in the Fanout and Logic Type columns. If the initial switching activity
estimation does not seem correct, you can select the net in the Power Report and edit Power
properties in the Net Properties window.
Primary Inputs
Common nodes are taken care of with the above recommendations. However, design specific
handshaking (protocols, memory interface, etc.) and data ports also need attention. Ideally, the
activity rates on primary ports decide the overall activity of the design, which influences the
dynamic power accuracy. Report Power assigns a default switching activity of Toggle_rate=12.5
and Static_Probability=0.5 on primary inputs (except clock and control ports). These values mean
that the port will toggle once in eight clock cycles, and 50% of time the port stays at High (Logic
1). This assumption works fairly well on data ports. But it will have a huge accuracy impact when
it is applied to handshaking nodes. This emphasizes the importance of correct switching
information settings on primary inputs. The default activity settings can be found in the
Switching tab of the Report Power dialog box:
You can change the default values which will be applied to all primary inputs (non-clock and non-
control). Equivalent Tcl command:
The same activity rate is applied to all the primary inputs - Report Power does not understand
and distinguish handshaking ports from data ports. So it is important to specify the activity rates
manually for the handshaking ports. This can be done either through the Vivado® Integrated
Design Environment or a Tcl command.
Note: Make sure correct switching values are set on primary I/O Ports.
In the Power Report, the I/O section lists all the ports and corresponding switching activity
information.
Verify the activity rates on I/O ports. To change the activity rate, select the input port in the
Power Report and edit the Power properties in the I/O Port Properties window.
Component Level
Finally, monitor the activity rates across major power consuming primitives in the design. After all
the above points are taken care of, the activity rates across the hard blocks such as block RAMs,
GTs, and DSPs should reflect meaningful values. However, Xilinx® recommends you to double-
check these values, to make sure that there are no internal logic propagation or modeling issues
in the tool.
For example, one known limitation is that the Report Power does not propagate activity rates
across GTs. If any GT data outputs are consumed by logic, you must set activity rates explicitly on
GT TX/RX outputs.
Report Power offers a simple interface in the Report Power dialog box to set the output activity
rates on various types: registers, shift registers, LUTs, RAMs, block RAMs, DSPs, and GTs. These
settings are the equivalent of the -type argument of set_switching_activity command. After a
value is set, it is retained for subsequent power reporting runs. Global settings affect all the
instances of hard primitives in the design. For example, a Toggle Rate set on block RAMs will be
applied to all the block RAMs in the design. Alternatively, the Cell Properties window could also
be used to change the activity rates. In the Power Report, review block RAM, DSP, and GT
sections:
To change the activity rate, select a hard block instance in the Power Report and edit the Power
properties in the Cell Properties window.
• To set activity rates on all BRAMs in the specific design hierarchy instance u1/transmit:
set_switching_activity -static_probability 0.25 -toggle_rate 10 -type
bram [get_cells u1/transmit]
Chapter 6
Introduction
This chapter describes power reduction techniques and their expected effect on total device
power. This information will help you evaluate your best options depending on your time, power
budget, available resources, and freedom to change your design.
Supply Strategy
Voltage has a large effect on both static and dynamic power. Active control of the voltage level
ensures the desired voltage is applied to the device.
Sense voltage as close as possible to the device and to the highest consuming device if the
same supply powers multiple devices.
• Select regulators with tight tolerances.
Device Selection
• Select the best device for the product.
Increasingly, power is becoming one of the primary factors for selecting a device. Select the
device that best meets your density, functionality, and performance requirements and will also
meet your power budget.
• Minimize the number of devices.
This saves space, I/O interconnect power, total leakage, and other factors. Typically, replacing
multiple components (for example, processor and device) with a single larger device consumes
less static power.
• Select the smallest device possible.
This reduces leakage. Typically in a device family the same package may be available with
different die sizes. You can, for instance, use a larger die during the prototyping and pre-series
phase, then move to a smaller die for volume production.
• Select the largest package possible.
This increases heat dissipation. A larger package has a larger area to dissipate the die heat into
the environment. A larger heat sink can be attached to the package upper side and more heat
can escape onto the PCB via the bottom ball grid array.
• Use low voltage devices.
Some device families are available with a lower power option. The lower core voltage
requirements translate into significant static and dynamic power savings.
• Use low leakage devices.
Some device families are available with a lower leakage or static power options in the form of
specific speed or temperature grades. These devices may cost a bit more to purchase but you
or the end user may be able to more than offset this with savings on the electricity bill or
cooling hardware and system maintenance.
• Device Static
• Design Static
• Design Dynamic
Device Static
Download a blank design to ensure that: (1) no input noise is captured; and (2) all internal logic
and configuration circuits are in a known state.
Note: A blank design is a design with a single gate or flip-flop that never toggles, and in which all outputs
are in a 3-state configuration.
Wait for the junction temperature to stabilize, then measure VCCINT, VCCAUX, and any other
supply source of interest. With special equipment, a simple heat gun, or cold spray, you can force
temperature changes to evaluate the influence of the environment on the device static power.
VCCADC should always be connected to VCCAUX.
Design Static
Download the design onto the device and do not start any input or internal activity (input data
and external and internal clock generation). Wait for the device temperature to stabilize, then
measure power on all supply rails of interest. Subtracting the device static measurement from
these values gives you the additional static power from the specific logic resources and
configuration used in your design (design static power).
Design Dynamic
Download the design onto the device and provide clocks and input stimulus representative of the
design. Wait for the junction temperature to stabilize before measuring all supply sources of
interest. This power represents the instantaneous total power of the design. It will vary with the
change in activity at each clock cycle.
• You want to further optimize your design after constraints are met.
Or
• Your design has exceeded its power budget.
Step 3: Experiment
With the list of candidate areas in your design for power optimizations derived from the previous
step, you can now sort this list from easiest to most involved and decide which optimization or
experiment to perform next. The power tools allow you to do What If? analysis so you can
quickly enter design changes and estimate the power implications without having to actually edit
any code or constraint or rerun the implementation tools.
• The amount of power block RAM consumes is directly proportional to the amount of time
it is enabled. To save power, the block RAM enable can be driven Low on clock cycles
when the block RAM is not used in the design. Block RAM Enable Rate, along with Clock
rate, is an important parameter that must be considered for power optimization.
• Use the NO_CHANGE mode in the TDP mode if the output latches remain unchanged
during a write operation. This mode is the most power efficient. This mode is not available
in the SDP mode because it is identical in behavior to WRITE_FIRST mode.
• I/O: I/O interfaces have to drive long distances with potentially more parasitic effects, hence
they typically represent a large portion of the device power requirements.
• VCCAUX: Use the lowest VCCAUX possible. This minimizes both the static and dynamic
power for this voltage supply.
• I/O Configuration: Review the I/O standard, drive strength, and on-chip termination
settings in the context of your performance needs and evaluate if you can use lower drive
strength using tristatable DCI I/O standards (T_DCI), get by without terminations, or use
external terminations.
• Outputs:
• Transceivers:
• The GTX/GTH/GTP transceiver supports a range of power-down modes that may save
power if applicable.
• There are two types of adaptive filtering available to the GTX/GTH receiver depending on
system level trade-offs between power and performance. Optimized for power with lower
channel loss, the GTX/GTH/GTP receiver has a power-efficient adaptive mode named the
low-power mode (LPM).
• Each GTX/GTH/GTP transceiver provides support for generating the out-of-band (OOB)
sequences described in the Serial ATA (SATA), Serial Attach SCSI (SAS) specification, and
beaconing described in the PCI™ Express specification. If OOB sequence is not used, this
could further save power.
• Pack the maximum number of transceivers into a single tile to minimize duplicating
supporting circuits.
• XADC:
• The XADC can be powered down by writing to its Configuration register #2 (Address
0x42) from the DRP port during run time. Bits DI4 and DI5 of this register control the
power-down for each channel. To statically emulate power-down behavior in Vivado®, the
configuration registers can be set by entering this command in the Vivado Tcl console:
set_property INIT_42 {16'h0430} [get_cells <inst>]
where <inst> is the XADC instance. The above command powers down both channels of
the XADC.
• Logic:
• Minimize asynchronous control signals which prevent logic optimization and use more
routing resources.
• Minimize the number of control sets. A control set consists of the unique grouping of a
clock, clock enable, set, reset, and, in the case of LUT RAM, write enable signals. Control
set information is important because count limits or sharing of signals within a slice may
occur. This varies with the device architecture, and when the limit is reached can prevent
proximity packing of related logic, which would increase routing resources.
• Add pipeline levels to minimize the size of combinatorial logic cones. This minimizes the
propagation of glitches between registers until signals reach their final state at each clock
cycle.
• Use resource time sharing. These techniques minimize device resource usage by time
multiplexing different functions to the same hardware resources. This allows you to use a
smaller device or can reduce placement and routing congestion, which will lower both
static and dynamic core power.
• Processes which are slow and similar can be performed on the same resources instead of
separate resources. This requires careful thinking for how to buffer, multiplex, initialize,
and control the data to be processed. Typical applications for such optimization are similar
parallel processes, such as processing multiple input sensors. Instead of having as many
processing units as inputs, you could use a single processing unit and make it run faster, so
it processes input channels one after the other while ensuring the same response time for
each output. A Xilinx® Power Estimator What If? estimation can help you decide whether
the power savings are worth the engineering effort.
• Use the DSP and block RAM optional registers. For example, in DSP blocks the multiplier
or MREG registers, when enabled, are the most power efficient implementation as they
minimize the propagation of internal glitches between clock cycles.
Note: In the Vivado tools, power optimization works to minimize the impact on timing while maximizing
power savings. However, in certain cases, timing may degrade after power optimization. For techniques to
offset this impact, see Preserving Timing After Power Optimization.
Where possible, minimize the use of asynchronous set/reset signals especially to datapath or
pipeline flip-flops as well as block RAMs. You should also consider constraining the global set
and reset signals as dont_touch during the power_opt_design step to avoid their use as
enables. Note that setting the dont_touch property in HDL will cause every step in the flow
to obey this property. It is recommended that this option is set up as an XDC constraint only
for the power optimization step. Here is an example of how to do this:
Finally, ensure that the signal rate and probabilities of the global set and reset signals are set
correctly prior to running power optimizer and vectorless power estimation.
• Slice registers and SRLs: A number of different reasons could explain why
power_opt_design might not be able to generate clock enables for slice registers or SRLs
in the design. Some examples are:
• Block RAMs: Block RAM rich designs are excellent candidates for power savings. Vivado uses
a variety of optimization techniques to generate enables and save power. If block RAM gating
coverage is low after using power_opt_design, some of the possible reasons could be:
• BRAMs are mainly FIFO18/FIFO36 cells. These cannot be optimized by the tool.
• Memories inferred or instantiated are mainly in true dual port (TDP) mode using
asynchronous clocks on their A and B ports that cannot be optimized by
power_opt_design.
• Use of asynchronous reset signals to either the block RAMs themselves or to the address/
write-enable flip-flops feeding the block RAMs.
• Design Activity: Adjust the activity of nets or cells in the design. Change one item or change
multiple items at a time. You can also change:
• I/Os: Adjust both static and dynamic activity probabilities. You can also adjust parameters
for the external components connected to the device outputs, such as the load
capacitance or the near-end board termination details.
• Signals: Adjust the dynamic activity rate for data signals. For control signals you can also
adjust the static probability to evaluate power under different Clock Enable, Set, or Reset
scenarios.
• Specific blocks: In addition to the dynamic activity probability you can also adjust the
activity of control ports such as port enables or write enables on block RAMs.
• Resource usage: Explore reducing the resource count. Try remapping pieces of logic from slice
logic to dedicated blocks such as block RAM or DSP, and vice versa.
• Resource configuration: Explore using different configuration settings for the design I/Os,
block RAMs, clock generators, and other resources.
Appendix A
Xilinx Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Xilinx
Support.
Xilinx Design Hubs provide links to documentation organized by design tasks and other topics,
which you can use to learn key concepts and address frequently asked questions. To access the
Design Hubs:
Note: For more information on DocNav, see the Documentation Navigator page on the Xilinx website.
References
These documents provide supplemental material useful with this guide:
1. Vivado Design Suite User Guide: Release Notes, Installation, and Licensing (UG973)
2. Vivado Design Suite Tcl Command Reference Guide (UG835)
3. Vivado Design Suite User Guide: Using Constraints (UG903)
4. Xilinx Power Estimator User Guide (UG440)
5. Vivado Design Suite Tutorial: Power Analysis and Optimization (UG997)
6. Vivado Design Suite User Guide: Logic Simulation (UG900)
7. 7 Series FPGAs Packaging and Pinout Product Specification (UG475)
8. UltraScale and UltraScale+ FPGAs Packaging and Pinouts Product Specification (UG575)
9. 7 Series FPGAs and Zynq-7000 SoC XADC Dual 12-Bit 1 MSPS Analog-to-Digital Converter User
Guide (UG480)
10. Driving the Xilinx Analog-to-Digital Converter (XAPP795)
11. Vivado Design Suite Documentation
Training Resources
Xilinx® provides a variety of training courses and QuickTake videos to help you learn more about
the concepts presented in this document. Use these links to explore related training resources:
had been advised of the possibility of the same. Xilinx assumes no obligation to correct any
errors contained in the Materials or to notify you of updates to the Materials or to product
specifications. You may not reproduce, modify, distribute, or publicly display the Materials
without prior written consent. Certain products are subject to the terms and conditions of
Xilinx's limited warranty, please refer to Xilinx's Terms of Sale which can be viewed at https://
www.xilinx.com/legal.htm#tos; IP cores may be subject to warranty and support terms contained
in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or
for use in any application requiring fail-safe performance; you assume sole risk and liability for
use of Xilinx products in such critical applications, please refer to Xilinx's Terms of Sale which can
be viewed at https://www.xilinx.com/legal.htm#tos.
Copyright
© Copyright 2012-2020 Xilinx, Inc. Xilinx, the Xilinx logo, Alveo, Artix, Kintex, Spartan, Versal,
Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the
United States and other countries. All other trademarks are the property of their respective
owners.