0% found this document useful (0 votes)
13 views8 pages

Core 2 Duo

Uploaded by

inductionvctm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Core 2 Duo

Uploaded by

inductionvctm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Introduction to Intel® CoreTM Duo Processor Architecture

INTRODUCTION thread performance is rather costly in terms of power


The Intel® Core™ Duo processor is a new member of and may achieve diminishing returns in terms of
the Intel® mobile processor product line. It is the first efficiency, if major microarchitecture enhancements are
Intel® mobile microarchitecture that uses CMP (Core not made. The big potential for improved performance
Multi-Processor; i.e., multi cores on die) technology. is through exploring parallelism between threads.
Targeted to the market of general-purpose mobile However, the CMP architecture presents many
systems, the Intel Core Duo core was built to achieve challenges for power and thermal control to still fit into
high performance, while consuming low power and the mobility constraints.
fitting into different thermal envelopes.
In order to achieve the required performance, a CMP-
based microarchitecture was designed to achieve power-
efficient architecture, each performance improvement
was evaluated against the power cost, and only the
power-efficient performance features were implemented.
On top of that, special hardware mechanisms were added
to better control the static and the dynamic power
consumption. As a result, the Intel Core Duo processor
provides higher performance in the same form factors
without needing to increase the cooling capability.

The Intel Core Duo processor is a new member of the


Intel mobile processor product line. It is the first Intel
mobile microarchitecture that uses CMP (multi cores on Figure 1: Products using different thermal
die) technology. Building a general-purpose mobile core envelopes*
is a challenging task since, on the one hand, the system
needs to maintain the highest level of performance, In this paper we present the new Intel Core Duo
while on the other hand, the system must fit into microarchitecture and show how the need to target
different thermal envelopes, as illustrated by Figure 1, power-efficient general-purpose processors has affected
and improve power efficiency. many of our decisions. We provide a general overview
of the different ingredients of the Intel Core Duo system,
Intel Core Duo is based on Pentium M processor while the other papers in this issue of the Intel
755/745 core microarchitecture with few performance Technology Journal focus on more specific aspects of
improvements at the level of each single core. The major the system such as the CMP microarchitecture and the
performance boost is achieved from the integration of power and thermal control methods.
dual cores on the die (CMP architecture). This agrees
with our assessment that continuing to improve single
The main focus of the core enhancements was to do the
following:
 Support virtualization (Virtualization Technology2)
[3].
 Support the new Streaming SIMD Extension (SSE3)
[4].
 Address performance inefficiencies mainly in the
handling of SSE/SSE2, FP (x87) and some long
latency integer instructions.

Intel Core Duo Processor-based


Technology Core Performance
Improvements
Intel Core Duo processor-based technology introduces
Figure 2: Intel Core Duo processor floor plan
performance improvements in the following areas:
As Figure 2 shows, Intel Core Duo technology is based
 Streaming SIMD Extensions (SSE/2/3)
on two enhanced Pentium M cores that were integrated
and use a shared L2 cache. The way we integrated the  Floating Point (x87)
dual core in the system had a major impact on our design
 Integer
and implementation process. In order to meet the
performance and power targets we aimed to do the The main difficulty with SSE implementation in
following: Pentium M is caused by the fact that SSE/2/3 is a 128-bit
wide microarchitecture while the Pentium M execution
 Keep the performance similar to or better than that
core is 64-bits wide (in order to meet power and energy
of single thread performance processors in the
constraints). Making the machine twice as wide may
previous generation of the Pentium M family (that
produce more heat and so will have a significant impact
use the same-size L2 cache).
on the Thermal Design Point (TDP) of the system as
 Significantly improve the performance for well as some impact on battery life. Since the Pentium
multithreaded and multi-processes software M was primarily designed for mobility we preferred to
environments. make it relatively narrow and cope with the SSE
performance issues. The by-product of this tradeoff is
 Keep the average power consumption of the dual
that each SSE vector operation is “broken” into 64-bit
core the same as previous generations of mobile
wide micro-operation (uOp) pairs. Such instructions
processors (that use a single core).
suffer from several performance bottlenecks in the
 Ensure that this processor fits in all the different Pentium M pipeline, mainly in the Front End (FE) of the
thermal envelopes the processor is targeted to. pipeline. For example, the Instruction Decoder in the
Pentium M processor can potentially handle three
In this paper we provide a high-level description of the
instructions per cycle but only the first decoder in a row
main Intel Core Duo features and discuss how each
is capable of handling complex instructions. The other
feature fits into the targets of the various projects.
two decoders are limited to single uOp instructions only.
This works fine in most cases since the most frequent
THE IMPROVED PENTIUM instructions are single uOp. However, this is not the case
M PROCESSOR-BASED with SSE instructions: only scalar SSE operations are
CORES single uOps while the vector operations are typically 2-4
uOps. This results in several potential bottlenecks in the
The core of the Intel Core Duo processor-based
technology is an enhanced Pentium M processor
755/7451 core converted to 65nm process technology.
2
Intel® Virtualization Technology requires a computer system
with a processor, chipset, BIOS, virtual machine monitor
1
Intel processor numbers are not a measure of (VMM) and applications enabled for virtualization technology.
performance. Processor numbers differentiate features within Functionality, performance or other virtualization technology
each processor family, not across different processor benefits will vary depending on hardware and software
families. configurations. Virtualization technology-enabled BIOS and
VMM applications are currently in development.
FE: the Instruction Decoder in the Pentium M can only H/W prefetcher. This mechanism identifies streaming
handle one SSE vector operation per cycle, causing loads at a very early stage in the machine and
starvation in the rest of the machine. This bottleneck was speculatively predicts the future incarnation of these
addressed in the Intel Core Duo core: a new mechanism loads. These speculative requests are looked up in the
was introduced that allows lamination of pairs of similar shared L2 cache and if miss, they’re speculatively
uOps. This mechanism along with enhanced uOp fusion prefetched from the external memory. This mechanism
allows handling of the SSE/2/3 vector operation by a is dynamically deactivated whenever there are many
single laminated uOp. The instruction decoders were demand requests pending (a watermark mechanism).
modified to handle three such instructions per cycle, The benefit of this change is an average reduction in
increasing significantly the decode bandwidth of SSE load latency.
vector operations. The laminated uOps streaming down
the pipe are at a certain point un-laminated, reproducing The performance implication of these enhancements on
again the 64-bit wide uOp pairs to feed the machine. single-threaded (ST) applications as well as on
These changes not only improve performance of vector multithreaded (MT) applications are discussed in [1]
operations but also save some energy since the FE, no
more a bottleneck, can be clock gated whenever its uOp CMP–GENERAL STRUCTURE
buffer is filled beyond a certain watermark. Intel Core Duo processor-based technology implements
Another bottleneck that was discovered was the handling shared cache-based CMP microarchitecture in order to
of the floating point (FP) Control Word (CW). The FP maximize the performance of both ST and MT
CW is part of the x87 state and was usually viewed as applications (assuming the same L2 cache size). Figure 3
“constant”; namely it is loaded once at the beginning and describes the general structure of our implementation.
stays constant throughout the program. This is indeed The figure shows the following:
the way the FP CW is used by most of the programs.  Each core is assumed to have an independent APIC
However there are some FP applications that manipulate unit to be presented to the OS as a “separate logical
the “rounding control” which is located in this register: processor.”
the default rounding mode is “rounding to nearest even”
but before converting results to fixed point, some  From an external point of view the system behaves
applications change the round control to “chop” (this is like a Dual Processor (DP) system.
the rule with C programs for example). Such behavior  From the software point of view, it is fully
was treated rather inefficiently by the Pentium M core: compatible with Intel Pentium 4 processors with
each manipulation of the FP CW was effectively stalling Hyper-Threading (HT) Technology3 [6], and DP-
the pipeline until its completion. The Intel Core Duo based systems. However, special optimizations
core introduced a new renaming mechanism for the FP could be applied to improve the performance of the
CW so that four different versions of this register can share-based cache organization.
coexist on the fly without stalling the machine.
 Each core has an independent thermal control unit
Intel Core Duo also improved the latency of some long (discussed later in this paper and also covered in
latency integer operations such as Integer Divide (IDIV). [2]).
Although these instructions are not very frequent,
because of their extremely long latencies, their  The system combines per-core power state together
accumulative affect on integer benchmark scores have with package-level power state.
shown to be very significant. The basic Divide algorithm The paper CMP Implementation in Intel Core Duo
has remained unchanged; however, Intel Core Duo Systems [1] extends the discussion on the CMP
Divide logic exploits opportunities for “early exit.” The implementation and compares its performance with
Divide logic calculates in advance the number of other configurations such as the use of split cache
iterations that are required to accomplish the operation. architecture. The results shown there indicate that the
This is indeed data dependent; however, it is often new proposed
significantly smaller relative to the maximal number of
iterations. Once the required number of iterations is
accomplished the divider wraps up the results. This does 3
Hyper-Threading Technology requires a computer system
not impact the maximal Integer Divide latency; however, with an Intel® Pentium® 4 processor supporting HT
on average it is much faster. Technology and a HT Technology enabled chipset, BIOS and
operating system. Performance will vary depending on the
Another enhancement that benefits different kinds of specific hardware and software you use.
applications is the introduction of a new mechanism of
microarchitecture maximizes the performance benefits of the Intel Core Duo processor. As can be seen, the
of both ST and MT execution at a given cache size. The average power consumption was reduced by handling
enhancements we implemented in each of the cores the problem at all different levels of the design, starting
allow us to improve both the ST performance (in with adjusting the process technology through all the
specific cases) as well as the MT execution. It also design stages of production.
allows us to improve the power and the thermal control
of the system, and to achieve similar average power
consumption, as was the case in the single-core Pentium
M processor.

Figure 4: Low-power processor–design process


In order to save leakage power, the Intel Core Duo
system uses mainly two techniques: enhanced sleep
states control and Dynamic Intel ® Smart cache sizing. In
order to control the active power consumption, Intel
Core Duo technology uses a technique based on Intel
SpeedStep® technology .
The traditional way to control the power and the thermal
of the system is via a software/hardware interface. One
of the most common schemes to achieve this is called
ACPI [5], where the system defines different levels of
sleep modes, and each of the states represents a more
efficient way to save power, at the expense of a longer
time to bring the system back into operational mode.
(For more details on this method, please see [2]). The
challenge of adding a second core on die while
improving the overall power-consumption demands an
improvement to the power states of the system in order
to avoid power being wasted whenever a core is not
Figure 3: The general structure of the Intel Core Duo
active. We face two main problems: (1) since only a
implementation
single power plane is used, it forces us to run all cores
with the same voltage and frequency, and (2) the chipset
POWER CONTROL and the OS see both cores as a single entity that has the
Extending the battery life, while improving the same state at the same time. Thus, the Intel Core Duo
performance, was one of the main goals in designing the processor presents two separate views on the power state
Intel Core Duo processor. Battery life is affected by of the system; internally we manage the states of each
dynamic power, caused when the processor is active, and core independently (we call it per-core power state) and
by static power, which is the power wasted when a unit externally we view the system as having a single,
or the entire processor is not active. Intel Core Duo synchronized power state. Figure 5 provides an overview
microarchitecture saves both types of power. of this approach.
Figure 4 describes the general process we followed in
order to reduce the power during the development cycle
mode. The new mechanism keeps only the minimum
cache memory size needed active, and it uses special
circuit techniques to keep the rest of the cache memory
in a state that consumes only a minimal amount of
leakage power.
In order to control the active power consumption, Intel
Core Duo technology uses Intel SpeedStep technology.
When a set of working points is defined, each one has a
different frequency and voltage and so different power
consumption. The system can define at what working
point it works in order to strike a balance between the
performance needs and the dynamic power consumption.
This is usually done via the OS, using the ACPIs.

– CPU/package sleep states:


– C0 – Active CPU is on
– C1 – Auto Halt Core clock is off
– C2 – Stop clock Core and bus clock are off
– C3 – Deep sleep Clock generator is OFF
– C4 – Deeper sleep Reduced VCC
– DC4 –Deeper C4 Further reduced VCC

Figure 5: Power states of the Intel Core Duo


processor
As we can see the Intel Core Duo processor defines five Figure 6: Changing working point in Intel Core Duo
different sleep states of the system. The first three states processor
allow local power-saving measures to be activated
individually per core, while the last two states require a The way the system moves from one working point to
coordination of the entire package for the power-saving another is described in Figure 6. As illustrated, in order
measures to be activated. to move from a “high” working point to a lower one, the
system can switch the frequency almost immediately,
A core which is in C0, power state is assumed to be in but it will take the system some time to lower the
running mode. When the core has nothing to do, the OS voltage. When moving from a low working point to a
issues a halt command that moves it to CC1, where higher one, we need to increase the voltage first (slow
execution is halted and clocks are stopped. When it operation) and only then can we increase the frequency.
detects even lower levels of activity (via the ACPI
mechanisms [2]), the OS will further promote the idle By extending the hardware mechanisms to better support
state of each of the cores beyond CC1 to CC2, CC3, or advanced power states and sleep states the Intel Core
CC4 states, based on the core activity history. In the Duo processor achieves improved power performance
CC2 and CC3 states, additional core-level power-saving efficiency. The power-efficiency improvement over
measures can be activated, achieving a lower average processor generations is shown in Figure 7. As a result,
power consumption. Starting from C4 state, core voltage the Intel Core Duo processor provides higher
reduction is applied to further increase average power performance in the same form factor without needing to
savings. Since the cores are connected to the same power increase the cooling capability.
plane, this must be done in coordination between the two
cores, and this is known as package-level C4 and
package-level DC4.
While being in a sleep state, the system still consumes
static power (leakage). In Intel Core Duo technology, we
implement an advanced algorithm that tries to anticipate
the effective cache memory footprint that the system
needs when moving from a deep sleep state to an active
Power Performance Efficiency
200
Power Performance
Perf/W Efficiency
Performance optimized
200 180Perf/W Power optimized
Perf/W Performance optimized
180 Perf/W Power optimized
160
160

140
140

120
120
100
100
80

60 80

40
60
20

0 40

Figure 7: Power performance


20
efficiency
0

THERMAL DESIGN POINT


Pentium-MPentium-M 700Core Duo

Figure 8: Analog vs. digital sensors in Intel Core Duo


Thermal management is another fundamental capability
processors
of all mobile platforms. Managing the platform thermals
enables us to maximize CPU and platform performance As we can see the use of multiple sensing points
within thermal constraints. Thermal management also provides high accuracy and close proximity to the hot
improves ergonomics with a cooler system and lower fan spot at any time. An analog thermal diode is still
acoustic noise. available on the Intel Core Duo processor. The use of a
digital thermometer allows tighter thermal control
In order to better control the thermal conditions of the
functions, allowing higher performance in the same form
system, the Intel Core Duo processor presents two new
factor. The improved capability also allows us to achieve
concepts: the use of digital sensors for high accuracy die
better ergonomic systems that do not get too hot, can
temperature measurements and dual-core multiple-level
operate more quietly, and are more reliable. Unlike
thermal control.
diode-based thermal management algorithms that require
In the previous Pentium M processor, a single analog some temperature guard band (or activating the self
thermal diode was used to measure die temperature. throttle mechanism as a safety-net), the digital
Thermal diode cannot be located at the hottest spot of thermometer is tested and calibrated against
the die and therefore some offset was applied to keep the specifications. Full functionality and reliability of the
CPU within specifications. For these systems it was processor are guaranteed, as long as the reported
sufficient, since the die had a single hot spot. In the Intel temperature is equal to or below the maximum specified
Core Duo processor, there are several hot spots that temperature. Any inaccuracy or offset are programmed
change position as a function of the combined workload into the device and already accounted for.
of both cores. Figure 8 shows the differences between
The thermal measurement function provides interfaces to
the use of the traditional analog sensor and the use of the
power-management software such as the industry-
new digital sensors.
standard ACPI. Each core can be defined as an
independent thermal zone, or a single thermal zone for
the entire chip. The maximum temperature for each
thermal zone is reported separately via dedicated
registers that can be polled by the software.
In addition to the polling capability, the digital
thermometer implements event-based reporting. Control
software programs temperature thresholds that require
actions. Such actions can be fan activation or passive
control policy such as dynamic voltage and frequency
scaling. Upon temperature crossing of the threshold, an
APIC-defined interrupt is generated and it initiates the
requested action.
Intel Core Duo technology implemented a dual-core deliver high current at quick respond times. Intel Core
power monitor capability. Power monitor functionality is Duo processors implemented a feedback mechanism to
provided in order to prevent thermal exceptions, and it the VR. The CPU tracks its activity at any time. If
can throttle the CPU once the temperature exceeds utilization goes down, the CPU communicates a signal
specifications. The overview of the power monitoring to the VR, allowing it to switch to a lower power
logic is described in Figure 9. consumption. A lower power state can be either a
reduced number of phases or asynchronous operation.
Temperature
The communication is done using the voltage ID lines
T
T and PSI signal as described in Figure 10.
P/C state request
Control P
P #1
Policy definition
Core PSI-2
PSI-2/ VID
/
Power and thermal management
Policy
External controls definition
Power and
Control
thermal
Core #2
P/C state request P Core
#2
TP
Temperature VR

Figure 9: Thermal control overview


The power monitor continuously tracks the die
temperature. If the temperature reaches the maximum Figure 10: Voltage regulator interface
allowed value, a throttle mechanism is initiated. A multi- The CPU has internal knowledge of the activity demand
level tracking algorithm is implemented. Throttling and it communicates a request to go to higher power
starts with the more efficient dynamic voltage scaling early enough for the VR to get ready for the increased
policy and if not sufficient, the power monitor algorithm demand.
continues lowering the frequency. If an extreme cooling
malfunction occurs, an Out of Spec notification will be Another power optimization is load line control. At low
initiated, requesting controlled shutdown. Lastly, the CPU activities, the voltage drop on the load line is
CPU can initiate a thermal shutdown and turn off the smaller resulting in higher voltage and power to the
system. CPU. At low workloads, the CPU reduces the voltage
request, and early enough, before power consumption
Power and thermal management activities in notebook increases, a voltage increase request is sent to the VR.
computers are usually performed by the OS and platform
control functions. These thermal management features Using utilization knowledge, available in the CPU, Intel
are designed to best serve user preferences under Core Duo technology made it possible to reduce
notebook constraint conditions. Thermal monitor platform power, increase battery life, and improve form
function is not expected to be activated under these factor ergonomics.
normal operation conditions. The thermal monitor
mechanism ensures that the CPU will never exceed the INTEL® CORE™ SOLO PROCESSOR
CPU-specified parameters and guarantees functionality In order to fit into very limited thermal constraints and
and reliability at any time. power consumption, the Intel Core Duo processor has a
The use of high accuracy temperature reading together derivative that contains a single core only. This can be
with thermal monitoring protection enables high achieved by either disabling one of the cores either at the
performance in thermally limited form factors, while OS level or as a BIOS option, or at the architecture level,
allowing improved ergonomics and high reliability. where one core is disconnected from the power grid.
The first option is a user or OS decision. If you run a
PLATFORM POWER MANAGEMENT single-core OS on an Intel Core Duo system, it will keep
Intel Core Duo processor technology closely interacts the second core idle, at CC4 sleep state. Please note that
with other components on the platform. One such due to the way the BIOS is set, each time an interrupt is
component is the Voltage Regulator (VR). VR power received or a broadcast IPI is sent, this core may need to
losses at low CPU utilization may get as high as the wake up and go immediately back to a sleep state,
CPU power. The losses of the VR are due to the consuming small amounts of dynamic power.
need to
The user can disable the second core via a BIOS option
as well. In this case, the system does not recognize the
other core and so it is kept in CC4 state all the time,
consuming no dynamic power at all.
The disadvantage of the two methods described above is
that the core still consumes static power. In order to
avoid this and reduce the power consumption of the core
even further, Intel introduces the single-core version of
Intel Core Duo technology, called Intel ® Core™ Solo
processor, which disconnects the non-active core from
the power grid, or saves the area and does not fabricate
this part at all.

CONCLUSION
The Intel Core Duo processor is the first Intel processor
that implements dual core on die. The processor
addresses new challenges for providing the best
performance under power and thermal constraints.
This paper described the main architectural features of
the new processor focusing on the different
performance, power, and thermal control features of the
processor and of the system.
By applying punctual control between the performance,
power and thermal features implemented in the Intel
Core Duo system, we achieved a significant
improvement in performance, at the same power
consumption, and with improved thermal control
mechanisms.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy