0% found this document useful (0 votes)

61 views136 pages

Appunti Digital Eletctronics

This document contains notes from a Digital Electronics course taught at Politecnico di Torino. The notes cover several topics, including programmable logic devices, processor architecture, peripherals, and memories. Specific topics discussed include PROMs, PLAs, PALs, FPGAs, processor components and operations, interrupts, timers, SRAM cells and organization, DRAM operation and timing, cache organization methods, and ROM types. The notes were taken by student Nicola Antonio Travaglini during a class taught by Professor Maurizio Martina.

Uploaded by

Antonio Moles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views136 pages

Appunti Digital Eletctronics

Uploaded by

Antonio Moles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 136

Politecnico di Torino

Notes of

Digital Electronics

Professor: Student:
Maurizio Martina Nicola Antonio Travaglini 235881

August 28, 2018

2
Contents

1 Programmable Logic Devices 7

1.1 Basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Programmable Logic Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 PROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 PAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 Complex Programmable Logic Devices . . . . . . . . . . . . . . . . . . . 11
1.2.5 FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Processor architecture 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Basic architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Working principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Memory access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Architectural Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.5 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.6 Two memory architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.7 Performance improvement methods . . . . . . . . . . . . . . . . . . . . . 29

3 Peripherals 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 I/O extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Memory-mapped and standard I/O . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Microprocessor interfacing: interrupts . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Management of processor registers . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Maskable/non-maskable interrupts . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Peripheral data managing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8 Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Memories 37
4.1 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 General organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Static-RAM (SRAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 SRAM-cell analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 6T: structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Dual Port SRAM cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5 SRAM timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3
4.5.1 Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6 Synchronous SRAM (SSRAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.1 Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.7 DRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7.1 Refresh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.7.2 Accessing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.7.3 Timing: read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.7.4 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.7.5 Refresh handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7.6 Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7.7 SDRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.8 CACHE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.8.1 Cache organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.8.2 Direct mapping cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.8.3 Fully Associative Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.8.4 N-way Set Associative Cache . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.8.5 What to do if a miss happens? . . . . . . . . . . . . . . . . . . . . . . . . 80
4.8.6 Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.8.7 Write-back strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.8.8 Write miss event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.9 Non volatile memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.9.1 Read Only Memory (ROM) . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.9.2 MOS-based ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.9.3 The MOS threshold voltage . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.9.4 Floting Gate Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.10 Flash memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.10.1 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.10.2 Cell sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.10.3 NOR-flash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.10.4 NAND-flash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.10.5 NOR Vs NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.10.6 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.10.7 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.10.8 Wear levelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5 Interfacing 103
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.1.1 L-H transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.1.2 H-L transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.1.3 Minimum period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2 Lumped model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 Transmission lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Multiple reflection lattice case . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.2 Loading the line with a capacitor . . . . . . . . . . . . . . . . . . . . . . 114
5.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4.1 Incident Wave Switching (IWS) . . . . . . . . . . . . . . . . . . . . . . . 116
5.4.2 Reflected Wave Switching (RWS) . . . . . . . . . . . . . . . . . . . . . . 116

4
6 Serial Communications 119
6.1 Serial and parallel transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.1.1 Parallel Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.1.2 Serial link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.2 Communication glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2.1 Basic serial connection system . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3 Asynchronous and synchronous links . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.1 Link glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.3 Serial asynchronous protocol . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.3.4 Serial Synchronous Protocols . . . . . . . . . . . . . . . . . . . . . . . . 127

5
6
Chapter 1

Programmable Logic Devices

1.1 Basic idea

The basic idea of programmable devices cames from the basic circuit in 1.1 which is a nor
gate. If we put an high voltage on A, the first transistor is ON and there is a path from O to
GND. The same happens with a high voltage on B. With a low voltage on both A and B, both
transistors are turned OFF, so the voltage on the output is given by the pull-up resistor, but
since there is no current flow in O, there is no current flow also in Rpu, therefore the voltage
drop on O is Vdd.

Figure 1.1: Basic idea: NOR circuit

Thanks to De Morgan’s Law, we can obtain an AND gate by simply adding an inverter at
the input of A and B, or an OR gate by inserting an inverter at the output O.

Figure 1.2: AND circuit on the left, OR circuit on the right

7
These examples show seen that it is easy to obtain a NOR gate and, from it, an AND or
OR gate. Is it possible to make the circuit more flexible so that the function performed is
programmable? The answer is yes: if we replace each transistor in our circuit with a couple of
transistors and switch the structure is programmable.

Figure 1.3: Programmable approach

So, the problem now moves to how to implement this switch.

The most used solutions belong to two families:

• permanent: the solution is programmable only once (fuse, antifuse,..)

• programmable: exploits some kind of memory (volatile, non volatile)

1.2 Programmable Logic Organization

With this approach we can obtain something which is a flexible and programmable platform
to develop logic functions resorting to two programmable gates. Usually, this kind of devices
is made by two arrays: OR gates array and AND gates array. With the inputs of the AND
gates array, we can build product terms and through the OR gates we can obtain any combina-
tions. If we want to make this structure programmable, we need switches (to insert or remove
connections in any portion of the arrays).

Figure 1.4: Programmable Array Block Diagram

Depending on which of the AND/OR logic arrays is programmable, we have three basic
organizations:

8
1.2.1 PROM

There is a fixed combination of AND gates, while the OR ones are programmable. If we increase
the number of inputs, the number of AND gates will be quite high to cover all the possible
combinations.

1.2.2 PLA
It the most flexible organization: both AND and OR gates are programmable.Here, the best
way to proceed is to reuse product terms. Let us consider now an example:

9
1.2.3 PAL

The high flexibility of PLAs is paid in terms of area and, as a consequence, integrationability
and higher cost. For these reasons, in the past, some solutions involving PALs were exploited
for programmable arrays (originally PROM were used just for memories). With PALs, the
AND array is programmable while the OR array is fixed at fabrication: a given column of the
OR array has access to only a subset of the possible product terms.

PALs are more restricted than PLAs (we trade the number of OR terms vs the number of
outputs. Many device variations are needed, but each device is cheaper than a PLA.
We don’t use all of the combinations of
BCD, just from 0000 to 1001. All the other
possibilities (1010 to 1111) are unused, so
from the logic function point of view, we
can use these configurations as we prefer,
as an example with the goal of minimizing
the complexity of the logic. Gray code is a
way of coding sequences of bits such that
two neighbour sequences differ of just one
bit.
10
In this example, we can see that with PAL
we can perform all the logic functions we
want but we need to perform some logic
minimization because the OR plane is fixed
(in this case we added some zeros). With
the PLA we would have a greater flexibil-
ity, here we don’t have this, but we can
save area.

The same result can be obtained with dis-

crete gate implementation, but it would re-
quire at least 4 different short scaling inte-
grated packages.

PALs are a nice way of building combinational functions. In practice, it is useful to imple-
ment also sequential circuits. PALs and PLAs are not able alone to provide sequential logics.
Using PALs with FFs we can obtain even programmable sequential circuits. With a FF at the
output of each OR gate and with a multiplexer, we can choose if we want a pure combinational
function or a sequential function. The output of the FF is feedback. This structure is used in
CPLDs.

Figure 1.5: PAL with FFs

1.2.4 Complex Programmable Logic Devices

Complex PLDs are the natural derivation of PALs and PLAs into modern programmable de-
vices. They include several PAL-like blocks (the number depends on the size of the chip, so on
the cost of the CPLD) connected through some interconnection wires and some I/O blocks for
being able to connect the programmable logic to the external world.

11
The 1.6 shows the architecture of the Xilinx Coolrunner. This architecture includes some
PLAs and the core of the device (the ability to create programmable logic) is handled by
function blocks, made by 16 macrocells. Each macrocell is almost like a PLA.

Figure 1.6: Xilinx Coolrunner architecture

The picture on the left shows the Xilinx

Coolrunner Macrocell circuit. The input
goes to a programmable AND plane, whose
outputs (products) go to an OR gate, then
an XOR gate allows to connect other sig-
nals. In coolrunner architecture, there is
a programmable FF (yellow) which can be
D or T-type, single or dual edge or even
a latch. The last mux allows choosing be-
tween combinational and sequential logic.
The feedback is present (it goes to the ad-
vanced interconnection matrix). This ar-
chitecture is not handled directly by a de-
signer, but by an automatic tool.

1.2.5 FPGA
FPGAs have much more logic than CPLDs. FPGAs can be RAM-based or Flash-based:
• RAM FPGAs must be programmed at power-on:

- external memory needed for programming data;

- may be dynamically reconfigured.

12
• Flash FPGAs: store program data in non-volatile memory:

- reprogramming is more difficult;

- holds configuration when power is off.

The basic idea of FPGA is different from PLAs and PALs since it exploits multiplexer-based
approach whose output is a function of the selector. This can be made programmable by
changing the value of A and B (the inputs). Acting on the value of the selector and of the
inputs, we can obtain any logic function involving these parameters. By using more muxs, we
can build more complex structures.

Figure 1.7: LUT based programmable logic

This is the basic architecture of an FPGA: there is a table where we can store some constant
used to produce a logic function as an output. This table is usually called LUT. The logic for
implementing combinational functions is made of LUTs (no more AND or OR planes) then
the usual FF is used for the sequential part (and the mux chooses between a combinatorial or
sequential path).

Figure 1.8: LUT with FF basic cell

13
The typical organization of an FPGA is a bidimensional array, where we can find:
• CLBs (Configurable Logic Blocks) containing combinational functions and FFs;

• CLB interconnections (local or global): CLBs have connections to local neighbours,

horizontal and vertical channels use for long distance and channel intersections have switch
matrix;

• IOBs: used to connect the outside world to the logic array, they usually have some
additional sequential element in the block.

Figure 1.9: FPGA structure

Cyclone V - SoC Cyclone V is a family of FPGA of INTEL ALTERA. The chip is rather
complex, it involves PLLs, a lot of elements for logic operations, memories, etc...

Main features are:

• Large number of logic elements arranged in Logic Array Blocks (LABs);
• Embedded memory configurable in different sizes;
• Variable precision DSP blocks;
• Sophisticated multi-level interconnection network;
• Programmable I/O;
• Multiple PLLs for clock generation and control.
The LAB is an array of logic modules (ALMs). An ALM is similar to the macrocell seen
previously.
One of the main limitations of CPLD architectures is the fact that the only way to implement
memories is through FFs: the risk is to spend all of them in memories and not being able
to use them in the sequential circuits. In FPGAs, this problem is solved because we have
programmable logic and memories. In the Cyclone V architecture, we have some more flexibility
since a portion of programmable logic can be configured to work as a memory -¿ Memory LAB
(MLAB).
Each ALM can be configured to implement logic functions, arithmetic functions, and register
functions and each LAB contains fast-local and direct-link interconnections and dedicated logic

14
for driving the control signals to its ALMs, it also has two unique clock sources and three clocks
enable signals.
The ALMs can be seen as an advanced version of the basic architecture seen before: instead
of a simple LUT, there is an adaptive LUT which can use up to 8 inputs, in order to implement
in an effective way arithmetical operations some Full Adders (with optimized connections for
carrying propagation) are involved, multiplexers and registers (for sequential logic).

Figure 1.10: Adaptive Logic Module (ALM)

ALM operating modes

The modes are usually handled automatically by tools.
• Normal mode
– two functions implemented in one ALM, or a single function of up to six inputs.
• Extended LUT mode
– One 7-input unregistered (combinational) function and one FF.
• Arithmetic mode
– two sets of two 4-input LUTs with two dedicated full adders.
• Shared arithmetic mode
– can implement a 3-input add in the ALM.

Cyclone V – memory
Two types of memory blocks:
– M10K, dedicated memory resources (10 kb each), it can be programmed with different
depths and widths. The 10kb in the name counts even the use of parity bits;
– Memory Logic Array Blocks (MLABs, 640 bit each) has a fixed depth (32 bits) and a
programmable width (from 1 bit up to 20 bits).

Figure 1.11: Cyclone V – memory

15
Usually, in integrated circuit technology, the choice of the memory mode depends on the
amount of complexity which can be handled. On FPGA, instead, we already have memories
and we can configure them.

Variable precision DSP block

• High-performance fully registered multiplication:

– 9-bit, 18-bit, and 27-bit;
• Two 18x19 complex multiplications;
• Built-in addition, subtraction, and dual 64-bit accumulation unit to combine multiplica-
tion results;
• Hard pre-adder supported in 19-bit, and 27-bit mode for symmetric filters;
The implementation of DSP functions is useful, especially in performing multiplications.
Side note: Multiplications are rather complex operations which can be implemented resorting
to multipliers, but, very often, due to their complexity, many platforms don’t have many
multipliers.
From the user point of view, the DSP blocks are handled by tools.

Figure 1.12: Cyclone V – memory

16
Simplified I/O

Figure 1.13: Simplified I/O structure

With OE disabled we can get an input from the external worlds, otherwise we can send an
output to the external world. The registers (OE and Output) can be used to control the
critical path, while the Input register can be used when we have no idea of the delay of the
input signal.

17
18
Chapter 2

Processor architecture

2.1 Introduction

A processor can be:

- Special-Purpose: used for some dedicated devices that are designed to do only some opera-
tions;

- General-Purpose: generic operations, can be used in many fields and it is more flexible.

We will focus on general-purpose processors. This kind of processor is designed for a variety of
computation tasks and are characterized by:

- low unit cost, in part because manufacturer spreads NRE over large numbers of units;

- carefully designed since higher NRE is acceptable (can yield good performance, size and
power);

- user appealing:

• Low NRE cost, short time-to-market/prototype, high flexibility;

• The user just writes software; no hardware design.

19
2.2 Basic architecture

Figure 2.1: Basic structure of a processor

The basic idea is to have a piece of hardware able to do two things: read instructions and
process binary data. This translates into having three elements:
• Datapath (DP): portion of hardware able to do computations, basically arithmetic and
logic operations;
• Control unit (CU): drives in the correct way the datapath (the CU does not store the
algorithm);
• Memory: stores instructions and data (The algorithm is programmed into the memory).
The working principle is based on two registers: the program counter (PC) stores the address
of the cell we want to access while the instruction register (IR) stores the instruction coming
from the memory and makes it available to the processor.

2.2.1 Working principles

The basic idea can be refined through this flow:
• Program (instructions) and Data are stored in Memory.
• Program Counter (PC) indicates current location of program in Memory and is automat-
ically incremented after each instruction.
• Instruction Register (IR) contains the current instruction coming from the memory.
• Each instruction is read from memory, interpreted, and executed:
- Arithmetic Logic Unit (ALU) performs operations on data;
- Data is transferred (register, memory, I/O).
• Each instruction can take several clock cycles.

20
2.2.2 Memory access

Different computer architectures manage memory access in different ways. We can consider
two families of processors: one is able to direct access to data and modify the content of
memory (higher performance, higher complexity), the other is not able to do these operations
with just one instruction. The latter is the case of several modern processors, especially for
embedded applications, which rely on the load/store approach: the processor takes the value
from the memory (LOAD), modifies it and then writes back the value in the memory (STORE).
Commonly RISC processors are based on LOAD/STORE architecture.

2.2.3 Instructions

Instructions can be roughly classified as:

• Memory access instructions (LOAD, STORE);

• Arithmetic/logic instructions (ADD, SUB, AND, OR, . . . );

• Control flow instructions (JUMP, BRANCH, . . . ).

Let us now consider an example that shows that every time a LOAD/STORE processor has
to make some computation and to update the content of the memory, three steps are needed:
Load the content of a memory cell, perform some ALU operations, Store the value back in the
memory.

In the LOAD/STORE machine, if we want

to modify the content of a location we first
have to read the data from the memory and
load the content inside the processor. A
good place to store it is one of the registers
in the DP.

21
Now we have the value of the content inside
our processor and we can modify it.

With ALU operations we can modify this

value (e.g we add 1) and store the result in
another register.

The final instruction is STORE: the new

value is written inside the memory (in this
example we are not overwriting the old
value).

Instruction execution

Usually, each instruction is divided into more steps and each step may require one or more
clock cycle, depending on its complexity.

22
Instruction format
There are many types of instructions, so there is a different format depending on the purpose.

• R format This is the register format. It is used for arithmetic and logic operations
(ADD, SUB, AND, OR).

- opcode: basic operation of instruction;

- rs: first register source operand;
- rt: second register source operand;
- rd: register destination operand (where to store the result);
- shamt: shift amount, shifts the result of a certain quantity;
- funct: select the specific variant of the operation in the opcode field (function code).

• I format Used for instruction where one operand is an immediate value (ALU, conditional
branches). This is the case of a sum of a value for a fixed quantity or an immediate address
displacement with respect to the first register.

- the opcode and rs are the same as for R format;

ALU instructions, load:
- register destination operand;
- immediate: second source operand;
Conditional branches, store:
- rt: second register source operand;
- immediate: address displacement.

23
• J format It is only used for unconditional branches.

- the opcode is the same as for R format;

- the address is an immediate which indicates the target address of the jump.

Let us now consider as an example that the following set of instruction: LOAD a value from
the memory, ADD 1 to it, STORE the result in the memory.
Let us assume that our first instruction (which is a LOAD) is stored at address 100.
Therefore, we want to load the content of the memory at location 500 (whose content is 10)
and store it in the register R0.

Figure 2.2: LOAD instruction

The first step is the instruction FETCH: the PC must contain the value 100 (address of the
first instruction we want to fetch)

Figure 2.3: Fetch phase

The LOAD instruction is now inside the IR. Inside the Controller, there is a decoding logic
which, knowing the format of the instruction, understands that there is a LOAD instruction
and starts to drive all the signals to the DP to execute the instruction.

24
Figure 2.4: Decode phase

There are no ALU operations in this case.

At this point we have to access the memory, so, during the memory phase, we take the value
at location 500 and, in the next phase, we write it back in the register R0.

Figure 2.5: Memory access phase Figure 2.6: Write back phase

Let us have a look at the timing. We need one clock cycle for the EXECUTION even if we
are doing nothing.

25
Figure 2.7: LOAD instruction timing

Then, the following operation is an ADD. Here we still have a clock cycle for the Memory,
even if we don’t access it.

Figure 2.8: ADD instruction timing

Finally, we STORE the result in the memory.

Figure 2.9: STORE instruction timing

This example shows how even simple operations (in this case: take a value from the memory,
increment it by 1 and store it at a different location) can be quite complex and from a timing
point of view, we need to fetch, decode, execute, access the memory and write back. Even
performing very simple tasks has a kind of overhead due to the fact that we have to handle
several steps and basic operations and, from the hardware point of view, probably the solution

26
seen is not the best one (we make the same steps in all the instructions even if some steps are
not needed) but since it has a rigid structure (we know in each step what is going to happen)
it is a way to make things easier (even if not optimized).

Question from the audience: What happens when there is a JUMP?

Let us consider that location 103 contains a JUMP. We have to repeat almost the same
steps: after the fetch, we decode the instruction and we understand that it is a jump, the
execution, memory and write back phases do almost nothing. During the decoding operation,
we have to take one portion of the instruction itself and carry it to the logic to update the
program counter (so, in this case, the PC is not a sort of simple counter)

Figure 2.10: Simple PC implementation

2.2.4 Architectural Considerations

If we have an N-bit processor it means that the ALU, buses, registers are represented on N
bits. The size of the PC gives the address space.

The clock frequency is related to the combinational delay inside the processor. The clock
period must be longer than the longest register to register delay in the entire processor.
Therefore the maximum clock frequency depends on the minimum delay. Memory access is
often the longest delay. We have a path from PC to IR, then one from IR to one of the DP
registers, then there is another path toward the ALU and a last one from the registers to the
memory (STORE operation) and from the memory to the registers (LOAD operation). So, if
we want to maximize the clock frequency we have to work on 2 aspects to be sure the processor
is fast enough: processor-memory interface, inside of the processor (especially for DECODE
and EXECUTE stages)

27
Figure 2.11: Paths inside the processor

2.2.5 Pipelining
A technique used to increase the performances of a circuit is the pipelining. If the algorithm
requires many operations to be completed and the instructions don’t share resources, then,
while doing one, we can begin to do the other one, without having to wait for the end of the
previous. This solution may have some problems since we have to manage it carefully to avoid
overlaps problems.
This technique allows to reach high performances and to achieve a throughput close to 1
instruction per cycle. The main delay is due to the time needed for every clock cycle, so for
the individual path components.

Figure 2.12: Pipeline approach

The major problem to be avoided is the resource conflict: in the example shown in 2.12,
the memory is requested both from the access to the data memory and from instruction fetch.
This problem can be solved by using two memories.

28
2.2.6 Two memory architecture

Von Neumann: program and data are in

one memory allowing fewer memory wires.

Figure 2.13: Von Neumann architecture

Harvard: program and data are stored in

two different memories, allowing
simultaneous program and data memory
access. With Harvard architecture, there
is no conflict during pipeline.

Figure 2.14: Harvard architecture

2.2.7 Performance improvement methods

• Faster clock (but there is a limit);

• Pipelining: overlap stages

• Multiple ALUs to support many instruction streams:

- Superscalar: Fetches instructions in batches, detects dependencies between them in

real time, and executes as many as possible. To be sure that the 2 instructions can
work in parallel (no data dependencies, ...) there is a hardware part, the pre-fetcher,
between the program memory and the execution unit, devoted to start the fetch
operation and understand if the operations can be done in parallel.
One of the main problems is the management of the pre-fetcher buffer (we must
avoid losing instructions).
The price to get a better throughput is the increase in hardware complexity

29
Figure 2.15: Superscalar approach

- Very Long Instruction Word (VLIW):

each word in memory has multiple independent instructions, so it uses a compiler to
detect and schedule instructions. It is much cheaper, faster and less power-hungry.
When we compile a program from C language to machine language, before executing
it on a μP, we already know the sequence of instructions to be performed. So, from
the result of the compiler (in 2.16 black instructions have data-dependencies) a
software creates a long instruction made of instructions avoiding data dependencies.
The complexity is mainly related to software.

Figure 2.16: VLIW approach

- Multi-core (system solution, not processor architecture)

30
Chapter 3

Peripherals

3.1 Introduction
A microcontroller contains a Microprocessor, a Memory, I/O ports and some peripherals. These
sections are connected through buses.
There are many kinds of peripherals: timers, converters, Standard Interfaces (to communi-
cate with the external world), and so on.
Each peripheral can be seen, from the microprocessor point of view, as a set of registers that
can be grouped into three main families:

• control/configuration registersare used to enable the peripherals, to set the speed

and the mode or even the bit-width of a converter (when the peripheral is a converter);

• status registers give information about the status busy or ready of the peripheral, used
to get some feedback from the peripheral (e.g useful to know when the conversion of a
converter is finished);

• data registers are used to move data from peripheral to microprocessor and vice-versa

The μP interfaces with peripherals in two ways:

• port-based I/O (parallel I/O): the μP uses ports (made of pins) to communicate with
the peripherals. The software reads/writes them just like registers and the access is direct.
The problem is the scalability: if we increase the number of peripherals, the number of
ports (and bits) required may be too high. It is not so used today.

• Bus-based I/O: control part, address and data form a bus which can be accessed through
a protocol written within the micro-controller: a single instruction carries out the read or
write protocol on the bus. The peripheral is accessed as any other memory address. The
downside is that since we are using a bus (so a shared link) we cannot access concurrently
with more resources. There is no direct access.

3.2 I/O extensions

In real cases, it could happen that the processor has to handle parallel I/O. In these cases we
can exploit some I/O extensions:

31
• Parallel I/O: when processor supports only
bus-based I/O but a parallel I/O is needed,
the parallel peripheral is connected to a regis-
ter of the μC and the peripherals are accessed
independently. Moreover, the Parallel I/O pe-
ripheral can be easily replaced by a larger one.;

Figure 3.1: Parallel I/O

• extended parallel I/O: when the proces-
sor supports port-based I/O but more ports
are needed, one of the ports is interfaced with
a parallel I/O peripheral extending the total
number of available ports for I/O. It is like if
one port is seen as a bus.

Figure 3.2: Extended parallel I/O

3.3 Memory-mapped and standard I/O

Processor talks to both memory and peripherals using the same bus to get the address of the
data or of the peripheral. There are two options to communicate with peripherals:

• memory-mapped I/O: peripheral registers occupy addresses in same address space as

memory. The address is in a fixed position and it cannot be found elsewhere in the
memory. e.g., Bus has 16-bit address:
– lower 32k addresses may correspond to memory;
– upper 32k addresses may correspond to peripherals.
In memory-mapped architectures, since there is no difference in the hardware point of
view between memory and peripheral, special instructions are not required (we can use
LOAD and STORE operations for both memory and peripheral)

• Standard I/O (I/O mapped I/O): an additional pin (M/IO) on the bus indicates
whether a memory or peripheral, so the address decoding is simpler (when the number
of peripherals is much smaller than address space then high-order address bits can be
ignored) and there is no loss of memory addresses to peripherals. Special instructions
are needed to move data between peripheral registers and memory. When the number of
peripherals is much smaller than the address space, high-order address bits can be ignored

32
allowing for smaller and/or faster comparators. e.g., Bus has 16-bit address:
– all 64K addresses correspond to memory when M/IO set to 0;
– all 64K addresses correspond to peripherals when M/IO set to 1.
PRO: we can fully exploit the bit-width of the address bus.

3.4 Microprocessor interfacing: interrupts

In order to check if a peripheral has received a data and has to be served, a processor has two
options: polling and interrupts. The processor can poll the peripheral to see if a data has
arrived, so it checks continuously whether a data is present or not, without doing anything else.
This solution is easier to debug, but very wasteful for the processor, as it is a time-triggered
activation. Polling is commonly used for embedded systems that commonly use data, where
this event is common and not an irregular operation, such as the other cases.
The other option is the interrupt: when a data arrives, the peripheral asks to be served
and the processor is interrupted during its normal execution. This second option requires an
extra pin in the structure (Int): when Int is ’1’ the processor suspends the current program and
serves the peripheral. This is called interrupt-driven I/O. The use of interrupts means that the
debug is more complex, but it is the best solution in those cases where a fast response to an
irregular event is needed (or during the execution of a code).
The problem now is how to know from which peripheral the Interrupt Service Request (ISR
from now on) comes :

• fixed interrupt: the addresses of ISR are stored in the processor and cannot be changed.
If there are not enough bytes, the ISR contains a jump. It is a simple but not so flexible
solution.

• Vectored interrupt: the address is provided by the peripheral and is commonly used
when the μP has more than one peripheral connected by a system bus. When there is an
ISR from a peripheral, the μP sends an address request to it.

• Interrupt vector table: it is a compromise between fixed and vectored interrupts.

There is an interrupt pin and a table in memory holds the ISR addresses. The peripheral
does not provide ISR address, but only an index to the table.
This solution is reconfigurable (there is the possibility to move the ISR location without
changing peripheral) and requires fewer bits.

3.5 Management of processor registers

When the ISR returns, the code must preserve the value of the registers used by the processor
prior the servicing of the interrupt. In a certain way, we could say that handling an ISR is very
similar to handling a function: the processor, which is executing a certain portion of code, has
to jump to a completely different context, so this implies saving registers for jumping and then
restoring them after the ISR. Possible solutions are the following:

• some processors have a dedicated set of registers, that are used only by ISRs (fast, but
with HW cost);

• other processors automatically save some registers to the stack (flexible, but slow);

33
• in other cases the ISR itself saves the registers it uses to the stack (flexible and efficient,
“RISC-like”);

• sometimes the compiler reserves some registers for ISRs (useful for simple ISRs which
need fast response).

3.6 Maskable/non-maskable interrupts

Some interrupts can be ignored by the MPU (maskable), while others cannot be ignored by the
MPU (nonmaskable: reset, special interrupt pin, unimplemented opcode,...).

• Maskable: the programmer can set a bit in the register and ignore the interrupt. It is
useful when in the middle of a time-critical code because the MPU cannot be interrupted
while servicing an interrupt by another one from the same peripheral.

• Non-maskable: it is a separate interrupt pin that cannot be masked. It is usually

reserved for drastic situations, like reset, power failure requiring immediate backup of
data to non-volatile memory.

• Intermediate solution → interrupt priorities: an ISR can be interrupted only by

a more urgent request. Therefore some maskable interrupts can be served before other
ones (non-maskable have always a higher priority). Priority is established by the order of
the vectored interrupts in the memory (e.g. index with low value have high priority and
vice-versa).

3.7 Peripheral data managing

Some peripheral exchange large amount of data with the MPU. In these cases, even if the use
of interrupts is possible, it is time-consuming. There is a solution, which involves a peripheral,
called Direct Memory Access (DMA from now on) that allows the direct transfer without
interrupting the activity of the μP. However, DMA requires system bus, so it is convenient to
use it if the μP does not need to access to the memory (otherwise it stalls).

DMA
The data transfer is based on interrupts: every time a data has to be written in the memory,
an interrupt request has to be asserted. Then, the data is read from the peripheral and, after
that, written in the data memory.

With DMA
Before starting the execution of the main task, the MPU configures the DMA setting the origin
and destination of the data. After the DMA controller has been configured:

• Peripheral.1 asserts req to request DMA controller servicing;

• DMA controller asserts Dreq to request control of system bus;

• after executing an instruction μP sees the request and asserts Dack and releases the system
bus. It stalls only if the processors needs it;

34
• μP resumes execution;

• DMA reads the data from P1 and writes it in the Data memory (μP is executing the
other instructions);

• DMA de-asserts Dreq;

• DMA de-asserts ack;

• the data destination is changed;

Question from the audience: What happens when both Program memory and DMA have
to drive the bus?
In this architecture, if the DMA is working on the bus to move the data from peripheral
to the memory, it means that the bus cannot be used for other things. So we need separate
memories for Program and Memory. If the processor is executing a LOAD instruction when
there is a DMA transition on the bus, only one takes control of the bus and the other waits.
In the other cases, the program memory, since it is a distinct entity from the Data memory, it
can still run while the DMA is working.

3.8 Timer
The timer is basically a programmable counter that consists of 3 registers:

• threshold register: contains the threshold to be reached;

• counter register: contains the counter value;

• latch register: used to read the value of the counter.

The result of the comparison between Counter and threshold can be used as an interrupt.

Figure 3.3: Timer structure

35
36
Chapter 4

Memories

4.1 Categories
Memories can be divided into many categories depending on:
• Access:
– Read only: useful for storing fixed code and constants;
– Read/write: useful for storing variables and data to be processed.
• Type:
– Volatile: data are lost if power supply goes down;
– Non-volatile: data are preserved even with no power supply.
• Interface:
The access in memory can read or write the data serially or in parallel, in a synchronous
or asynchronous way, having different types of memories.
• Addressing:
- Explicit addressing is used when specific commands to access the information are avail-
able. In this case, the specific address, where to read or write, must be provided;
- Implicit addressing is used in memories where only the command to read or write are
present if the access of memory is in a fixed position(at the beginning of the memory or
at the end, as in FIFO and LIFO);
-Addressing by content means that if the requested value is present, then the memory
provides the address to its location (Content Addressable Memory - CAM).

• Static/Dynamic:
- Static memory has no need to refresh the data in it because they remain stable in time;
- Dynamic memory needs to be rewritten to avoid a loss of information, because it fades.

4.1.1 Definitions
• Storage cell: 1 bit memory element;
• Word: collection of bits. Its size is the unit of access for the memory;
• Word-Line (WL): the line in the memory array corresponding to one word.
• Bit-Line (BL): the line in the memory array corresponding to one bit.

37
4.2 General organization

Figure 4.1: Array organization (1D)

The fig.4.1 shows the general organization of a memory as an array of words. A decoder gen-
erates a signal for each word and selects only the cells of the same word line.
This organization has some issues because, as the number of words increases, the size of the
decoder becomes bigger and bigger: this means that, for high values of n, the decoder is slow.
Therefore this scheme is good just for small memories.
Usually a memory is represented by a matrix organization.

Figure 4.2: Matrix organization (2D)

38
Here, as shown in 4.2, in every row there are multiple words and the row decoder is smaller,
but an additional multiplexer is required, driven by the column decoder, to select the correct
word in the line. This organization reduces the decoding time and optimizes array silicon area
(aspect ratio). The width of each column is the word-width p in bits.

If the size of the memory is huge, a hierarchical organization can be exploited.

Figure 4.3: Hierarchical organization (3D)

The basic blocks of a hierarchical organization are array memories and to select one of them
there are wires for the block address and for the block selector. This solution allows saving
power because when a block isn’t used it doesn’t use energy. (another advantage: shorter wires
within blocks).

4.3 Static-RAM (SRAM)

SRAMs are a kind of volatile memory that requires a power supply.

Figure 4.4: Base cell

39
The basic cell of an SRAM is based on a latch. The input of N1 is A and its output is B,
for N2 the input is B and the output is A. To work as a storage cell, the circuit must work in
stable points.

4.3.1 SRAM-cell analysis

The plot of the input to the output characteristic of the cell shown in 4.5 highlights the presence
of three equilibrium points, but only two of them (the blue ones) are stable points.

Figure 4.5: Base cell characteristic

The stability is analysed by applying small variations to one input and looking to what
happens to the output. In the case of the blue dots there is a specular behaviour, therefore,
only one case will be treated. The starting condition is VA = Vcc , if we apply a small variation
on the input we will have a small variation on the output (even smaller than the one in the
input), so this variation is on the input of N2, which becomes a variation on the input of N1
and so on. The variation propagates from one gate to the other one but, at every step, it is
reduced.
The case of the unstable point (red dot) is very different. We consider VA = V2cc , a small
variation on the input of N1 will give a larger variation of the output. This output is on the
input of N2 and, due to the slope of the curve, the variation of the output will be greater. This
output is then applied on N1 and this variation will increase at every point. This is an unstable
point. The cell has problems when small variation are applied on the central point of the plot
or when the variation of the voltage on A or B is lower than V2cc .
So, this cell works correctly only in the two stable points, therefore dVA and dVB must stay
far from V2cc . The logic value in B is NOT(A). It is perfect for storing binary symbols.

40
4.3.2 6T: structure

Fig.4.6 shows the 4T structure: there is a form of redundancy, since Q, BL and their negations
are present. The four transistors (two for each inverter) are used exclusively for the storage
cell.
In order to access the content of the cell, two switches are introduced: they are the transistors
M5 and M6. The structure obtained in this way is called 6T and it is shown in fig.4.7.

Figure 4.6: 4T structure Figure 4.7: 6T structure

Each bit line can be modelled as a large capacitance (due to the fact that in each transis-
tor there are parasitic capacitances). Therefore, in an array of memory cells there are many
capacitances in parallel: even if their single value is small, the bit line value becomes large and,
since the wire of the BL itself has some capacitance, this aspect has to be considered in the
design of this kind of memory.

Figure 4.8: Array structure

41
6T: read operation

The read operation is a critical one for this kind of memory since it can lead to metastability.
Let us consider that the last read operation led to BLi=1 and, then, that a storage cell with
0 is read. What happens on BLi? The WLj is activated (from 0 goes to 1), this means that
the pass transistor M6 is ON and, since it has a resistance RON , the capacitance on BL forces
a dVQ > VC2C and, instead of reading a 0, there is a writing operation of a 1.

Figure 4.9: BL=1 forces a value in the cell

This problem can be avoided by precharging the bit lines with VCC 2
before the reading
operation. In this way, if in the cell there was a 0, the voltage on the BL decreases, if there
was a 1 the voltage increases. This behaviour is shown in fig.4.10.

Figure 4.10: Precharging and transition time

However, there is a certain amount of time to wait before to know the exact value because of
the capacitance of the bit line, limiting the access speed (it is a problem for large memories). If
this settling time is too high, it can be reduced by adding a comparator between BLi and BLi:
if BLi is increasing, BL is decreasing, comparing the two of them will give a 0 at the output in
a faster way. In a similar way, if BLi is decreasing, BL is increasing and the comparison will
give a 1.

42
Figure 4.11: Comparator used to reduce the transition time

In order to have a very dense array, we should reduce the number of transistors. Ideally,
we just need 4 transistors per cell (for the 2 inverters), but then 2 more are needed for the
access and, if for each bit-line we are adding a comparator, we are wasting even more silicon.
Therefore this comparator should be as small as possible so that most of the silicon is used
for the cells. The circuit used to implement the comparator is a sense amplifier. The sense
amplifier shown in 4.12

Figure 4.12: Sense amplifier circuit

When we start reading, we activate a word line and the sensing signal so that MN3 and
MP3 are turned ON and the circuit is activated. If the voltage on the node B is increasing,
it means that the BL was precharged at Vcc/2 but the cell we are reading contains a 1. As
a consequence, the voltage on A, precharged at Vcc/2, is decreasing so MN2 is turning OFF
while MP2 is turning ON. Since MP2 is turning ON, the node B is pushed towards Vdd, but B
is still rising by itself since the bitline is precharged. This means that MN1 is turning ON and
MP1 is turning OFF, so the node A is pushed towards GND. So we have a positive feedback
speeding up the answer of what is inside the cell.

43
6T: transistor sizing
As stated previously, transistors should be of the minimum size to increase integration. The
sizes of the transistors must be chosen so that they allow to perform correctly the read and
write operation.

Transistor sizing: read operation Let us consider a cell storing a 0. During the read
operation, the transistors M6 and M3 are ON, due to their RON , the voltage VC2C , pre-charged
on the bitline, has a voltage partition ∆V. If this value reaches the threshold voltage Vtn,M 1 ,
M1 turns on.

Figure 4.13: Current analysis of read operation

Analyzing the currents, it can be seen that the current in M6 can only flow in M3 (because
the other node is the gate of M2), therefore the ∆V must be as low as possible. Considering
that M3 is turned ON (one node at 0 and the other at 1) and works in the parabolic region:

Defining the Cell Ratio as:

∆V can be written as:

This usually means CR > 1, so M6 is minimum and M3 with the maximum width. The
resistance introduced by M3 is smaller than the one introduced by M6, therefore the voltage

44
drop on M6 is larger than the one on M3. The same holds true for M5 and M1 since the
structure is symmetric.

Figure 4.14: Example: Cell voltage VS CR graph for 250 nm technology

6T: write operation

Let us assume that a 1 must be written into a cell storing Q=0. The read constraint on M6,
M3 (M5, M1) does not allow to increase the voltage on Q through M6. As a consequence, the
solution is to write a 0 on a cell storing Q=1.If it is possible to turn M3 off and M4 on this
operation is completed. In a simplified analysis, the approach is to pull down the node Q=1
enough to be below the threshold of M3, so this is turned off and it is possible to write a 1
in the cell. Moreover, the resistance due to M5 must be lower than the resistance on M2. All
these conditions are needed to keep Q=0.

Figure 4.15: Current analysis of write operation

The current flowing in M2 must be the same as the one flowing in M5 (because of the gate
of M3) during the write operation.

Defining the Pull-Up Ratio as:

45
the VQ can be written as:

In order to have VQ as low as possible, PR must be lower than 1. This condition usually
means that M2 has a smaller width than M5. The previous constraint is also valid for M4 and
M6.

Figure 4.16: Example: Cell voltage VS PR graph for 250 nm technology

4.4 Dual Port SRAM cell

The dual port SRAM has additional ports (two pass transistors per WORD line, doubling
so the access logic) in each memory cell. This larger complexity allows having independent
operations on different cells and which may be performed concurrently.

Figure 4.17: Dual port SRAM cell

Figure 4.18: Dual port SRAM block scheme

46
4.5 SRAM timing
The interface of an SRAM usually contains:

• Data bus;

• Address bus specifies which is the cell or the word we want to access.;

• Chip Enable (often active-low) → CE;

• Write Enable, specifies which operation we want to do (often active-low) → W E;

• Output Enable (often active-low) → OE.

The memory from the outside world is usually seen as an array of cells. But this is not true:
there is a lot of logic around (decoding, logic for handling the output, sense amplifier,...),
therefore, keeping the memory turned OFF means consuming power even if the memory is
doing nothing. The idea is that if we are not using the memory we keep connected to the power
supply just the array so to save power. With the CE we can disconnect all the logics that do
not get involved in keeping the information. OE enables the output logic of the array just when
we want to read (in writing operations we don’t use this output logic) thus saving power.

4.5.1 Read
The read operation is used to get data from the memory. It requires that the address is valid,
that W E is high and that the CE is activated. After that OE is activated and a certain time
tAA , during which the address is stable, is passed, the output data are valid and remain so for
a certain time tOH after the address changes.

Figure 4.19: Reading timing assuming CE = 0, OE = 0, W E = 1

A new read operation can start after the minimum read cycle time tRC (during which the
address remains stable) and a new data is ready after the maximum address access time, at
least tAA (worst case scenario). To avoid to consume power when the memory is not used, the
CE is deactivated.When the memory is on a bus, the output cannot be always “active” (OE):
if a read operation is required, OE = 0 and the other devices don’t have to write on the bus
(Z); if one device has to write on the memory, OE = Z.

47
Figure 4.20: Example of read timing for a two port SRAM with CE1 and CE2

Write The write operation is used to store data into the memory. The needed signals are
the memory location (address), the chip enable and the write enable.

Figure 4.21: Example of write timing for a two port SRAM with CE1 and CE2

4.6 Synchronous SRAM (SSRAM)

SSRAMs use one or more external clocks to improve the performances of an SRAM. The
throughput is increased (but also the latency), access time and cycle time can be reduced, can
reach the processor speed.

48
4.6.1 Reading
Four reading methods are possible:

• Flow thru: The address and the control signals are set up before the clock rising edge,
and in that rising edge the read cycle begins. Data will be available after some delay but
within this clock cycle.

Figure 4.22: Flow thru

• Pipeline: Address and control signals are set up before the clock rising edge. Data is
read from the memory cells and stored in output registers. At the next clock cycle, data
is transferred from the output registers to the data output.

Figure 4.23: Pipeline

• Register to Latch: Address and control signals are set up before the clock rising edge.
Data is read from the memory cells, is stored in output latches. Data is transferred from
the output latches to the data output during the falling-edge of the clock.

49
Figure 4.24: Register to Latch

• Burst: Several bits of data are selected using a single address, which is incremented by
an on-chip counter. Both flow thru and pipelined SRAMs may have the burst feature.

Figure 4.25: Burst

Writing

There are two possible methods of writing:

• Standard: This case is similar to the previous modes of reading: address, control signals
and data are set up before the clock rising-edge. On clock raising edge, the data is written.

50
Figure 4.26: Standard Mode

• Late: Useful in single port read/write memory with pipeline. Address/control signals
are set up before clock rising-edge 1, while data are set up before the clock rising-edge 2.

Figure 4.27: Late Mode

4.7 DRAM
We have seen in section 4.3.2 how the SRAM requires 6 transistors (4 for the cell and 2 for the
access). This is a limiting factor if we want to integrate a lot. To increase integration, simpler
cells are required. The idea is to exploit just one transistor: we use one MOS capacitor Cs to
store the information and a pass transistor to access the cell. Charging the capacitor means
that we store a logic 1, discharging it means that we have a logic 0. In this way we achieve a
smaller cell with just one BL.

51
Figure 4.28: Cell of DRAM

The capacitance depends on the dielectric constant C=A/d. If we want a capacitor with a
large capacitance we need a big area. In order to save space, instead of planar technology, we
can work in 3D: we have a trench in the silicon and we fill it with a layer of isolator and a layer
of metal. In this way the surface of the capacitor is no longer on the plane of the semiconductor.
Depending on the depth of the trench we can change the area of the capacitor and get a large
capacitance with small area.

Figure 4.29: Trench cell

If we want to write something on the cell, we have to put a value on the BL, activate the
WL to access the cell, at that point Cs is charging or discharging, then the WL is deactivated.
Since the cell is rather small, it is possible to realize a large integration. Therefore we have lot
of cell connected to the same BL and a long wire for the BL. As a result the capacitance on
BL is large (CBL ) and, as in SRAMs, reading operation affects the content of the cell.
•Let us assume:

- the transistor is an ideal switch;

- the cell stores a 1 (Vcc);

- the BL is at 0;

52
- t = 0− is the time immediately before WL activation;

- t = 0+ is the time immediately after WL activation;

- t∞ is the time when charge partitioning is complete.

So:

- VS (t = 0− )=Vcc;

- VBL (t = 0− )=0;

- VS (t = ∞ = VBL (t = ∞) = VX , this is what we want to find, Vx is the voltage inside the cell
after the reading operation.

• Charge analysis:

- QS = CS VS and QB L = CBL VBL ;

- QT ot = QS + QBL , due to charge conservation;

- QT ot (t = 0−) = CS VCC , we are storing a 1, so Vs(t = 0− )=Vcc and at t = 0− VBL =0;

- QT ot (t = ∞) = (CS + CBL )VX , in steady state we assumed the same voltage on the cell and
on the bit line;

• Due to conservation of charge → QT OT (t = 0− ) = QT OT (t = ∞), CsVcc = (Cs+CBL )VX

→ VX = [Cs/(Cs+CBL )]Vcc, the numerator is lower than the denominator: we have a charge
redistribution between cell and bitline;
• There is a voltage partition!
• Since Cs « CBL , then VX , which is also the voltage inside the cell, tends to 0
• The reading operation destroyed the value in the cell! Having a large capacitance inside the
cell would be useful since it reduces the effects of the partitioning, but if it is too large the
read/write operation are slow: when we turn on/off the transistor it behaves like a resistor, so
the time to access the dynamic cell is given by an RC circuit. Therefore increasing the value
of C means increasing the time constant, reducing the speed of the circuit.

• Solutions:

- Precharge BL at Vcc/2 before reading;

- QT OT (t = 0− ) = CS VCC + CBL VCC

- QT OT (t = ∞) = (Cs + CBL )VX

CS
- VX = [1 + CS +CBL
] VCC
2

VCC
- Since CS << CBL , then VX = 2
+ e;

- With a comparator correct value is read;

- The cell content is still lost!

53
Figure 4.30: Circuit with precharging and comparator

VX is still something not so useful: it is something in between logic 0 and logic 1. With this
solution we are able to read the content of the cell but not to preserve it. We have to find a
way to be able to read without destroying the content of the cell.
What we can is exploit the output of the comparator:
1. Precharge BL at Vcc/2;

2. Enable WL;

3. The comparator output is saturated → put it in feedback to restore cell content;

4. Disable WL.

Figure 4.31: Circuit with comparator and feedback

The output of the comparator provides an answer almost immediately with the activation of
the WL and it is used to restore Vc (by closing the feedback in the circuit). Even if the answer

54
from the memory is very fast, we need to wait a certain time before removing the WL signal
to be sure that the content of the cell is correct, limiting thus the access time.

Figure 4.32: Timing of the circuit with comparator and feedback

4.7.1 Refresh
We have to be sure that the system, every time we are reading a value, is performing 2 things:
giving the answer and restoring the cell. Even when the transistor for accessing the cell is
turned off, so the capacitance should be isolated from the outside world, we still have a leakage
current thus discharging the capacitor. This is due to Source/Drain to bulk junctions which
are in reverse bias. So even when the voltage on WL is zero, the voltage on Cs is not constant,
but it slowly decreases. Therefore the refresh operation is mandatory even after every certain
amount of time (dummy read).

Figure 4.33: microscopic D-cell view

This means that the dynamic RAM needs a refreshing logic (additional logic): with D-cell
architecture (one transistor and one capacitor) we are able to have a very dense array, but the
price is additional logic (that should be small so to not waste much silicon).

4.7.2 Accessing data

A very dense array requires a big number of bits for the addressing, therefore a big number of
pins (increasing the price of the chip). The solution is multiplexing the address. In this way
row and column address are provided in different time slots.

55
Figure 4.34: DRAM array organization

So, while in SRAM row and column address are provided concurrently, in DRAMs (for
historical reasons) row and column address are time-multiplexed:

• Row Address Strobe (RAS);

• Column Address Strobe (CAS).

RAS and CAS tell if the address in the bus refers to row or column.

The interface of a DRAM usually contains:

• Address bus;

• Data bus;

• Row Address Strobe → RAS;

• Column Address Strobe → CAS;

• Write Enable → W E;

• Output Enable → OE.

From the outside it is seen as a combinational circuit (no clock signal involved).

56
4.7.3 Timing: read

Figure 4.35: Timing: Read 1

In fig.4.35 it is explained the operations needed to perform a read. First we put the row address
on the address bus, then we move the RAS signal from 1 to 0. We can notice here a first timing
constraint from when we change the address to when we move the RAS. This means that we
need a set-up time tASR for the address before moving the RAS signal. Then the address must
be stable for a certain amount of time (hold) tRAH . Then, RAS stays active for a time tRAS .

Figure 4.36: Timing: Read 2

At this point, we change the value on the address bus (column address) and it must be
stable for a time tASC , then we can move the CAS from 0 to 1. This signal stays active for a
time tCAS and the address can change after tCAH .

Figure 4.37: Timing: Read 3

57
Write enable must be high at least tCAS before CAS is asserted and at least tRCH after CAS
is de-asserted.

Figure 4.38: Timing: Read 4

Data is valid after the maximum access time (from address tAA , from RAS tRAC and from
CAS tCAC ).

Figure 4.39: Timing: Read 5

The read cycle ends after RAS and CAS are de-asserted (tCRP , tRP ).

4.7.4 Speed
The minimum time to complete the read cycle is given by
tRS = tRAS + tRP + switchingtime
How can we improve the performance?
• Fast Page Mode;
• Extended Data Out (EDO);
• Synchronous DRAM (SDRAM).
N.B. Page: group of cells with the same row address.

Fast page mode

The idea behind is that if we have to read some consecutive values inside the memory (usually
the memory is organized in a way in which consecutive values are on the same page), we can
perform just one reading operation and then multiplex the pages from the array.

58
Figure 4.40: Timing: Fast page mode

First, we put the row address on the address bus, then we start moving the RAS signal. At
this point, we can put the column address on the address bus, then we can start moving the
CAS signal. Now we can remove the CAS signal but without removing the RAS signal: RAS
is kept active, we have just to change the column address and CAS. In this way we get several
values from the same page.
tP C is the time to complete an operation (read or write): the time from CAS transition to
precharge end is (tCAS + tP C );

Extended Data Out

In EDO DRAMs the output driver is still active even if CAS is in precharge.

Figure 4.41: Timing: Extended Data Out

Usually, when working with RAS and CAS, we start driving the RAS active, then we start
moving the CAS signal and, after a certain amount of time, we get the value. When we remove
the CAS we cannot be sure the data is still valid. This means that if we want to switch very
quickly the RAS and CAS signals, for being very quick in access the content of the memory,
we have to be very quick as well in getting the value from it. So the amount of time we have
to read the value of the memory is the one underlined in red in 4.41.
In EDO DRAMs, even when the CAS is removed to 1, the data is kept valid, in this way
we have more time to read it. Doing so, we can narrow the CAS signal without reducing the
DQ time slot.

59
4.7.5 Refresh handling
Every DRAM contains a refresh controller and a refresh counter to generate the row addresses.
The refresh, as was said previously, is a dummy read and write operation that must be done
periodically. There are two main refresh methods:
• distributed: in between refresh cycles the memory is free;
• burst: provides lot of free time, but we have to be sure that the first cells are valid.

Figure 4.42: Timing: Refresh options

The problem of refresh is not the bandwidth, but the uncertainty: if we perform a read operation
after a refreshing one, we have 100% uncertainty (from the outside we don’t know when the
refresh happens).
Usually, the refresh is not automatically handled by the DRAM itself, but it is handled by
the designer. The designer has to correctly drive the interface signals (RAS and CAS) and the
refresh. There are several types of refresh.

RAS-Only-Refresh (ROR)

Figure 4.43: Timing of ROR: 1)Row address is applied 2)RAS is set active and CAS must
remain inactive 3)After a specified amount of time, RAS is set inactive.

We put the row address on the address bus (we are identifying one of the word in the array),
then, with the column address, we select something in the same line (page). If we don’t provide
any column address, so the RAS is active and the CAS is inactive, it means the we want to
perform a refresh operation of the whole page. After a specified amount of time (enough to
finish the refresh), RAS is set inactive. This solution is annoying from the designer point of
view: he has to keep track of the rows refreshed.

60
CAS-Before-RAS (CBR)

Figure 4.44: Timing of CBR: 1)CAS is set active 2) WE must be inactive 3) After a specified
amount of time, RAS is set active 4)The refresh counter determines which row to refresh; –
After the required time, CAS and RAS are set inactive.

The main advantage of this strategy is that we don’t have to keep track of the addresses of
the lines we are refreshing. During the whole operation WE remains inactive and CAS is set
active, after a certain amount of time RAS is active too. A counter is present, it determines
the page to refresh. In this way the work of the designer is simplified.

Hidden refresh
During a refresh operation we are not able to access the value of the memory. We can “hide”
the refresh by keeping a value available on the bus.

Figure 4.45: Timing: hidden refresh 1

First we perform a read operation: RAS is activated and the the same is done for the
CAS. At this point the CAS is kept active and the RAS is removed and then it is re-activate
(CBR-like). In this way the data on the bus is still valid.

61
Figure 4.46: Timing: hidden refresh 2

Self refresh

Some DRAMs have self-refresh features:

• Start a CBR refresh.

• Keep the RAS low for a longer period.

• The refresh counter is incremented automatically (handled by internal logic).

4.7.6 Writing

Two modes:

• CAS based (early write), like the reading operation: WE is taken low prior to CAS
falling.

• WE based (read-modify-write), allows to access one cell, read it, modify it and write
it back with a single bus transition. WE falls after CAS is taken low.

Read-Modify-Write detail:

62
Figure 4.47: Timing: read-modify-write

1. Row address + RAS

2. Column address + CAS

3. Reading (enable OE)

4. Disable OE

5. Enable WE

6. Write the new value in the cell

This must be an “atomic” operation. It is important in operating systems to handle

semaphores, which are a simple way to lock resources.
Example: printer
A variable represents the printer status (free/locked). If the printer is free (variable at 0)
→ set the variable (lock the resource) → print → clear the variable (unlock the resource). If
the operation is non-atomic, then semaphores do not work.

4.7.7 SDRAM
Developed to have higher storage capability and lower price than SSRAMs.
Two main families:

• Single Data Rate (SDR) → 1 data per clock cycle;

• Double Data Rate (DDR) → 2 data per clock cycle.

Main characteristics:

• Synchronous Control Logic and Controls with commands;

63
• Multiple bank architecture;

• Separated power supply;

• Selectable CAS Latency and Burst Length (Mode register).

Synchronous Control Logic and Controls with commands:

• All inputs and outputs are synchronous with the clock (so we have some registers).

• Control is easier.

• The memory executes commands (combination of the logic levels of control signals instead
of complex timing).

Multiple bank architecture:

• the memory array is split into banks, we can refresh a bank and access another;

• controls can be performed at bank level (interleaving control on each bank separately to
hide pre-charge time).

Separated power supply: synchronous logic is characterized by a large current taken from
the power supply during the rising (or falling) edge of the clock. This can add noise inside
the array and it can be dangerous. Separating the power supply of the logic circuit (for the
synchronization) and of the array, we reduce the amount of noise injected into the array.

Mode register:

• Configure some timing/performance related parameters;

• Selectable CAS Latency;

• the number of clock cycles that occur from the input of a command to the output of
data.

64
Figure 4.48: MODE Register

65
SDRAM block diagram

Figure 4.49: SDRAM block diagram: notice the multiple bank architecture

Simplified sequence of operations:

In order to be sure that the memory works correctly, we have to perform different steps, so
we need several control signals. This means that we need a controller in-between the mP and
the SDRAM.

66
The controller interacts at the beginning of the time with the SDRAM to configure it, then,
when the memory is ready, the controller receives commands (read and write) from the μP and
translates these macro-commands into specific commands for the SDRAM.
Commands

Figure 4.50: SDRAM commands, the # symbol means that the signal is active low

SDRAM use
The beginning of operations with an SDRAM starts with ACTIVE

67
The ACTIVE signal is very important because after that we have to wait a minimum delay
tRCD to READ or WRITE. tRCD is equal to n number of clock periods in which no operation
has to be done. Therefore, changing the clock frequency of the memory, changes the number
of clock cycles we have to wait in order to respect the timing of the memory.

SDRAM Read
Random: we can issue the read command and specify with the address the bank column
we want to read.

Figure 4.51: Random read

Burst: after a CAS latency we can get a burst of data.

Burst Length: the number of words that can be continuously input/output for a read or
write operation.

68
Figure 4.52: Burst read

Burst mode requires to load the Mode register with LOAD MODE REGISTER command.
The value to be written in the Mode register is taken from the address bus: inside the controller
we need a multiplexer that chooses to put in the address bus the value of the address or the
value of the Mode register.

Figure 4.53: Burst read configuration

69
SDRAM write

SDRAM: DDR DDR exploits both rising and falling edge of the clock, so this memory
should be very fast in reacting since the available time (to get the data) is no more a clock
period but half of it. DDR is very useful in high performance systems, where the memory
should run very fast.

Figure 4.54: SDR and DDR comparison

A problem of DDR SDRAMs is that at high rates there can be some skew between clock
and data on the PCB → correct read/write operations are difficult! What we can do is add a
data-edge-aligned signal (DQS).

70
Figure 4.55: DQS

The wire of the DQS must have the same length of the wire that carries the data to be
sampled (so that they have the same delay). In this way the DDR uses the DQS to perfectly
synchronize the sampling.

4.8 CACHE

We are used to work with fast μP, on the other hand we want to have large memories, so that
they can store lot of information, and we want them to be fast. However, large memories (lot
of BLs → large capacitance) are slow. In practice, we can solve the problem of coupling the
speed of the processor with the speed of the memory, resorting to a hierarchical system: we
put next to the processor small memories which are very fast and we place the large memories
a bit far from the processor.
Therefore, the memory hierarchy can be seen as a pyramid: as the level increases, the
distance from the processor and the size of the memory increase, while its speed decreases.

Figure 4.56: Memory Hierarchy pyramid

71
Figure 4.57: Current memory hierarchy

Let us consider now, as an example of memory access, a μP with L1, L2, and main memory,
which executes a load word (lw) instruction.
Sequence of operation:

1. Search requested word in L1;

2. Not present in L1, search in L2;

3. Not present in L2, read block of data (with requested word) from main memory, put it
in L2;

4. Put block with requested word in L1 (from L2);

5. Put requested word in the register;

6. lw is completed.

Memory hierarchy terminology

72
Cache fundamentals
The elementary component of a cache memory is the cache line, which is made of several
field (data, address related information, ...).

• The data field is matched to the next level of hierarchy: if we want to connect successfully
this memory to the next level of hierarchy, the width of the data field must be matched
with the data width of the next level;

• The format and the size of the address related information depend on the mapping
algorithm.

Let us consider the following example: assume we have a 16 MByte main memory → 16
MByte = 24 ∗ 22 0 = 22 4 → 24 bits for the address. Let us suppose the data is represented
on 32 bits (4 bytes per data). The 24 bits are used in this way: 22 are used to choose one
line and 2 bits choose one byte among the 4 available. Let us assume we use a 64 kB cache.
64kB = 26 ∗ 21 0 → 16 bits. Since we have to match the data width of the cache with the data
width of the main memory, we need a 32-bit bus for data. As a consequence the 16 bits used
to address the single byte, must be reduced to 14.

Figure 4.58: Example of data width matching

73
4.8.1 Cache organization
Two questions:

1. How do we know if something is in the cache?

2. If it is in the cache, how to find it?

Answers to these questions depend on type and organization of the cache. Usually byte access
support is required !

4.8.2 Direct mapping cache

A first example of cache organization is direct mapping cache, each block of the main memory
maps to only one cache line.
A block is made of n bits.

• Address (m bits) split in two parts:

Least Significant w bits identify the byte in the block;

Most Significant s bits identify the address of the block.

• Least Significant r bits identify the line in the cache (index);

• Most Significant s-r bits are the tag.

• The tag is required to identify which block is in the cache.

N.B. At the begin of the time, the valid field is 0. So at the begin of the time we have a lot
of misses, so when we get the correct value from the memory and we put it in the cache, the
valid is set to 1.

74
In this example, the address next to 000000 cannot be 000001, since 000000 identifies the
byte 68, 000001 the byte 24, 000002 the byte 57 and 000003 the byte 13.
For easing the access to the memory, it is better to translate the byte level address to the
block level address (just by neglecting the two least significant bits). If we are using direct
mapping, we have clearly defined where each element in the main memory can be stored inside
the cache. The position inside the cache is given by the index which is a portion of the address.
The least significant bits of the block address represent the index, the most significant ones
must be stored inside the cache as the tag so to avoid misunderstandings.

Direct Mapping Cache: reading

Figure 4.59: Direct Mapping Cache: reading

75
The microprocessor provides an address on m bits (e.g. m=24). First, we have to remove
w LSBs related to the index of the byte in one block (e.g. w=2). In this way we get an index
on r bits (in our example r=14) used as address of the cache and a tag on s-r bits (in this case
22-14 =8 bits). From the cache we get a value which contains the data, the valid (1 bit) and
the tag. At this point we can compare the input and output tag to be sure that what we are
reading from the line is exactly what we are looking for. If they are equal and the valid is 1 we
have an hit.
N.B. For building our cache we can exploit the logic available on FPGA, so the memory is
usually synchronous. When the μP gives the address, we have to sample it at the next clock
cycle, so the memory is not answering immediately. We have to be sure that we compare the
TAGs corresponding to the same index value. Therefore we add a register on the input of the
comparator (side of s-r), otherwise we compare the TAG we get from the cache with a TAG
that is not consistent from the timing point of view. Moreover, we have to pay attention to
match the number of registers we add in this way with the pipeline depth of the the memory.

Direct Mapping Cache – Pro & Cons

• Direct mapping cache is simple to design and fast: given memory address, index in the
cache is easily found, and no search is needed.

• Good for small caches (e.g. L1 (on chip cache));

• Problem: conflict misses, that is misses caused by accessing different memory locations
mapped to the same cache index: no flexibility in where memory block can be placed in
cache.

4.8.3 Fully Associative Cache

Each block of the main memory maps to any cache line. A block is made of n bits. Address
(m bits) split in two parts:

• Least Significant w bits identify the byte in in the block;

• Most Significant s bits identify the address of the block. Most Significant s bits are thetag;

• There is no index and so the tag is rather large.

76
In the direct mapping approach we are paying a richer structure with the advantage of having
a rather small tag. In the full associative approach we have high flexibility, which is paid with
a large tag.
Example

Figure 4.60: Fully Associative Cache example: this time we have to store the full block address

77
Fully Associative Cache: reading

Figure 4.61: Fully Associative Cache: reading

Since everything can be mapped everywhere, we have that every time we want to check if
the wanted data is inside the cache or not, we have to read concurrently all the lines of the cache
and then compare the TAG field with the current TAG. Since we have to read concurrently all
the lines, the cache can be seen as a bunch of registers (each line is a register). moreover we
need lot of comparators and a logic decoder. This means that, in terms of hardware, a fully
associative cache is more complex than the a direct access one.
Fully Associative Cache: notes
When a read operation is performed, the tag must be searched in the whole cache, because
the item can be in any cache block. The search for the tag must be done by hardware in
parallel (a serial search would be slow). The necessary parallel comparator hardware is very
expensive (one comparator for each line). Fully associative approach is practical only for very
small caches.

4.8.4 N-way Set Associative Cache

It is halfway between “direct mapping” and “fully associative”:

- Like having N direct mapping caches operating in parallel;

- Select the one that gets the hit.

Each block of the main memory maps to a set of cache lines. A block is made of n bits. Address
(m bits) split in two parts:

• Most Significant s bits identify the address of the block. Most Significant s-r+log2(N)
bits are the tag.

78
• Least Significant r-log2(N) bits identify the set in the cache (index);

N-way Set Associative Cache: example

The TAG is slightly larger than the one of the direct mapping, but much smaller compared
to the one of the fully associative.

Figure 4.62: Example with N=2, so each line stores 2 blocks

79
N-way Set Associative Cache: reading

Figure 4.63: N-way Set Associative Cache: reading

The address comes from the microprocessor. From it we take w LSBs for identifying the
byte inside the word, then the other part of the address is split in 2. The first part (the LSBs)
is used as index for identifying one line in the cache and, since these caches are direct mapped
ones working concurrently, we access all caches with the same index. Each line is composed of
address related information and data. The address related information is basically the TAG.
We compare these TAGs with the second part of the address, corresponding to the TAG. The
decoder takes as an input the result of the comparisons and drive accordingly the multiplexer
(whose inputs are the data from the caches). The decoder drives also the multiplexer of the
valid field.
N-way Set Associative Cache: note
Direct mapped cache and fully associative cache can be seen as just variations of set associative
cache:
• Direct mapped cache is a 1-way set associative cache;

• Fully associative cache is an n-way set associative cache (where n is the number lines in
the cache).

4.8.5 What to do if a miss happens?

If a miss is obtained, valid bits from selected set must be examined:
- If there is at least one cache line free (valid=0), then block read from memory is placed in it
(and valid bit is set).

- If all valid bit are set, one of the lines in the cache must be replaced with the block read from
memory!
Replacement strategies:
- Random line;

80
- LRU → Least Recently Used line;

- FIFO →First loaded line (in) is replaced (out);

- LFU → Least Frequently Used line.

LRU replacement
It replaces Least Recently Used (LRU) line, since the least recently used line is (probably) not
used again.
We must add logic in order to store informations about the use of the stored lines.
In a 2-way set associative a bit is needed:

• When first line of the set is accessed, set “access bit” to 0;

• When second line of the set is accessed, set “access bit” to 1;

• When a line has to be replaced, if “access bit” is 0, replace the first line (and vice versa).

If N is greater than 2, one additional bit is not enough in order to find the least recently used
line.
- e.g. Intel 80486 uses a 8kbytes 4-way set associative cache. It implements a pseudo LRU
strategy, with three additional bits per each set. Lines are grouped into two couples:

• First bit indicates which couples has been accessed last;

• Second bit indicates which line in the first couple has been accessed last;

• Third bit indicates which line in the second couple has been accessed last;

Figure 4.64: LRU with N¿2

Read: miss event summary

• First access to a memory block (compulsory miss);

• More data that can fit in the cache (capacity miss, it can be reduced working on the size
of the cache;

• Block replacement policy for the cache discarded a block that is now being referenced
and must be reloaded (conflict miss).

81
4.8.6 Writing
When a write operation is performed on a data present in the cache, we have to update the
value stored in the cache. What about the value stored in the memory?
Let us suppose to write data only to cache: main memory and cache would be inconsistent, so
this approach must be discarded. If the data in the cache is replaced and the memory is not
updated, a miss can occur and the new value (the updated one) is lost!

Write-through strategy
When a write operation is performed, we write both block in the cache and block in the
main memory. In this way memory and cache are always coherent, but the write operation is
performed at the speed of slower memory! It is a simple strategy with poor performance.

Write-through with buffer strategy

A write buffer is added between cache and memory:

- Processor: writes data into cache and write buffer;

- Buffer controller: writes buffer contents to memory;

- Write buffer is a FIFO (First In First Out) queue.

Figure 4.65: Write-through with buffer strategy

This strategy works fine if average time between two writes is greater than memory write cycle
time.

4.8.7 Write-back strategy

Data is written only to cache and the line is marked as modified. When a modified line is
replaced, it is written to the memory (so consistency between memory and cache is maintained).
It requires one additional bit per line in the cache, in order to verify if the block has been
modified or not. Write back: when repeated writes occur, only last value is stored in memory.
It is complex to implement.

4.8.8 Write miss event

When a write operation is performed on a data not present in the cache, we have a write miss.
Possible solutions:

• Write through (write no-allocate) → slow;

82
• Write through with buffer → consistency problem (risk of reading the block when it is
not up-to-date.
• Write back-based (write allocate):
- Read the block from the main memory;
- Place it in the cache;
- Write (update) in the cache;
- Update the main memory when the cache line is replaced (as in write back).

More on writing
Write-back strategy is based on the fact that writes are performed only one time, and only when
the line in the cache is replaced. What happens if more “actors” can access to main memory
(Multi-processor systems, DMA controller, ...)? If a modified line content is accessed from the
main memory, then an error occurs! In order to ensure consistency of the cache, a protocol
has been developed and it is called MESI protocol (Modified Exclusive Shared Invalid): inside
each line of the cache we have data, TAG, valid and other bits just to be sure that the value in
the cache is coherent with the value in the main memory and all other caches. We are paying
the price of having a fast system (if we use cache we want a fast system) with some overheads

4.9 Non volatile memories

4.9.1 Read Only Memory (ROM)
We have an array made of bit lines with one pull up resistor on each line. Each word line can
be connected to the bit line through a diode. If both WL and BL are at 1, the diode is OFF,
if the WL goes to 0 since the BL is held by the Rpu, the diode is ON, so we inject a zero on
the BL.

Figure 4.66: Masked ROM, written during fabrication

83
• W Lj = 1→ BLi = 1 (diode off);

• W Lj = 0→ BLi = 0 (diode on);

The true structure contains all diodes: instead of having diodes just where we need, a
simpler solution is to have them in each position of the array and then decide to connect them
or not. From the fabrication point of view each ROM is equal to another one, the difference is
in the last step, the metallization, which defines if the diode is connected or not. (No diode
→ 1 on BL, connection of diode → 0 on BL). We have to be sure to drive just one WL at the
time, otherwise we can create a short.

Figure 4.67: Masked ROM detail

This solution is, however, not the smartest one: the BL is not an ideal wire, but it has a
capacitance, so having a direct connection between WL and BL is not a good idea.

4.9.2 MOS-based ROM

Replacing the diode with an nMOS transistor, we avoid direct WL to BL connection.

84
Figure 4.68: MOS-based ROM

- Drain connected to the BL;

- Source connected to GND;

- Gate connected to the WL.

• W Lj = 1→ BLi = 0 (transistor on);

• W Lj = 0→ BLi = 1 (transistor off);

Compared with the diodes approach, here we have, as a downside, an area overhead: we must
connect the Source to GND, so we need an additional line. An idea to reduce this problem is
to swap transistors, so having two WLs that are physically close with transistors reversed, so
that their Sources can share the GND line. In this way we halve the number of GND lines.

Programmable ROM (PROM)

We build the array of transistors and with metallization we choose the connections. To get
a programmable ROM we put a fuse between Drain and BL. In this way, programmability is
achieved by fusing or not the fuse.

85
Figure 4.69: PROM

• PRO: easy user customization;

• CONS: testability, reliability.

4.9.3 The MOS threshold voltage

Each cell is based on an MOS transistor, which, as a first approximation, acts:
• As a short circuit (Small R) if VGS > VT ON state

• As an open circuit (Large R) if VGS < VT OFF state

Figure 4.70: nMOS cell behaviour approximation

If we modify VT we can make the MOS transistor:

- permanently ON (VT < V min);

86
- permanently OFF (VT > V max);
- switchable (Vmin < VT < Vmax).
When we connect together metallization, oxide and silicon, there is a movement of charges
inside the structure and, if the Si is p-doped, there is a depleted region (the majority carriers are
outside this area). At the equilibrium this structure has the Fermi level in all the materials. Due
to Schrödinger theory, electrons inside this structure are divided into bands and, depending on
the type of material, we have different characteristics (conductors, semi conductors, insulator,
metals). In metals, conduction and valence band are overlapped, so the distance between
the free level of energy and the Fermi level is given by the extraction potential (qXm). In
semiconductors the two bands are separated but the distance is rather narrow, so the energy
gap (distance between the lower energy band in the conduction band and the upper energy
band in the valence band) is narrow. In insulator the energy gap is rather large.
Inside the semiconductor the intrinsic level of Fermi level is very close to the low limit of the
conduction band (Ec), so the distance between E0 and EF is given by the extraction energy of
the semiconductor (qXs) plus Ec-EF I plus qΦp .
If we have to find VT we need a further step: we apply a voltage to increase the qVfb step
bending the slope of the band structure. At a certain point, the intrinsic Fermi energy Efi
crosses the Ef of the semiconductor. If the two levels are crossing it means we are changing the
type of semiconductor we are using: with EF I < EF , on the oxide interface, the behaviour of
the semiconductor is n-type. So, increasing the voltage on the structure we are able to capture
the electrons from the structure in the interface between semiconductor and oxide. When we
apply a voltage so that EF I − EF = qΦp we are in the strong inversion condition. During strong
inversion the structure is the one shown in fig 4.71

Figure 4.71: Band diagram of MOS

If we apply a voltage between gate and bulk, high enough to create strong inversion, we

87
create a channel of electrons even if the structure has a depleted region. To build a transistor
we need something to exploit this channel, so we add source and drain. In this way, with an
electric field, we are able to move the charges.

Figure 4.72: Channel of electrons and depleted region

We want to be able to change the threshold voltage of the transistor, so we need to change
the physical parameters that defines the strong inversion condition.

A way to change the threshold voltage is to exploit the fact that some charges can be

trapped inside the oxide, and so the complete threshold voltage has a contribution which is

given by the standard threshold voltage of the transistor, then we can modify this value by

trapping charges inside the oxide. We can also implants some charges into the structure (in

this way we get the wanted value for the threshold voltage, but we are not able to change it

88
again).

VT = VT 0 − CQoxi −α Q ox
Cox

4.9.4 Floting Gate Structures

Instead of using the standard structure seen in fig.4.72, we have an oxide layer and a gate
trapped between another layer of oxide, then the last layer is the effective gate. In this way,
instead of trapping charges inside the oxide of the transistor(which can create some problems
of reliability because, forcing charges inside a layer of oxide, during time we can damage the
oxide layer) we store charges in a floating gate.
We have to avoid leakage (or reduce them) so that the charges in the floating gate stay un-
touched for a very long time.
Storing negative charges rises the threshold of the MOS structure.

Figure 4.73: Floating Gate structure and symbol

Programming Mechanism 1:Hot Electrons programming

Gate and drain are set to high programming voltage. A large current flows in the drain and
some electrons acquire a large energy. Under the influence of the high vertical field, they may
overcome the oxide barrier and reach the floating gate where they are trapped.
The mechanism is SELF REGULATING:

89
• Electrons in the gate increase VT ;

• Current Decreases.

Figure 4.74: FAMOS

Programmable ROM (EPROM)

Floating gate transistors are the cell elements.

• Programming → rise VG and VD to VP P ;

• A programmed transistor has a VT > VDD .

• Reading:
– programmed transistors do not conduct and the BIT lines are 1;
– Unprogrammed transistors conduct and the corresponding BIT lines are at 0.

Figure 4.75: EPROM

90
The erasing procedure is done through UV radiation: electrons in the floating gate achieve
sufficient energy to return in the semiconductor.
N.B.

- Programming → by cell;

- Erasing → by memory;

The erasing operation requires ceramic package with a small quartz window for exposure.

Figure 4.76: Erasing operation

Programming Mechanism 2:Fowler Nordheim effect

Very thin oxide next to the Drain, high gate voltage (more than VDD ) and Source and Drain
at ground. The electrons pass through the oxide barrier by tunnelling.

• Positive gate voltage stores negative charges → High VT ;

• Negative gate voltage removes negative charges → Low VT ;

91
Figure 4.77: FLOTOX

At a first glance we may think to reproduce the EPROM structure with a different elementary
cell. However, this does not work ! Let us consider the example in fig.4.78

• To program A:
WL1 → VP P andBL1 → GND

• To avoid programming B:
BL2 → VP P

• To avoid programming C:
WL2 → GND

• Side effect:
WL2 → GND and BL2 → VP P erases D !

92
Figure 4.78: Example: EEPROM based on EPROM structure

Therefore, the problem is that programming a transistor interferes with the other ones.

What we can do is add an access transistor to each cell like shown in fig.4.79
• To program A:
W0 → VDD and P 0 → VDD
• To avoid programming B:
BIT1 → VP P
• To avoid programming other cells C:
W1,. . . , WN → 0

Figure 4.79: Example: EEPROM structure

The price to use EEPROMs is that we have to handle inside the chip several voltage values, at
least three (GND, Vdd, Vpp).

93
4.10 Flash memories
Flash memories can be seen as an extension of EPROMs and of EEPROMs:

• Based on FAMOS structure;

• One transistor per cell (thanks to this they can reach high integration);

They have a hierarchical architecture

• Block (sector)→ tens to hundreds of kBytes;

• Page → up to some kBytes;

• Byte/Word.

Possible operations are:

• Read: get the value from the memory (byte/word/page);

• Program: set the required cells of a page to 0;

• Erase: set the cells of a block to 1;

There are two families of Flash memory:

• NOR: every BL is connected to the supply voltage with a Rpu, each transistor is con-
nected to the BL through the Drain, the Source is connected to the GND and the Gate
to the WL.

• NAND: the Drain of the transistor is connected to the BL, the Source is connected to
the Drain of the next transistor.

4.10.1 Architectures

Figure 4.80: Architecture of NOR and NAND flash

94
In NOR architectures, due to the fact that the Source must be connected to GND, we need
a metal wire which connects Source to GND. Exploiting symmetries (consecutive pages share
the same wire) we can reduce this metallization. The width of the metal layer (source) cannot
be too narrow otherwise the reliability is compromised (and it is difficult to realise).

Figure 4.81: Architecture of NOR-flash

The NAND cell solves the problem of achieving high integrability by putting the cells
sufficiently close one to the others, so that the n+ well is shared by two transistors, acting as
the source of one transistor and the drain of the other one.

Figure 4.82: Architecture of NAND-flash

4.10.2 Cell sensing

The content of a cell is measured sensing IDS (I/V converter).
• Transistor OFF → low current variation/low current variation → cell storing a 0.
• Transistor ON → high current/low current variation → cell storing a 1.
Erased cell: No charge in the floating gate → “normal transistor” → high current can flow
→ logic 1.
Programmed cell: Charge in the floating gate → “OFF transistor” → low current → logic 0.

95
4.10.3 NOR-flash
NOR-Flash Memory: erasing

When we perform an erasing operation we must be sure that all transistors act in the standard
way. So, we need to remove trapped charges from the floating gate.
This is done exploiting the Fowler-Nordheim effect:

• first we leave open the BLs,

• then we drive at 0 the gates of the transistors;

• then we apply a high voltage to the bulk (in this architecture the bulk is connected to
the source);

• In this way trasistors have regular threshold voltage.

So, if we apply an high voltage on the gates, the transistors turn ON and there will be a relevant
current variation meaning a logic 1.

Figure 4.83: NOR-Flash Memory: erasing

NOR-Flash Memory: programming

To perform the program operation, we exploit the hot electrons effect, so we need a high voltage
(6V) between source and drain so to accelerate the electrons, then we need a high voltage (12V)
on the gate so that the electrons are able to jump into the floating gate. This increases the
threshold voltage of the transistor.
When, with a regular voltage, we try to access the transistor, it will not turn ON, so the current
variation at the node is very low and it will be detected as a logic 0.

96
Figure 4.84: NOR-Flash Memory: programming

NOR-Flash Memory: reading

Let us consider that the red circled transistor infig.4.85 is programmed, so it has electrons in the
floating gate and its threshold voltage is high. On the other hand, the blue circled transistor is
not programmed, so it has no electrons in the floating gate and its threshold voltage is regular.
Applying 5V on their WL the first transistor won’t turn ON (high Vt) and a logic 0 will be
read, whereas the second one will turn ON (regular Vt) and a logic 1 will be detected.

Figure 4.85: NOR-Flash Memory: reading

97
4.10.4 NAND-flash
NAND-Flash Memory: erasing
We apply a high voltage to the bulk and connect the gate of the semiconductor to ground so
to discharge the floating gate. If the bulk voltage is high enough, we can over-discharge the
floating gate removing its electrons. In this way there is a positive charge in the floating gate
and the threshold voltage of erased cells becomes negative.
With a negative threshold, even with 0 V on the WLs, the transistors are turned ON.

Figure 4.86: NAND-Flash Memory: erasing

NAND-Flash Memory: programming

While in NOR-flash the transistor is working with standard or high threshold voltage, with
NAND-flash the threshold voltage can be very high or very low (even negative).

Figure 4.87: NAND-Flash Memory: programming

98
NAND-Flash Memory: reading
We apply 5V on those pages we don’t want to read.

Figure 4.88: NAND-Flash Memory: reading

4.10.5 NOR Vs NAND

NOR-flash: NAND-flash:

• Large cell area, small capacity; • Small cell area, large capacity;

• Fast read (tens of ns), random access; • Slow read (tens of μs), sequential ac-
cess (shadowing);
• Slow write (some μs) compared to
read; • Fast write (several μs), comparable
with read;
• Used mainly for code/instructions
• Used mainly for data.

4.10.6 Interface
The interfacing is similar to the one seen for the SRAM, the difference is that signals are
grouped to create commands (as in SDRAMs). A complex FSM with a DC-DC is used to
handle erasing and programming operations.
Several flash memories contain a Status Register to check erasing and programming (success-
ful/error), and they can implement write protection mechanisms.
The informations about the memory (block size, density, . . . ) are contained in the Common
Flash-Memory Interface (CFI).
ONFI contains the specifications of NAND-flash interface.

4.10.7 Reliability
As time passes and the memory is used, Flash memories lose reliability. This is due to:

99
• Erasure causes large and asymmetric distribution of VT (endurance): VTerase and VTprogrammed
tend to become closer;

• Noise (injected by programming) on cells sharing the same BL and WL;

• Charge leakage in the floating gate oxide (data retention);

• VT reduction due to surface trapping induced by programming and erasing.

Erasing/programming can be performed only correctly a limited amount of times (about 104 −
105 times).
Checking the status register we know if the block is reliable: if programming/erasing fails the
block is marked as damaged (thus reducing the size of usable memory).

• Can we improve the lifetime of a flash memory?

– Wear levelling;

• Can we improve the reliability of a flash memory?

– Error Correcting Codes (ECC).

4.10.8 Wear levelling

Flash memories are organised in blocks, the wear levelling, using the information stored in a
table, chooses which block to write/erase considering how many times these operations have
been performed on the memory.
Wear Levelling can be:

• Dynamic: the next block to write is chosen among the ones with lower erase rate.
This type of levelling is used when only dynamic data (data that change frequently) are
involved. If we write/erase always the same data, a memory block will age faster and be
useless.

• Static: the static levelling is used both for static/dynamic data. We track the write/erasing
rate of all good blocks. In this case also static data (data that are almost never changed)
are moved to the ones with higher erase counts, to keep the age almost uniform.

EEC
The read operation in NAND-flash has that the read page cells have the gate at 0V and unread
page cells have the gate at 5V. Unread page cells undergo an electrical stress and programmed
cells become weak. We need Error Correcting Coded to improve reliability: the following ones
are taken from the family of Block Codes, they are used for reliability of memories, (another
family are the Convolution Codes, mainly used for data transmission)

• Hamming codes, correct only one error;

• BCH codes;

• LDPC codes, well performing with increasing size of data;

These codes add, basically, redundant bits to protect the information.

Definitions:

100
• k: number of information symbols;

• m: number of parity-check symbols;

• n=k+m: code length;

• t: error correcting capability;

• r=k/n: code rate.

The graph below shows for t = 1, 2 that the bit error rate decreases with respect to the
case in which t = 0, with no error correction.

Figure 4.89: EEC

Further improvements Cells can be erased (low Vt ) or programmed (high Vt ) so to obtain

Single Level Cell (SLC) Flash memory → 1 bit per cell.
Using different Vt values we can exploit Multiple Level Cell (MLC) Flash memory → many bits
per cell.
Today the standard is 2 bit per cell → 4 Vt values:

• Lower performance;

• Lower reliability;

• Lower price.

How to obtain 4 different Vt ? Erasing/programming is not a single operation but it is achieved

through pulses. A Programming algorithm is needed (e.g. Program and Verify) and Program
pulses are followed by a verify operation to check that the cells have reached the required Vt .

101
102
Chapter 5

Interfacing

5.1 Introduction
Let us consider a digital system with a transmitter (TX) and a receiver (RX). Since we are
dealing with a digital system, the TX transmits logic 0s and 1s. We need to define which
voltages and currents are needed to correctly communicate these values.

• VOH is the minimum output voltage to represent a high logic value;

• VOL is the maximum output voltage to represent a low logic value

Figure 5.1: Digital system transmitting a logic 1 (left) and a logic 0 (right)

We have some margins to represents VOH and VOL :

• VIH is the minimum input voltage to represent a high logic value;

• VIL is the maximum input voltage to represent a low logic value.

Usually, systems are made in a way to have VIH < VOH and VIL > VOL in order to have some
margin.

103
Figure 5.2: Voltage values and margins

With these conditions the system can properly work: we have introduced a noise margin
(NM):
• N MH = VOH − VIH > 0
• N ML = VIL − VOL > 0
The NM is static: it does not change with the transmission of different symbols (static com-
patibility check).

Let us consider the system from the current point of view. If the current is too high, there is a
voltage lower than VOH . This current depends on what is connected to the TX. To be sure to
have a correct communication, we have to check the NM, but also that the amount of current
required by the RX is compatible with the amount of current the TX is able to produce.
• IOH is the maximum output current for a high logic value;
• IOL is the maximum output current for a low logic value;
• IIH is the maximum input current for a high logic value;
• IIL is the maximum input current for a low logic value.

Figure 5.3: Current analysis

A TX can also be connected with several RXs. In this case, the TX can drive an output
current IOH , but each RX can sink a current IIH . So, if we want to connect more than one RX,
we have to check that the sum of all these currents is compatible with IOH .

Current → static fan-out

104
• nH = b IIOH
IH
c

• nL = b IIOL
IL
c

The minimum between these two parameters is the static fan-out of the system (maximum
number of RXs we can connect to one TX to be sure we do not violate the interfacing rules).

Which is the maximum speed at which the system can work correctly?
From the static compatibility we have that TX and RX can be connected together, but we
don’t have any informaton about the speed of communication.
We can derive a model where the TX is represented by its Thevenin equivalent model and
the RXs is modelled as capacitors (CMOS circuits). Depending on the number n of RXs, we
have many capacitor connected in parallel (they all have the same nodes: TX output wire and
GND). Even the wire has its own model which includes a capacitance CL . A more accurate
model includes even a line resistance RL (series with RT X ).

Figure 5.4: Equivalent model

The voltage generator at the TX is very specific: when the TX switches from 0 to 1, the
voltage goes from VOL to VOH as a step, vice-versa if it switches from 1 to 0.

Figure 5.5: L-H transition: at RX side we need at least VIH to understand there is a switch

105
Figure 5.6: H-L transition: at RX side we need at least VIL to understand there is a switch

Considering VIRX , we have:

t
VIRX = (v0 − v∞ )e− τ + v∞

where τ = (RT X + RL )(CL + nCRX ) is a time constant (we will neglect RL from now on).
L-H transition: (v0 = VOL ) v∞ = VOH ;
H-L transition: (v0 = VOH ) v∞ = VOL .

5.1.1 L-H transition

t
VIRX = (VOL − VOH )e− τ + VOH

VIRX must reach at least VIH in one period.

Figure 5.7: L-H transition

VIRX starts increasing exponentially from VOL and, after a time T, it reaches VIH .

T
VIRX (T ) = (VOL − VOH )e− τ + VOH ≥ VIH

VOL , VOH , VIH are given, τ is known (by some electrical parameters), so the only unknown is T:

−VOH
T ≥ τ ln( VVOL
IH −VOH
)

This formula provides the minimum period to have a correct transition from L to H.
If the system was symmetric, the dynamic analysis would have ended here, but this is not the
case.

106
5.1.2 H-L transition
t
VIRX = (VOH − VOL )e− τ + VOL
VIRX must reach at least VIL in one period.

Figure 5.8: H-L transition

T
VIRX (T ) = (VOH − VOL )e− τ + VOL ≤ VIL
−VOL
T ≥ τ ln( VVOH
IL −VOL
)
This is the minimum period to have a correct transition fro H to L.

5.1.3 Minimum period

The minimum period for correctly receiving is:
−VOH −VOL −VOH −VOL
Tmin = max{τ ln( VVOL
IH −VOH
), τ ln( VVOH
IL −VOL
)} = τ max{ln( VVOL
IH −VOH
), ln( VVOH
IL −VOL
)}

−VOH −VOL
Let M = max{ln( VVOL
IH −VOH
), ln( VVOH
IL −VOL
)},

so:
Tmin = RT X (CL + nCRX )M

M depends on VOH , VOL , VIH , VIL (so we know it from electrical characteristics), CL , CRX , RT X
values can be found in datasheets, therefore, Tmin depends on n!
This equation allows us to answer to:
• Given n, which is the maximum clock frequency we can work with?
• Given a clock frequency, which is the maximum value of n?
Let us now consider an example
N.B. the sign of the currents depends on the load connection: e.g. if the current goes out from
the TX (we send a 1) it is negative

107
This means that, even if from the static point of view, we could connect more than 300 RXs
to the same TX with no electrical problems, if we want the system to work at 10 MHz, we can
use at most 3 RXs, while at 100 MHz it won’t work.

5.2 Lumped model

The model we used for dynamic analysis is accurate when the circuit works at low frequencies
and when the wires are long. As soon as the frequency increases and the wires become long,
this model is no more accurate.
N.B. in digital systems signals are square waves, so there is not just one frequency (there are
high frequencies contributions to achieve fast transitions).
The critical point is not the first harmonic (the period of the signal) but higher frequencies due
to rising and falling edges. In other words, what is critical is not the speed of communication,

108
but how steep are the slopes of the digital signal. When the rising and falling times are
sufficiently small so the transitions are very fast, the period of the signal may be no more the
main contribution.
As thumb rule, an interconnection should be treated as a transmission line when its time delay
T is greater than 1/10th of the signal rise time tr .
How to compute the time delay ?
Signals in a PCB propagate at roughly half the light speed
v = c/2 = 150 ∗ 106 m/s = 150 ∗ 10−3 m/ns = 15cm/ns
Given the wire length we obtain the time delay.

Figure 5.9: Examples of time delays

If we have a rather large and complex PCB with lot of digital circuits working at high
frequencies, probably the drivers driving these signals could give very steep edges and the
problem is not the speed of the system itself but the steepness of the edges. If we have very
steep edges in digital systems, we have problems with long wires on PCB: the model described
so far is no more reliable, but we need a more refined one.

Figure 5.10: Examples of lumped model behaviour and limits

From the previous example, we can notice ow the steeper the slope, the more accentuated
is the ringing effect.

5.3 Transmission lines

When the propagation delay is comparable with one tenth of the rising time of the sig-
nal, lumped models are not enough accurate. The interconnection must be analysed as a
Transmission-Line (TL).

109
Figure 5.11: Transmission line model

The amount of resistance, inductance, parallel resistance and capacitance introduced by

the interconnections are related to parasitic parameters per length of interconnections. Their
values are obtained in function of dz (infinitesimal amount of length of interconnection).
In order to simplify the model we can neglect the resistive effects of per unit parameters.
Transmission-Line equations:

v(z, t) states that the voltage at any section at any time slot is given by the overposition of
two waves: v + ,the propagating one, and v − , which propagates in the opposite direction.

The type of signal we will analyse are square waves, not the sinusoidal waves of the usual
TX line theory.

110
Sending a signal
Consider the following system:

The voltage at z=0 is: v(t, 0) = e(t) − Rg i(t, 0).

Using TL equations it becomes v + (t) = e(t) − Rg Y0 v + (t): since we are at the beginning of the
time, the voltage has only the v + component.

This result shows how, at beginning of the time, the effect of the TL is equal to a load of
impedance Z0 , so there is a voltage division between Z0 and Rg .

Propagation

Receiving a signal
At the receiver side, if we model the RX as Rl , v(t, L) = Rl i(t, L)

111
With TL equations it becomes:

Possible scenarios:
• If Rl is equal to Z0 , gamma is 0 and there is no reflection;

• If Rl is an open circuit (Rl = ∞) → Γ= 1 → total reflection;

• If Rl is a short circuit (Rl = 0) → Γ= -1

Reflection

Depending on Rl and Z0 there can be a reflection. In this case, there is a back propagat-
ing wave.

The amplitudes of the front propagating and back waves are different, because the Γ modu-
lates the amplitude of the back propagating one, the shape is the same and also the propagating
parameters are the same.

At the receiver side we can have 3 situations:

Rl −Z0
Γ= Rl +Z0
112
• Rl > Z0 positive reflection;

• Rl = Z0 no reflection, matched load

• Rl < Z0 negative reflection

Figure 5.12: example of reflection with Γ < 0

5.3.1 Multiple reflection lattice case

After a certain time, the back propagating wave reaches the begin of the transmission line,
g −Z0
where it sees a reflection coefficient Γg (= R
Rg +Z0
) and it may be reflected again if the TX
impedance is not matched with the characteristic impedance of the line.
This phenomenon goes on for a certain amount of time and it can give problems to the electronic
system like providing voltages that are out of the range the system was designed for.

113
Figure 5.13: Multiple reflections

This phenomenon does not last to infinity because Γ decreases the effect of the reflection
since it is bound between +1 and -1.
If the signal transmitted is limited in time (like a pulse) this phenomenon is even less accen-
tuated. However, a case closer to real applications is the one of the transmission of the step
function.
When both Γg and Γl are positive, each reflection adds a positive value to the far-end signal:
staircase waveforms

5.3.2 Loading the line with a capacitor

Let Rg << Z0 (so that v + (t) ≈ e(t)),

114
We have a derivative equation in canonical form.

This result shows that the behaviour of the reflected waveform due to a capacitive load is
rather similar to the voltage of an RC circuit. The unit step function u(t) is used to remember
that this solution is valid for the positive axis.

5.4 Matching
We have just seen that:

• if the resistance at the driver side is equal to the characteristic impedance, the driver is
matched to the transmission line;

• if the resistance at the load side is equal to the characteristic impedance, the load is
matched to the transmission line.

Should resistances be always matched ?

115
5.4.1 Incident Wave Switching (IWS)
If the load is matched there is no reflection (parallel termination). The system can be sized
such that the incident wave is switching the receiver: without reflection the wave arrives with
low delay and the RX switches very fast.
In this case, we should not match the driver, since there is no need as there is no reflected
wave. Moreover, matching the driver means halving the amplitude of the incident wave, so it
is better having Rg < Z0 (such that the voltage divider is close to 1)

The voltage divider effect is very low (difference between VA and VB ). Since the load is
matched, VC is equal to VB except for a delay (tp , propagation delay) The problem is that the
presence of Rl implies current flowing so high power consumption.

5.4.2 Reflected Wave Switching (RWS)

With no load (open circuit) we can reduce the power consumption. However, if Rl = ∞ →
Γl = 1 → reflected wave.
In this case we have to match the driver to avoid other reflections.
If the driver resistance is not equal to the characteristic impedance, we add a resistance such
that Rg + Rs = Z0 (giving a voltage divider of 1/2).

116
At tp , V : B is equal to 1/2 VA and the RX would not be able to switch, however, since Γis
1, VC is the sum of two contributes equal to 1/2 VA . Therefore, almost instantaneously, after
tp the RX is switching. At 2 tp the voltage VB is stable.
This solution (series termination) reduces the power consumption issue of the previous one and
avoids the reflection problems.

Figure 5.14: Near end matching: Destination Open - Source matches line impedance (series
termination)

Figure 5.15: Mismatch at both ends: Γ=1 at far end (open) Γ=0.8 at near end

117
Figure 5.16: Mismatch at both ends: Γ=1 at far end (open) Γ=- 0.5 at near end

Figure 5.17: Capacitor at far end: remember that the effect of a capacitor as a load has the same
effect of a capacitor in a RC circuit (exponential behaviour that depends on RC). Matching at
the near end side makes the signal at the RX side almost ideal

118
Chapter 6

Serial Communications

6.1 Serial and parallel transmission

Data can be transferred in serial or parallel way:

• serial: bits are carried on the same wire at different times;

• parallel: bits are carried by different wires at the same time.

Figure 6.1: Above: serial transfer; Below: parallel transfer

Parallel transfer is very effective in terms of bandwidth: if TX and RX work with the same
clock frequency, we can transmit an N-bit word in just one clock cycle. On the other hand, in
serial communications, N bits are transferred from TX to RX in at least N clock cycle.

119
6.1.1 Parallel Connection

Figure 6.2: Parallel Connection

• N bits are moved in a single cycle → speed;

• Many drivers → High power consumption;

• Many single-ended signals → EMI/EMC problems;

• Data and timing (Clock) on separate wires → skew (especially at very high clock fre-
quency, see DDR memories).

6.1.2 Serial link

Figure 6.3: Serial link

• N bits require N transfer cycles → high LATENCY (but lower complexity);

• Few drivers, N cycles → similar power consumption of parallel link (because the the driver
is used for a very long time);

• Few (often differential) signals → low EMI/EMC;

• Data and Clock on the same wire → no skew (requires proper protocol).

120
6.2 Communication glossary

• Synchronous: same clock for all units;

• Mesochronous: same CK frequency, phase is unknown (unique oscillator, with skew).

Need to correct clock phase;

• Plesiocronous: similar CK frequency, phase changes (separate oscillators). Need to

rebuild the clock, or correct phase shift.

• Asynchronous: No clock, handshake.

6.2.1 Basic serial connection system

Figure 6.4: Basic serial connection system

Due to high speeds, TX lines may be involved.

When the LOAD signal is activated, the register loads N bits in parallel and, one clock cycle
after the other, the PISO gives bits to the driver. At the RX side, we need a register which
takes a bit after the other and gives back the N-bit data when it (the register) is full.
In this example we assumed that TX and RX are synchronized.

121
Figure 6.5: Serial transmission rate Bit Rate = 1/Tbit

Tbit (bit time) can be seen as the period of the clock over the transmission system. We need
to sample with correct set-up and hold time. Good solutions are having the transmitter and
the receiver working on different clock edges.

Example: we want to transmit 45h. First we need the PISO. We can start from LSB or
MSB. At the receiver side, if the clock signal has a phase shift of half a clock period, we are
able to sample correctly the bits at their half period. Therefore, we fill the SIPO register and,
after 8 clock cycle, it is completely loaded (with 45h) and the Ready signal goes to 1. The time
we have to wait from when the data is loaded at transmitter side and when it is read at the
receiver side is quite long and it is called latency.

Figure 6.6: Example: “01000101” (45H) transmission

122
6.3 Asynchronous and synchronous links
In serial asynchronous links, bits are organized in characters (8 bit of data) and the trans-
mission is discontinuous (we have to wait to check that we can start again). Bit synchronization
is, therefore, required on each character. Synchronization is usually achieved with some special
characters.
• PRO: the overhead is very low;
• CONS: we need a perfectly synchronized clock at RX and TX.

Figure 6.7: Asynchronous serial link

In serial synchronous links, bits organized in packets (frame, the size of the information
may be quite large.) and the transmission is continuous. The bit synchronization is continuous
but additional information for frame synchronization is needed.

Figure 6.8: Synchronous serial link

6.3.1 Link glossary

• Simplex link: one direction only;
• Half-duplex link: only one direction at a time;
• Duplex link: two directions at the same time.

6.3.2 Terminology
• Transmitter: device that sends data to the bus;
• Receiver:device that receives data from the bus. TX and RX are related with the
electrical level of the system;
• Master: device initiating a transfer, generates the clock and terminates a transfer (e.g.
processor);
• Slave: device addressed by the master (e.g. memory);
• Multi-master: more than one master can attempt to control the bus;
• Arbitration: procedure to ensure that only one master has control of the bus at any
instant;
• Synchronization: procedure to synchronize the clocks of two or more devices.

123
6.3.3 Serial asynchronous protocol
1. Resting line has a defined and steady state (usually High);

2. A character can be issued at any time;

3. Beginning of a character is signalled by a Low state (Start Bit);

4. CKRX clock is generated at the receiver and phase synchronized with the falling edge of
the Start bit. Bit synchronization is maintained for a limited time;

5. To guarantee sensing of the Start bit, there is at least one Stop bit among adjacent
characters.The Stop bit indicates that the data is ended and the line is brought back into
the idle state.

Figure 6.9: Serial asynchronous character

In fig.6.9 we can notice there is an overhead due to the difference between Character and Data.

Figure 6.10: Character sequence

124
Example: UART (Universal Asynchronous Receiver/Transmitter)

• TX side:

- Conversion with PISO register (a PISO is used to convert parallel data into a stream
of bits);
- Start and Stop bit insertion → Insertion of parity bit, if required.

• RX side:

- Bit and character synchronization (from Start Bit);

- Conversion with SIPO register;
- Verify character formatting (Stop Bit) → Verification of parity bit (if present).

In Fig.6.11 is listed the configuration for this example. The main clock at the RX side is 50
MHz (even if the communication is asynchronous, we need a clock to sample the data)

Figure 6.11: Example: UART circuit

The RX is a FF which is sampling the data-line according to a clock signal, then we need
logic to detect the Start bit and a CU to correctly fill the SIPO.
In the timing diagram we can notice how we are sampling the data line with Tck=20 ns
(corresponding to 50 MHz). With such high clock frequency we obtain more than one sample
per bit, (since Baud rate is 19k, Tbit is close to 52 us). Therefore we have lot of samples
corresponding to just one transmitted bit.

Synchronization and sampling

Let us focus now on the window containing the bit period. With the chosen configuration
(Tbit= 52 us, TCK=20 ns ) we obtain 2604 cycles (samples). Due to ISI, the bit transmitted
is not perfectly flat. The shape of the bit has, in practice, smooth slopes. Since the bit can
assume high or low value, by a superposition of the values assumed in a Tbit slot, we get a
picture which resembles an eye. The best sampling point, among the 2604, is in the middle of
the “eye”.

125
Figure 6.12: Eye diagram: the best sampling point is in the middle of the “Eye”

However, due to noise, one sample, even if in the middle of the eye, might not be the best
choice. Possible solution: 3 samples and a voter (to understand the correctness of the value).
In this way we trade reliability for complexity.
It is not mandatory to take 3 consecutive samples, since all of them can be affected by the
noise.

Figure 6.13: Multiple samples

We choose the resolution and get only the needed samples. Since there are a lot of samples
available, we can consider, as an example, one sample every 16 and then perform the voting
operation on the 3 samples in the interested region. It is a valid solution involving just one
counter, three register and a voter.

126
Figure 6.14: E.g. 16 samples (1 sample every 2604/16= 162 cycles)

6.3.4 Serial Synchronous Protocols

• Inter Integrated-circuit Connection (I2C):

- Introduced by Philips;
- Designed to connect few chips;
- Short distance;
- 2 wires (low cost).

• Serial Peripheral Interface (SPI):

- Introduced by Motorola;
- Same purpose as I2C;
- 3/4 wires (more complex but simpler to use).

I2C

I2C speed

• Standard mode: fM AX = 100 kHz.

• Fast mode: fM AX = 400 kHz.

• Fast mode plus: fM AX = 1 MHz.

We need a proper way to connect the devices to the bus. In order to avoid shorts, the I2C
protocol is connected in open drain. Each device is able to drive the line to the low logic, while
the logic 1 is imposed by the pull-up resistors. The open drain characteristic is a limiting factor
for the speed of the system.

127
Figure 6.15: Only two wires: Serial Data Line (SDA) and Serial Clock Line (SCL).

The speed is limited by:

• Falling edge → strength of CMOS driver;

• Rising edge → pull up resistors and max. line capacitance.

The sizing is a problem very similar to the interfacing one.

• Max line capacitance → 400 pF;

• Rp sizing:

- Static condition → when one device pulls down a line, the current must be no more
than the IOL of the device:
(VDD −VOL ) −VOL )
Rp
≤ IOL → Rp ≥ (VDDIOL = Rpmin
N.B. increasing Rpu we increase, in dynamic conditions, the time constant of the
system making it slower.
- Dynamic condition → maximum rising time:

The static condition gives the lower bound for Rpu, while the dynamic condition gives the
upper bound. We have to be sure that the maximum rising time (required to go from VIL to
VIH ) is compatible with our needs. Therefore, the question is which is the max Rp?

128
Let VIL = αVDD and VIH = βVDD , C is the maximum capacitance allowed (400pF) and R
is the pullup resistor.
t1
V (t1 ) = αVDD = VDD (1 − e− RC ) → t1 = RClog 1−α
1

t2
V (t2 ) = βVDD = VDD (1 − e− RC ) → t1 = RClog 1−β
1

tr = t2 − t1 = RClog[ (1−α)
(1−β)
] ≤ tM
r
AX

tM AX
R≤ r
(1−α) = RpM AX
Clog[ (1−β) ]

The max capacitance (CM AX ) limits the number of devices which can be connected. What
if C ≥ CM AX ? What the standard suggests are 4 possibilities:

• Reduced frequency;

• Higher drive outputs;

• Bus buffers;

• Switched pull-up circuits.

I2C transitions

• IDLE state: Serial Data Line (SDA) and Serial Clock Line (SCL) both at 1.

• Data transfer: SCL at 0 and SDA commutation.

In this way we can sample when SCL is 1 and get a proper value since SDA is kept stable.

Figure 6.16: I2C transitions - 1

Even if the I2C is a synchronous standard, Philips decided to use START and STOP bits
to have a better control of what is inside of the bus. Since our system is made of open drain
devices, so the rest line is at logic 1, we need a way to indicate the start of the transmission
data.

• Commands: SCL at 1 and SDA commutation.

- START: SDA transition 1 → 0;
- STOP: SDA transition 0 → 1.

129
Figure 6.17: I2C transitions - 2

Both start and stop conditions are generated by the bus master.
The bus is considered busy after a start condition, until a stop condition occurs.
Transfers are byte-oriented:
- Each transferred byte is followed by an acknowledge (the acknowledge is given by the slave
receiving the command);
- SDA is turned to 0 if the transfer is OK;
- SDA is left to 1 if the transfer if KO.

Figure 6.18: Acknowledge (N.B. the Master does not wait an infinite amount of time for the
aknowledge

I2C Addressing and Data transfer Bus master starts a transition by issuing a start
command. Then the master addresses a slave.
Addressing: 7-bit address + 1 bit for R/W (0 → write, 1 → read. The slave with the
corresponding address sends ACK on SDA.
data transfer
• Start Condition (Master)
• Slave address + R/W (Master)

130
• Acknowledges with ACK (Slave)

• All data bytes, each followed by ACK (Master/Slave)

• Stop Condition (Master)

Figure 6.19: I2C data transfer

Figure 6.20: Example: master writing to slave

Figure 6.21: Example: master reading from slave

131
I2C multimaster If the system has more than one master connected to the bus, it is impor-
tant to decide which one is taking control of the bus. This is based on a two step procedure:

1. Clock Synchronization:
The synchronization requires a common clock, so the SCL is used. One master drives
SCL low and it starts counting its low period; another one detects SCL low and starts
driving SCL low and counting its low period. When the master ends its low period it
releases SCL and checks the value of SCL. Two options are possible: SCL is high, so it
starts counting the high period or SCL is low and it waits for SCL to become high. When
SCL becomes high, it starts counting its high period. The first master finishing the high
period pulls SCL low.
The synchronization of the low period is defined by the slowest master and the high period
is defined by the fastest one.

Figure 6.22: Clock Synchronization: CLK1 is the first maser, CLK2 is the second master

In the case of equal masters, more master can get access concurrently, we need an arbi-
tration mechanism.

2. Arbitration
Two or more masters can start a transition concurrently. They drive SDA low while SCL
high. An arbiter decides which one completes the transition.
The procedure is bit by bit and each master reads SDA when SCL is high: if SDA matches
what was sent then the bit is ok; if SDA does not match what was sent then the arbitration
is lost and the SDA driver is turned off. Two or more masters with identical transmission
complete their transition.

132
Figure 6.23: Arbitration: the master understands if it is winning or losing the access to the
bus not just by the AKN from the slave but also by sampling the SDA.

I2C additional features

• Clock stretching: any slave can hold SCL low to slow the transition so to sample correctly
the line (the master understands we are not on time and waits before continuing the
transmission). In this way we can slow down the transmission without aborting it (thus
providing an AKN);

• 10 bit addressing (up to 1024 addresses), allowing more devices;

• High-Speed – up to 3.4 Mb/s.

• Ultra Fast mode – up to 5 Mb/s.

SPI
• Up to 10 MHz.

• One master, several slaves.

• No standard.

• Four pins interface:

- Clock (SCLK);
- Master Out Slave In (MOSI);
- Master In Slave Out (MISO);
- Slave Select (SS) / Chip Select (CS).

• Each device operates as a shift register.

133
SPI connections: single slave

Figure 6.24: Single slave

The Master drives SCLK, SS and MOSI, the Slave drives MISO.

SPI connections: multislave

1. Independent:

Figure 6.25: Multislave - independent

The Master selects one Slave via SSX. Unselected slaves must put MISO to high impedance.

2. Daisy Chain:

Figure 6.26: Multy slave - Daisy Chain

134
The Master has the MOSI connected to the first Slave and the MISO connected to the
last one. The Slaves in the middle are connected as a Chain.
PRO: just one SS, we don’t need high impedance and all MISO are standard signal (we
don’t have electrical problems;) CONS: less speed, less simple protocol (we select all the
slaves and one of them understands the call)

SPI modes Two configuration bits:

• Clock polarity (CPOL): defines clock idle and active values.

• Clock phase (CPHA): defines the edges for sampling and outputting.

Four possible modes:

• CPOL=0 clock idle at 0 and active at 1

CPHA=0 data sampled on the rising edge and output on the falling.
CPHA=1 data sampled on the falling edge and output on the rising.

• CPOL=1 clock idle at 1 and active at 0

CPHA=0 data sampled on the falling edge and output on the rising.
CPHA=1 data sampled on the rising edge and output on the falling.

Figure 6.27: SPI modes

The following two figures show an example of write and read operation to and from an SPI
memory.
– Command (instruction);
– Data (2 bytes).

135
Figure 6.28: SPI write

Figure 6.29: SPI read

136

Rm0394 Stm32l41xxx42xxx43xxx44xxx45xxx46xxx Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Rm0394 Stm32l41xxx42xxx43xxx44xxx45xxx46xxx Advanced Armbased 32bit Mcus Stmicroelectronics
1,628 pages
STM32F0xx Reference Manual
No ratings yet
STM32F0xx Reference Manual
1,008 pages
Reference Manual Dm00135183 Stm32f446xx Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Reference Manual Dm00135183 Stm32f446xx Advanced Armbased 32bit Mcus Stmicroelectronics
1,328 pages
nRF52840 PS v1.9
No ratings yet
nRF52840 PS v1.9
899 pages
nRF52840 PS v1.5
100% (2)
nRF52840 PS v1.5
631 pages
nRF52833 PS v1.7
No ratings yet
nRF52833 PS v1.7
885 pages
Embedded Systems - A Hardware Software Co Design Approach Unleash
100% (1)
Embedded Systems - A Hardware Software Co Design Approach Unleash
270 pages
Logic in Memory
No ratings yet
Logic in Memory
124 pages
STM32L4 Reference Manual
No ratings yet
STM32L4 Reference Manual
1,906 pages
Rm0364 Stm32f334xx Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Rm0364 Stm32f334xx Advanced Armbased 32bit Mcus Stmicroelectronics
1,124 pages
Rm0316 Stm32f303xbcde Stm32f303x68 Stm32f328x8 Stm32f358xc Stm32f398xe Advanced Armbased Mcus Stmicroelectronics
No ratings yet
Rm0316 Stm32f303xbcde Stm32f303x68 Stm32f328x8 Stm32f358xc Stm32f398xe Advanced Armbased Mcus Stmicroelectronics
1,148 pages
nRF52840 PS v1.0 PDF
No ratings yet
nRF52840 PS v1.0 PDF
551 pages
DM00091010
No ratings yet
DM00091010
775 pages
Rm0430 Stm32f413423 Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Rm0430 Stm32f413423 Advanced Armbased 32bit Mcus Stmicroelectronics
1,324 pages
Rm0394 Stm32l41xxx42xxx43xxx44xxx45xxx46xxx Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Rm0394 Stm32l41xxx42xxx43xxx44xxx45xxx46xxx Advanced Armbased 32bit Mcus Stmicroelectronics
1,600 pages
STM32F41120reference20manual 275109719
No ratings yet
STM32F41120reference20manual 275109719
837 pages
8BIT
No ratings yet
8BIT
121 pages
STM32F401CEU6
No ratings yet
STM32F401CEU6
852 pages
RM0351r6 - STM32L4xy PDF
No ratings yet
RM0351r6 - STM32L4xy PDF
1,881 pages
Rm0377 Ultralowpower Stm32l0x1 Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Rm0377 Ultralowpower Stm32l0x1 Advanced Armbased 32bit Mcus Stmicroelectronics
905 pages
nRF52840 PS v1.1
No ratings yet
nRF52840 PS v1.1
619 pages
RM0390
No ratings yet
RM0390
1,347 pages
En CD00240193
No ratings yet
En CD00240193
911 pages
rm0402 stm32f412 Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
rm0402 stm32f412 Advanced Armbased 32bit Mcus Stmicroelectronics
1,163 pages
Ref - Manual - rm0367 Ultralowpower stm32l0x3 Advanced Armbased 32bit Mcus Stmicroelectronics
No ratings yet
Ref - Manual - rm0367 Ultralowpower stm32l0x3 Advanced Armbased 32bit Mcus Stmicroelectronics
1,043 pages
stm32f410 RM PDF
No ratings yet
stm32f410 RM PDF
771 pages
Reference Manual PDF
No ratings yet
Reference Manual PDF
844 pages
STM32F051R8T6
No ratings yet
STM32F051R8T6
914 pages
Manual Completo ARM STM32F746ZG
No ratings yet
Manual Completo ARM STM32F746ZG
1,671 pages
DM00124865 - RM0385 - Reference Manual
No ratings yet
DM00124865 - RM0385 - Reference Manual
1,657 pages
Henry Thesis PHD
No ratings yet
Henry Thesis PHD
275 pages
ST62T00CM6 TR
No ratings yet
ST62T00CM6 TR
100 pages
stm32f7 Reference Manual
No ratings yet
stm32f7 Reference Manual
1,709 pages
En DM00124865 PDF
No ratings yet
En DM00124865 PDF
1,724 pages
STM8S Family Datasheet
0% (1)
STM8S Family Datasheet
464 pages
RM0316 STM32F3xx RM PDF
No ratings yet
RM0316 STM32F3xx RM PDF
965 pages
ARM Manual
No ratings yet
ARM Manual
1,422 pages
Rm0090 Reference Manual: Stm32F40X, Stm32F41X, Stm32F42X, Stm32F43X Advanced Arm-Based 32-Bit Mcus
100% (1)
Rm0090 Reference Manual: Stm32F40X, Stm32F41X, Stm32F42X, Stm32F43X Advanced Arm-Based 32-Bit Mcus
1,416 pages
DM00119316-STM32F411 RM
No ratings yet
DM00119316-STM32F411 RM
837 pages
Datasheet XC888
No ratings yet
Datasheet XC888
144 pages
DDR2 and DDR3 SDRAM Interface Termination and Layout Guidelines
No ratings yet
DDR2 and DDR3 SDRAM Interface Termination and Layout Guidelines
148 pages
Datasheet - HK st52t400g3 202820
No ratings yet
Datasheet - HK st52t400g3 202820
94 pages
Cse Esd LN Ug24
No ratings yet
Cse Esd LN Ug24
238 pages
Andreic Master
No ratings yet
Andreic Master
95 pages
ECE VLSI Design Lecture Notes
No ratings yet
ECE VLSI Design Lecture Notes
147 pages
UE719
No ratings yet
UE719
48 pages
Nand 01 GW 3 B 2 N 6
No ratings yet
Nand 01 GW 3 B 2 N 6
64 pages
IntroCompOrg Preview
No ratings yet
IntroCompOrg Preview
274 pages
PPT
100% (1)
PPT
56 pages
Embedde, D Systems
No ratings yet
Embedde, D Systems
91 pages
Digital ASIC Manual
100% (1)
Digital ASIC Manual
162 pages
Arduino
No ratings yet
Arduino
60 pages
DDI0489D Cortex m7 TRM
100% (1)
DDI0489D Cortex m7 TRM
151 pages
Embedded Systems
No ratings yet
Embedded Systems
92 pages
Ect282 Microcontrollers Syllabus
0% (1)
Ect282 Microcontrollers Syllabus
7 pages
Introduction To Microcontrollers: Courses 182.064 & 182.074
No ratings yet
Introduction To Microcontrollers: Courses 182.064 & 182.074
6 pages
System Software Unit-II
90% (10)
System Software Unit-II
21 pages
datasheet2408TMPM462F15FG Datasheet en 20230731-1316682 PDF
No ratings yet
datasheet2408TMPM462F15FG Datasheet en 20230731-1316682 PDF
735 pages
Process Synchronization
No ratings yet
Process Synchronization
17 pages
7TH - Unit 2-21ec74h6 - Ca
No ratings yet
7TH - Unit 2-21ec74h6 - Ca
95 pages
JNTUH Usedpapers March 2022: (Common To CSE, CSBS, CSIT, CSE (SE), CSE (CS), CSE (AIML), CSE (DS), CSE (N) )
No ratings yet
JNTUH Usedpapers March 2022: (Common To CSE, CSBS, CSIT, CSE (SE), CSE (CS), CSE (AIML), CSE (DS), CSE (N) )
1 page
Benm 2123 Microprocessor Technology: Chapter 1: Introduction To Microprocessor
No ratings yet
Benm 2123 Microprocessor Technology: Chapter 1: Introduction To Microprocessor
24 pages
Examining Object Code Lab Exercise
No ratings yet
Examining Object Code Lab Exercise
5 pages
Computer Fundamentals Unit 1 & Unit 4
No ratings yet
Computer Fundamentals Unit 1 & Unit 4
98 pages
5 TMS320C5X Assembly Language
No ratings yet
5 TMS320C5X Assembly Language
51 pages
5 Marks Q. Describe Array Processor Architecture
No ratings yet
5 Marks Q. Describe Array Processor Architecture
11 pages
Syllabus
No ratings yet
Syllabus
1 page
Hacking Windows Ce
No ratings yet
Hacking Windows Ce
29 pages
1 CPE 413 Overview of x86 Architecture-1
No ratings yet
1 CPE 413 Overview of x86 Architecture-1
60 pages
Ec8552 - Cao MCQ
No ratings yet
Ec8552 - Cao MCQ
27 pages
CA EC208 Assignment1 Feb 2024
No ratings yet
CA EC208 Assignment1 Feb 2024
2 pages
MIC Report
No ratings yet
MIC Report
20 pages
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
No ratings yet
Computer Architecture Prof. Madhu Mutyam Department of Computer Science and Engineering Indian Institute of Technology, Madras
10 pages
Week 1
No ratings yet
Week 1
25 pages
17.40 Vector - RISCV 20190611 Vectors
No ratings yet
17.40 Vector - RISCV 20190611 Vectors
26 pages
Assembler Directives
No ratings yet
Assembler Directives
8 pages
CH01 COA9e Introduction
No ratings yet
CH01 COA9e Introduction
15 pages
EECS 2021 Sample Final Exam
No ratings yet
EECS 2021 Sample Final Exam
5 pages
COE301 Lab 3 IntegerArithmetic
No ratings yet
COE301 Lab 3 IntegerArithmetic
7 pages
COOS Questions
No ratings yet
COOS Questions
3 pages
Pre-Lab5 Digital Design
No ratings yet
Pre-Lab5 Digital Design
5 pages
Ca2324 hw3 Questions
No ratings yet
Ca2324 hw3 Questions
4 pages
Subject: Advanced Microprocessor 5th Sem. / Power Eltx. Section-D
No ratings yet
Subject: Advanced Microprocessor 5th Sem. / Power Eltx. Section-D
2 pages
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.