0% found this document useful (0 votes)
17 views35 pages

Coa Unit 2

New file 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views35 pages

Coa Unit 2

New file 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

COA

UNIT 2
SIGNED BINARY INTEGERS

Signed integers are numbers with a “+” or “-“ sign. If n bits are used to represent a signed
binary integer number, then out of n bits,1 bit will be used to represent a sign of the number
and rest (n - 1)bits will be utilized to represent magnitude part of the number itself.
A real-life example is the list of temperatures (correct to nearest digit) in various cities of the
world. Obviously they are signed integers like +34, -15, -23,and +17. These numbers along
with their sign have to be represented in a computer using only binary notation orbits.
There are various ways of representing signed numbers in a computer −
 Sign and magnitude
 One's complement
 Two's complement
The simplest way of representing a signed number is the sign magnitude(SM) method.
Sign and magnitude − The sign-magnitude binary format is the simplest conceptual format.
In this method of representing signed numbers, the most significant digit (MSD) takes on
extra meaning.
 If the MSD is a 0, we can evaluate the number just as we would any normal unsigned
integer. And also we shall treat the number as a positive one.
 If the MSD is a 1, this indicates that the number is negative.
The other bits indicate the magnitude (absolute value) of the number. Some of signed decimal
numbers and their equivalent in SM notation follows assuming a word size of 4 bits.

Signed decimal sign-magnitude

+6 0110

-6 1110

+0 0000

-0 1000

+7 0111
Signed decimal sign-magnitude

-7 1111

Range
From the above table, it is obvious that if the word size is n bits, the range of numbers that
can be represented is from -(2n-1 -1) to +(2n-1 -1). A table of word size and the range of SM
numbers that can be represented is shown in the following.

Word size Range for SM numbers

4 -7 to +7

8 -127 to +127

16 -32767 to +32767

32 -2147483647 to +2147483647

Notice that the bit sequence 1101 corresponds to the unsigned number 13, as well as the
number –5 in SM notation. Its value depends only on the way the user or the programmer
interprets the bit sequence.
One's complement − This is one of the methods of representing signed integers in the
computer. In this method, the most significant digit (MSD) takes on extra meaning.
 If the MSD is a 0, we can evaluate the number just as we would interpret any normal
unsigned integer.
 If the MSD is a 1, this indicates that the number is negative.
The other bits indicate the magnitude (absolute value) of the number.
If the number is negative, then the other bits signify the 1's complement of the magnitude of
the number.
Some signed decimal numbers and their equivalent in 1's complement notations are shown
below, assuming a word size of 4 bits.

Signed decimal 1’s complement

+6 0110
Signed decimal 1’s complement

-6 1001

+0 0000

-0 1111

+7 0111

-7 1000

Range
From the above table, it is obvious that if the word size is n bits, the range of numbers that
can be represented is from -(2n-1- 1) to+(2n-1 -1). A table of word size and the range of 1's
complement numbers that can be represented is shown.

Word size Range for 1's complement numbers

4 -7 to +7

8 -127 to +127

16 -32767 to +32767

32 -2147483647 to +2147483647 ±2 × 10+9 (approx.)

FIXED POINT AND FLOATING POINT NUMBER REPRESENTATIONS

Digital Computers use Binary number system to represent all types of information inside the
computers. Alphanumeric characters are represented using binary bits (i.e., 0 and 1). Digital
representations are easier to design, storage is easy, accuracy and precision are greater.
There are various types of number representation techniques for digital number
representation, for example: Binary number system, octal number system, decimal number
system, and hexadecimal number system etc. But Binary number system is most relevant and
popular for representing numbers in digital computer system.

Storing Real Number

These are structures as following below −

There are two major approaches to store real numbers (i.e., numbers with fractional
component) in modern computing. These are (i) Fixed Point Notation and (ii) Floating Point
Notation. In fixed point notation, there are a fixed number of digits after the decimal point,
whereas floating point number allows for a varying number of digits after the decimal point.
Fixed-Point Representation −
This representation has fixed number of bits for integer part and for fractional part. For
example, if given fixed-point representation is IIII.FFFF, then you can store minimum value
is 0000.0001 and maximum value is 9999.9999. There are three parts of a fixed-point
number representation: the sign field, integer field, and fractional field.

We can represent these numbers using:

 Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.


 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.
(k-1)
 2’s complementation representation: range from -(2 ) to (2(k-1)-1), for k bits.
2’s complementation representation is preferred in computer system because of
unambiguous property and easier for arithmetic operations.
Example −Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for
the integer part and 16 bits for the fractional part.
Then, -43.625 is represented as following:

Where, 0 is used to represent + and 1 is used to represent. 000000000101011 is 15 bit binary


value for decimal 43 and 1010000000000000 is 16 bit binary value for fractional 0.625.
The advantage of using a fixed-point representation is performance and disadvantage is
relatively limited range of values that they can represent. So, it is usually inadequate for
numerical analysis as it does not allow enough numbers and accuracy. A number whose
representation exceeds 32 bits would have to be stored inexactly.

These are above smallest positive number and largest positive number which can be store in
32-bit representation as given above format. Therefore, the smallest positive number is 2-16 ≈
0.000015 approximate and the largest positive number is (215-1)+(1-2-16)=215(1-2-16)
=32768, and gap between these numbers is 2-16.
We can move the radix point either left or right with the help of only integer field is 1.
FLOATING-POINT REPRESENTATION −
This representation does not reserve a specific number of bits for the integer part or the
fractional part. Instead it reserves a certain number of bits for the number (called the
mantissa or significand) and a certain number of bits to say where within that number the
decimal place sits (called the exponent).
The floating number representation of a number has two part: the first part represents a
signed fixed point number called mantissa. The second part of designates the position of the
decimal (or binary) point and is called the exponent. The fixed point mantissa may be
fraction or an integer. Floating -point is always interpreted to represent a number in the
following form: Mxre.
Only the mantissa m and the exponent e are physically represented in the register (including
their sign). A floating-point binary number is represented in a similar manner except that is
uses base 2 for the exponent. A floating-point number is said to be normalized if the most
significant digit of the mantissa is 1.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number.
Note that signed integers and exponent are represented by either sign representation, or one’s
complement representation, or two’s complement representation.
The floating point representation is more flexible. Any non-zero number can be represented
in the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.
Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed
exponent, and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1
for a normalized number) and is referred to as a “hidden bit”.
Then −53.5 is normalized as -53.5=(-110101.1)2=(-1.101011)x25 , which is represented as
following below,

Where 00000101 is the 8-bit binary value of exponent value +5.


Note that 8-bit exponent field is used to store integer exponents -126 ≤ n ≤ 127.
The smallest normalized positive number that fits into 32 bits is
(1.00000000000000000000000)2x2-126=2-126≈1.18x10-38 , and largest normalized positive
number that fits into 32 bits is (1.11111111111111111111111)2x2127=(224-1)x2104 ≈
3.40x1038 . These numbers are represented as following below,

The precision of a floating-point format is the number of positions reserved for binary digits
plus one (for the hidden bit). In the examples considered here the precision is 23+1=24.
The gap between 1 and the next normalized floating-point number is known as machine
epsilon. the gap is (1+2-23)-1=2-23for above example, but this is same as the smallest positive
floating-point number because of non-uniform spacing unlike in the fixed-point scenario.
Note that non-terminating binary numbers can be represented in floating point
representation, e.g., 1/3 = (0.010101 ...)2 cannot be a floating-point number as its binary
representation is non-terminating.
IEEE FLOATING POINT NUMBER REPRESENTATION −
IEEE (Institute of Electrical and Electronics Engineers) has standardized Floating-Point
Representation as following diagram.

So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number. The sign bit is 0 for positive number and 1 for
negative number. Exponents are represented by or two’s complement representation.
According to IEEE 754 standard, the floating-point number is represented in following ways:

 Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa
 Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa
 Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa
 Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit mantissa
Special Value Representation −
There are some special values depended upon different values of the exponent and mantissa
in the IEEE 754 standard.

 All the exponent bits 0 with all mantissa bits 0 represents 0. If sign bit is 0, then +0,
else -0.
 All the exponent bits 1 with all mantissa bits 0 represents infinity. If sign bit is 0, then
+∞, else -∞.
 All the exponent bits 0 and mantissa bits non-zero represents denormalized number.
 All the exponent bits 1 and mantissa bits non-zero represents error.

FLOATING POINT ADDITION AND SUBTRACTION

FLOATING POINT ADDITION


To understand floating point addition, fi
rst we see addition of real numbers in decimal as same logic is applied in both cases.
For example, we have to add 1.1 * 103 and 50.
We cannot add these numbers directly. First, we need to align the exponent and then, we
can add significant.
After aligning exponent, we get 50 = 0.05 * 103
Now adding significant, 0.05 + 1.1 = 1.15
So, finally we get (1.1 * 103 + 50) = 1.15 * 103
Here, notice that we shifted 50 and made it 0.05 to add these numbers.
Now let us take example of floating point number addition
We follow these steps to add two numbers:
1. Align the significant
2. Add the significant
3. Normalize the result
Let the two numbers be
x = 9.75
y = 0.5625
Converting them into 32-bit floating point representation,
9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
0.5625’s representation in 32-bit format = 0 01111110 00100000000000000000000
Now we get the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10
Now, we shift the mantissa of lesser number right side by 4 units.
Mantissa of 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, we get 0.00010010000000000000000
Mantissa of 9.75 = 1. 00111000000000000000000
Adding mantissa of both
0. 00010010000000000000000
+ 1. 00111000000000000000000
————————————————-
1. 01001010000000000000000
In final answer, we take exponent of bigger number
So, final answer consist of :
Sign bit = 0
Exponent of bigger number = 10000010
Mantissa = 01001010000000000000000
32 bit representation of answer = x + y = 0 10000010 01001010000000000000000

FLOATING POINT SUBTRACTION

Subtraction is similar to addition with some differences like we subtract mantissa unlike
addition and in sign bit we put the sign of greater number.
Let the two numbers be
x = 9.75
y = – 0.5625
Converting them into 32-bit floating point representation
9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
– 0.5625’s representation in 32-bit format = 1 01111110 00100000000000000000000
Now, we find the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10
Now, we shift the mantissa of lesser number right side by 4 units.
Mantissa of – 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, 0.00010010000000000000000
Mantissa of 9.75= 1. 00111000000000000000000
Subtracting mantissa of both
0. 00010010000000000000000
– 1. 00111000000000000000000
————————————————
1. 00100110000000000000000
Sign bit of bigger number = 0
So, finally the answer = x – y = 0 10000010 00100110000000000000000

PERFORMANCE OF ADDITION AND SUBTRACTION WITH SIGNED


MAGNITUDE

A signed-magnitude method is used by computers to implement floating-point operations.


Signed-2’s complement method is used by most computers for arithmetic operations
executed on integers. In this approach, the leftmost bit in the number is used for signifying
the sign; 0 indicates a positive integer, and 1 indicates a negative integer. The remaining bits
in the number supported the magnitude of the number.
Example: -2410 is defined as −
10011000
In this example, the leftmost bit 1 defines negative, and the magnitude is 24.
The magnitude for both positive and negative values is the same, but they change only with
their signs.
The range of values for the sign and magnitude representation is from -127 to 127.
There are eight conditions to consider while adding or subtracting signed numbers. These
conditions are based on the operations implemented and the sign of the numbers.
The table displays the algorithm for addition and subtraction. The first column in the table
displays these conditions. The other columns of the table define the actual operations to be
implemented with the magnitude of numbers. The last column of the table is needed to avoid
a negative zero. This defines that when two same numbers are subtracted, the output must
not be - 0. It should consistently be +0.
In the table, the magnitude of the two numbers is defined by P and Q.
Addition and Subtraction of Signed Magnitude Numbers

Operations Addition of Magnitudes Subtraction of Magnitudes

(+P) + (+Q) +(P+Q) P>Q P<Q P=Q

(+P) + (-Q) +(P-Q) -(Q-P) +(P-Q)

(-P) + (+Q) -(P-Q) +(Q-P) +(P-Q)

(-P) + (-Q) -(P+Q)

(+P) - (+Q) +(P-Q) -(Q-P) +(P-Q)

(+P) - (-Q) +(P+Q)

(-P) - (+Q) -(P+Q)

(-P) - (-Q) -(P-Q) +(Q-P) +(P-Q)


As display in the table, the addition algorithm states that −
 When the signs of P and Q are equal, add the two magnitudes and connect the sign of
P to the output.
 When the signs of P and Q are different, compare the magnitudes and subtract the
smaller number from the greater number.
 The signs of the output have to be equal as P in case P > Q or the complement of the
sign of P in case P < Q.
 When the two magnitudes are equal, subtract Q from P and modify the sign of the
output to positive.
The subtraction algorithm states that −
 When the signs of P and Q are different, add the two magnitudes and connect the signs
of P to the output.
 When the signs of P and Q are the same, compare the magnitudes and subtract the
smaller number from the greater number.
 The signs of the output have to be equal as P in case P > Q or the complement of the
sign of P in case P < Q.
 When the two magnitudes are equal, subtract Q from P and modify the sign of the
output to positive.

RIPPLE CARRY ADDER

Ripple Carry Adder is a combinational logic circuit. It is used for the purpose of
adding two n-bit binary numbers. It requires n full adders in its circuit for adding two n-bit
binary numbers. It is also known as n-bit parallel adder.

Ripple Carry Adder-

In Ripple Carry Adder,


 Each full adder has to wait for its carry-in from its previous stage full adder.
 Thus, nth full adder has to wait until all (n-1) full adders have completed their
operations.
 This causes a delay and makes ripple carry adder extremely slow.
 The situation becomes worst when the value of n becomes very large.
 To overcome this disadvantage, Carry Look Ahead Adder comes into play.

CARRY LOOK AHEAD ADDER-

 Carry Look Ahead Adder is an improved version of the ripple carry adder.
 It generates the carry-in of each full adder simultaneously without causing any delay.
 The time complexity of carry look ahead adder = Θ (logn).

Logic Diagram-

The logic diagram for carry look ahead adder is as shown below-

Carry Look Ahead Adder Working-

The working of carry look ahead adder is based on the principle-


The carry-in of any stage full adder is independent of the carry bits generated
during intermediate stages.

The carry-in of any stage full adder depends only on the following two parameters-
 Bits being added in the previous stages
 Carry-in provided in the beginning

Now,
 The above two parameters are always known from the beginning.
 So, the carry-in of any stage full adder can be evaluated at any instant of time.
 Thus, any full adder need not wait until its carry-in is generated by its previous stage
full adder.

4-Bit Carry Look Ahead Adder-

Consider two 4-bit binary numbers A3A2A1A0 and B3B2B1B0 are to be added.
Mathematically, the two numbers will be added as-

From here, we have-


C1 = C0 (A0 ⊕ B0) + A0B0
C2 = C1 (A1 ⊕ B1) + A1B1
C3 = C2 (A2 ⊕ B2) + A2B2
C4 = C3 (A3 ⊕ B3) + A3B3

For simplicity, Let-


 Gi = AiBi where G is called carry generator
 Pi = Ai ⊕ Bi where P is called carry propagator

Then, re-writing the above equations, we have-


C1 = C0P0 + G0 ………….. (1)
C2 = C1P1 + G1 ………….. (2)
C3 = C2P2 + G2 ………….. (3)
C4 = C3P3 + G3 ………….. (4)

Now,
 Clearly, C1, C2 and C3 are intermediate carry bits.
 So, let’s remove C1, C2 and C3 from RHS of every equation.
 Substituting (1) in (2), we get C2 in terms of C0.
 Then, substituting (2) in (3), we get C3 in terms of C0 and so on.

Finally, we have the following equations-


 C1 = C0P0 + G0
 C2 = C0P0P1 + G0P1 + G1
 C3 = C0P0P1P2 + G0P1P2 + G1P2 + G2
 C4 =C0P0P1P2P3 + G0P1P2P3 + G1P2P3 + G2P3 + G3

These equations are important to remember.

These equations show that the carry-in of any stage full adder depends only on-
 Bits being added in the previous stages
 Carry bit which was provided in the beginning

FIXED POINT ARITHMETIC : MULTIPLICATION

Multiplication and Division are two other arithmetic operations frequently required
even in simple mathematics. CPUs have set of instructions for integer MULTIPLY and
DIVIDE operations. Internally these instructions are implemented as suitable algorithms in
hardware. Not only integer arithmetic but also Floating-Point and Decimal instruction sets are
also likely to have these MULTIPLY and DIVIDE sets of instructions in sophisticated CPUs.
Hardware implementation increases the efficiency of CPU.

Multiplication
Multiplicand M = 12 1100
Multiplier Q = 11 x1011
--------
1100
1100
0000
1100
--------
Product P = 132 10000100

As you see, we start with LSB of the Multiplier Q, multiply the Multiplicand, the partial
product is jotted down. Then we used the next higher digit of Q to multiply the multiplicand.
This time while jotting the partial product, we shift the jotting to the left corresponding to the
Q–digit position. This is repeated until all the digits of Q are used up and then we sum the
partial products. By multiplying 12x11, we got 132. You may realize that we used binary
values and the product also in binary. Binary multiplication was much simpler than decimal
multiplication. Essentially this is done by a sequence of shifting and addition of multiplicand
when the multiplier consists only of 1's and 0's. This is true and the same, in the case of
Binary multiplication. Binary multiplication is simple because the multiplier would be either
a 0 or 1 and hence the step would be equivalent to adding the multiplicand in proper shifted
position or adding 0's.
It is to be observed that when we multiplied two 4-bit binary numbers, the product obtained is
8-bits. Hence the product register (P) is double the size of the M and Q register. The sign of
the product is determined from the signs of multiplicand and multiplier. If they are alike, the
sign of the product is positive. If they are unlike, the sign of the product is negative.

Unsigned Multiplication
When multiplication is implemented in a digital computer, it is convenient to change
the process slightly. It is required to provide an adder for the summation of only two binary
numbers and successively accumulate the partial products in a register. The registers, Shift
Counter and the ALU width is decided by the word size of the CPU. For simplicity of
understanding, we will take 4-bit word length i.e the Multiplier (Q) and Multiplicand (M) are
both 4-bits sized. The logic is extrapolated to the word size requirement.
We need registers to store the Multiplicand (M) and Multiplier (Q) and each 4-bits.
However, we use 8-bit register which is standard and minimum and hence the register to
collect Product (P) is 16-bits. The Shift counter keeps track of the number of times the
addition is to be done, which is equal to the number of bits in Q. The shifting of the contents
of the registers is taken care of by shift register logic. The ALU takes care of addition and
hence partial product and product are obtained here and stored in P register. The control unit
controls the cycles for micro-steps. The product register holds the partial results. The final
result is also available in P when the shift counter reaches the threshold value.
Figure Data path for typical Multiplication

The flowchart for the unsigned multiplication is shown in figure and table 9.1
explains the work out with an example of 12 x 11 values. The flowchart is self-explanatory of
the unsigned multiplication algorithm. In an unsigned multiplication, the carry bit is used as
an extension of the P register. Since the Q value is a 4-bit number, the algorithm stops when
the shift counter reaches the value of 4. At this point, P holds the result of the multiplication.
Figure : Flowchart for Unsigned Multiplication algorithm

Table Workout for unsigned multiplication (12 x 11 = 132)


Shift
Counter
Operation Step Value Multiplicand M Multiplier Q Product P
Initial Values for 0 1100 1011 0000 0000
multiplication of 12x11
Q0 = 1, So, Left half of 0 1100 1011 1100 0000
P <- Left half of P + M
Shift Right P, Shift 0 1100 0101 0110 0000
Right Q
SC <- SC + 1 1 1100 0101 0110 0000
Q0 = 1, So, Left half of 1 1100 0101 10010 0000
P <- Left half of P + M
Shift Right P, Shift 1 1100 0010 1001 0000
Right Q
SC <- SC + 1 2 1100 0010 1001 0000
Q0 = 0, do nothing 2 1100 0010 1001 0000
Shift Right P, Shift 2 1100 0001 0100 1000
Right Q
SC <- SC + 1 3 1100 0001 0100 1000
Table Workout for unsigned multiplication (12 x 11 = 132)
Shift
Counter
Operation Step Value Multiplicand M Multiplier Q Product P
Q0 = 1, So, Left half of 3 1100 0001 10000 1000
P <- Left half of P + M
Shift Right P, Shift 3 1100 0000 1000 0100
Right Q
SC <- SC + 1 1100 0000 1000 0100
4

Signed Multiplication
Signed numbers are always better handled in 2's complement format. Further, the
earlier signed algorithm takes n steps for n digit number. The multiplication process although
implemented in hardware 1-step per digit is costly in terms of execution time. Booths
algorithm addresses both signed multiplication and efficiency of operation.

BOOTH'S ALGORITHM

Booth algorithm gives a procedure for multiplying binary integers in signed 2’s
complement representation in efficient way, i.e., less number of additions/subtractions
required. It operates on the fact that strings of 0’s in the multiplier require no addition but
just shifting and a string of 1’s in the multiplier from bit weight 2^k to weight 2^m can be
treated as 2^(k+1 ) to 2^m. As in all multiplication schemes, booth algorithm requires
examination of the multiplier bits and shifting of the partial product. Prior to the shifting,
the multiplicand may be added to the partial product, subtracted from the partial product, or
left unchanged according to following rules:
1. The multiplicand is subtracted from the partial product upon encountering the
first least significant 1 in a string of 1’s in the multiplier
2. The multiplicand is added to the partial product upon encountering the first 0
(provided that there was a previous ‘1’) in a string of 0’s in the multiplier.
3. The partial product does not change when the multiplier bit is identical to the
previous multiplier bit.
Hardware Implementation of Booths Algorithm – The hardware implementation of the
booth algorithm requires the register configuration shown in the figure below.
Booth’s Algorithm Flowchart –
We name the register as A, B and Q, AC, BR and QR respectively. Qn designates the least
significant bit of multiplier in the register QR. An extra flip-flop Qn+1is appended to QR to
facilitate a double inspection of the multiplier.The flowchart for the booth algorithm is
shown below.

Flow chart of Booth’s Algorithm.


AC and the appended bit Qn+1 are initially cleared to 0 and the sequence SC is set to a
number n equal to the number of bits in the multiplier. The two bits of the multiplier in Qn
and Qn+1are inspected. If the two bits are equal to 10, it means that the first 1 in a string
has been encountered. This requires subtraction of the multiplicand from the partial product
in AC. If the 2 bits are equal to 01, it means that the first 0 in a string of 0’s has been
encountered. This requires the addition of the multiplicand to the partial product in AC.
When the two bits are equal, the partial product does not change. An overflow cannot occur
because the addition and subtraction of the multiplicand follow each other. As a
consequence, the 2 numbers that are added always have a opposite signs, a condition that
excludes an overflow. The next step is to shift right the partial product and the multiplier
(including Qn+1). This is an arithmetic shift right (ashr) operation which AC and QR ti the
right and leaves the sign bit in AC unchanged. The sequence counter is decremented and
the computational loop is repeated n times. Product of negative numbers is important, while
multiplying negative numbers we need to find 2’s complement of the number to change its
sign, because it’s easier to add instead of performing binary subtraction. product of two
negative number is demonstrated below along with 2’s complement.
Example – A numerical example of booth’s algorithm is shown below for n = 4. It shows
the step by step multiplication of -5 and -7.
BR = -5 = 1011,
BR' = 0100, <-- 1's Complement (change the values 0 to 1 and 1 to 0)
BR'+1 = 0101 <-- 2's Complement (add 1 to the Binary value obtained after 1's
compliment)
QR = -7 = 1001 <-- 2's Complement of 0111 (7 = 0111 in Binary)
The explanation of first step is as follows: Qn+1
AC = 0000, QR = 1001, Qn+1 = 0, SC = 4
Qn Qn+1 = 10
So, we do AC + (BR)'+1, which gives AC = 0101
On right shifting AC and QR, we get
AC = 0010, QR = 1100 and Qn+1 = 1

OPERATION AC QR Qn+1 SC

0000 1001 0 4

AC + BR’ + 1 0101 1001 0

ASHR 0010 1100 1 3

AC + BR 1101 1100 1
OPERATION AC QR Qn+1 SC

ASHR 1110 1110 0 2

ASHR 1111 0111 0 1

AC + BR’ + 1 0010 0011 1 0

Product is calculated as follows:


Product = AC QR
Product = 0010 0011 = 35

CARRY SAVE MULTIPLIER


A carry save adder is typically used in a binary multiplier, since a binary
multiplier involves addition of more than two binary numbers after multiplication. A big
adder implemented using this technique will usually be much faster than conventional
addition of those numbers.

Description: A parallel multiplier for unsigned operands. It is composed of 2-input AND


gates for producing the partial products, a series of carry save adders for adding them and a
ripple-carry adder for producing the final product.
Supported Sizes: 2, 3, 4, ..., 255, 256
Supported Languages: VHDL, Verilog
Example: An 8-bit Carry Save Array Multiplier
An important advance in improving the speed of multipliers, pioneered by Wallace, is the use
of carry save adders (CSA). Even though the building block is still the multiplying adder
(ma), the topology of prevents a ripple carry by ensuring that, wherever possible, the carry-
out signal propagates downward and not sideways.

The illustration below gives an example of this multiplication process.

Inside a carry-save array multiplier

Again, the building block is the multiplying adder (ma) as describe on the previous page.
However, the topology is so that the carry-out from one adder is not connected to the carry-in
of the next adder. Hence preventing a ripple carry. The circuit diagram below shows the
connections between these blocks.

4-bit carry-save array multiplier


The observant reader might notice that ma0x can be replaced with simple AND
gates, ma4x can be replaced by adders. Also the block ma43 is not needed. More interesting,
the the ripple adder in the last row, can be replace with the faster carry look ahead adder.
Similar to the carry-propagate array multiplier, using Verilog HDL we can generate instances
of ma blocks based on the word length of the multiplicand and multiplier (N). To describe the
circuit in Verilog HDL, we need to derive the rules that govern the connections between the
blocks.
Start by numbering the output ports based on their location in the matrix. For this circuit, we
have the output signals sum (s) and carry-out (c). E.g. c_13 identifies the carry-out signal for
the block in row 1 and column 3. Next, we express the input signals as a function of the
output signal names s and c and do the same for the product itself as shown in the table
below.
Function for output signals
‘so’ and ‘co’ and output signals ‘x’, ‘y’, ‘si’, ‘ci’ and ‘p’
DIVISION RESTORING TECHNIQUES

A division algorithm provides a quotient and a remainder when we divide two number.
They are generally of two type slow algorithm and fast algorithm. Slow division
algorithm are restoring, non-restoring, non-performing restoring, SRT algorithm and under
fast comes Newton–Raphson and Goldschmidt.
In this article, will be performing restoring algorithm for unsigned integer. Restoring term
is due to fact that value of register A is restored after each iteration.

Here, register Q contain quotient and register A contain remainder. Here, n-bit dividend is
loaded in Q and divisor is loaded in M. Value of Register is initially kept 0 and this is the
register whose value is restored during iteration due to which it is named Restoring.
Let’s pick the step involved:
 Step-1: First the registers are initialized with corresponding values (Q =
Dividend, M = Divisor, A = 0, n = number of bits in dividend)
 Step-2: Then the content of register A and Q is shifted left as if they are a single
unit
 Step-3: Then content of register M is subtracted from A and result is stored in A
 Step-4: Then the most significant bit of the A is checked if it is 0 the least
significant bit of Q is set to 1 otherwise if it is 1 the least significant bit of Q is
set to 0 and value of register A is restored i.e the value of A before the
subtraction with M
 Step-5: The value of counter n is decremented
 Step-6: If the value of n becomes zero we get of the loop otherwise we repeat
from step 2
 Step-7: Finally, the register Q contain the quotient and A contain remainder
Examples:
Perform Division Restoring Algorithm
Dividend = 11
Divisor = 3

n M A Q Operation

4 00011 00000 1011 initialize

00011 00001 011_ shift left AQ

00011 11110 011_ A=A-M

00011 00001 0110 Q[0]=0 And restore A

3 00011 00010 110_ shift left AQ

00011 11111 110_ A=A-M

00011 00010 1100 Q[0]=0

2 00011 00101 100_ shift left AQ

00011 00010 100_ A=A-M

00011 00010 1001 Q[0]=1

1 00011 00101 001_ shift left AQ

00011 00010 001_ A=A-M

00011 00010 0011 Q[0]=1

Remember to restore the value of A most significant bit of A is 1. As that register Q contain
the quotient, i.e. 3 and register A contain remainder 2.

DIVISION NON-RESTORING TECHNIQUES

Now, here perform Non-Restoring division, it is less complex than the restoring one
because simpler operation are involved i.e. addition and subtraction, also now restoring step
is performed. In the method, rely on the sign bit of the register which initially contain zero
named as A.
Here is the flow chart given below.
Let’s pick the step involved:
 Step-1: First the registers are initialized with corresponding values (Q =
Dividend, M = Divisor, A = 0, n = number of bits in dividend)
 Step-2: Check the sign bit of register A
 Step-3: If it is 1 shift left content of AQ and perform A = A+M, otherwise shift
left AQ and perform A = A-M (means add 2’s complement of M to A and store
it to A)
 Step-4: Again the sign bit of register A
 Step-5: If sign bit is 1 Q[0] become 0 otherwise Q[0] become 1 (Q[0] means
least significant bit of register Q)
 Step-6: Decrements value of N by 1
 Step-7: If N is not equal to zero go to Step 2 otherwise go to next step
 Step-8: If sign bit of A is 1 then perform A = A+M
 Step-9: Register Q contain quotient and A contain remainder
Examples: Perform Non_Restoring Division for Unsigned Integer
Dividend =11
Divisor =3
-M =11101

N M A Q Action

4 00011 00000 1011 Start

00001 011_ Left shift AQ

11110 011_ A=A-M

3 11110 0110 Q[0]=0

11100 110_ Left shift AQ

11111 110_ A=A+M

2 11111 1100 Q[0]=0

11111 100_ Left Shift AQ

00010 100_ A=A+M

1 00010 1001 Q[0]=1

00101 001_ Left Shift AQ

00010 001_ A=A-M

0 00010 0011 Q[0]=1


Quotient = 3 (Q)
Remainder = 2 (A)

FLOATING POINT ARITHMETIC


1. To convert the floating point into decimal, we have 3 elements in a 32-bit floating
point representation:
i) Sign
ii) Exponent
iii) Mantissa

 Sign bit is the first bit of the binary representation. ‘1’ implies negative number
and ‘0’ implies positive number.
Example: 11000001110100000000000000000000 This is negative number.
 Exponent is decided by the next 8 bits of binary representation. 127 is the
unique number for 32 bit floating point representation. It is known as bias. It is
determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
There are 3 exponent bits in 8-bit representation and 8 exponent bits in 32-bit
representation.
Thus
bias = 3 for 8 bit conversion (2 3-1 -1 = 4-1 = 3)
bias = 127 for 32 bit conversion. (2 8-1 -1 = 128-1 = 127)
Example: 01000001110100000000000000000000
10000011 = (131)10
131-127 = 4
Hence the exponent of 2 will be 4 i.e. 2 4 = 16.
 Mantissa is calculated from the remaining 23 bits of the binary representation. It
consists of ‘1’ and a fractional part which is determined by:
Example:
01000001110100000000000000000000
The fractional part of mantissa is given by:
1*(1/2) + 0*(1/4) + 1*(1/8) + 0*(1/16) +……… = 0.625
Thus the mantissa will be 1 + 0.625 = 1.625
The decimal number hence given as: Sign*Exponent*Mantissa = (-
1)0*(16)*(1.625) = 26
2. To convert the decimal into floating point, we have 3 elements in a 32-bit floating
point representation:
i) Sign (MSB)
ii) Exponent (8 bits after MSB)
iii) Mantissa (Remaining 23 bits)

 Sign bit is the first bit of the binary representation. ‘1’ implies negative number
and ‘0’ implies positive number.
Example: To convert -17 into 32-bit floating point representation Sign bit = 1
 Exponent is decided by the nearest smaller or equal to 2 n number. For 17, 16 is
the nearest 2n. Hence the exponent of 2 will be 4 since 2 4 = 16. 127 is the unique
number for 32 bit floating point representation. It is known as bias. It is
determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
Thus bias = 127 for 32 bit. (28-1 -1 = 128-1 = 127)
Now, 127 + 4 = 131 i.e. 10000011 in binary representation.
 Mantissa: 17 in binary = 10001.
Move the binary point so that there is only one bit from the left. Adjust the
exponent of 2 so that the value does not change. This is normalizing the number.
1.0001 x 24. Now, consider the fractional part and represented as 23 bits by
adding zeros.
00010000000000000000000
FLOATING POINT ARITHMETIC OPERATIONS:

Arithmetic operations on floating point numbers consist of addition, subtraction,


multiplication and division. The operations are done with algorithms similar to those used
on sign magnitude integers (because of the similarity of representation) — example, only add
numbers of the same sign.

Floating Point Addition and Subtraction

Compared to a fixed point addition and subtraction, a floating point addition and subtraction
is more complex and hardware consuming. This is because exponent field is not present in
case of fixed point arithmetic.Thus floating point addition and subtraction is not as simple as
fixed point addition and subtraction.
The major steps for a floating point addition and subtraction are
 Extract the sign of the result from the two sign bits.
 Subtract the two exponents and . Find the absolute value of the exponent
difference ( ) and choose the exponent of the greater number.
 Shift the mantissa of the lesser number by bits Considering the hidden bits.
 Execute addition or subtraction operation between the shifted version of the mantissa
and the mantissa of the other number. Consider the hidden bits also.
 Normalization for addition: In case of addition, if there is an carry generated then
the result right shifted by 1-bit. This shift operation is reflected on exponent
computation by an increment operation.
 Normalization for subtraction: A normalization step is performed if there are
leading zeros in case of subtraction operation. Depending on the leading zero count
the obtained result is left shifted. Accordingly the exponent value is also decremented
by the number of bits equal to the number of leading zeros.
Example: Floating Point Addition
 Representation: The input operands are represented
as and
 Sign extraction: As both the numbers are positive then sign of the output will be
positive. Thus S = 0.
 Exponent subtraction: and . Thus result of the subtraction
is E = 0001.
 Shifting of mantissa of lesser number: The mantissa is
shifted by 1 bit right and the result is .
 Result of the mantissa addition is 000010000000 and generates a carry. This means
the result is greater than 1.
 The output of the adder is right shifted and the exponent value is incremented to get
the correct results. The new mantissa value is now 00001000000 choosing the last 11-
bits from the LSB and exponent is 1010.
 The final result is 0_1010_00001000000 which is equivalent to 8.25 in decimal.
Example: Floating Point Subtraction
 Representation: The input operands are represented
as and .
 Sign extraction: As sign of is negative and is greater thus S = 1.
 Exponent subtraction: and . Thus result of the subtraction
is .
 Shifting of mantissa of lesser number: The mantissa is
shifted by 2 bit right and the result is .
 Result of the mantissa subtraction is . This leading zero indicates
that the result is lesser than 1.
 The output of the adder is left shifted by 1 bit as there is one leading zero and the
exponent value is decremented by 1-bit to get the correct results. The new mantissa
value is now choosing the last 11-bits from the LSB and exponent is
1001.
 The final result is which is equivalent to -5.0625 in decimal.
A simple architecture of a floating point adder is shown below in Figure 1.
IEEE 754 FORMAT

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard
for floating-point computation which was established in 1985 by the Institute of Electrical
and Electronics Engineers (IEEE). The standard addressed many problems found in the
diverse floating point implementations that made them difficult to use reliably and reduced
their portability. IEEE Standard 754 floating point is the most common representation
today for real numbers on computers, including Intel-based PC’s, Macs, and most Unix
platforms.
There are several ways to represent floating point number but IEEE 754 is the most
efficient in most cases. IEEE 754 has 3 basic components:
1. The Sign of Mantissa –
This is as simple as the name. 0 represents a positive number while 1 represents a
negative number.
2. The Biased exponent –
The exponent field needs to represent both positive and negative exponents. A bias is
added to the actual exponent in order to get the stored exponent.
3. The Normalised Mantissa –

The mantissa is part of a number in scientific notation or a floating-point number,


consisting of its significant digits. Here we have only 2 digits, i.e. O and 1. So a
normalised mantissa is one with only one 1 to the left of the decimal.

IEEE 754 numbers are divided into two based on the above three components:
single precision and double precision.

TYPES SIGN BIASED NORMALISED BIAS


EXPONENT MANTISA
Single 1(31st 8(30-23) 23(22-0) 127
precision bit)
Double 1(63rd 11(62-52) 52(51-0) 1023
precision bit)
Example –
85.125
85 = 1010101
0.125 = 001
85.125 = 1010101.001
=1.010101001 x 2^6
sign = 0

1. Single precision:
biased exponent 127+6=133
133 = 10000101
Normalised mantisa = 010101001
we will add 0's to complete the 23 bits

The IEEE 754 Single precision is:


= 0 10000101 01010100100000000000000
This can be written in hexadecimal form 42AA4000

2. Double precision:
biased exponent 1023+6=1029
1029 = 10000000101
Normalised mantisa = 010101001
we will add 0's to complete the 52 bits

The IEEE 754 Double precision is:


= 0 10000000101 0101010010000000000000000000000000000000000000000000
This can be written in hexadecimal form 4055480000000000
Special Values: IEEE has reserved some values that can ambiguity.
 Zero –
Zero is a special value denoted with an exponent and mantissa of 0. -0 and +0
are distinct values, though they both are equal.
 Denormalised –
If the exponent is all zeros, but the mantissa is not then the value is a
denormalized number. This means this number does not have an assumed
leading one before the binary point.
 Infinity –
The values +infinity and -infinity are denoted with an exponent of all ones and a
mantissa of all zeros. The sign bit distinguishes between negative infinity and
positive infinity. Operations with infinite values are well defined in IEEE.
 Not A Number (NAN) –
The value NAN is used to represent a value that is an error. This is represented
when exponent field is all ones with a zero sign bit or a mantissa that it not 1
followed by zeros. This is a special value that might be used to denote a variable
that doesn’t yet hold a value.
EXPONENT MANTISA VALUE
0 0 exact 0
255 0 Infinity
0 not 0 denormalised
255 not 0 Not a number (NAN)
Similar for Double precision (just replacing 255 by 2049), Ranges of Floating point
numbers:
Denormalized Normalized Approximate Decimal
Single ± 2-149 to (1 – 2- ± 2-126 to (2 – ± approximately 10-
23
Precision )×2-126 2-23)×2127 44.85
to approximately
38.53
10
-1074 - -1022
Double ±2 to (1 – 2 ± 2 to (2 – ± approximately 10-
52
Precision )×2-1022 2-52)×21023 323.3
to approximately
10308.3
The range of positive floating point numbers can be split into normalized numbers, and
denormalized numbers which use only a portion of the fractions’s precision. Since every
floating-point number has a corresponding, negated value, the ranges above are symmetric
around zero.
There are five distinct numerical ranges that single-precision floating-point numbers are not
able to represent with the scheme presented so far:
1. Negative numbers less than – (2 – 2-23) × 2127 (negative overflow)
2. Negative numbers greater than – 2-149 (negative underflow)
3. Zero
4. Positive numbers less than 2 -149 (positive underflow)
5. Positive numbers greater than (2 – 2-23) × 2127 (positive overflow)
Overflow generally means that values have grown too large to be represented. Underflow is
a less serious problem because is just denotes a loss of precision, which is guaranteed to be
closely approximated by zero.
Table of the total effective range of finite IEEE floating-point numbers is shown below:

Binary Decimal

Single ± (2 – 2-23) × 2127 approximately ± 1038.53

Double ± (2 – 2-52) × 21023 approximately ± 10308.25

Special Operations –
Operation Result
n ÷ ±Infinity 0
±Infinity × ±Infinity ±Infinity
±nonZero ÷ ±0 ±Infinity
±finite × ±Infinity ±Infinity
Infinity + Infinity +Infinity
Infinity – -Infinity
-Infinity – Infinity – Infinity
-Infinity + – Infinity
±0 ÷ ±0 NaN
±Infinity ÷ ±Infinity NaN
±Infinity × 0 NaN
NaN == NaN False

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy