Coa Unit 2
Coa Unit 2
UNIT 2
SIGNED BINARY INTEGERS
Signed integers are numbers with a “+” or “-“ sign. If n bits are used to represent a signed
binary integer number, then out of n bits,1 bit will be used to represent a sign of the number
and rest (n - 1)bits will be utilized to represent magnitude part of the number itself.
A real-life example is the list of temperatures (correct to nearest digit) in various cities of the
world. Obviously they are signed integers like +34, -15, -23,and +17. These numbers along
with their sign have to be represented in a computer using only binary notation orbits.
There are various ways of representing signed numbers in a computer −
Sign and magnitude
One's complement
Two's complement
The simplest way of representing a signed number is the sign magnitude(SM) method.
Sign and magnitude − The sign-magnitude binary format is the simplest conceptual format.
In this method of representing signed numbers, the most significant digit (MSD) takes on
extra meaning.
If the MSD is a 0, we can evaluate the number just as we would any normal unsigned
integer. And also we shall treat the number as a positive one.
If the MSD is a 1, this indicates that the number is negative.
The other bits indicate the magnitude (absolute value) of the number. Some of signed decimal
numbers and their equivalent in SM notation follows assuming a word size of 4 bits.
+6 0110
-6 1110
+0 0000
-0 1000
+7 0111
Signed decimal sign-magnitude
-7 1111
Range
From the above table, it is obvious that if the word size is n bits, the range of numbers that
can be represented is from -(2n-1 -1) to +(2n-1 -1). A table of word size and the range of SM
numbers that can be represented is shown in the following.
4 -7 to +7
8 -127 to +127
16 -32767 to +32767
32 -2147483647 to +2147483647
Notice that the bit sequence 1101 corresponds to the unsigned number 13, as well as the
number –5 in SM notation. Its value depends only on the way the user or the programmer
interprets the bit sequence.
One's complement − This is one of the methods of representing signed integers in the
computer. In this method, the most significant digit (MSD) takes on extra meaning.
If the MSD is a 0, we can evaluate the number just as we would interpret any normal
unsigned integer.
If the MSD is a 1, this indicates that the number is negative.
The other bits indicate the magnitude (absolute value) of the number.
If the number is negative, then the other bits signify the 1's complement of the magnitude of
the number.
Some signed decimal numbers and their equivalent in 1's complement notations are shown
below, assuming a word size of 4 bits.
+6 0110
Signed decimal 1’s complement
-6 1001
+0 0000
-0 1111
+7 0111
-7 1000
Range
From the above table, it is obvious that if the word size is n bits, the range of numbers that
can be represented is from -(2n-1- 1) to+(2n-1 -1). A table of word size and the range of 1's
complement numbers that can be represented is shown.
4 -7 to +7
8 -127 to +127
16 -32767 to +32767
Digital Computers use Binary number system to represent all types of information inside the
computers. Alphanumeric characters are represented using binary bits (i.e., 0 and 1). Digital
representations are easier to design, storage is easy, accuracy and precision are greater.
There are various types of number representation techniques for digital number
representation, for example: Binary number system, octal number system, decimal number
system, and hexadecimal number system etc. But Binary number system is most relevant and
popular for representing numbers in digital computer system.
There are two major approaches to store real numbers (i.e., numbers with fractional
component) in modern computing. These are (i) Fixed Point Notation and (ii) Floating Point
Notation. In fixed point notation, there are a fixed number of digits after the decimal point,
whereas floating point number allows for a varying number of digits after the decimal point.
Fixed-Point Representation −
This representation has fixed number of bits for integer part and for fractional part. For
example, if given fixed-point representation is IIII.FFFF, then you can store minimum value
is 0000.0001 and maximum value is 9999.9999. There are three parts of a fixed-point
number representation: the sign field, integer field, and fractional field.
These are above smallest positive number and largest positive number which can be store in
32-bit representation as given above format. Therefore, the smallest positive number is 2-16 ≈
0.000015 approximate and the largest positive number is (215-1)+(1-2-16)=215(1-2-16)
=32768, and gap between these numbers is 2-16.
We can move the radix point either left or right with the help of only integer field is 1.
FLOATING-POINT REPRESENTATION −
This representation does not reserve a specific number of bits for the integer part or the
fractional part. Instead it reserves a certain number of bits for the number (called the
mantissa or significand) and a certain number of bits to say where within that number the
decimal place sits (called the exponent).
The floating number representation of a number has two part: the first part represents a
signed fixed point number called mantissa. The second part of designates the position of the
decimal (or binary) point and is called the exponent. The fixed point mantissa may be
fraction or an integer. Floating -point is always interpreted to represent a number in the
following form: Mxre.
Only the mantissa m and the exponent e are physically represented in the register (including
their sign). A floating-point binary number is represented in a similar manner except that is
uses base 2 for the exponent. A floating-point number is said to be normalized if the most
significant digit of the mantissa is 1.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number.
Note that signed integers and exponent are represented by either sign representation, or one’s
complement representation, or two’s complement representation.
The floating point representation is more flexible. Any non-zero number can be represented
in the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.
Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed
exponent, and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1
for a normalized number) and is referred to as a “hidden bit”.
Then −53.5 is normalized as -53.5=(-110101.1)2=(-1.101011)x25 , which is represented as
following below,
The precision of a floating-point format is the number of positions reserved for binary digits
plus one (for the hidden bit). In the examples considered here the precision is 23+1=24.
The gap between 1 and the next normalized floating-point number is known as machine
epsilon. the gap is (1+2-23)-1=2-23for above example, but this is same as the smallest positive
floating-point number because of non-uniform spacing unlike in the fixed-point scenario.
Note that non-terminating binary numbers can be represented in floating point
representation, e.g., 1/3 = (0.010101 ...)2 cannot be a floating-point number as its binary
representation is non-terminating.
IEEE FLOATING POINT NUMBER REPRESENTATION −
IEEE (Institute of Electrical and Electronics Engineers) has standardized Floating-Point
Representation as following diagram.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number. The sign bit is 0 for positive number and 1 for
negative number. Exponents are represented by or two’s complement representation.
According to IEEE 754 standard, the floating-point number is represented in following ways:
Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa
Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa
Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa
Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit mantissa
Special Value Representation −
There are some special values depended upon different values of the exponent and mantissa
in the IEEE 754 standard.
All the exponent bits 0 with all mantissa bits 0 represents 0. If sign bit is 0, then +0,
else -0.
All the exponent bits 1 with all mantissa bits 0 represents infinity. If sign bit is 0, then
+∞, else -∞.
All the exponent bits 0 and mantissa bits non-zero represents denormalized number.
All the exponent bits 1 and mantissa bits non-zero represents error.
Subtraction is similar to addition with some differences like we subtract mantissa unlike
addition and in sign bit we put the sign of greater number.
Let the two numbers be
x = 9.75
y = – 0.5625
Converting them into 32-bit floating point representation
9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
– 0.5625’s representation in 32-bit format = 1 01111110 00100000000000000000000
Now, we find the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10
Now, we shift the mantissa of lesser number right side by 4 units.
Mantissa of – 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, 0.00010010000000000000000
Mantissa of 9.75= 1. 00111000000000000000000
Subtracting mantissa of both
0. 00010010000000000000000
– 1. 00111000000000000000000
————————————————
1. 00100110000000000000000
Sign bit of bigger number = 0
So, finally the answer = x – y = 0 10000010 00100110000000000000000
Ripple Carry Adder is a combinational logic circuit. It is used for the purpose of
adding two n-bit binary numbers. It requires n full adders in its circuit for adding two n-bit
binary numbers. It is also known as n-bit parallel adder.
Carry Look Ahead Adder is an improved version of the ripple carry adder.
It generates the carry-in of each full adder simultaneously without causing any delay.
The time complexity of carry look ahead adder = Θ (logn).
Logic Diagram-
The logic diagram for carry look ahead adder is as shown below-
The carry-in of any stage full adder depends only on the following two parameters-
Bits being added in the previous stages
Carry-in provided in the beginning
Now,
The above two parameters are always known from the beginning.
So, the carry-in of any stage full adder can be evaluated at any instant of time.
Thus, any full adder need not wait until its carry-in is generated by its previous stage
full adder.
Consider two 4-bit binary numbers A3A2A1A0 and B3B2B1B0 are to be added.
Mathematically, the two numbers will be added as-
Now,
Clearly, C1, C2 and C3 are intermediate carry bits.
So, let’s remove C1, C2 and C3 from RHS of every equation.
Substituting (1) in (2), we get C2 in terms of C0.
Then, substituting (2) in (3), we get C3 in terms of C0 and so on.
These equations show that the carry-in of any stage full adder depends only on-
Bits being added in the previous stages
Carry bit which was provided in the beginning
Multiplication and Division are two other arithmetic operations frequently required
even in simple mathematics. CPUs have set of instructions for integer MULTIPLY and
DIVIDE operations. Internally these instructions are implemented as suitable algorithms in
hardware. Not only integer arithmetic but also Floating-Point and Decimal instruction sets are
also likely to have these MULTIPLY and DIVIDE sets of instructions in sophisticated CPUs.
Hardware implementation increases the efficiency of CPU.
Multiplication
Multiplicand M = 12 1100
Multiplier Q = 11 x1011
--------
1100
1100
0000
1100
--------
Product P = 132 10000100
As you see, we start with LSB of the Multiplier Q, multiply the Multiplicand, the partial
product is jotted down. Then we used the next higher digit of Q to multiply the multiplicand.
This time while jotting the partial product, we shift the jotting to the left corresponding to the
Q–digit position. This is repeated until all the digits of Q are used up and then we sum the
partial products. By multiplying 12x11, we got 132. You may realize that we used binary
values and the product also in binary. Binary multiplication was much simpler than decimal
multiplication. Essentially this is done by a sequence of shifting and addition of multiplicand
when the multiplier consists only of 1's and 0's. This is true and the same, in the case of
Binary multiplication. Binary multiplication is simple because the multiplier would be either
a 0 or 1 and hence the step would be equivalent to adding the multiplicand in proper shifted
position or adding 0's.
It is to be observed that when we multiplied two 4-bit binary numbers, the product obtained is
8-bits. Hence the product register (P) is double the size of the M and Q register. The sign of
the product is determined from the signs of multiplicand and multiplier. If they are alike, the
sign of the product is positive. If they are unlike, the sign of the product is negative.
Unsigned Multiplication
When multiplication is implemented in a digital computer, it is convenient to change
the process slightly. It is required to provide an adder for the summation of only two binary
numbers and successively accumulate the partial products in a register. The registers, Shift
Counter and the ALU width is decided by the word size of the CPU. For simplicity of
understanding, we will take 4-bit word length i.e the Multiplier (Q) and Multiplicand (M) are
both 4-bits sized. The logic is extrapolated to the word size requirement.
We need registers to store the Multiplicand (M) and Multiplier (Q) and each 4-bits.
However, we use 8-bit register which is standard and minimum and hence the register to
collect Product (P) is 16-bits. The Shift counter keeps track of the number of times the
addition is to be done, which is equal to the number of bits in Q. The shifting of the contents
of the registers is taken care of by shift register logic. The ALU takes care of addition and
hence partial product and product are obtained here and stored in P register. The control unit
controls the cycles for micro-steps. The product register holds the partial results. The final
result is also available in P when the shift counter reaches the threshold value.
Figure Data path for typical Multiplication
The flowchart for the unsigned multiplication is shown in figure and table 9.1
explains the work out with an example of 12 x 11 values. The flowchart is self-explanatory of
the unsigned multiplication algorithm. In an unsigned multiplication, the carry bit is used as
an extension of the P register. Since the Q value is a 4-bit number, the algorithm stops when
the shift counter reaches the value of 4. At this point, P holds the result of the multiplication.
Figure : Flowchart for Unsigned Multiplication algorithm
Signed Multiplication
Signed numbers are always better handled in 2's complement format. Further, the
earlier signed algorithm takes n steps for n digit number. The multiplication process although
implemented in hardware 1-step per digit is costly in terms of execution time. Booths
algorithm addresses both signed multiplication and efficiency of operation.
BOOTH'S ALGORITHM
Booth algorithm gives a procedure for multiplying binary integers in signed 2’s
complement representation in efficient way, i.e., less number of additions/subtractions
required. It operates on the fact that strings of 0’s in the multiplier require no addition but
just shifting and a string of 1’s in the multiplier from bit weight 2^k to weight 2^m can be
treated as 2^(k+1 ) to 2^m. As in all multiplication schemes, booth algorithm requires
examination of the multiplier bits and shifting of the partial product. Prior to the shifting,
the multiplicand may be added to the partial product, subtracted from the partial product, or
left unchanged according to following rules:
1. The multiplicand is subtracted from the partial product upon encountering the
first least significant 1 in a string of 1’s in the multiplier
2. The multiplicand is added to the partial product upon encountering the first 0
(provided that there was a previous ‘1’) in a string of 0’s in the multiplier.
3. The partial product does not change when the multiplier bit is identical to the
previous multiplier bit.
Hardware Implementation of Booths Algorithm – The hardware implementation of the
booth algorithm requires the register configuration shown in the figure below.
Booth’s Algorithm Flowchart –
We name the register as A, B and Q, AC, BR and QR respectively. Qn designates the least
significant bit of multiplier in the register QR. An extra flip-flop Qn+1is appended to QR to
facilitate a double inspection of the multiplier.The flowchart for the booth algorithm is
shown below.
OPERATION AC QR Qn+1 SC
0000 1001 0 4
AC + BR 1101 1100 1
OPERATION AC QR Qn+1 SC
Again, the building block is the multiplying adder (ma) as describe on the previous page.
However, the topology is so that the carry-out from one adder is not connected to the carry-in
of the next adder. Hence preventing a ripple carry. The circuit diagram below shows the
connections between these blocks.
A division algorithm provides a quotient and a remainder when we divide two number.
They are generally of two type slow algorithm and fast algorithm. Slow division
algorithm are restoring, non-restoring, non-performing restoring, SRT algorithm and under
fast comes Newton–Raphson and Goldschmidt.
In this article, will be performing restoring algorithm for unsigned integer. Restoring term
is due to fact that value of register A is restored after each iteration.
Here, register Q contain quotient and register A contain remainder. Here, n-bit dividend is
loaded in Q and divisor is loaded in M. Value of Register is initially kept 0 and this is the
register whose value is restored during iteration due to which it is named Restoring.
Let’s pick the step involved:
Step-1: First the registers are initialized with corresponding values (Q =
Dividend, M = Divisor, A = 0, n = number of bits in dividend)
Step-2: Then the content of register A and Q is shifted left as if they are a single
unit
Step-3: Then content of register M is subtracted from A and result is stored in A
Step-4: Then the most significant bit of the A is checked if it is 0 the least
significant bit of Q is set to 1 otherwise if it is 1 the least significant bit of Q is
set to 0 and value of register A is restored i.e the value of A before the
subtraction with M
Step-5: The value of counter n is decremented
Step-6: If the value of n becomes zero we get of the loop otherwise we repeat
from step 2
Step-7: Finally, the register Q contain the quotient and A contain remainder
Examples:
Perform Division Restoring Algorithm
Dividend = 11
Divisor = 3
n M A Q Operation
Remember to restore the value of A most significant bit of A is 1. As that register Q contain
the quotient, i.e. 3 and register A contain remainder 2.
Now, here perform Non-Restoring division, it is less complex than the restoring one
because simpler operation are involved i.e. addition and subtraction, also now restoring step
is performed. In the method, rely on the sign bit of the register which initially contain zero
named as A.
Here is the flow chart given below.
Let’s pick the step involved:
Step-1: First the registers are initialized with corresponding values (Q =
Dividend, M = Divisor, A = 0, n = number of bits in dividend)
Step-2: Check the sign bit of register A
Step-3: If it is 1 shift left content of AQ and perform A = A+M, otherwise shift
left AQ and perform A = A-M (means add 2’s complement of M to A and store
it to A)
Step-4: Again the sign bit of register A
Step-5: If sign bit is 1 Q[0] become 0 otherwise Q[0] become 1 (Q[0] means
least significant bit of register Q)
Step-6: Decrements value of N by 1
Step-7: If N is not equal to zero go to Step 2 otherwise go to next step
Step-8: If sign bit of A is 1 then perform A = A+M
Step-9: Register Q contain quotient and A contain remainder
Examples: Perform Non_Restoring Division for Unsigned Integer
Dividend =11
Divisor =3
-M =11101
N M A Q Action
Sign bit is the first bit of the binary representation. ‘1’ implies negative number
and ‘0’ implies positive number.
Example: 11000001110100000000000000000000 This is negative number.
Exponent is decided by the next 8 bits of binary representation. 127 is the
unique number for 32 bit floating point representation. It is known as bias. It is
determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
There are 3 exponent bits in 8-bit representation and 8 exponent bits in 32-bit
representation.
Thus
bias = 3 for 8 bit conversion (2 3-1 -1 = 4-1 = 3)
bias = 127 for 32 bit conversion. (2 8-1 -1 = 128-1 = 127)
Example: 01000001110100000000000000000000
10000011 = (131)10
131-127 = 4
Hence the exponent of 2 will be 4 i.e. 2 4 = 16.
Mantissa is calculated from the remaining 23 bits of the binary representation. It
consists of ‘1’ and a fractional part which is determined by:
Example:
01000001110100000000000000000000
The fractional part of mantissa is given by:
1*(1/2) + 0*(1/4) + 1*(1/8) + 0*(1/16) +……… = 0.625
Thus the mantissa will be 1 + 0.625 = 1.625
The decimal number hence given as: Sign*Exponent*Mantissa = (-
1)0*(16)*(1.625) = 26
2. To convert the decimal into floating point, we have 3 elements in a 32-bit floating
point representation:
i) Sign (MSB)
ii) Exponent (8 bits after MSB)
iii) Mantissa (Remaining 23 bits)
Sign bit is the first bit of the binary representation. ‘1’ implies negative number
and ‘0’ implies positive number.
Example: To convert -17 into 32-bit floating point representation Sign bit = 1
Exponent is decided by the nearest smaller or equal to 2 n number. For 17, 16 is
the nearest 2n. Hence the exponent of 2 will be 4 since 2 4 = 16. 127 is the unique
number for 32 bit floating point representation. It is known as bias. It is
determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
Thus bias = 127 for 32 bit. (28-1 -1 = 128-1 = 127)
Now, 127 + 4 = 131 i.e. 10000011 in binary representation.
Mantissa: 17 in binary = 10001.
Move the binary point so that there is only one bit from the left. Adjust the
exponent of 2 so that the value does not change. This is normalizing the number.
1.0001 x 24. Now, consider the fractional part and represented as 23 bits by
adding zeros.
00010000000000000000000
FLOATING POINT ARITHMETIC OPERATIONS:
Compared to a fixed point addition and subtraction, a floating point addition and subtraction
is more complex and hardware consuming. This is because exponent field is not present in
case of fixed point arithmetic.Thus floating point addition and subtraction is not as simple as
fixed point addition and subtraction.
The major steps for a floating point addition and subtraction are
Extract the sign of the result from the two sign bits.
Subtract the two exponents and . Find the absolute value of the exponent
difference ( ) and choose the exponent of the greater number.
Shift the mantissa of the lesser number by bits Considering the hidden bits.
Execute addition or subtraction operation between the shifted version of the mantissa
and the mantissa of the other number. Consider the hidden bits also.
Normalization for addition: In case of addition, if there is an carry generated then
the result right shifted by 1-bit. This shift operation is reflected on exponent
computation by an increment operation.
Normalization for subtraction: A normalization step is performed if there are
leading zeros in case of subtraction operation. Depending on the leading zero count
the obtained result is left shifted. Accordingly the exponent value is also decremented
by the number of bits equal to the number of leading zeros.
Example: Floating Point Addition
Representation: The input operands are represented
as and
Sign extraction: As both the numbers are positive then sign of the output will be
positive. Thus S = 0.
Exponent subtraction: and . Thus result of the subtraction
is E = 0001.
Shifting of mantissa of lesser number: The mantissa is
shifted by 1 bit right and the result is .
Result of the mantissa addition is 000010000000 and generates a carry. This means
the result is greater than 1.
The output of the adder is right shifted and the exponent value is incremented to get
the correct results. The new mantissa value is now 00001000000 choosing the last 11-
bits from the LSB and exponent is 1010.
The final result is 0_1010_00001000000 which is equivalent to 8.25 in decimal.
Example: Floating Point Subtraction
Representation: The input operands are represented
as and .
Sign extraction: As sign of is negative and is greater thus S = 1.
Exponent subtraction: and . Thus result of the subtraction
is .
Shifting of mantissa of lesser number: The mantissa is
shifted by 2 bit right and the result is .
Result of the mantissa subtraction is . This leading zero indicates
that the result is lesser than 1.
The output of the adder is left shifted by 1 bit as there is one leading zero and the
exponent value is decremented by 1-bit to get the correct results. The new mantissa
value is now choosing the last 11-bits from the LSB and exponent is
1001.
The final result is which is equivalent to -5.0625 in decimal.
A simple architecture of a floating point adder is shown below in Figure 1.
IEEE 754 FORMAT
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard
for floating-point computation which was established in 1985 by the Institute of Electrical
and Electronics Engineers (IEEE). The standard addressed many problems found in the
diverse floating point implementations that made them difficult to use reliably and reduced
their portability. IEEE Standard 754 floating point is the most common representation
today for real numbers on computers, including Intel-based PC’s, Macs, and most Unix
platforms.
There are several ways to represent floating point number but IEEE 754 is the most
efficient in most cases. IEEE 754 has 3 basic components:
1. The Sign of Mantissa –
This is as simple as the name. 0 represents a positive number while 1 represents a
negative number.
2. The Biased exponent –
The exponent field needs to represent both positive and negative exponents. A bias is
added to the actual exponent in order to get the stored exponent.
3. The Normalised Mantissa –
IEEE 754 numbers are divided into two based on the above three components:
single precision and double precision.
1. Single precision:
biased exponent 127+6=133
133 = 10000101
Normalised mantisa = 010101001
we will add 0's to complete the 23 bits
2. Double precision:
biased exponent 1023+6=1029
1029 = 10000000101
Normalised mantisa = 010101001
we will add 0's to complete the 52 bits
Binary Decimal
Special Operations –
Operation Result
n ÷ ±Infinity 0
±Infinity × ±Infinity ±Infinity
±nonZero ÷ ±0 ±Infinity
±finite × ±Infinity ±Infinity
Infinity + Infinity +Infinity
Infinity – -Infinity
-Infinity – Infinity – Infinity
-Infinity + – Infinity
±0 ÷ ±0 NaN
±Infinity ÷ ±Infinity NaN
±Infinity × 0 NaN
NaN == NaN False