Digital Computers
Digital Computers
2. Signed Numbers:
Signed numbers contain sign flag, this representation distinguish positive and
negative numbers. This technique contains both sign bit and magnitude of a
number. For example, in representation of negative decimal numbers, we need to
put negative symbol in front of given decimal number.
Representation of Signed Binary Numbers:
There are three types of representations for signed binary numbers. Because of
extra signed bit, binary number zero has two representation, either positive (0) or
negative (1), so ambiguous representation. But 2’s complementation
representation is unambiguous representation because of there is no double
representation of number 0. These are: Sign-Magnitude form, 1’s complement
form, and 2’s complement form which are explained as following below.
Sign-Magnitude form:
For n bit binary number, 1 bit is reserved for sign symbol. If the value of sign bit
is 0, then the given number will be positive, else if the value of sign bit is 1, then
the given number will be negative. Remaining (n-1) bits represent magnitude of
the number. Since magnitude of number zero (0) is always 0, so there can be two
representation of number zero (0), positive (+0) and negative (-0), which depends
on value of sign bit. Hence these representations are ambiguous generally because
of two representation of number zero (0). Generally sign bit is a most significant
bit (MSB) of representation. The range of Sign-Magnitude form is from (2(n-1)-
1) to (2(n-1)-1).
For example, range of 6 bit Sign-Magnitude form binary number is from (25-
1) to (25-1) which is equal from minimum value -31 (i.e., 1 11111) to maximum
value +31 (i.e., 0 11111). And zero (0) has two representation, -0 (i.e., 1
00000) and +0 (i.e., 0 00000).
1’s complement form:
Since, 1’s complement of a number is obtained by inverting each bit of given
number. So, we represent positive numbers in binary form and negative numbers
in 1’s complement form. There is extra bit for sign representation. If value of sign
bit is 0, then number is positive and you can directly represent it in simple binary
form, but if value of sign bit 1, then number is negative and you have to take 1’s
complement of given binary number. You can get negative number by 1’s
complement of a positive number and positive number by using 1’s complement
of a negative number. Therefore, in this representation, zero (0) can have two
representation, that’s why 1’s complement form is also ambiguous form. The
range of 1’s complement form is from (2(n-1)-1) to (2(n-1)-1) .
For example, range of 6 bit 1’s complement form binary number is from (25-
1) to (25-1) which is equal from minimum value -31 (i.e., 1 00000) to maximum
value +31 (i.e., 0 11111). And zero (0) has two representation, -0 (i.e., 1
11111) and +0 (i.e., 0 00000).
2’s complement form:
Since, 2’s complement of a number is obtained by inverting each bit of given
number plus 1 to least significant bit (LSB). So, we represent positive numbers
in binary form and negative numbers in 2’s complement form. There is extra bit
for sign representation. If value of sign bit is 0, then number is positive and you
can directly represent it in simple binary form, but if value of sign bit 1, then
number is negative and you have to take 2’s complement of given binary number.
You can get negative number by 2’s complement of a positive number and
positive number by directly using simple binary representation. If value of most
significant bit (MSB) is 1, then take 2’s complement from, else not. Therefore, in
this representation, zero (0) has only one (unique) representation which is always
positive. The range of 2’s complement form is from (2(n-1)) to (2(n-1)-1).
For example, range of 6 bit 2’s complement form binary number is from (25) to
(25-1) which is equal from minimum value -32 (i.e., 1 00000) to maximum value
+31 (i.e., 0 11111). And zero (0) has two representation, -0 (i.e., 1 11111) and
+0 (i.e., 0 00000).
Fixed point arithmatic operation
•Addition
•Subtraction
•Multiplication
•Division
The binary numbers that are unsigned are continually considered as positive
integers and are defined as 0s in the MSB. The binary numbers that are registered
contrast for negative numbers and are defined as 1s in the MSB.
• In floating point representation, a number has two parts, first part is mantissa
or fraction and the second part is exponent.
Overflow
As for the integer data types, we might expect that:
assertTrue(Double.MAX_VALUE + 1 == Double.MIN_VALUE);
However, that is not the case for floating-point variables. The following is true:
assertTrue(Double.MAX_VALUE + 1 == Double.MAX_VALUE);
This is because a double value has only a limited number of significant bits. If
we increase the value of a large double value by only one, we do not change
any of the significant bits. Therefore, the value stays the same.
If we increase the value of our variable such that we increase one of the significant
bits of the variable, the variable will have the value INFINITY:
assertTrue(Double.MAX_VALUE * 2 == Double.POSITIVE_INFINITY);
and NEGATIVE_INFINITY for negative values:
assertTrue(Double.MAX_VALUE * -2 == Double.NEGATIVE_INFINITY);
We can see that, unlike for integers, there's no wraparound, but two different
possible outcomes of the overflow: the value stays the same, or we get one of
the special values, POSITIVE_INFINITY or NEGATIVE_INFINITY.
Underflow
There are two constants defined for the minimum values of
a double value: MIN_VALUE (4.9e-324)
and MIN_NORMAL (2.2250738585072014E-308).
IEEE Standard for Floating-Point Arithmetic (IEEE 754) explains the details for
the difference between those in more detail.
Let's focus on why we need a minimum value for floating-point numbers at all.
A double value cannot be arbitrarily small as we only have a limited number
of bits to represent the value.
The chapter about Types, Values, and Variables in the Java SE language
specification describes how floating-point types are represented. The minimum
exponent for the binary representation of a double is given as -1074. That means
the smallest positive value a double can have is Math.pow(2, -1074), which is
equal to 4.9e-324.
As a consequence, the precision of a double in Java does not support values
between 0 and 4.9e-324, or between -4.9e-324 and 0 for negative values.
So what happens if we attempt to assign a too-small value to a variable of
type double? Let's look at an example:
for(int i = 1073; i <= 1076; i++) {
System.out.println("2^" + i + " = " + Math.pow(2, -i));
}
With output:
2^1073 = 1.0E-323
2^1074 = 4.9E-324
2^1075 = 0.0
2^1076 = 0.0
We see that if we assign a value that's too small, we get an underflow, and the
resulting value is 0.0 (positive zero).
Similarly, for negative values, an underflow will result in a value of -0.0 (negative
zero).
if(result == Double.POSITIVE_INFINITY ) {
throw new ArithmeticException("Double overflow resulting in POSITIVE_INFINITY");
} else if(result == Double.NEGATIVE_INFINITY) {
throw new ArithmeticException("Double overflow resulting in NEGATIVE_INFINITY");
} else if(Double.compare(-0.0f, result) == 0) {
throw new ArithmeticException("Double overflow resulting in negative zero");
} else if(Double.compare(+0.0f, result) == 0) {
throw new ArithmeticException("Double overflow resulting in positive zero");
}
return result;
}