Data Representation in Computer Systems
Data Representation in Computer Systems
2.1 Introduction
A bit is the most basic unit of information in a computer.
It is a state of on or off in a digital circuit. Sometimes they represent high or low voltage
A byte is a group of eight bits.. It is the smallest
2.1 Introduction
A word is a contiguous group of bytes.
Words can be any number of bits or bytes.
Word sizes of 16, 32, or 64 bits are most common.
= 16
+ 0
+ 1 = 25
When the radix of a number is something other than 10, the base is denoted by a subscript.
Sometimes, the subscript 10 is added for emphasis: 110012 = 2510
First we take the number that we wish to convert and divide it by the radix in which we want to express our result. In this case, 3 divides 190 63 times, with a remainder of 1. Record the quotient and the remainder.
7
Continue in this way until the quotient is zero. In the final calculation, we note that 3 divides 2 zero times with a remainder of 2. Our result, reading from bottom to top is:
19010 = 210013
10
You are finished when the product is zero, or until you have reached the desired number of binary places. Our result, reading from top to bottom is: 0.812510 = 0.11012 This method also works with any base. Just use the target radix as the multiplier.
12
For compactness and ease of reading, binary values are usually expressed using the hexadecimal, or base-16, numbering system.
13
Octal (base 8) values are derived from binary by using groups of three bits (8 = 23):
16
In an 8-bit word, signed magnitude representation places the absolute value of the number in the 7 bits to the right of the sign bit.
17
19
20
21
22
24
26
28
Example:
In 8-bit ones complement, positive 3 is: 00000011 Negative 3 in ones complement is: 11111100 Adding 1 gives us -3 in twos complement form: 11111101.
30
32
Using the same number of bits, unsigned integers can express twice as many values as signed numbers. Trouble arises if an unsigned value wraps around.
In four bits: 1111 + 1 = 0000.
In most cases, Booths algorithm carries out multiplication faster and more accurately than nave pencil-and-paper methods. The general idea is to replace arithmetic operations with bit shifting to the extent possible.
35
+ 0000
- 0011 + 0000 + 0011____ 00010010
36
(?)
37
Signed number overflow means nothing in the context of unsigned numbers, which set a carry flag instead of an overflow flag. If a carry out of the leftmost bit occurs with an unsigned number, overflow has occurred.
Carry and overflow occur independently of each other.
The table on the next slide summarizes these ideas.
38
39
40
41
42
The size of the exponent field, determines the range of values that can be represented.
The size of the significand determines the precision of the representation.
43
The IEEE-754 single precision floating point standard uses an 8-bit exponent and a 23-bit significand. The IEEE-754 double precision standard uses an 11-bit exponent and a 52-bit significand.
For illustrative purposes, we will use a 14-bit model with a 5-bit exponent and an 8-bit significand.
44
The significand of a floating-point number is always preceded by an implied binary point. Thus, the significand always contains a fractional binary value. The exponent indicates the power of 2 to which the significand is raised.
45
46
Not only do these synonymous representations waste space, but they can also cause confusion.
47
Another problem with our system is that we have made no allowances for negative exponents. We have no way to express 0.5 (=2 -1)! (Notice that there is no sign in the exponent field!)
All of these problems can be fixed with no changes to our basic model.
48
In our model, exponent values less than 16 are negative, representing fractional numbers.
50
51
52
53
The double precision standard has a bias of 1023 over its 11-bit exponent.
The special exponent value for a double precision number is 2047, instead of the 255 used by the single precision standard.
54
This is why programmers should avoid testing a floating-point value for equality to zero.
Negative zero does not equal positive zero.
55
56
57
58
61
62
63
There are other problems with floating point numbers. Because of truncated bits, you cannot always assume that a particular floating point operation is commutative or distributive.
66
Moreover, to test a floating point value for equality to some other number, first figure out how close one number can be to be considered equal. Call this value epsilon and use the statement:
if (abs(x) < epsilon) then ...
67
68
71
The Unicode codespace is divided into six parts. The first part is for Western alphabet codes, including English, Greek, and Russian.
72
Thus, error detection and correction is critical to accurate data transmission, storage and retrieval.
74
Longer data streams require more economical and sophisticated error detection mechanisms. Cyclic redundancy checking (CRC) codes provide error detection for large blocks of data.
75
You will fully understand why modulo 2 arithmetic is so handy after you study digital circuits in Chapter 3.
77
As with traditional division, we note that the dividend is divisible once by the divisor. We place the divisor under the dividend and perform modulo 2 subtraction.
78
Now we bring down the next bit of the dividend. We see that 00101 is not divisible by 1101. So we place a zero in the quotient.
79
80
82
Thus, to provide data integrity over the long term, error correcting codes are required.
84
Because the mathematics of Hamming codes is much simpler than Reed-Soloman, we discuss Hamming codes in detail.
85
The minimum Hamming distance for a code is the smallest Hamming distance between all pairs of words in the code.
86
87
Thus, a Hamming distance of 2k + 1 is required to be able to correct k errors in any data word.
Hamming distance is provided by adding a suitable number of parity bits to a data word.
88
89
(n + 1) 2 m 2 n
Because n = m + r, we can rewrite the inequality as:
(m + r + 1) 2 m 2 m + r or (m + r + 1) 2 r
This inequality gives us a lower limit on the number of check bits that we need in our code words.
90
1 (= 20) contributes to all of the odd-numbered digits. 2 (= 21) contributes to the digits, 2, 3, 6, 7, 10, and 11. . . . And so forth . . .
94
Bit 1checks the digits, 3, 5, 7, 9, and 11, so its value is 1. Bit 4 checks the digits, 5, 6, 7, and 12, so its value is 1. Bit 8 checks the digits, 9, 10, 11, and 12, so its value is also 1.
Using the Hamming algorithm, we can not only detect single bit errors in this code word, but also correct them!
96
Suppose an error occurs in bit 5, as shown above. Our parity bit values are:
Bit 1 checks digits, 3, 5, 7, 9, and 11. Its value is 1, but should be zero. Bit 2 checks digits 2, 3, 6, 7, 10, and 11. The zero is correct. Bit 4 checks digits, 5, 6, 7, and 12. Its value is 1, but should be zero. Bit 8 checks digits, 9, 10, 11, and 12. This bit is correct.
97
We have erroneous bits in positions 1 and 4. With two parity bits that dont check, we know that the error is in the data, and not in a parity bit. Which data bits are in error? We find out by adding the bit positions of the erroneous bits. Simply, 1 + 4 = 5. This tells us that the error is in bit 5. If we change bit 5 to a 1, all parity bits check and our data is restored.
98
Chapter 2 Conclusion
Computers store data in the form of bits, bytes, and words using the binary numbering system. Hexadecimal numbers are formed using four-bit groups called nibbles (or nybbles).
Signed integers can be stored in ones complement, twos complement, or signed magnitude representation.
Floating-point numbers are usually coded using the IEEE 754 floating-point standard.
99
Chapter 2 Conclusion
Floating-point operations are not necessarily commutative or distributive. Character data is stored using ASCII, EBCDIC, or Unicode.
Error detecting and correcting codes are necessary because we can expect no transmission or storage medium to be perfect.
CRC, Reed-Soloman, and Hamming codes are three important error control codes.
100
End of Chapter 2
101