Error Correction
Error Correction
Error Control
Used in communications links, error-correcting memories, magnetic
disks (RAID disk arrays), CDs, spacecraft, real-time video/audio (VIP, VoIP), tape backup
Error detection merely detect when an error has occurred Error correction detect and correct this error Correction is much harder!
RAID = Redundant Array of Inexpensive Disks VIP = Video over Internet Protocol VoIP = Voice over Internet Protocol DVD = Digital Video Disk ECM = Error Correcting Memory
Error Control
Error detection merely detect when an error has occurred Error correction detect and correct this error Correction is much harder! If we can detect an error in a data communications link, it may be pos ARQ is not suitable in some applications isochronous (real-time
trans- mission, eg speech/video) and storage applications.
Error Detection
Idea: from a received message, we can tell if part of the information has
been corrupted. Analogy 1: Each lighthouse on the coastline has a unique rotation speed, so ships can tell where they are. Lighthouse rotation speeds are allocated so that nearby ones have quite different speeds (in case of timing errors).
checking, since (for a given year) the day/month/date must be consistent with the calendar. Also can be error-correcting, if we assume the sender meant a nearby month.
Error Control
Basic approach (for storage or transmission)
1. Split data into chunks (8-bit bytes, 16-bit words, or longer data frames) 2. Append check bits to frame.
Check bits are redundant in that they do not convey new information In general, error correction requires a more sophisticated algorithm and
more check bits.
Error Control
Obviously want to:
Minimize the number of check bits, and Maximize probability of detecting an error if one occurs.
Need to create check bits in such a way that they are error-protected
themselves check the check bits. Note that most errors occur in bursts (not single-bit errors).
Error Rate
Need a high probability of detection of errors. Example: 6 An Ethernet data link transmits at 10Mbps = 10 10 bps. Suppose the 1 probability of a single error is 1 in 100 million, ie 8 . Would expect on 10 average
error/second
Error Tolerance
Error tolerance: file transfer vs analog coded data. Files must be bitexact. Analog data may accept some (small) error rate. affect subsequent bits if using compression.
Parity Bits
Parity: exclusive-or gates (modulo-2 arithmetic) XOR is a programmable inverter, ie A controls invert B or dont
invert B. (true).
AB , A B A B
0 0 1 1 0 1 0 1
+ A B AB
0 1 1 0
(1)
D3
b
D2
b
D1
b
D0
b
invert
even
odd
Parity can detect single-bit errors Parity cant correct the error, as there is no way of knowing which bit
was in error if two bits are flipped, error would be undetectable. Prob(error not detected) = 0.5. Poor, not much good for burst errors.
12
Row-Column Parity
Example using odd parity. Sending 16 data bits in a block, plus 4 + 4 = 8 check bits (HP=horizontal parity, VP=vertical parity). HP 1 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 1 0
VP If the bit shown in bold is flipped, the corresponding row & column parity bits will be wrong.
13
Row-Column Parity
Could correct single-bit errors by interpolating row/column. Since this bit is in error, we just need to invert it to restore that bit. What if several bits get corrupted (eg along the same row)? Could still
detect, but not correct. Efficiency of this scheme not good, as it has 8/16 or 50% overhead.
14
15
Error Distance
Say we want to send 00. Encode as 000 000 000. Suppose this
codeword gets corrupted to 000 000 010 due to a single-bit error.
To the receiver, the obvious assumption is that 000 000 010 should have been 000 000 000, which corresponds to 00 originally.
Therefore the error is both detected and corrected.
16
Error Distance
Suppose the bit pattern gets corrupted to 000 000 011 due to a 2-bit
error burst. Receiver knows an error has occurred, as this is an invalid codeword in our error-control coding system. This is incorrect!
However, the assumption is that this codeword should have been the smallest-distance, 000 000 111, that is the bits 01 in the first place.
17
Hamming Distance
This leads to the concept of error distance, commonly called the Hamming Distance. d (x, y) is the number of locations in which codewords x and y differ. Example:
x = 0 1 1 0 y = 0 1 1 1
1 1 0 1 0 1 1 1
Hence d (x, y) = 3 The minimum distance dmin of a codevector set is the smallest Hamming distance between any two codevectors in the codeset.
18
Hamming Distance
19
Error detecting/correcting capabilities of a coding system depend on the codes dmin . If two codewords are a distance d apart, it will require d,
single-bit errors to convert one into another (and thus have false decoding).
To detect d errors, a distance d + 1 code is required, because d singlebit errors cannot change a valid codeword into another valid codeword (only into an erroneous codeword, which will be picked up). closest original codeword can still be deduced.
To correct d errors, a distance 2d + 1 code is required, because all legal codewords are now spaced so that even if d changes occur, the
Put another way, a coding system with a minimum Hamming distance dmin can correct up to d errors, provided
dmin 2d + 1 1 d (dmin 1) 2
(2) (3)
in the repetition code example, had dmin = 3, the minimum number of bit positions by which codewords differed (3 = 2d + 1). So we can correct up to 1 (dmin 1) = 1 (3 1) = 1 error, but can detect 2 errors (3 = d + 1).2 2
Check Bits
For an ideal single-bit error-correcting code, how many check bits do we need? Using code example from previously to demonstrate... Define m = The number of message or data bits to begin with (m = 2 in the example) n = The number of bits in total in the codeword (n = 9). c = The number of redundant (or check) bits. Hence n = c + m so c = n m = 9 2 = 7
Check Bits
Number of valid states is 2m (here 22 = 4) Number of possible codewords is 2n (here 29 = 512) Number of erroneous states is 2n 2m (29 22)
Substituting n = m + c (total codeword bits = message bits + check bits), the number of erroneous codewords is
Ne = 2m+c 2m m c m = 2 2 2 = 2m (2c 1)
Hamming Distance
There are Nv = 2m valid codewords. Ratio of erroneous codewords to
valid codewords is
Ne 2m (2c 1) = Nv 2m = 2c 1
(7) (8)
There are 2n possible codeword patterns (29 ). For single-bit errors, m 2 there are n 2 = 9 2 possible error patterns, because a single bit error is possible in any of the n (=9) bit positions.
m Each of m the 2 legal messages has n codewords at a distance 1 from it, ie n 2 So,
2m (2c 1) n 2m c 2 1 n 2c 1 m + c
Note the small increase in the number of check bits required as the data
bits go up.
Hamming Codes
We know the requirements on such a code (number of check bits
needed) so how to design this efficient code?
How to generate the code? ie does a coding system exist with this efficiency, or do we need more check bits than the theoretical minimum? (Recall earlier module on Entropy & Huffman codes). Called the Hamming code (Hamming, 1950). Incorporated into chips - see data sheets for Intel 8206 for example.
Hamming Codes
Hamming H (n, m) codes = total length of codevector n = number of data (message) bits m c = n m = redundant check bits It is:
Hamming Codes
c2
c1 c0
m3 m2 m1 c2 m0 c1 c0
Write below each bit position the indexes binary code: 7 MSB LSB 1 1 1 6 1 1 0 5 1 0 1 4 1 0 0 3 0 1 1 2 0 1 0 1 0 0 1
m3 m2 m1 c2 m0 c1 c0
Take each check equation in turn, and xor both sides to get the error syndrome bit s,
s0 = c0 c0 = c0 m3 m1 m0
(15) (16)
similarly for s1 and s2 . XORing a number with itself will always yield 0, so that each syndrome s0 = c0 c0 etc should always equal 0. If not, there is an error. Furthermore, the binary value of the syndrome points to the bit error position.
4 error
0 no error
34
c0 = 1 1 1 = 1 c1 = 0 c2 = 0
With no errors, the syndrome is
s0 = 1 1 1 1 = 0 s1 = 0 s2 = 0
ie no errors, all zero.
35
Suppose there is an error in bit m1 , and hence m1 gets flipped from 1 to 0. Repeating the calculations,
s0 = 1 1 0 1 = 1 s1 = 0 s2 = 1
Hence s2 s1 s0 is 101, or 5. Therefore position 5 is in error, which points to m1 . Since we are using the binary system, if we know which bit is incorrect, it is simply a matter of inverting it to correct it.
36
37
m 3 m 2 m 1 c2 m 0 c1 c0
and then concatenate several to form a block:
c2 (0) m0 (0) c1 (0) c0 (0) c2 (1) m0 (1) c1 (1) c0 (1) c2 (2) m0 (2) c1 (2) c0 (2) m3 (k 1) m2 (k 1) m1 (k 1) c2 (k 1) m0 (k 1) c1 (k 1) c0 (k 1) m3 (0) m3 (1) m3 (2) m2 (0) m2 (1) m2 (2) m1 (0) m1 (1) m1 (2)
That is, code each group of 4 bits using a standard (7,4) code. Repeat for the next 4 bits, etc until a block is formed (say, 256 4 bits) Then transmit the block vertically or column-wise.
38
An error burst has to last a block length (256 bits in the previous example) before it cannot be corrected.
Error-Detecting Codes
Have seen error correcting codes. What about error detecting codes where retransmission request is possible?
Generally can have more powerful error protection capability with far
fewer check bits.
Error-Detecting Codes
Checksums more suitable for software implementation (use only addition and shift). xor-ing)
Checksum
Idea: to add up the bytes (or words), each treated as an unsigned 8-bit
(or 16-bit) number.
Using byte calculations and 2k bytes, the largest number required is 28 1 = 255 for each byte, and approximately 2k 28 in total. For example, for 210 bytes the largest number is just less than 210 28 = 218 . 16-bit checksum used in TCP header, IP header. Easy for routers to
re-calculate (update) checksum as datagrams are forwarded.
42
Modulo-256, the carry bits 10 would be ignored. Modulo-255 is similar: the accumulator (running sum) must be large
enough to hold the sum, then divided by 255. This is equivalent to ones complement addition with end-around carry carry from the MSB out is added to the LSB (simple test in software).
43
Fletcher Checksum
position in the message. culation yields zero.
The modulo-255 sum of each message octet. The modulo-255 sum of each message octet weighted in reverse by its The checksum octets are modified so that the receivers checksum cal-
44
Fletcher Checksum
Checksums c0 and c1 from a length L message with bi value in each message octet is:
c0 = c1 =
L 1 X
bi
i=0 L 1 X i=0
(26) (27)
(L i) bi
The checksum octets are modified so that the receiver calculation yields zero.
45
Detects all single-bit errors. Detects all double-bit errors. Detects 99.999981% of all bursts not exceeding 16 bits. Detects 99.9985% of all longer bursts.
Set header checksum to 0 to calculate. Checking is done with the checksum in place, and should yield zero.
47
Computing the Internet Checksum, R. Braden, D. Borman, C. Partridge, Computer Communication Review, Vol 19, No 2, April 1989, pp 86-101
48
additions are done in native endian-ordering, the result must be stored in native endian-ordering. See RFC 1071 for implementation techniques See RFC1141 for incremental update techniques. see handout
49
51
The following introduces the basic concepts behind the CRC. 1. Define a generator (or generator polynomial) of N bits. 2. Treat the data frame to be transmitted as a number, and divide this num- ber by the generator. 3. Transmit the frame followed by the remainder. 4. At the receiver, perform the same calculation and check the remainder.
CRC Calculation
52
CRC Calculation
Although it seems that division is required (and hence floating-point
computations), the operation is carried out using modulo-2 arithmetic and the exclusive-or (XOR, written as ) digital logic function.
The check bits are appended to the message and the receiver simply
checks for a zero remainder.
53
CRC Calculation
Define a generator (polynomial) of N bits. Calculate the N 1 checkbits at the sender using N 1 zeros
appended. Divide modulo-2 the message by the generator. Ignore the quotient. Transmit the remainder immediately after the bits of the message (the augmented message). At receiver, divide the generator into the augmented message. Zero remainder indicates that no errors have occurred. Non-zero remainder indicates that one or more errors occurred and the message must be resent.
54
CRC Calculation
The computation may be understood by recalling the long-division manual calculation method. Suppose the sum is 3421 8. Recall how this would be laid out with a quotient above and a remainder at the end.
55
CRC Example
Generator = 1001 length N = 4
g(X ) = X 3 + 1
The sender calculates the N 1 CRC checkbits from the message with N 1 zeros appended.
56
CRC Example
1 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0
generator polynomial
0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1
1 1 0 1 1 0 0 1 1 0 0 0 1 0 0 1
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 1 0 0 1 0 0 1
CRC result
57
CRC Example
The receiver calculates the N 1 remainder bits from the message with the N 1 CRC checkbits appended. The remainder should be zero.
58
CRC Example
1 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0
generator polynomial
no errors
59
CRC Example
1 0 1 1 1 1 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1
generator polynomial
error
60
CRC Example
If the error burst is identical to the generator polynomial, we fail to detect the error!
61
CRC Example
1 0 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0
generator polynomial
no errors
62
1
CRC-Ethernet: X 32 + X 26 + X
23
+ X 22 + X 16 + 11 10 12 X + X X + + X 5+ X 2 + X + 1
63
CRC Performance
CRC-CCITT can catch:
all single-bit errors all double-bit errors all bursts of length 16 bits or less 99.997% of 17-bit error bursts 99.998% of 18-bit or longer bursts
64
Simon Haykin, Digital Communications Chapter 8, Error-Control Coding John Wiley & Sons, 1988 Andrew Tanenbaum, Computer Networks Prentice-Hall, 3rd ed, 1996 William Stallings, Data and Computer Communications Appendix 11A, The ISO Checksum MacMillan, 4th ed, 1991 RFC1071: ftp://ftp.rfc- editor.org/in- notes/rfc1071.txt
65
66