Intro To ICT 11
Intro To ICT 11
and Communication
The Digital Information Age:
An Introduction To Electrical Engineering
2nd Edition
3/31
11.2 Data Compression Textbook Sec. 9.2
“Since channel capacity or memory size is finite, data compression technique is very useful.”
4/31
11.2 Data Compression Textbook Sec. 9.2
❖ Source model
1) Source has a vocabulary consisting of m unique symbols (Xi)
𝑋𝑖 for 1 ≤ 𝑖 ≤ 𝑚
2) Source creates a data file containing a total
of nT symbols
▪ Each symbol comes from the vocabulary.
𝑃 𝑋𝑖 for 1 ≤ 𝑖 ≤ 𝑚, 𝑃 𝑋𝑖 = 1
𝑖=1
▪ If all symbols appear (be generated) equally, 𝑃 𝑋1 = 𝑃 𝑋2 = 𝑃 𝑋3 = ⋯ = 𝑃 𝑋𝑚 = 1/𝑚
5/31
11.2 Data Compression Textbook Sec. 9.2
6/31
11.2 Data Compression Textbook Sec. 9.2
𝐻𝑆 = − 𝑃 𝑋𝑖 log2 𝑃 𝑋𝑖 bits/symbol
𝑖=1
The average number of bits required to convert one of the symbols that make up a data file
into binary!
Average amount of information per symbol included in the data file generated by the source
7/31
11.2 Data Compression Textbook Sec. 9.2
⚫ File size: nT = 600 200 5 = 6 105 Total no. of symbols in the book
10/31
11.2 Data Compression Textbook Sec. 9.2
: the minimum no. of bits required to encode (convert to a binary file) a data file
11/31
11.2 Data Compression Textbook Sec. 9.2
3 bits Fixed-length
code words
Ex) binary representation of symbols
A : 000, B : 001 C : 010 D : 011 E : 100
▪ The data file consists of a total of 10 symbols. What is the size of the binary file?
→ 3 bits/symbols X 10 symbols = 30 bits
⚫ However, the effective file entropy calculated earlier was 21.2 bits.
12/31
11.2 Data Compression Textbook Sec. 9.2
⚫ Key idea!
▪ Represent frequently appearing symbols as binary number of short bit-length.
▪ Represent occasionally appearing symbols as binary number of long bit-length.
→ The average no. of bits in a code word becomes similar to source entropy or effective entropy.
❖ Huffman code
⚫ Implementing variable-length code words using code trees
▪ ○
A leaf in a code tree is a code word for a specific symbol.
• A:1
• B : 01
• C : 001
• D : 000
variable-length code words
code tree
13/31
11.2 Data Compression Textbook Sec. 9.2
⚫ Continue the above process until the final decoding result is obtained.
14/31
11.2 Data Compression Textbook Sec. 9.2
Step 2. Assign ‘0’ to the symbol at the bottom of the symbol list and ‘1’ to the symbol directly above
it. (The assigned 0 or 1 is called code bit, and becomes the lowest bit in the code word
expression of the symbol) first assignment
Step 3. Define a composite symbol by combining two symbols to which code bits have been assigned,
and set the result of summing two symbols probabilities as the probability of the new symbol.
(Ex) Assume that code bits are assigned to two symbols, X2 and X6.
Probability of the composite symbol: P[X2-X6] = P[X2] + P[X6]
Step 4. Rearrange all symbols including the composite symbol like Step 1 second ordering
Step 5. Repeat Step 2 to Step 4 until only two symbols (or composite symbols) remain in the symbol list
final ordering
Assign code bits to the last two remaining symbols. final assignment
15/31
11.2 Data Compression Textbook Sec. 9.2
16/31
11.2 Data Compression Textbook Sec. 9.2
17/31
11.2 Data Compression Textbook Sec. 9.2
▪ Code words
• sum=0: X1 → 00
• sum=1: X2 → 1
• sum=2: X3 → 01 Total 15 bits
18/31
Prob. 9.11 Data Compression
19/31
Prob. 9.11 Data Compression
B 010 1 0
C 00 DE B
D 0111 1 0
E 0110 D E
20/31
Prob. 9.11 Data Compression
[Solution] Now let’s convert data file into binary file! E 0110
A A A C A A A B A A A C A A A D A A A E
1 1 1 00 1 1 1 010 1 1 1 00 1 1 1 0111 1 1 1 0110
“As wireless wifi and remote data storage access become possible, data security
becomes very important!” -- Commercial cryptography is based on complex mathematical
algorithms, and we will only explore the basic concepts here.
22/31
11.3 Encryption Textbook Sec. 9.3
0000
⊕ 1110
-------
1110
23/31
11.3 Encryption Textbook Sec. 9.3
24/31
11.3 Encryption Textbook Sec. 9.3
❖ Good PRNG
1. Even if a deterministic formula is used to generate random numbers, the generation of random
numbers should not be predictable from previous random values.
2. Generated random numbers must be evenly distributed over the interval [0, nmax).
3. Random number generation must be repeatable. So that you can generate the same sequence of
random numbers regardless of time and space, if desired.
25/31
11.3 Encryption Textbook Sec. 9.3
[Note] If the generated random number Xi is equal to the previously generated random
number, then the PRNG repeats the same sequence of random numbers.
26/31
11.3 Encryption Textbook Sec. 9.3
.
.
.
27/31
11.3 Encryption Textbook Sec. 9.3
28/31
11.3 Encryption Textbook Sec. 9.3
X Y
Y X
29/31
11.3 Encryption Textbook Sec. 9.3
⚫ If x and y are 128-bit numbers, how long does it take to find a solution?
▪ Number range : 0 ~ 2128 = 0 ~ 2.56 1038
Attempts to find an encryption
▪ Total number of possible (x,y) pairs: (2.56 1038)2 = 6.55 1076 key are futile because it takes
▪ If one operation (checking) takes 1us, the total time required is too long and costs too much
3.3 1076 10-6 s = 3.3 1070 s = 1062 years money to obtain it.
31/31