0% found this document useful (0 votes)

14 views37 pages

File Organization Lec910

Uploaded by

Pc Pc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views37 pages

File Organization Lec910

Uploaded by

Pc Pc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

CSW241-File Organization and Processing

Organizing Files for Performance

Dr. Riham Moharam

Faculty of Information Technology & Computer Science
Sinai University
North Sinai, Egypt
Outline
➢ Introduction.
➢ Data Compression.
➢ Compression in Unix.
➢ Reclaiming Space in Files.
➢ Finding Things Quickly.

2
Introduction

➢ We will be looking at four different issues:

• Data Compression: how to make files smaller.
• Reclaiming space in files that have undergone deletions and updates.
• Sorting Files in order to support binary searching ==> Internal Sorting.
• A better Sorting Method: KeySorting.

3
Introduction

➢ Question: Why do we want to make files smaller?

Answer:
• To use less storage, i.e., saving costs.
• To transmit these files faster, decreasing access time or using the same access time, but
with a lower and cheaper bandwidth.

• To process the file sequentially faster.

4
Data Compression

➢ Data Compression:
• The encoding of data in such a way as to reduce its size.
• Data compression is the process of encoding, restructuring or otherwise modifying data
in order to reduce its size.

➢ Redundancy Compression:
• Any form of compression which removes only redundant information.

5
Data Compression

➢ Advantages:
• Smaller files use less storage space.
• The transfer time of disk access is reduced.
• The transmission time to transfer files over a network is reduced.

➢ Disadvantages:
• Program complexity and size are increased (Encoding/Decoding Module).
• Computation time is increased.
• Cost of Encoding/Decoding Time.

6
Data Compression
➢ Data compression is possible because most data contains redundant (repeated) data
or unnecessary information.

➢ Reversible compression removes only redundant information, making it possible to

restore the data to its original form.

➢ Irreversible compression goes further, removing information which is not actually

necessary, making it impossible to recover the original form.

7
Data Compression

Data Compression
Methods

Lossless Methods Lossy Methods

Text & Programs Images , Audio & Video

Run-Length Huffman Lempel Ziv JPEG MPEG MP3

8
Data Compression
➢ When the data is represented in a sparse array, we can use a type of compression
called: run-length encoding.

➢ Run-length encoding (RLE) is a lossless compression method where sequences

that display redundant data are stored as a single data value. This value represents
the repeated block, and shows how many times it appears in the image.

➢ The goal of RLE is to compress a string by replacing sequences of repeated

characters with a single character followed by the number of times it’s repeated.

9
Data Compression
➢ Example 1:
Input: “AAAABBBCCDAA”

Output: “4A3B2C1D2A”

➢ Example 2:
Input:
“WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWW
WWWWWWWWWWWWWWWB”

Output: “12WB12W3B24WB”

10
Data Compression

11
Data Compression
➢ Run-length encoding (RLE) works best with simple images and animations that
have a lot of redundant pixels. It is useful for black and white images in particular.

➢ For complex images and animations that do not have many redundant sections,
RLE can make the file size bigger rather than smaller. Therefore it is important to
understand the content and whether this algorithm will help or hinder compression.

12
Data Compression
➢ Huffman Coding is a variable length code that depends on the frequency of letter
occurring in data set.
➢ Example 1: suppose the content is:

➢ There are 10 characters

➢ Encoding Message:
• 1010010011011011001111100

13
Data Compression
➢ Huffman Tree (for easy encoding)
➢ Encoding Message:
• 1010010011011011001111100
➢ Intepret 0’s as “go left” and 1’s as
“go right”
➢ A codeword for a character
corresponding to the path from root
of the Huffman tree to the leaf
containing the character.

14
Data Compression
Example 2:

Initial string:
Size = 8*15=120

1. Calculate the frequency of each character in the string.

2. Sort the characters in increasing order of the frequency. These are stored in a
priority queue Q.

15
Data Compression
3. Make each unique character as a leaf node.
4. Create an empty node z. Assign the minimum frequency to the left child of z and
assign the second minimum frequency to the right child of z. Set the value of the z as
the sum of the above two minimum frequencies.

16
Data Compression
5. Remove these two minimum frequencies from Q and add the sum into the list of
frequencies.
6. Insert node z into the tree.
7. Repeat steps 3 to 5 for all the characters.

17
Data Compression
8. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.

18
Data Compression

Without encoding, the total size of the string was 120 bits. After encoding the
size is reduced to 32+9+28=69.

19
Data Compression
➢ The techniques we have discussed so far preserve all information in the original
data.

➢ Irreversible compression technique: Based on the assumption that some

information can be sacrificed. [Irreversible compression is also called Entropy
Reduction].

20
Constructing Huffman Codes
While there is more than one TREE in the FOREST

i= index of the TREE in FOREST with smallest weight;

j= index of the TREE in FOREST with 2nd smallest weight;

Create a new node with left child FOREST(i)--> root and right child FOREST(j)--> root

Replace TREE i in FOREST by a tree whose root is the new node and whose weight is
FOREST(i)--> weight + FOREST(j)--> weight

Delete TREE j from FOREST

➢ (A FOREST is a collection of TREES; each TREE has a root and a weight)

21
Reclaiming Space in Files
➢ Record Deletion and Storage Compaction.
• Recognizing Deleted Records
• Reusing the space from the record ==> Storage Compaction.
• Storage Compaction: After deleted records have accumulated for some time, a special
program is used to reconstruct the file with all the deleted approaches.

• Storage Compaction can be used with both fixed- and variable-length records.

22
Reclaiming Space in Files
➢ Deleting Fixed-Length Records for Reclaiming Space Dynamically:
➢ In some applications, it is necessary to reclaim space immediately.
➢ To do so, we can:
• Mark deleted records in some special ways
• Find the space that deleted records once occupied so that we can reuse that space
when we add records.

• Come up with a way to know immediately if there are empty slots in the file and
jump directly to them.

➢ Solution: Use an avail (List of Available Space) linked list in the form of a stack.
Relative Record Numbers (RRNs) play the role of pointers.
23
Reclaiming Space in Files
➢ Deleting Variable-Length Records for Reclaiming Space Dynamically:
• Same ideas as for Fixed-Length Records, but a different implementation must be
used.

• In particular, we must keep a byte count of each record and the links to the next
records on the avail list cannot be the RRNs.

24
Storage Fragmentation
➢ Wasted Space within a record is called internal fragmentation.
➢ Variable-Length records do not suffer from internal fragmentation. However,
external fragmentation is not avoided.

➢ 3 ways to deal with external fragmentation:

• Storage Compaction
• Coalescing the holes
• Use a clever placement strategy

25
Placement Strategies I
➢ First Fit Strategy: accept the first available record slot that can accommodate
the new record.

➢ Best Fit Strategy: choose the first available smallest available record slot that
can accommodate the new record.

➢ Worst Fit Strategy: choose the largest available record slot.

26
Placement Strategies II
➢ Some general remarks about placement strategies:
• Placement strategies only apply to variable-length records.
• If space is lost due to internal fragmentation, the choice is first fit and best fit. A worst
fit strategy truly makes internal fragmentation worse.

• If the space is lost due to external fragmentation, one should give careful
consideration to a worst-fit strategy.

27
Finding Things Quickly
➢ The cost of Seeking is very high.
➢ This cost has to be taken into consideration when determining a strategy for
searching a file for a particular piece of information.

➢ The same question also arises with respect to sorting, which often is the first step
to searching efficiently.

➢ Rather than simply trying to sort and search, we concentrate on doing so in a way
that minimizes the number of seeks.

28
Finding Things Quickly
➢ So far, the only way we have to retrieve or find records quickly is by using their
RRN (in case the record is of fixed-length).

➢ Without a RRN or in the case of variable-length records, the only way, so far, to
look for a record is by doing a sequential search. This is a very inefficient method.

➢ We are interested in more efficient ways to retrieve records based on their key-
value.

29
Finding Things Quickly
➢ Binary Search:
• Let’s assume that the file is sorted and that we are looking for record whose key is
Kelly in a file of 1000 fixed-length records.

1: Johnson 2: Monroe

1 2 …. 500 750 1000

Next Comparison

30
Finding Things Quickly
➢ Binary Search versus Sequential Search:
➢ Binary Search of a file with n records takes O (log2n) comparisons.
➢ Sequential search takes O (n) comparisons.
➢ When sequential search is used, doubling the number of records in the file
doubles the number of comparisons required for sequential search.

➢ When binary search is used, doubling the number of records in the file only adds
one more guess to our worst case.

➢ In order to use binary search, though, the file first has to be sorted. This can be
very expensive.
31
Finding Things Quickly
➢ Sorting a Disk File in Memory:
➢ If the entire content of a file can be held in memory, then we can perform an
internal sort. Sorting in memory is very efficient.

➢ However, if the file does not hold entirely in memory, any sorting algorithm will
require a large number of seeks. Sorting would, thus, be extremely slow.
Unfortunately, this is often the case, and solutions have to be found.

32
Finding Things Quickly
➢ The limitations of Binary Search and Internal Sorting:
➢ Binary Search requires more than one or two accesses. Accessing a record using
the RRN can be done with a single access ==> We would like to achieve RRN
retrieval performance while keeping the advantage of key access.

➢ Keeping a file sorted is very expensive: in addition to searching for the right
location for the insert, once this location is founds, we have to shift records to open
up the space for insertion.

➢ Internal Sorting only works on small files. ==> Keysorting

33
Finding Things Quickly
➢ KeySorting:
➢ Overview: when sorting a file in memory, the only thing that really needs sorting
are record keys.

➢ Keysort algorithms work like internal sort, but with 2 important differences:
• Rather than read an entire record into a memory array, we simply read each record
into a temporary buffer, extract the key and then discard.

• If we want to write the records in sorted order, we have to read them a second time.

34
Finding Things Quickly
➢ KeySorting:

35
Finding Things Quickly
➢ Limitation of the KeySort Method:
➢ Writing the records in sorted order requires as many random seeks as there are records.
➢ Since writing is interspersed with reading, writing also requires as many seeks as there are
records.

➢ Solution: Why bother to write the file of records in key order:

• Simply write back the sorted index.
36
Finding Things Quickly
➢ Pinned Records:
➢ Indexes are also useful with regard to deleted records.
➢ The avail list indicating the location of unused records consists of pinned records
in the sense that these unused records cannot be moved since moving them would
create dangling pointers.

➢ Pinned records make sorting very difficult. One solution is to use an ordered
index and not to move the records.

Dsa Notes
No ratings yet
Dsa Notes
510 pages
Absolute Beginner S Guide To Algorithms
No ratings yet
Absolute Beginner S Guide To Algorithms
563 pages
Data Compression Seminar Report
67% (6)
Data Compression Seminar Report
34 pages
تنظيم الملفات
No ratings yet
تنظيم الملفات
179 pages
Chapter 6 Organizing Files For Performance Not Complete
No ratings yet
Chapter 6 Organizing Files For Performance Not Complete
65 pages
Chapter Four Indexing Structure
100% (2)
Chapter Four Indexing Structure
60 pages
21IS742 Module 2 PDF P
No ratings yet
21IS742 Module 2 PDF P
33 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
File Management15
No ratings yet
File Management15
52 pages
ISR Chap... 4
No ratings yet
ISR Chap... 4
43 pages
Manual de Cdx-Gt470us
No ratings yet
Manual de Cdx-Gt470us
64 pages
Chapter 5 - File Management
No ratings yet
Chapter 5 - File Management
25 pages
Data Structures and Algorithms Compression Methods
No ratings yet
Data Structures and Algorithms Compression Methods
21 pages
Fs Mod 2 (WWW - Vtuloop.com)
No ratings yet
Fs Mod 2 (WWW - Vtuloop.com)
91 pages
DOC2107162 - LOGIQ - E10 - E10s - E11 - E20 - Fortis - DICOM Conformance Statement
No ratings yet
DOC2107162 - LOGIQ - E10 - E10s - E11 - E20 - Fortis - DICOM Conformance Statement
349 pages
Organizing Files For Performance
No ratings yet
Organizing Files For Performance
92 pages
Lec2 PDF
No ratings yet
Lec2 PDF
38 pages
Explain File Management in An Operating System
No ratings yet
Explain File Management in An Operating System
57 pages
Lesson 7 INF211 Lect 08
No ratings yet
Lesson 7 INF211 Lect 08
29 pages
FP-Lecture-6 01
No ratings yet
FP-Lecture-6 01
33 pages
Part 2 File Organization L1&2
No ratings yet
Part 2 File Organization L1&2
23 pages
AST20105 Data Structures & Algorithms: Chapter 1 - Introductory
No ratings yet
AST20105 Data Structures & Algorithms: Chapter 1 - Introductory
17 pages
Mod 3
No ratings yet
Mod 3
8 pages
Hardware (Final Year Projects)
100% (11)
Hardware (Final Year Projects)
101 pages
Topic 5
No ratings yet
Topic 5
40 pages
Compression Algorithms
No ratings yet
Compression Algorithms
16 pages
Estruturas de Dados II LISCH EISCH
No ratings yet
Estruturas de Dados II LISCH EISCH
38 pages
Fundamental File Structure Concepts
No ratings yet
Fundamental File Structure Concepts
17 pages
Sequential Files
No ratings yet
Sequential Files
26 pages
Fundamental File Structure Concepts & Managing Files of Records
No ratings yet
Fundamental File Structure Concepts & Managing Files of Records
49 pages
File Organization For Performance: Amogh P K, SVIT
No ratings yet
File Organization For Performance: Amogh P K, SVIT
12 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
File Structure and Hashing
No ratings yet
File Structure and Hashing
12 pages
Organizing Files For Performance
No ratings yet
Organizing Files For Performance
39 pages
Semester Projects For Digital Signal Processing PDF
100% (1)
Semester Projects For Digital Signal Processing PDF
2 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
DoQuangHuy HE191197
No ratings yet
DoQuangHuy HE191197
8 pages
Lecture# 08 Greedy Algorithms
No ratings yet
Lecture# 08 Greedy Algorithms
63 pages
Run Length Coding
67% (3)
Run Length Coding
8 pages
File System
No ratings yet
File System
9 pages
OS - Chapter - 5 - File System
No ratings yet
OS - Chapter - 5 - File System
30 pages
Lecture Notes - 17ec741 - Module 3a - Itext & Image Compression - Raja GV
No ratings yet
Lecture Notes - 17ec741 - Module 3a - Itext & Image Compression - Raja GV
50 pages
Os Exam
No ratings yet
Os Exam
10 pages
PDF P Classtruncatedtext Module Lineclamped 85ulhh Style Max Lines52025 02 06 Statement Usb Checking 6656 P Compress - Json Metadata
No ratings yet
PDF P Classtruncatedtext Module Lineclamped 85ulhh Style Max Lines52025 02 06 Statement Usb Checking 6656 P Compress - Json Metadata
35 pages
CSE 203-July2022-L1
No ratings yet
CSE 203-July2022-L1
31 pages
Digital Graphics
No ratings yet
Digital Graphics
57 pages
XmaruView V1 Technical Manual (ENG) Rev11
No ratings yet
XmaruView V1 Technical Manual (ENG) Rev11
84 pages
Introduction To Digital Communication
100% (1)
Introduction To Digital Communication
158 pages
Lecture 1. Digital Signal Processing - Basics
100% (1)
Lecture 1. Digital Signal Processing - Basics
19 pages
Brief Introduction of Data Structure
No ratings yet
Brief Introduction of Data Structure
36 pages
User Guide
No ratings yet
User Guide
40 pages
File Management
No ratings yet
File Management
10 pages
Philips MicroDose-L30-full Calibration
100% (1)
Philips MicroDose-L30-full Calibration
12 pages
2 ND Note Sorting
No ratings yet
2 ND Note Sorting
20 pages
Report
No ratings yet
Report
156 pages
Panasonic sc-ht355 PDF
No ratings yet
Panasonic sc-ht355 PDF
32 pages
Data Compression Report
No ratings yet
Data Compression Report
12 pages
Mini Project HPC
No ratings yet
Mini Project HPC
3 pages
Mpeg 2 Compressed Domain Algorithms For Video Analysis
No ratings yet
Mpeg 2 Compressed Domain Algorithms For Video Analysis
11 pages
Rehman End-To-End Trained CNN Encoder-Decoder Networks For Image Steganography ECCVW 2018 Paper
No ratings yet
Rehman End-To-End Trained CNN Encoder-Decoder Networks For Image Steganography ECCVW 2018 Paper
6 pages
Mat Lab Virtual Systems
100% (1)
Mat Lab Virtual Systems
10 pages
Marantz DAC White Paper 4 5 2016 PDF
No ratings yet
Marantz DAC White Paper 4 5 2016 PDF
16 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Camera - Dahua - IPC HDBW2431F AS S2
No ratings yet
Camera - Dahua - IPC HDBW2431F AS S2
3 pages
KLCP Codec Log
No ratings yet
KLCP Codec Log
7 pages
Deflate: From Wikipedia, The Free Encyclopedia
No ratings yet
Deflate: From Wikipedia, The Free Encyclopedia
9 pages
Enter AV1: Alliance For Open Media Codec
No ratings yet
Enter AV1: Alliance For Open Media Codec
14 pages
AVCC
No ratings yet
AVCC
3 pages
Part-A: Searching: Searching Refers To The Operation of Finding Locations of A
No ratings yet
Part-A: Searching: Searching Refers To The Operation of Finding Locations of A
8 pages
Data Structure
No ratings yet
Data Structure
36 pages
Teradata 13.10 Features
No ratings yet
Teradata 13.10 Features
43 pages
Yamaha Yht 1840 Home Theater System
No ratings yet
Yamaha Yht 1840 Home Theater System
2 pages
00 Introduction
No ratings yet
00 Introduction
15 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
5Mp Full Color Night Network Camera: Systemoverview
No ratings yet
5Mp Full Color Night Network Camera: Systemoverview
3 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
OpenBSD Mastery: Filesystems: IT Mastery, #19
From Everand
OpenBSD Mastery: Filesystems: IT Mastery, #19
Michael W. Lucas
No ratings yet
Application and Implementation of DES Algorithm Based on FPGA
From Everand
Application and Implementation of DES Algorithm Based on FPGA
madhav
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
From Everand
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Administrator & Helpdesk Interview Questions You'll Most Likely Be Asked
From Everand
Administrator & Helpdesk Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

File Organization Lec910

Uploaded by

File Organization Lec910

Uploaded by

CSW241-File Organization and Processing

Organizing Files for Performance

Dr. Riham Moharam

➢ We will be looking at four different issues:

➢ Question: Why do we want to make files smaller?

• To process the file sequentially faster.

➢ Reversible compression removes only redundant information, making it possible to

➢ Irreversible compression goes further, removing information which is not actually

Lossless Methods Lossy Methods

Run-Length Huffman Lempel Ziv JPEG MPEG MP3

➢ Run-length encoding (RLE) is a lossless compression method where sequences

➢ The goal of RLE is to compress a string by replacing sequences of repeated

➢ There are 10 characters

1. Calculate the frequency of each character in the string.

➢ Irreversible compression technique: Based on the assumption that some

i= index of the TREE in FOREST with smallest weight;

j= index of the TREE in FOREST with 2nd smallest weight;

Delete TREE j from FOREST

➢ (A FOREST is a collection of TREES; each TREE has a root and a weight)

➢ 3 ways to deal with external fragmentation:

➢ Worst Fit Strategy: choose the largest available record slot.

1 2 …. 500 750 1000

➢ Internal Sorting only works on small files. ==> Keysorting

➢ Solution: Why bother to write the file of records in key order:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.