CP4P Compression and Backup
CP4P Compression and Backup
PRINCIPLES FOR
PROGRAMMERS
Compression, 3/\/[®¥|°710/\/, Backup
( Encryption )
How many programmers does it take to change a lightbulb?
t3A1A5TTcFKmylhIoGABjwacB1vIWgIv6S6LdLcSg
8s=
News of the Week i
Agenda
Lecture:
1. What, Why, and How of “File Compression”
… depends on the Use Case
… overview of formats
… Lossless vs Lossy
2. What, Why, and How of “Backup”
… types of backups
… backup media
Agenda (Cont’d)
Activity:
1. Explore File Compression
2. Compress various native file formats within a ZIP
archive and compare the compression factors
3. Upload files to OneDrive to demonstrate a network
backup.
4. Do your own 3-2-1 backup.
What is “File Compression?”
4 pink
5 green
3 blue
What is “File Compression?”
• storing a file’s data in “less space”
by “minimizing redundancy” in the content
• An archive is a collection of folders and files
stored in one file, e.g. filename.ZIP
• Files are usually compressed
• Files can be encrypted
• Cross platform exchange
• OS options to compress / encrypt local files
Why use File Compression?
• Writing / Sending data takes bandwidth and I/O
time
Data
ZIP
Music
ZIP
.docx .pptx .xlsx MP3 AAC MQA
.tar.gz 7z RAR WAV ogg FLAC
Standard for
Video
Images Lossless
Cross-Platform MPG MP4 DIVX
JPEG JXL AVIF WebP
data exchange XVID MOV AVI
GIF PNG TIFF RAW
AVIF WebP JXL
BOLD formats are Lossless
Drawbacks to Compression
o Time: compression needs CPU and primary storage resources
o PCs have lots of both and only one user. Servers on the other hand…
o Space: archived files must be uncompressed before use,
extra space needed for both compression & decompression
o Integrity: any data corruption can cause loss of entire archive
o Solid or multi-volume archives can be lost with even minor data
corruption. Archive repair is possible but not probable.
o Test your archives to confirm integrity.
o Recoverability: the Lossy sacrifice is reduced quality
o Lossy compression is appropriate only for specific Use Cases.
Why do we need backups?
#1 Accidental deletion by users or IT people 2/3 to 3/4
of all
#2 Hardware failure: all storage fails eventually data loss
A copy in a
geographically
separate location
that is platform
independent.
Classic File Backup Strategy
Types: Full (all files) + Differential (only files changed since last Full backup)
Full backup is slow, Differential backup is faster but gets slower.
Restore requires Full + Differential.
Classic File Backup Strategy
Types: Full (all files) + Incremental (files changed since last backup of any type)
Incremental backups are faster than Differential.
Restore is slowest because multiple backup types must be done in sequence.
Enterprise backup
• Backup software does Full, Differential, Incremental strategies
• Options for file versions / generations, and periodic snapshots..
• Enterprise OS provides for backup of continuously running systems.
• LTO tape or Optical Disc libraries as nearline tertiary storage
• AWS Glacier, Google Nearline, Sync cloud cold storage
• Inexpensive backup storage but slow to restore ($$$ if speed needed)
• Recovery and Restoration speed is highly variable.
• Depends on data transfer rates from backup device or location, and
complexity of rebuilding the relational aspect of data base objects
• Data deduplication and Single-instance optimize storage
• eliminate duplicate copies of data within and across systems
User Level File Recovery … is not backup
Windows File History, macOS Time Machine are not exactly backup
• Automatic copying of files to external or network drive [good]
• Historical versions of user files maintained. Easy to restore. [good]
• Must configure and test to ensure copying of all user folders. [okay]
• If drive is always connected, it is not a backup, just a copy; it is likely
not geographically separate and not platform independent. [bad!]
Windows Recycle Bin, macOS Trash can are not backup
• Only good for oops! and short-term recovery. [hopeful]
Two-way synchronization is not backup [deluded]
• Synchronization is platform interdependent, not independent.[bad!]
• A file on one system does not have a "copy" on other systems, [bad!]
the same file co-exists on all synchronized systems. [good?]
3-2-1 Backup Checklist
• 3 copies (change only the active file, not the backups)
• 1 active, 1 local backup, 1 remote backup
• 2 different formats/platforms (platform independence)
• External drive is platform independent only when not plugged in
• LTO tape or optical disc. Initially local, optionally moved offsite.
• One-way backup to cloud cold storage (not two-way cloud sync)
• 1 off-site backup (geographically separate location)
• Cloud storage different from your cloud IaaS, PaaS, SaaS provider
• tape/optical media – rotate Full, Diff, Incr to offsite storage services
• The near loss of Toy Story 2
The final word on backups…
Backups do not matter.
Only RESTORE matters.
NOTES
…not on the quiz but here for further
information and explanation.
Effect of File Compression on Data Transfer
Assumptions:
• 1MB plain text file, unique for each of 30,000 users
• Network throughput is 2 seconds per file
• text compressed to 35% of original, throughput 1 sec/file