0% found this document useful (0 votes)
8 views81 pages

introduction to multimedia module

The document serves as an introduction to multimedia, defining it as the integration of various information types such as text, graphics, audio, and video. It covers the history of multimedia systems, the relationship between multimedia and the World Wide Web, and outlines multimedia system requirements and applications. Additionally, it discusses the characteristics, features, and challenges of multimedia systems, emphasizing the importance of processing power and synchronization in multimedia applications.

Uploaded by

Alehegn Zurbet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views81 pages

introduction to multimedia module

The document serves as an introduction to multimedia, defining it as the integration of various information types such as text, graphics, audio, and video. It covers the history of multimedia systems, the relationship between multimedia and the World Wide Web, and outlines multimedia system requirements and applications. Additionally, it discusses the characteristics, features, and challenges of multimedia systems, emphasizing the importance of processing power and synchronization in multimedia applications.

Uploaded by

Alehegn Zurbet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

INTRODUCTION

TO
MULTIMEDIA
MODULE
Table of Contents
CHAPTER ONE ............................................................................................................................. 4
INTRODUCTION TO MULTIMEDIA ......................................................................................... 4
1.1. What is Multimedia? ............................................................................................................. 4
1.2. History of Multimedia Systems ............................................................................................... 6
1.3. Hypermedia and Multimedia ................................................................................................ 7
1.4. Multimedia and World Wide Web (WWW)............................................................................ 9
1.5. Multimedia System Requirement ....................................................................................... 10
Review Question .......................................................................................................................... 17
CHAPTER TWO .......................................................................................................................... 18
MULTIMEDIA AUTHORING AND TOOLS ............................................................................ 18
2.1. What is Multimedia Authoring? ............................................................................................ 18
2.2. Multimedia Authoring Paradigms.......................................................................................... 19
2.3. Some Useful Editing and Authoring Tools ............................................................................ 24
Review Questions ........................................................................................................................ 26
CHAPTER THREE: ..................................................................................................................... 27
MULTIMEDIA DATA REPRESENTATIONS........................................................................... 27
3.1. Graphic/Image Data Representation ................................................................................. 27
3.2. Popular File Formats .............................................................................................................. 32
3.3. Digital Audio and MIDI ...................................................................................................... 34
Review Questions ........................................................................................................................ 38
CHAPTER FOUR:........................................................................................................................ 39
COLORS IN IMAGE AND VIDEO ............................................................................................ 39
4.1. Color Spaces .......................................................................................................................... 40
4.2. Color Models in Images ......................................................................................................... 42
4.3. Color Models in Video........................................................................................................... 45
Review Questions ........................................................................................................................ 47
CHAPTER FIVE: ......................................................................................................................... 48
FUNDAMENTAL CONCEPTS IN VIDEO ................................................................................ 48
5.1. Types of Video ....................................................................................................................... 48
5.2. Analog Video ......................................................................................................................... 48

2
5.3. Digital Video .......................................................................................................................... 49
5.4. Types of Color Video Signals .............................................................................................. 51
5.5. Video Broadcasting Standards/ TV standards ....................................................................... 53
Review Question .......................................................................................................................... 56
CHAPTER SIX: ............................................................................................................................ 58
BASICS OF DIGITAL AUDIO ................................................................................................... 58
6.1. Digitizing Sound .................................................................................................................... 58
6.2. Quantization and Transmission of Audio .......................................................................... 63
Review Questions ........................................................................................................................ 64
LOSSLESS COMPRESSION ALGORITHMS ........................................................................... 65
7.1. Introduction ............................................................................................................................ 65
7.2. Basics of Information Theory ................................................................................................ 66
7.3. Run Length Coding ................................................................................................................ 69
7.4. Variable-Length Coding (VLC) ............................................................................................. 70
7.5. Huffman Coding .................................................................................................................... 70
7.6. The Shannon-Fano Encoding Algorithm ........................................................................... 73
7.7. Lempel-Ziv Encoding ............................................................................................................ 76
7.8. Arithmetic Coding ................................................................................................................. 78
7.9. Lossless Image Compression ................................................................................................. 80
Review Questions ........................................................................................................................ 81

3
CHAPTER ONE

INTRODUCTION TO MULTIMEDIA

Lesson Content

1.1. What is Multimedia?

1.2. History of Multimedia Systems

1.3. Hypermedia and Multimedia

1.4. Multimedia and World Wide Web (WWW)

1.5. Multimedia System Requirement

1.1. What is Multimedia?


People who use the term “multimedia” often seem to have quite different, even opposing,
viewpoints. A PC vendor would like us to think of multimedia as a PC that has sound capability,
a DVD-ROM drive, and perhaps the superiority of multimedia-enabled microprocessors that
understand additional multimedia instructions. A consumer entertainment vendor may think of
multimedia as interactive cable TV with hundreds of digital channels, or a cable-TV-like service
delivered over a high-speed Internet connection.
Before we go on, it is important to define multimedia. Let us define it from two perspectives:

1) In terms of what multimedia is all about:


It refers to the storage, transmission, interchange, presentation and perception of different
Information types (data types) such as text, graphics, voice, audio and video where:
Storage- refers to the type of physical means to store data.
-Magnetic tape
-Hard disk
-Optical disk
-DVDs
-CD-ROMs, etc.

Presentation– refers to the type of physical means to reproduce information to the user.
-Speakers
-Video windows, etc.
Perception– describes the nature of information as perceived by the user
-Speech
-Music
-Film

4
2) Based on the word “Multimedia”
It is composed of two words:

Multi– multiple/many
Media– source

Source refers to different kind of information that we use in multimedia.


This includes text, graphics, audio, video and images

Multimedia refers to multiple sources of information. It is a system, which integrates all the
above types.
Definitions:

Multimedia means computer information can be represented in audio, video and animated format
in addition to traditional format. The traditional formats are text and graphics.
General and working definition:

Multimedia is the field concerned with the computer controlled integration of text, graphics,
drawings, still and moving images (video), animation, and any other media where every type of
information can be represented, stored, transmitted, and processed digitally.

What is Multimedia Application?


A Multimedia Application is an application which uses a collection of multiple media sources
e.g. text, graphics, images, sound/audio, animation and/or video.

What is Multimedia system?


A Multimedia System is a system capable of processing multimedia data. A Multimedia System
is characterized by the processing, storage, generation, manipulation and rendition of multimedia
information.

Characteristics of a Multimedia System


A Multimedia system has four basic characteristics:
 Multimedia systems must be computer controlled
 Multimedia systems are integrated
 The information they handle must be represented digitally
 The interface to the final presentation of media is usually interactive.

Multimedia Applications (where it is applied)


 Digital video editing and production systems
 Interactive movies, and TV
 Video conferencing

5
 Virtual reality(the creation of artificial environment that you can explore, e.g. 3-D images)
 Augmented reality (placing real-appearing computer graphics and video objects into scenes so
as to take the physics of objects and lights (e.g., shadows) into account
 Distributed lectures for higher education
 Digital libraries
 World Wide Web
 On-line reference works e.g. encyclopedias, games, etc.
 Electronic Newspapers/Magazines
 Games
 Groupware (enabling groups of people to collaborate on projects and share information)
 Cooperative work environments that allow business people to edit a shared document or
schoolchildren to share a single game using two mice that pass control back and forth.
 Making multimedia components editable – allowing the user side to decide what components,
video, graphics, and so on are actually viewed and allowing the client to move components
around or delete them – making components distributed

Features of Multimedia
Multimedia has three aspects:
Content: movie, production, etc.
Creative Design: creativity is important in designing the presentation
Enabling Technologies: Network and software tools that allow creative designs to be presented.

1.2. History of Multimedia Systems


Newspaper was perhaps the first mass communication medium, which used mostly text,
graphics, and images. In 1895, Gugliemo Marconi sent his first wireless radio transmission at
Pontecchio, Italy. A few years later (in 1901), he detected radio waves beamed across the
Atlantic. Initially invented for telegraph, radio is now a major medium for audio broadcasting.
Television was the new media for the 20th century. It brings the video and has since changed the
world of mass communications.
Motion pictures were originally conceived of in the 1830s to observe motion too rapid for
perception by the human eye. Thomas Alva Edison ‘commissioned the invention of a motion
picture camera in 1887. Silent feature films appeared from 1910 to 1927; the silent era
effectively ended with the release of the jazz singer 1927.

On computers, the following are some of the important events:


1945 -Vannevar Bush (1890-1974) wrote about Memex.
MEMEX stands for MEMory EXtension. A memex is a device in which an individual stores all
his books, records, and communications, and which is mechanized so that it may be consulted
with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

6
1960s-Ted Nelson started Xanadu project (Xanadu – a kind of deep Hypertext).
Project Xanadu was the explicit inspiration for the World Wide Web, for Lotus
Notes and for HyperCard, as well as less-well-known systems.
1967 – Nicholas Negroponte formed the Architecture Machine Group at MIT.
A combination lab and think tank responsible for many radically new approaches to the human-
computer interface. Nicholas Negroponte is the Wiesner Professor of Media Technology at the
Massachusetts Institute of Technology.
1968 – Douglas Engelbart demonstrated NLS (Online Systems) system at SRI.
Shared-screen collaboration involving two persons at different sites communicating over a
network with audio and video interface is one of the many innovations presented at the
demonstration . 1976 1969 – Nelson & Van Dam hypertext editor at Brown

 Architecture Machine Group proposal to DARPA: Multiple Media


1985 – Negroponte, Wiesner: opened MIT Media Lab Research at the Media Lab comprises
interconnected developments in an unusual range of disciplines, such as software agents;
machine understanding; how children learn; human and machine vision; audition; speech
interfaces; wearable computers; affective computing; advanced interface design; tangible media;
object-oriented video; interactive cinema; digital expressionsfrom text, to graphics, to sound.
1989 – Tim Berners-Lee proposed the World Wide Web to CERN (European Council for
Nuclear Research)
 1990 – K. Hooper Woolsey, Apple Multimedia Lab gave education to 100 people
 1992 – The first M-Bone audio multicast on the net (MBONE- Multicast Backbone)
 1993 – U. Illinois National Center for Supercomputing Applications introduced NCSA Mosaic (a
web browser)
 1994 – Jim Clark and Marc Andersen introduced Netscape Navigator (web browser)
 1995 – Java for platform-independent application development.
 1996 – DVD video was introduced; high-quality, full-length movies were distributed on a single
disk. The DVD format promised to transform the music, gaming and computer industries. 1998 –
XML 1.0 was announced as a W3C Recommendation.
 1998 – Handheld MP3 devices first made inroads into consumer tastes in the fall, with the
introduction of devices holding 32 MB of flash memory.
 2000 – World Wide Web (WWW) size was estimated at over 1 billion pages.

1.3. Hypermedia and Multimedia


What is Hypertext and Hypermedia?
Hypertext is a text, which contains links to other texts. The term was invented by Ted Nelson
around 1965. Hypertext is usually non-linear (as indicated below). Hypermedia is not
constrained to be text-based. It can include other media, e.g., graphics, images, and especially the
continuous media — sound and video. Apparently, Ted Nelson was also the first to use this term.
The World Wide Web (www) is the best example of hypermedia applications.

7
Hypertext

Figure 1. 1 Hypertext is non linear

Hypertext is therefore usually non-linear (as indicated above).

Hypermedia

Figure 1. 2 Example of Hypermedia

Hypermedia is the application of hypertext principles to a wider variety of media, including


audio, animations, video, and images.
As we have seen, multimedia fundamentally means that computer information can be
represented through audio, graphics, images, video, and animation in addition to traditional
media (text and graphics). Hypermedia can be considered one particular multimedia application.
Examples of typical multimedia applications include: digital video editing and production

8
systems; electronic newspapers and magazines; the World Wide Web; online reference works,
such as encyclopedias; games; groupware; home shopping; interactive TV; multimedia
courseware; video conferencing; video-on-demand; and interactive movies.

Desirable Features for a Multimedia System


These are the following features of desirable for a Multimedia System:

1. Very high processing speed processing power. Why? Because there are large data to be
processed. Multimedia systems deals with large data and to process data in real time, the
hardware should have high processing capacity.
2. It should support different file formats. Why? Because we deal with different data types (media
types).
3. Efficient and High Input-output: input and output to the file subsystem needs to be efficient and
fast. It has to allow for real-time recording as well as playback of data.

4. Special Operating System: to allow access to file system and process data efficiently and
quickly. It has to support direct transfers to disk, real-time scheduling, fast interrupt processing,
I/O streaming, etc.
5. Storage and Memory: large storage units and large memory are required. Large
Caches are also required.
6. Network Support: Client-server systems common as distributed systems common.
7. Software Tools: User-friendly tools needed to handle media, design and develop applications,
deliver media.

Challenges of Multimedia Systems


1) Synchronization issue: in MM application, variety of media are used at the same instance. In
addition, there should be some relationship between the media. E.g between Movie (video) and
sound. There arises the issue of synchronization.
2) Data conversion: in MM application, data is represented digitally. Because of this, we have
to convert analog data into digital data.
3) Compression and decompression: Why? Because multimedia deals with large amount of
data (e.g. Movie, sound, etc) which takes a lot of storage space.
4) Render different data at same time — continuous data.

1.4. Multimedia and World Wide Web (WWW)


Multimedia is closely tied to the World Wide Web (WWW). Without networks, multimedia is
limited to simply displaying images, videos, and sounds on your local machine. The true power
of multimedia is the ability to deliver this rich content to a large audience.

HyperText Transfer Protocol (HTTP)


HTTP is a protocol that was originally designed for transmitting hypermedia, but it also supports

9
transmission of any file type. HTTP is a “stateless” request/response protocol, in the sense that a
client typically opens a connection to the HTTP server, requests information, the server
responds, and the connection is terminated – no information is carried over for the next request.
The basic request format is
Method URI Version
Additional-Headers Message-body

The Uniform Resource Identifier (URI) identifies the resource accessed, such as the host name,
always preceded by the token ”http://”. A URI could be a Uniform Resource Locator CURL), for
example. Here, the URI can also include query strings (some interactions require submitting
data). Method is a way of exchanging information or performing tasks on the URI. Two popular
methods are GET and POST. GET specifies that the information requested is in the request string
itself, while the POST method specifies that the resource pointed to in the URI should consider
the message body. POST is generally used for submitting HTML forms. Additional-Headers
specifies additional parameters about the client. For example, to request access to this textbook’s
web site, the following HTTP message might be generated:

GET http://www.cs.sfu.ca/mmbook/ HTTP 1.1


The basic response format is Version Status-Code Status-Phrase

Additional-Headers Message body

Status-Code is a number that identifies the response type (or error that occurs), and Status-Phrase
is a textual description of it. Two commonly seen status codes and phrases are 200 OK when the
request was processed successfully and 404 Not Found when the URI does not exist. For
example, in response to the example request above the web server may return something like:
HTTP/l.l 200 OK Server:
[No~plugs~here~please] Date: Wed, 25 July 2002
20 : 04 : 30 GMT
Content-Length: 1045 Content-Type: text/html

<HTML>
….

</HTML>

1.5. Multimedia System Requirement


1) Software tools
2) Hardware Requirement

10
Software Requirement

3-D and Animation Tools:


This software provide 3D clip art object such as people, furniture, building, car, airplane, tree,
etc. You can use these objects in your project easily.
A good 3D modeling tool should include the following features:
 Ability to drag and drop primitive shape into screen
 Ability to create objects from scratch
 Ability to add realistic effects such as transparency, shadowing, etc
 Multiple window that allow user to view model in each dimension
 Color and texture mapping
Examples: 3Ds MaxLog motion, Discrete

Text editing and word processing tools:


Word processors are used for writing letters, invoices, project content, etc. They include features
like:

  spell check
  table formatting
  Thesaurus
  templates ( e.g. letters, resumes, & other common documents) Examples: Microsoft Word,
Word perfect, Note pad

Sound Editing Tools


They are used to edit sound (music, speech, etc.) The user can see the representation of sound in
fine increment, score or waveform. User can cut, copy, and paste any portion of the sound to edit
it. You can also add other effects such as distort, echo, pitch, etc. Examples: -sound forge
E.g., Sound Forge Sound Forge is a sophisticated PC-based program for editing WAV files.
Sound can be captured from a CD-ROM drive or from tape or microphone through the sound
card, then mixed and edited. It also permits adding complex special effects.

11
Multimedia authoring tools:
Multimedia authoring tools provide important framework that is needed for organizing and
editing objects included in the multimedia project (e.g. graphics, animation, sound, video, etc.).
They provide editing capability to limited extent.
Examples: Macromedia Flash, Macromedia Director, Macromedia Authoware

Macromedia Flash: Flash allows users to create interactive movies by using the score metaphor
– a timeline arranged in parallel event sequences, much like a musical score consisting of
musical notes. Elements in the movie are called symbols in Flash. Symbols are added to a central
repository, called a library, and can be added to the movie’s timeline. Once the symbols are
present at a specific time, they appear on the Stage, which represents what the movie looks like
at a certain time, and can be manipulated and moved by the tools built into Flash. Finished Flash
movies are commonly used to show movies or games on the web.

Macromedia Director: Director uses a movie metaphor to create interactive presentations. This
powerful program includes a built-in scripting language, Lingo, which allows creation of
complex interactive movies. The “cast” of characters in Director includes bitmapped sprites,
scripts, music, sounds, and palettes. Director can read many bitmapped file formats. The program
itself allows a good deal of interactivity, and Lingo, with its own debugger, allows more control,
including control over external devices, such as VCRs and videodisc players. Director also has
web-authoring features available, for creation of fully interactive Shockwave movies playable
over the web.

Authorware: is a mature, well-supported authoring product that has an easy learning curve for
computer science students because it is based on the idea of flowcharts (the so-called
iconic/flow-control metaphor). It allows hyperlinks to link text, digital movies, graphics, and
sound. It also provides compatibility between files produced in PC and Mac versions.
Shockwave Authorware applications can incorporate Shockwave files, including Director
Movies, Flash animations, and audio.

OCR software
These soft wares convert printed document into electronically recognizable ASCII character. It is
used with scanners. Scanners convert printed document into bitmap. Then these software’s break
the bitmap into pieces according to whether it contains text or graphics. This is done by
examining the texture and density of the bitmap and by detecting edges.
Text area ASCII text
Bitmap area bitmap image to do the above, these softwares use probability and expert system.
Use:
 To include printed documents in our project without typing from keyboard.  To include
documents in their original format e.g signatures, drawings, etc Examples: OmniPage Pro
Perceive

12
 To include printed documents in our project without typing from keyboard.  To include
documents in their original format e.g signatures, drawings, etc Examples: OmniPage Pro
Perceive

Painting and Drawing Tools to create graphics for web and other purposes, painting and editing
tools are crucial.

Painting Tools: are also called image-editing tools. They are used to edit images of different
format. They help us to retouch and enhance bitmap images. Some painting tools allow to edit
vector based graphics too. Some of the activities of editing include:
 bluring the picture
 removing part of the picture
 add texts to picture
 merge two or more pictures together, etc
Examples: Macromedia Fireworks and Adobe photoshop

Drawing Tool: used to create vector based graphics. Examples: Macromedia Freehand,
CorelDraw, Illustrator
Drawing and painting tools should have the following features:
 Scalable dimension for restore, stretch, and distorting images/graphics
 Customizable pen and brush shapes and sizes
 Multiple undo capabilities
 Capacity to import and export files in different formats
 Ability to create geometric shapes from circle, rectangle, line, etc.
 Zooming for magnified image editing
 Support for third party plug-ins.

Graphics and Image Editing


Adobe Illustrator: illustrator is a powerful publishing tool for creating and editing vector
graphics, which can easily be exported to use on the web.
Adobe Photoshop: Photoshop is the standard in a tool for graphics, image processing, and
image manipulation. Layers of images, graphics, and text can be separately manipulated for
maximum flexibility, and its “filter factory” permits creation of sophisticated lighting effects.

Macromedia Fireworks: Fireworks is software for making graphics specifically for the web. It
includes a bitmap editor, a vector graphics editor, and a JavaScript generator for buttons and
rellovers.

Video Editing

13
Animation and digital video movie are sequence of bitmapped graphic frames rapidly played
back. Some of the tools to edit video include:
Adobe premier, Deskshare Video Edit Magic, Videoshop These application display time
references (relationship between time & the video), frame counts, audio, transparency level, etc.

Hardware Requirement

Three groups of hardware for multimedia:

1) Memory and storage devices

2) Input and output devices

3) Network devices

1) Memory and Storage Devices


Multimedia products require high storage capacity than text-based data. Huge drives are essential
for the enormous files used in multimedia and audiovisual creation.

I) RAM: is the primary requirement for multimedia system. Why? Reasons: – you have to store
authoring software itself. E.g Flash takes 20MB of memory, Photoshop 16-20MB, etc. –
digitized audio and video is stored in memory – Animated files, etc. To store this at the same
time, you need large amount of memory

II) Storage Devices: large capacity storage devices are necessary to store multimedia data.
Floppy Disk: not sufficient to store multimedia data. Because of this, they are not used to store
multimedia data.

Hard Disk: the capacity of hard disk should be high to store large data. CD: is important for
multimedia because they are used to deliver multimedia data to users. A wide variety of data
like:

 Music (sound, & video)


 Multimedia Games
 Educational materials
 Tutorials that include multimedia
 Utility graphics, etc

 DVD: have high capacity than CDs. Similarly, they are also used to distribute multimedia data to
users. Some of the characteristics of DVD:
 High storage capacity 4.7-17GB
 Use narrow tracks than CDs high storage capacity

14
 High data transfer rate 4.6MB/sec

2) Input-Output Devices
I) Interacting with the system: to interact with multimedia system, we use either keyboard,
mouse, track ball, or touch screen, etc.
Mouse: multimedia project is typically designed to be used with mouse as an input pointing
device. Other devices like track ball and touch screen could be used in place of mouse. Track
ball is similar with mouse in many ways.
Wireless mouse: important when the presenter has to move around during presentation
Touch Screen: we use fingers instead of mouse to interact with touch screen computers.
There are three technologies used in touch screens:

i. Infrared light: such touch screens use invisible infrared light that are projected across the
surface of screen. A finger touching the screen interrupts the beams generating electronic signal.
Then it identifies the x-y coordinate of the screen where the touch occurred and sends signals to
the operating system for processing.
ii. Texture-coated: such monitors are coated with texture material that is sensitive towards
pressure. When user presses the monitor, the texture material on the monitor extracts the x-y
coordinate of the location and send signals to operating system
iii. Touch mate:
Use: touch screens are used to display/provide information in public areas such as airports,
Museums, transport service areas, hotels, etc.

Advantage:
 user friendly
 easy to use even for non-technical people
 easy to learn how to use

II) Information Entry Devices: the purpose of these devices is to enter information to be
included in our multimedia project into our computer.
OCR: they enable us to use OCR softwares convert printed document into ASCII file.
Graphical Tablets/ Digitizer: both are used to convert points, lines, and curves from sketch into
digital format. They use a movable device called stylus.
Scanners: enable us to convert printed images into digital format.
Microphones: they are important because they enable us to record speech, music, etc. The
microphone is designed to pick up and amplify incoming acoustic waves or harmonics precisely
and correctly and convert them to electrical signals. You have to purchase a superior, high-
quality microphone because your recordings will depend on its quality.
Digital Camera and Video Camera (VCR): are important to record and include image and
video in MMS respectively. Digital video cameras store images as digital data, and they do not
record on film. You can edit the video taken using video camera and VCR using video editing

15
tools.
Remark: video takes large memory space.

Output Devices
Depending on the content of the project, & how the information is presented, you need different
output devices. Some of the output hardwares are:
Speaker: if your project includes speeches that are meant to convey message to audience, or
background music, using speaker is obligatory.
Projector: when to use projector:
 if you are presenting on meeting or group discussion,
 if you are presenting to large number of audience
Plotter/printer: when the situation arises to present using papers, you use printer and/or plotters.
In such cases, print quality of the device should be taken into consideration.
Impact printers: not good quality graphics/poor quality
Non-impact printers: good quality graphics

3) Network Devices
Why do we require network devices?
The following network devices are required for multimedia presentation:
i) Modem: which stands for modulator demodulator, is used to convert digital signal into analog
signal for communication of the data over telephone line which can carry only analog signal. At
the receiving end, it does the reverse action i.e. converts analog to digital data. Currently, the
standard modem is called v.90, which has the speed of 56kbps (kilobits per second). Older
standards include v.34, which has the speed of 28kbps. Data is transferred through modem in
compressed format to save time and cost.

ii) ISDN: stands for Integrated Services Digital Network. It is circuit switched telephone
network system, designed to allow digital transmission of voice and data over ordinary telephone
copper wires. This has the advantage of better quality and higher speeds than available with
analog systems.
 It has higher transmission speed i.e faster data transfer rate.
 They use additional hardware hence they are more expensive.
iii) Cable modem: uses existing cables stretched for television broadcast reception. The data
transfer rate of such devices is very fast i.e. they provide high bandwidth. They are primarily
used to deliver broadband internet access, taking advantage of unused bandwidth on a cable
television network.

iv) DSL: provide digital data transmission over the telephone wires of local telephone network.
The speed of DSL is faster than using telephone line with modem. How? They carry a digital
signal over the unused frequency spectrum (analog voice transmission uses limited range of

16
spectrum) available on the twisted pair cables running between the telephone company’s central
office and the customer premises.

Summary
Multimedia Information Flow

Figure 1. 4 Multimedia information flow

Review Question

1. What is multimedia?
2. What are the desirable feature of multimedia?
3. Discuss some application are of multimedia.
4. What are the different hardware and software requirements of multimedia?
5. What is the difference between hypertext and hypermedia?
6. How web is related to multimedia?

17
CHAPTER TWO
MULTIMEDIA AUTHORING AND TOOLS

2.1. What is Multimedia Authoring?

2.2. Multimedia Authoring Paradigms

2.3. Some Useful Editing and Authoring Tools

Review Questions

2.1. What is Multimedia Authoring?


Multimedia authoring is the creation of multimedia productions, sometimes called “movies” or
“presentations”, since we are interested in this subject from a computer science point of view; we
are mostly interested in interactive applications. In addition, we need to consider still-image
editors, such as Adobe Photoshop, and simple video editors, such as Adobe Premiere, because
these applications help us create interactive multimedia projects. How much interaction is
necessary or meaningful depends on the application. The spectrum runs from almost no
interactivity, as in a slide show, to full-immersion virtual reality.

In a slide show, interactivity generally consists of being able to control the pace (e.g., click to
advance to the next slide). The next level of interactivity is being able to control the sequence
and choose where to go next. Next is media control: start/stop video, search text, scroll the view,
and zoom. More control is available if we can control variables, such as changing a database
search query. The level of control is substantially higher if we can control objects – say, moving
objects around a screen, playing interactive games, and so on. Finally, we can control an entire
simulation: move our perspective in the scene, control scene objects

What is Authoring System?


Authoring is the process of creating multimedia applications. An authoring system is a program,
which has pre-programmed elements for the development of interactive multimedia
presentations. Authoring tools provide an integrated environment for binding together the
different elements of a Multimedia production. Multimedia Authoring Tools provide tools for
making a complete multimedia presentation where users usually have a lot of interactive
controls.

Multimedia presentations can be created using:

 Simple presentation packages such as PowerPoint


 Powerful RAD tools such as Delphi, .Net, JBuilder.

18
 True authoring environments, which lie somewhere in between in terms of technical
complexity.
Authoring systems vary widely in:
 Orientation
 Capabilities, and
 Learning curve: how easy it is to learn how to use the application

Why should you use an authoring system?


 Can speed up programming i.e. content development and delivery
 Time gains i.e. accelerated prototyping
 The content creation (graphics, text, video, audio, animation) is not affected by choice of
authoring system
There is big distinction between authoring and programming

Characteristics of Authoring Tools


A good authoring tool should be able to:
 integrate text, graphics, video, and audio to create a single multimedia presentation
 control interactivity by the use of menus, buttons, hotspots, hot objects etc.
 publish as a presentation or a self-running executable; on CD/DVD, Intranet, WWW
 Be extended through the use of pre-built or externally supplied components, plug-ins etc
 let you create highly efficient, integrated workflow
 Have a large user base.

2.2. Multimedia Authoring Paradigms


The authoring paradigm, or authoring metaphor, is the methodology by which the authoring
system accomplishes its task. Most authoring programs use one of several authoring metaphors,
also known as authoring paradigms. There are various paradigms:
 Scripting Language
 Icon-Based Control Authoring Tool
 Card and Page Based Authoring Tool
 Time Based Authoring Tool
 Tagging Tools

Scripting Language
The idea here is to use a special language to enable interactivity (buttons, mouse, etc.) and allow
conditionals, jumps, loops, functions/macros, and so on. Closest in form to traditional
programming. The paradigm is that of a programming language, which specifies:
 multimedia elements,
 sequencing of media elements,

19
 hotspots (e.g links to other pages),
 synchronization, etc.

Usually use a powerful, object-oriented scripting language. Multimedia elements and events
become objects that live in a hierarchical order. In-program editing of elements (still graphics,
video, audio, etc.) tends to be minimal or non-existent. Most authoring tools provide visually
programmable interface in addition to scripting language. Media handling can vary widely
Examples
 The Apples HyperTalk for HyperCard,
 Asymetrixís OpenScript for ToolBook and
 Lingo scripting language for Macromedia Director
 ActionScript for Macromedia Flash

Iconic/Flow Control Tools

In these authoring systems, multimedia elements and interaction cues (or events) are organised as
objects in a structural framework.
 Provides visual programming approach to organizing and presenting multimedia
 The core of the paradigm is the icon palette. You build a structure and flowchart of events,
tasks, and decisions by dragging appropriate icons from icon palette library. These icons are used
to represent and include menu choice, graphic images, sounds, computations, video, etc.
 The flow chart graphically depict the project logic
 Tends to be the speediest in development time. Because of this, they are best suited for rapid
prototyping and short-development time projects.
 These tools are useful for storyboarding because you can change the sequence of objects,
restructure interaction, add objects, by dragging and dropping icons.

 Examples:

-Authorware

– IconAuthor

20
Figure 2. 1 Iconic/Flow control

Card and page Based Tools


In these authoring systems, elements are organized as pages of a book or a stack of cards. The
authoring system lets you link these pages or cards into organized sequences. You can jump, on
command, to any page you wish in a structured navigation pattern.
 Well suited for Hypertext applications, and especially suited for navigation intensive
applications.
 They are best suited for applications where the bulk of the content consist of elements that can
be viewed individually.
 Extensible via XCMDs (External Command) and DLLs (Dynamic Link Libraries).
 All objects (including individual graphic elements) to be scripted;
 Many entertainment applications are prototyped in a card/scripting system prior to compiled-
language coding.
 Each object may contain programming script that is activated when an event occurs.

Hypercard (Macintosh)

– SuperCard(Macintosh)

– ToolBook (Windows), etc.

21
Time Based Authoring Tools
In these authoring systems elements are organized along a time line with resolutions as high as
1/30th second. Sequentially organized graphic frames are played back at a speed set by
developer. Other elements, such as audio events, can be triggered at a given time or location in
the sequence of events.
 Are the most popular multimedia authoring tool
 They are best suited for applications that have a message with beginning and end, animation
intensive pages, or synchronized media application.
Examples
o Macromedia Director
o Macromedia Flash

Macromedia Director
Director is a powerful and complex multimedia authoring tool, which has broad set of features to
create multimedia presentation, animation, and interactive application. You can assemble and
sequence the elements of project using cast and score. Three important things that director uses
to arrange and synchronize media elements:
Cast

22
Cast: is multimedia database containing any media type that is to be included in the project. It
imports wide range of data type and multimedia element formats directly into the cast. You can
also create elements from scratch and add to cast. To include multimedia elements in cast into
the stages, you drag and drop the media on the stage.

Score: This is where the elements in the cast are arranged. It is sequence for displaying,
animating, and playing cast members. Score is made of frames and frames contain cast member.
You can set frame rate per second.
Lingo
Lingo is a full-featured object oriented scripting language used in Director.

 It enables interactivity and programmed control of elements


 It enables to control external sound and video devices
 It also enables you to control operations of internet such as sending mail, reading documents,
images, and building web pages.

Macromedia Flash
 Can accept both vector and bitmap graphics
 Uses a scripting language called Action Script, which gives greater capability to control the
movie.
 Flash is commonly used to create animations, advertisements, to design web-page elements, to
add video to web pages, and more recently, to develop Rich Internet Applications. Rich Internet
Applications (RIA) are web applications that have the features and functionality of traditional
desktop applications. RIA’s uses a client side technology, which can execute instructions on the
client’s computer (no need to send every data to the server).
 Flash is a simple authoring tool that facilitates the creation of interactive movies. Flash follows
the score metaphor in the way the movie is created and the windows are organized.

Flash uses:
Library: a place where objects that are to be re-used are stored. The Library window shows all
the current symbols in the scene and can be toggled by the Window > Library command. A
symbol can be edited by double-clicking its name in the library, which causes it to appear on the
stage. Symbols can also be added to a scene by simply dragging the symbol from the Library
onto the stage.
Timeline: used to organize and control a movie content over time. Manages the layers and
timelines of the scene. The left portion of the Timeline window consists of one or more layers of
the Stage, which enables you to easily organize the Stage’s contents. Symbols from the Library
can be dragged onto the Stage, into a particular layer. For example, a simple movie could have
two layers, the background and foreground. The background graphic from the library can be

23
dragged onto the stage when the background layer is selected.
Layer: helps to organize contents. Timeline is divided into layers.

ActionScript: enables interactivity and control of movies. Action scripts allow you to trigger
events such as moving to a different keyframe or requiring the movie to stop. Action scripts can
be attached to a keyframe or symbols in a keyframe. Right clicking on the symbol and pressing
Actions in the list can modify the actions of a symbol. Similarly, by right clicking on the
keyframe and pressing Actions in the pop-up, you can apply actions’ to a keyframe. A Frame
Actions window will come up, with a list of available actions on the left and the current actions
being applied symbol on the right.

Tagging Tags in text files (e.g. HTML) to:


 link to pages,
 provide interactivity, and
 Integrate multimedia elements.
Examples:
o SGML/HTML
o SMIL (Synchronized Media Integration Language)
o VRML
o 3DML

 Most of them are displayed in web browsers using plug-ins or the browser itself can
understand them.
 This metaphor is the basis of WWW.
 It is limited but can be extended by the use of suitable multimedia tags

Multimedia Production
A multimedia project can involve a host of people with specialized skills. Multimedia production
can easily involve an art director, graphic designer, production artist, producer, project manager,
writer, user interface designer, sound designer, videographer, and 3D and 2D animators, as well
as programmers.

2.3. Some Useful Editing and Authoring Tools


Since the first step in creating a multimedia application is probably creation of interesting video
clips, we start looking at a video editing tool. This is not really an authoring tool, but video
creation is so important that we include a small introduction to one such program. The tools we
look at are the following:
 Adobe Premiere 6
 Macromedia Director

24
 Macromedia Flash
 Dreamweaver MX

Adobe Premiere
Adobe Premiere is a very simple video editing program that allows you to quickly create a
simple digital video by assembling and merging multimedia components. It effectively uses the
score authoring metaphor, in that components are placed in tracks horizontally, in a Timeline
window. The File > New Project command opens a window that displays a series of presets –
assemblies of values for frame resolution, compression method, and frame rate. There are many
preset options, most of which conform to some NTSC or PAL video standard. Start by importing
resources, such as AVI (Audio Video Interleave) video files and WAV sound files and dragging
them from the Project window onto tracks 1 or 2.

Selecting Authoring Tools


The multimedia project you are developing has its own underlying structure and purpose. When
selecting tools for your project you need to consider that purpose. Some of the features that you
have to take into consideration when selecting authoring tools are:
1) Editing Feature: editing feature for multimedia data especially image and text are often
included in authoring tools. The more editors in your authoring system, the less specialized
editing tools you need. The editors that come with authoring tools offer only subset of features
found in dedicated in editing tool. If you need more capability, still you have to go to dedicated
editing tools (e.g. sound editing tools for sound editing).
2) Organizing feature: the organization of media in your project involves navigation diagrams,
or flow charts, etc. Some authoring tools provides a visual flowcharting facility. Such features
help you for organizing the project. e.g IconAuthor, and AuthorWare use flowcharting and
navigation diagram method to organize media.

3) Programming feature: there are different types of programming approach:


a. Visual programming: this is programming using cues, icons, and objects. It is done using
drag and drop. To include sound in your project, drag and drop it in stage. Advantage: the
simplest and easiest authoring process. It is particularly useful for slide show and presentation.
b. Programming with scripting language: Some authoring tool provide very high level
scripting language and interpreted scripting environment. This helps for navigation control and
enabling user input.
c. Programming with traditional language such as Basic or C. Some authoring tools provide
traditional programming tools like program written in C. We can call these programs to
authoring tools. Some authoring tools allow to call DLL (Dynamic Link Library).
d. Document development tools

4) Interactivity feature: interactivity offers to the end user of the project to control the content
and flow of information. Some of interactivity levels:

25
a. Simple branching: enables the user to go to any location in the presentation using key press,
mouse click, etc.
b. Conditional branching: branching based on if-then decisions
c. Structured branching: support complex programming logic such as nested if-then
subroutines

5) Performance-tuning features: accomplishing synchronization of multimedia is sometimes


difficult because performance varies with different computers. In such cases you need to use
authoring tools own scripting language to specify time and sequence on system.

6) Playback feature: easy testing of the project. Testing enables you to debug the system and
find out how the user interacts with it. Not waste time in assembling and testing the project
7) Delivery feature: delivering your project needs building run-time version of the project using
authoring tools. Why run time version (executable format):
 It does not require the full authoring software to play
 It does not allow users to access or change the content, structure, and programming of the
project. Distributerun-time version
8) Cross platform feature: multimedia projects should be compatible with different platform
like Macintosh, Windows, etc. This enables the designer to use any platform to design the project
or deliver it to any platform.
9) Internet playability: web is significant delivery medium for multimedia. Authoring tools
typically provide facility so that output can be delivered in HTML or DHTML format.
10) Ease of learning: is it easy to learn? The designer should not waste much time learning how
to use it. Is it easy to use?

Review Questions

1. Explain briefly about Multimedia Authoring. Is that important for Multimedia? How?
2. Can you mention different Multimedia Authoring tool with their importance?
3. Explain briefly the characteristics of authoring tool.
4. What does it mean time based authoring tool?

26
CHAPTER THREE:
MULTIMEDIA DATA REPRESENTATIONS
3.1. Graphic/Image Data Representation

3.2. Popular File Formats

3.3. Digital Audio and MIDI

Review Questions

3.1. Graphic/Image Data Representation


The number of file formats used in multimedia continues to proliferate. GlF and JPG image file
formats are the two formats that most web browsers can decompress and display them.

An image could be described as two-dimensional array of points where every point is allocated
its own color. Every such single point is called pixel, short form of picture element. Image is a
collection of these points that are colored in such a way that they produce meaningful
information /data. Pixel (picture element) contains the color or hue and relative brightness of that
point in the image. The number of pixels in the image determines the resolution of the image.
 A digital image consists of many picture elements, called pixels
 The number of pixels determines the quality of the imageimage resolution.
 Higher resolution always yields better quality.

 Bitmap resolution most graphics applications let you create bitmaps up to 300 dots per inch
(dpi). Such high resolution is useful for print media, but on the screen most of the
information is lost, since monitors usually display around 72 to 96 dpi.
 A bit-map representation stores the graphic/image data in the same manner that the

27
computer monitor contents are stored in video memory.
 Most graphic/image formats incorporate compression because of the large size of the data.

Types of Images
There are two basic forms of computer graphics: bit-maps and vector graphics. The kind you use
determines the tools you choose. Bitmap formats are the ones used for digital photographs.
Vector formats are used only for line drawings.
Bit-map images (also called Raster Graphics)
They are formed from pixels – a matrix of dots with different colors. Bitmap images are defined
by their dimension in pixels as well as by the number of colors they represent. For example, a
640X480 image contains 640 pixels and 480 pixels in horizontal and vertical direction
respectively. If you enlarge a small area of a bit-mapped image, you can clearly see the pixels
that
are used to create it.

Each of the small pixels can be a shade of gray or a color. Using 24-bit color, each pixel can be
set to any one of 16 million colors. All digital photographs and paintings are bitmapped, and any
other kind of image can be saved or exported into a bitmap format. In fact, when you print any
kind of image on a laser or ink-jet printer, it is first converted by either the computer or printer
into a bitmap form so it can be printed with the dots the printer uses.
To edit or modify bitmapped images you use a paint program. Bitmap images are widely used
but they suffer from a few unavoidable problems. They must be printed or displayed at a size
determined by the number of pixels in the image. Bitmap images also have large file sizes that
are determined by the images dimensions in pixels and its color depth. To reduce this problem,
some graphic formats such as GIF and JPEG are used to store images in compressed format.
Vector graphics
They are really just a list of graphical objects such as lines, rectangles, ellipses, arcs, or curves –
called primitives. Draw programs, also called vector graphics programs, are used to create and
edit these vector graphics. These programs store the primitives as a set of numerical coordinates

28
and mathematical formulas that specify their shape and position in the image. This format is
widely used by computer-aided design programs to create detailed engineering and design
drawings. It is also used in multimedia when 3D animation is desired. Draw programs have a
number of advantages over paint-type program.
These include:
 Precise control over lines and colors.
 Ability to skew and rotate objects to see them from different angles or add perspective.
 Ability to scale objects to any size to fit the available space. Vector graphics always print at
the best resolution of the printer you use, no matter what size you make them.
 Color blends and shadings can be easily changed.
 Text can be wrapped around objects.

Types of Bitmap Images

1. Monochrome/Bit-Map Images
 Images consist of pixels, or pels – picture elements in digital images. A 1-bit image consists of
on and off bits only and thus is the simplest type of image. Each pixel is stored as a single bit (0
or 1). Hence, such an image is also referred to as a binary image. It is also called a I-bit
monochrome image, since it contains no color.
 The value of the bit indicates whether it is light or dark
 A 640 x 480 monochrome image requires 37.5 KB of storage.
 Dithering is used to calculate patterns of dots such that values from 0 to 255 correspond to
patterns that are more and more filled at darker pixel values, for printing on a 1-bit printer.
 Dithering is often used for displaying monochrome images

Figure 3. 2 Monochrome image

29
2. Gray-scale Images
 Each pixel is usually stored as a byte (value between 0 to 255). The entire image can be
thought of as a two-dimensional array of pixel values. We refer to such an array as a bitmap, a
representation of the graphics/image data that parallels the manner in which it is stored in video
memory.
 This value indicates the degree of brightness of that point. This brightness goes from black to
white
 A 640 x 480 grayscale image requires over 300 KB of storage.

Figure 3. 3 Gray-scale Images

3. 8-bit Color Images


 One byte for each pixel
 Supports 256 out of the millions possible, acceptable color quality
 Requires Color Look-Up Tables (LUTs)
 A 640 x 480 8-bit color image requires 307.2 KB of storage (the same as 8-bit greyscale).
Examples: GIF

30
Such image files use the concept of a lookup table to store color information. Basically, the
image stores not color but instead just a set of bytes, each of which is an index into a table with
3-byte values that specify the color for a pixel with that lookup table index. In a way, it is a bit,
as a paint-by-number children’s art set, with number 1 perhaps standing for orange, number 2 for
green, and so on – there is no inherent pattern to the set of actual colors.

Color lookup Tables (LUTs)


It used in 8-bit color images is to store only the index, or code value, for each pixel. Then, if a
pixel stores, say, the value 25, the meaning is to go to row 25 in a color lookup table (LUT).
While images are displayed as two-dimensional arrays of values, they are usually stored in row-
column order as simply a long series of values. For an 8-bit image, the image file can store in the
file header information just what 8-bit values for R, G, and B correspond to each index. Figure
3.8 displays this idea. The LUT is often called a palette.

31
Figure 3. 5 Color LUT for 8-bit color images.

4. 24-bit Color Images


 Each pixel is represented by three bytes (e.g., RGB)
 Supports 256 x 256 x 256 possible combined colors (16,777,216)
 A 640 x 480 24-bit color image would require 921.6 KB of storage
 Most 24-bit images are 32-bit images,
 The extra byte of data for each pixel is used to store an alpha value representing special effect
information

Image Resolution

Image resolution refers to the spacing of pixels in an image and is measured in pixels per inch,
ppi, sometimes called dots per inch, dpi. The higher the resolution, the more pixels in the image.
A printed image that has a low resolution may look pixelated or made up of small squares, with
jagged edges and without smoothness. Image size refers to the physical dimensions of an image.

3.2. Popular File Formats


Choosing the right file type for your image to save in is of vital importance. If you are, for
example, creating image for web pages, then it should load fast. So such images should be small
size. The other criteria to choose file type is taking into consideration the quality of the image
that is possible using the chosen file type. You should also be concerned about the portability of
the image. To choose file type:
 resulting size of the imagelarge file size or small
 quality of image possible by the file type
 portability of file across different platforms

The most common formats used on internet are the GIF, JPG, and PNG.
Standard System Independent Formats
Graphics Interchange Format (GIF)
Graphics Interchange Format (GIF) devised CompuServe, initially for transmitting graphical
images over phone lines via modems.
32
 Uses the Lempel-Ziv Welch algorithm (a form of Huffman Coding), modified slightly for
image scan line packets (line grouping of pixels).
 Limited to only 8-bit (256) color images, suitable for images with few distinctive colors (e.g.,
graphics drawing)
 Supports one-dimensional interlacing (downloading gradually in web browsers. Interlaced
images appear gradually while they are downloading. They display at a low blurry resolution
first and then transition to full resolution by the time the download is complete.)

 Supports animation multiple pictures per file (animated GIF)


 GIF format has long been the most popular on the Internet, mainly because of its small size
 GIFs allow single-bit transparency, which means when you are creating your image, you can
specify one color to be transparent. This allows background colors to show through the image

PNG
 stands for Portable Network Graphics
 It is intended as a replacement for GIF in the WWW and image editing tools.
 GIF uses LZW compression, which is patented by Unisys. All use of GIF may have to pay
royalties to Unisys due to the patent.
 PNG uses unpatented zip technology for compression
 One version of PNG, PNG-8, is similar to the GIF format. It can be saved with a maximum of
256 colors and supports 1-bit transparency. File sizes when saved in a capable image editor like
Fire Works will be noticeably smaller than the GIF counterpart will, as PNGs save their color
data more efficiently.
 PNG-24 is another version of PNG, with 24-bit color support, allowing ranges of color to a
high color JPG. However, PNG-24 is in no way a replacement format for JPG, because it is a
loss-less compression format, which results in large file size.
 Provides transparency using alpha value

 Supports interlacing
 PNG can be animated through the MNG extension of the format, but browser support is less
for this format.

JPEG/JPG
 A standard for photographic image compression
 created by the Joint Photographic Experts Group
 Intended for encoding and compression of photographs and similar images
 Takes advantage of limitations in the human vision system to achieve high rates of
compression

33
 Uses complex lossy compression, which allows user to set the desired level of quality
(compression). A compression setting of about 60% will result in the optimum balance of quality
and file size.
 Though JPGs can be interlaced, they do not support animation and transparency unlike GIF.

TIFF
 Tagged Image File Format (TIFF), stores many different types of images (e.g., monochrome,
grayscale, 8-bit & 24-bit RGB, etc.)
 Uses tags, keywords defining the characteristics of the image that is included in the file. For
example, a picture 320 by 240 pixels would include a ‘width’ tag followed by the number ‘320’
and a ‘depth’ tag followed by the number ‘240’.
 Developed by the Aldus Corp. in the 1980s and later supported by the Microsoft
 TIFF is a lossless format (when not utilizing the new JPEG tag which allows for JPEG
compression)
 It does not provide any major advantages over JPEG and is not as user-controllable.
 Do not use TIFF for web images. They produce big files, and more importantly, most web
browsers will not display TIFFs.

System Dependent Formats


Microsoft Windows: BMP
Bit Map (BMP) is the major system standard graphics file format for Microsoft Windows, used
in Microsoft Paint and other programs. It makes use of run-length encoding compression and can
efficiently store 24-bit bitmap images. Note, however, that BMP has many different modes,
including uncompressed 24-bit images.
 A system standard graphics file format for Microsoft Windows
 Used in Many PC Graphics programs
 It is capable of storing 24-bit bitmap images
Macintosh: PAINT and PICT

 PAINT was originally used in Mac Paint program, initially only for 1-bit monochrome images.
 PICT is a file format that was developed by Apple Computer in 1984 as the native format for
Macintosh graphics.
 The PICT format is a meta-format that can be used for both bitmap images and vector images
though it was originally used in MacDraw (a vector based drawing program) for storing
structured graphics.
 Still an underlying Mac format (although PDF on OS X).

3.3. Digital Audio and MIDI


What is Sound?
Sound is produced by a rapid variation in the average density or pressure of air molecules above
34
and below the current atmospheric pressure. We perceive sound as these pressure fluctuations
cause our eardrums to vibrate. These usually minute changes in atmospheric pressure are referred
to as sound pressure and the fluctuations in pressure as sound waves. Sound waves are produced
by a vibrating body, be it a guitar string, loudspeaker cone or jet engine. The vibrating sound
source causes a disturbance to the surrounding air molecules, causing them bounce off each other
with a force proportional to the disturbance. The back and forth oscillation of pressure produces
a sound waves.

Source — Generates Sound


 Air Pressure changes
 Electrical —Microphone produces electric signal
 Acoustic (sound)— Direct Pressure Variations
Destination — Receives Sound
 Electrical — Loud Speaker
 Ears — Responds to pressure hear sound

How to Record and Play Digital Audio


In order to play digital audio (i.e WAVE file), you need a card with a Digital to Analog
Converter (DAC) circuitry on it. Most sound cards have both an ADC (Analog to Digital
Converter) and a DAC so that the card can both record and play digital audio. This DAC is
attached to the Line Out jack of your audio card, and converts the digital audio values back into
the original analog audio. This analog audio can then be routed to a mixer, or speakers, or
headphones so that you can hear the recreation of what was originally recorded. Playback
process is almost an exact reverse of the recording process. First, to record digital audio, you
need a card, which has an Analog to Digital Converter (ADC) circuitry. The ADC is attached to
the Line In (and Mic In) jack of your audio card, and converts the incoming analog audio to a
digital signal.

First, to record digital audio, you need a card, which has an Analog to Digital Converter (ADC)
circuitry. The ADC is attached to the Line In (and Mic In) jack of your audio card, and converts
the incoming analog audio to a digital signal. Your computer software can store the digitized
audio on your hard drive, visually display on the computer’s monitor, mathematically manipulate
in order to add effects, or process the sound, etc. While the incoming analog audio is being
recorded, the ADC is creates many digital values in its conversion to a digital audio
representation of what is being recorded. These values must be stored for later playback.

Digitizing Sound
 Microphone produces analog signal
 Computers understands only discrete(digital) entities
This creates a need to convert Analog audio to Digital audio — specialized hardware. This is

35
also known as Sampling.
Common Audio Formats
There are two basic types of audio files:

1. The Traditional Discrete Audio File:


In traditional audio file, you can save to a hard drive or other digital storage medium.
WAV: The WAV format is the standard audio file format for Microsoft Windows applications,
and is the default file type produced when conducting digital recording within Windows. It
supports a variety of bit resolutions, sample rates, and channels of audio. This format is very
popular upon IBM PC (clone) platforms, and is widely used as a basic format for saving and
modifying digital audio data.
AIF: The Audio Interchange File Format (AIFF) is the standard audio format employed by
computers using the Apple Macintosh operating system. Like the WAV format, it supports a
variety of bit resolutions, sample rates, and channels of audio and is widely used in software
programs used to create and modify digital audio.

AU: The AU file format is a compressed audio file format developed by Sun Microsystems and
popular in the unix world. It is also the standard audio file format for the Java programming
language. Only supports 8-bit depth thus cannot provide CD-quality sound.
MP3: MP3 stands for Motion Picture Experts Group, Audio Layer 3 Compression. MP3 files
provide near-CD-quality sound but are only about 1/10th as large as a standard audio CD file.
Because MP3 files are small, they can easily be transferred across the Internet and played on any
multimedia computer with MP3 player software.
MIDI/MID: MIDI (Musical Instrument Digital Interface), is not a file format for storing or
transmitting recorded sounds, but rather a set of instructions used to play electronic music on
devices such as synthesizers. MIDI files are very small compared to recorded audio file formats.
However, the quality and range of MIDI tones is limited.

2. Streaming Audio File Formats


Streaming is a network technique for transferring data from a server to client in a format that can
be continuously read and processed by the client computer. Using this method, the client
computer can start playing the initial elements of large time-based audio or video files before the
entire file is downloaded. As the Internet grows, streaming technologies are becoming an
increasingly important way to deliver time-based audio and video data. For streaming to work,
the client side has to receive the data and continuously feed it to the player application. If the
client receives the data more quickly than required, it has to temporarily store or buffer the
excess for later play. On the other hand, if the data does not arrive quickly enough, the audio or
video presentation will be interrupted. There are three primary streaming formats that support
audio files: RealNetwork’s RealAudio (RA, RM), Microsoft’s Advanced Streaming Format
(ASF) and its audio subset called Windows Media Audio 7 (WMA) and Apple s QuickTime
4.0+ (MOV).

36
RA/RM
For audio data on the Internet, the de facto standard is RealNetwork’s RealAudio (.RA)
compressed streaming audio format. These files require a RealPlayer program or browser plug-
in. The latest versions of RealNetworksserver and player software can handle multiple
encodings of a single file, allowing the quality of transmission to vary with the available
bandwidth. Webcast radio broadcast of both talk and music frequently uses RealAudio. treaming
audio can also be provided in conjunction with video as a combined RealMedia (RM) file.

ASF
Microsofts Advanced Streaming Format (ASF) is similar to designed to RealNetwork’s
RealMedia format, in that it provides a common definition for internet streaming media and can
accommodate not only synchronized audio, but also video and other multimedia elements, all
while supporting multiple bandwidths within a single media file. Also like RealNetwork’s
RealMedia format, Microsofts ASF requires a program or browser plugin.
The pure audio file format used in Windows Media Technologies is Windows Media Audio 7
(WMA files). Like MP3 files, WMA audio files use sophisticated audio compression to reduce
file size. Unlike MP3 files, however, WMA files can function as either discrete or streaming data
and can provide a security mechanism to prevent unauthorized use.

MOV

Apple QuickTime movies (MOV files) can be created without a video channel and used as a
sound-only format. Since version 4.0, QuickTime provides true streaming capability. QuickTime
also accepts different audio sample rates, bit depths, and offers full functionality in both
Windows as well as the Mac OS. Popular audio file formats are:

 au (Unix)
 aiff (MAC)
 wav (PC)
 mp3

MIDI
MIDI stands for Musical Instrument Digital Interface.
Definition of MIDI: MIDI is a protocol that enables computer, synthesizers, keyboards, and other
musical device to communicate with each other. This protocol is a language that allows
interworking between instruments from different manufacturers by providing a link that is
capable of transmitting and receiving digital data. MIDI transmits only commands; it does not
transmit an audio signal. It was created in 1982.

Components of a MIDI System

37
1. Synthesizer: It is a sound generator (various pitch, loudness, tone color). A good (musician’s)
synthesizer often has a microprocessor, keyboard, control panels, memory, etc.
2. Sequencer: It can be a stand-alone unit or a software program for a personal computer. (It used to
be a storage server for MIDI data. Nowadays it is more a software music editor on the computer.
It has one or more MIDI INs and MIDI OUTs.

Basic MIDI Concepts


Track: Track in sequencer is used to organize the recordings. Tracks can be turned on or off on
recording or playing back.
Channel: MIDI channels are used to separate information in a MIDI system. There are 16 MIDI
channels in one cable. Channel numbers are coded into each MIDI message.
Timbre: The quality of the sound, e.g., flute sound, cello sound, etc.

Multi-timbral: capable of playing many different sounds at the same time (e.g., piano, brass,
drums,..)
Pitch: The Musical note that the instrument plays
Voice: Voice is the portion of the synthesizer that produces sound. Synthesizers can have many
(1Two, Two0, Two4, 36, etc.) voices. Each voice works independently and simultaneously to
produce sounds of Different timbre and pitch.
Patch: The control settings that define a particular timbre.

Review Questions

1. What is the difference bit map and vectors graphics? Which one is better? Why.
2. What is grayscale Image?
3. What are the different popular file formats of images?
4. Explain briefly the common audio formats. Where we used?

38
CHAPTER FOUR:
COLORS IN IMAGE AND VIDEO
Light and Spectra
In 1672, Isaac Newton discovered that white light could be split into many colors by a prism.
The colors produced by light passing through prism are arranged in precise array or spectrum.
The colors spectral signature is identified by its wavelength.

Figure 4. 1 Isaac Newton’s experiments

Visible light is an electromagnetic wave in the 400nm-700nm range (Blue~400nm, Red~700nm,


Green~500nm). Most light we see is not one wavelength; it is a combination of many
wavelengths. For Example purple is a mixture of red and violet. 1nm=10-9m.

The Color of Objects


Here we consider the color of an object illuminated by white light. Color is produced by the
absorption of selected wavelengths of light by an object. Objects can be thought of as absorbing
all colors except the colors of their appearance, which are reflected back. A blue object
illuminated by white light absorbs most of the wavelengths except those corresponding to blue
light. These blue wavelengths are reflected by the object.

4.1. Color Spaces

4.2. Color Models in Images


39
4.3. Color Models in Video

Review Questions

4.1. Color Spaces


Color space specifies how color information is represented. It is also called color model. Any
color could be described in a three dimensional graph, called a color space. Mathematically the
axis can be tilted (sloped) or moved in different directions to change the way the space is
described, without changing the actual colors. The values along an axis can be linear or non-
linear. This gives a variety of ways to describe colors that have an impact on the way we process
a color image. There are different ways of representing color. Some of these are:
 RGB color space
 YUV color space
 YIQ color space
 CMY/CMYK color space
 YCbCr color space

RGB Color Space

RGB stands for Red, Green, Blue. RGB color space expresses/defines color as a mixture of three
primary colors:
 Red
 Green
 Blue

Absence of all colors create black and presence of the three colors form white. These colors are
called additive colors. Pure black (0,0,0). Pure white (255,255,255) all other colors are produced
by varying the intensity of these three primaries and mixing the colors.

40
Figure 4. 2 RGB color Model

These colors are called additive colors since they add together the way light adds to make colors,
and is a natural color space to use with video displays.

CYM and CYMK color space


The subtractive color system reproduces colors by subtracting some wavelengths of light from
white. The three subtractive color primaries are cyan, magenta, and yellow (CMY). If none of
these colors is present, the color produced is white because nothing has been subtracted from the
white light. If all the colors are present at their maximum amounts, the color produced is black
because all of the light has been subtracted from the white light.

A color model used with printers and other peripherals. Three primary colors, cyan (C), magenta
(M), and yellow (Y), are used to reproduce all colors.

41
Figure 4. 3 CMY Color Model

The three colors together absorb all the light that strikes it, appearing black (as contrasted to
RGB where the three colors together made white). “Nothing” on the paper is white (as contrasted
to RGB where nothing was black). These are called the subtractive or “paint” colors. Cyan,
Magenta, and Yellow (CMY) are complementary colors of RGB. CMY model is mostly used in
printing devices where the color pigments on the paper absorb certain colors (e.g., no red light
reflected from cyan ink) and in painting.

In practice, it is difficult to have the exact mix of the three colors to perfectly absorb all light and
thus produce a black color. Expensive inks are required to produce the exact color, and the paper
must absorb each color in exactly the same way. To avoid these problems, a forth color is often
added – black – creating the CYMK color space, even though the black is mathematically not
required. Sometimes, an alternative CMYK model (K stands for Black) is used in color printing
(e.g., to produce darker black than simply mixing CMY).

4.2. Color Models in Images


Colors models and spaces used for stored, displayed, and printed images.

RGB Color Model (Additive Model)


Color images are encoded as integer triplet (R,G,B) values. These triplets encode how much the
corresponding phosphor should be excited in devices such as a monitor. For images produced
from computer graphics, we store integers proportional to intensity in the frame buffer.

42
CMY Color Model (Subtractive Color)
Additive color: Namely, when two light beams affect on a target, their colors add; when two
phosphors on a CRT screen are turned on, their colors add. However, for ink deposited on paper,
the opposite situation holds: yellow ink subtracts blue from white illumination, but reflects red
and green; it appears yellow. Instead of red, green, and blue primaries, we need primaries that
amount to -red, -green, and -blue. i.e we need to subtract R, or G, or B. These subtractive color
primaries are Cyan (C), Magenta (M) and Yellow (Y) inks.

Figure 4. 4 RGB and CMY Cube

So far, we have effectively been dealing only with additive color. Namely, when two light beams
impinge on a target, their colors add; when two phosphors on a CRT screen are turned 49
on, their colors add. Therefore, for example, red phosphor + green phosphor makes yellow light.
However, for ink deposited on paper, in essence the opposite situation holds: yellow ink
subtracts blue from white illumination but reflects red and green; which is why it appears
yellow! Therefore, instead of red, green, and blue primaries, we need primaries that amount to —
-red, -green, and -blue; we need to subtract R, G, or B. These subtractive color primaries are
cyan (C), magenta (M), and yellow (Y) inks. RGB and CMY are connected. In the additive
(RGB) system, black is “no light”, RGB = (0,0, 0). In the subtractive CMY system, black arises
from, subtracting all the light by laying down inks with C = M = Y = 1.
Transformation from RGB to CMY
Simplest model we can invent to specify what ink density to lay down on paper, to make a
certain desired RGB color: Then the inverse transform is:
Undercolor Removal: CMYK System
Undercolor removal: Sharper and cheaper printer colors: calculate that part of the CMY mix that
would be black, remove it from the color proportions, and add it back as real black. The new
specification of inks is thus: K stands for black.
Color combinations

43
Transformation from RGB to CMY
Simplest model we can invent to specify what ink density to lay down on paper, to make a
certain desired RGB color: Then the inverse transform is:

Undercolor removal: Sharper and cheaper printer colors: calculate that part of the CMY mix
that would be black, remove it from the color proportions, and add it back as real black. The new
specification of inks is thus: K stands for black.

Color combinations that result from combining primary colors available in the two situations,
additive color and subtractive

44
4.3. Color Models in Video
Video Color Transforms
Methods of dealing with color in digital video derive largely from older analog methods of
coding color for TV. Typically, some version of the luminance is combined with color
information in a single signal. YIQ is used to transmit TV signals in North America and Japan.
This coding also makes its way into VHS videotape coding in these countries, since video tape
technologies also use YIQ. In Europe, videotape uses the PAL or SECAM coding’s, which are
based on TV that uses a matrix transform called YUV. Finally, digital video mostly uses a matrix
transform called YCbCr that is closely related to YUV.

YUV Color Model


Established in 1982 to build digital video standard. Video is represented by a sequence of fields
(odd and even lines). Two fields make a frame. Works in PAL (50 fields/sec) or NTSC (60
fields/sec). The luminance (brightness), Y, is retained separately from the chrominance (color).
The Y component determines the brightness of the color (referred to as luminance or luma),
while the U and V components determine the color itself (it is called chroma). U is the axis from
blue to yellow and V is the axis from magenta to cyan. Y ranges from 0 to 1 (or 0 to 255 in
digital formats), while U and V range from -0.5 to 0.5 (or -128 to 127 in signed digital form, or 0
to 255 in unsigned form). The component of a television signal which carries information on the
brightness of the image. Y(Luminance) is the intensity of light emitted from a surface per unit
area in a given direction. Chrominance refers to the difference between a color and a reference
white at the same luminance.

45
YIQ Color Model
YIQ is used in color TV broadcasting, it is downward compatible with Black and White TV. The
YIQ color space is commonly used in North American television systems. Note that if the
chrominance is ignored, the result is a “black and white” picture. I and Q are a rotated version of
U and V. Y in YIQ is the same as in YUV; U and V are rotated by 33 degree. Y (luminance). I is
red-orange axis, Q is roughly orthogonal to I. Eye is most sensitive to Y (luminance), next to I,
next to Q. YIQ is intended to take advantage of human color response characteristics. Eye is
more sensitive to (I) than in (Q). Therefore less bandwidth is required for Q than for I. NTSC
limits I to 1.5 MHZ and Q to 0.6 MHZ. Y is assigned higher bandwidth, 4MHZ.

YCbCr
This is similar to YUV. This color space is closely related to the YUV space, but with the
coordinates shifted to allow all positive valued coefficients. The luminance (brightness), Y, is
retained separately from the chrominance (color). During development and testing of JPEG, it
became apparent that chrominance sub sampling in this space allowed a much better
compression than simply compressing RGB or CYM. Sub sampling means that only one half or
one quarter as much detail is retained for the color as for the brightness. It is used in MPEG and
JPEG compressions.

Y-Luma component

Summary of Color
 Color images are encoded as (R,G,B) integer triplet values. These triplets encode how much
the corresponding phosphor should be excited in devices such as a monitor.

46
 Three common systems of encoding in video are RGB, YIQ, and YcrCb(YUV).
 Besides the hardware-oriented color models (i.e., RGB, CMY, YIQ, YUV), HSB (Hue,
Saturation, and Brightness, e.g., used in Photoshop) and HLS (Hue, Lightness, and Saturation)
are also commonly used.
 YIQ uses properties of the human eye to prioritize information. Y is the black and white
(luminance) image; I and Q are the color (chrominance) images. YUV uses similar idea.
 YUV is a standard for digital video that specifies image size, and decimates the chrominance
images (for 4:2:2 video)
 A black and white image is a 2-D array of integers

Review Questions

1. Explain RGB and CMY color models with their color cube.
2. Is there any difference between CMY and CMYK colors?
3. Can you describe HSB color model that used in Photoshop?

47
CHAPTER FIVE:
FUNDAMENTAL CONCEPTS IN VIDEO
Lesson Content

5.1. Types of Video

5.2. Analog Video

5.3. Digital Video

5.4. Types of Color Video Signals

5.5. Video Broadcasting Standards/ TV standards

Review Question

5.1. Types of Video


5.2. Analog Video
Analog technology requires information representing images and sound to be in a real time
continuous-scale electric signal between sources and receivers. It is used throughout the
television industry. Distortion of images and noise are common problems for analog video. In an
analogue video signal, each frame is represented by a fluctuating voltage signal. This is known
as an analogue waveform. One of the earliest formats for this was composite video. Analog
formats are susceptible to loss due to transmission noise effects. Quality loss is also possible
from one generation to another. This type of loss is like photocopying, in which a copy of a copy
is never as good as the original. Most TV is still sent and received as an analog signal. Once the
electrical signal is received, we may assume that brightness is at least a monotonic function of
voltage.

An analog signal f(t) samples a time-varying image. So-called progressive scanning traces
through a complete picture (a frame) row-wise for each time interval. A high-resolution
computer monitor typically uses a time interval of 1n2 second. In TV and in some monitors and
multimedia standards, another system, interlaced scanning, is used. Here, the odd-numbered lines
are traced first, then the even-numbered lines. This result in “odd” and “even” fields – two fields
make up one frame.

In fact, the odd lines (starting from 1) end up at the middle of a line at the end of the odd field,
and the even scan starts at a halfway point. The following figure shows the scheme used. First
the solid (odd) lines are traced – P to Q, then R to S, and so on, ending at T – then the even field
starts at U and ends at V. The scan lines are not horizontal because a small voltage is applied,
moving the electron beam down over time.

48
Figure 5. 1 Scanning video

5.3. Digital Video


Digital technology is based on images represented in the form of bits. A digital video signal is
actually a pattern of 1’s and 0’s that represent the video image. With a digital video signal, there
is no variation in the original signal once it is captured on to computer disc. Therefore, the image
does not lose any of its original sharpness and clarity. The image is an exact copy of the original.
A computer is the most common form of digital technology. The limitations of analog video led
to the birth of digital video. Digital video is just a digital representation of the analogue video
signal. Unlike analogue video that degrades in quality from one generation to the next, digital
video does not degrade. Each generation of digital video is identical to the parent. Even though
the data is digital, virtually all digital, formats are still stored on sequential tapes. There are two
significant advantages for using computers for digital video:

 the ability to randomly access the storage of video and


 Compress the video stored.

Computer-based digital video is defined as a series of individual images and associated audio.
These elements are stored in a format in which both elements (pixel and sound sample) are
represented as a series of binary digits (bits). Almost all digital video uses component video.

The advantages of digital representation for video are many. It permits

 Storing video on digital devices or in memory, ready to be processed (noise removal, cut and
paste, and so on) and integrated into various multimedia applications
 Direct access, which makes nonlinear video editing simple

49
 Repeated recording without degradation of image quality
 Ease of encryption and better tolerance to channel noise

Analog vs. Digital Video

An analog video can be very similar to the original video copied, but it is not identical. Digital
copies will always be identical and will not lose their sharpness and clarity over time. However,
digital video has the limitation of the amount of RAM available, whereas this is not a factor with
analog video. Digital technology allows for easy editing and enhancing of videos.

Displaying Video

There are two ways of displaying video on screen:

 Progressive scan
 Interlaced scan

Progressive scan

Progressive scan updates all the lines on the screen at the same time. This is known as
progressive scanning. Today all PC screens write a picture like this

Figure 5. 2
Progressive scan

Interlaced Scanning

Interlaced scanning writes every second line of the picture during a scan, and writes the other
half during the next sweep. Doing that we only need 25/30 pictures per second. This idea of
splitting up the image into two parts became known as interlacing and the splitted up pictures as

50
fields. Graphically seen a field is basically a picture with every 2nd line black/white. Here is an
image that shows interlacing so that you can better imagine what happens.

Figure 5. 3 Interlaced Scanning

5.4. Types of Color Video Signals

1. Component video each primary is sent as a separate video signal. The primaries can either be
RGB or a luminance-chrominance transformation of them (e.g., YIQ, YUV). Best color
reproduction. Requires more bandwidth and good synchronization of the three components.
Component video takes the different components of the video and breaks them into separate
signals. Improvements to component video have led to many video formats, including S-Video,
RGB etc. Component video – Higher-end video systems make use of three separate video signals
for the red, green, and blue image planes. Each color channel is sent as a separate video signal.
Most computer systems use Component Video, with separate signals for R, G, and B signals. For
any color separation scheme, Component Video gives the best color reproduction since there is
no “crosstalk” between the three channels. This is not the case for S-Video or Composite Video.
Component video, however, requires more bandwidth and good synchronization of the three
components.

2. Composite video/1 Signal: color (chrominance) and luminance signals are mixed into a single
carrier wave. Some interference between the two signals is inevitable. Composite analog video

51
has all its components (brightness, color, synchronization information, etc.) combined into one
signal. Due to the compositing (or combining) of the video components, the quality of composite
video is marginal at best. The results are color bleeding, low clarity and high generational loss.

In NTSC TV, for example, I and Q are combined into a chroma signal, and a color subcarrier
then puts the chroma signal at the higher frequency end of the channel shared with the luminance
signal. The chrominance and luminance components can be separated at the receiver end, and the
two color components can be further recovered.
When connecting to TVs or VCRs, composite video uses only one wire (and hence one
connector, such as a BNC connector at each end of a coaxial cable or an RCA plug at each end
of an ordinary wire), and video color signals are mixed, not sent separately. The audio signal is
another addition to this one signal. Since color information is mixed and both color and intensity
are wrapped into the same signal, some interference between the luminance and chrominance
signals is inevitable

2. S-Video/2 Signal (Separated video): a compromise between component analog video and
the composite video. It uses two lines, one for luminance and another for composite
chrominance signal.
As a compromise, S-video (separated video, or super-video, e.g” in S-VHS) uses two
wires: one for luminance and another for a composite chrominance signal. As a result,
there is less crosstalk between the color information and the crucial gray-scale
information. The reason for placing luminance into its own part of the signal is that
black-and-white information is crucial for visual perception. Humans are able to
differentiate spatial resolution in grayscale images much better than for the color part of
color images (as opposed to the “black-and-white” part). Therefore, color information
sent can be much less accurate than intensity information. We can see only large blobs of
color, so it makes sense to send less color detail.

Table 5.1 Types of Color Video Signals

52
5.5. Video Broadcasting Standards/ TV standards
There are three different video broadcasting standards: PAL, NTSC, and SECAM

PAL (Phase Alternate Line)

PAL is a TV standard originally invented by German scientists and uses 625 horizontal lines at a
field rate of 50 fields per second (or 25 frames per second). It is used in Australia, New Zealand,
United Kingdom, and Europe.

 Scans 625 lines per frame, 25 frames per second


 Interlaced, each frame is divided into 2 fields, 312.5 lines/field
 For color representation, PAL uses YUV (YCbCr) color model
 In PAL, 5.5 MHz is allocated to Y, 1.8 MHz each to U and V

SECAM (Sequential Color with Memory)

SECAM uses the same bandwidth as PAL but transmits the color information sequentially. It is
used in France, East Europe, etc. SECAM (System Electronic Pour Couleur Avec Memoire) is
very similar to PAL. It specifies the same number of scan lines and frames per second. SECAM
also uses 625 scan lines per frame, at 25 frames per second; it is the broadcast standard for
France, Russia, and parts of Africa and Eastern Europe.

SECAM and PAL are similar, differing slightly in their color-coding scheme. In SECAM U and
V, signals are modulated using separate color subcarriers at 4.25 MHz and 4.41 MHz,
respectively. They are sent in alternate lines – that is, only one of the U or V signals will be sent
on each scan line.

NTSC (National Television Standards Committee)


53
The NTSC TV standard is mostly used in North America and Japan. NTSC is a black-and-white
and color compatible 525-line system that scans a nominal 30 interlaced television picture
frames per second. Used in USA, Canada, and Japan.

 525 scan lines per frame, 30 frames per second (or be exact, 29.97 fps, 33.37 sec/frame)
 Interlaced, each frame is divided into 2 fields, 262.5 lines/field
 20 lines reserved for control information at the beginning of each field

So a maximum of 485 lines of visible data.

Table 5.2 Comparison of analog broadcast TV systems.

HDTV (High Definition Television)

First-generation HDTV was based on an analog technology developed by Sony and NHK in
Japan in the late 1970s. HDTV successfully broadcast the 1984 Los Angeles Olympic Games in
Japan. Multiple sub-Nyquist Sampling Encoding (MUSE) was an improved NHK HDTV with
hybrid analog/digital technologies that was put in use in the 1990s. It has 1,125 scan lines,
interlaced (60 fields per second), and a 16:9 aspect ratio. It uses satellite to broadcast ~ quite
appropriate for Japan, which can be covered with one or two satellites. The Direct Broadcast
Satellite (DBS) channels used have a bandwidth of 24 :MHz.

High-Definition television (HDTV) means broadcast of television signals with a higher


resolution than traditional formats (NTSC, SECAM, PAL) allow. Except for early analog
formats in Europe and Japan, HDTV is broadcasted digitally, and therefore its introduction
sometimes coincides with the introduction of digital television (DTV).

54
 Modern plasma television uses this
 It consists of 720-1080 lines and higher number of pixels (as many as 1920 pixels).
 Having a choice in between progressive and interlaced is one advantage of HDTV. Many
people have their preferences

Table 5.3 Advanced Digital TV Formats Supported by ATSC

HDTV vs Existing Signals (NTSC, PAL, or SECAM)

The HDTV signal is digital resulting in crystal clear, noise-free pictures and CD quality sound. It
has many viewer benefits like choosing between interlaced or progressive scanning.

 Standard Definition TV (SDTV) ~ the current NTSC TV or higher


 Enhanced Definition TV (EDTV) – 480 active lines or higher
 High Definition TV (HDTV) – 720 active lines or higher. So far, the popular choices are
720P (720 lines, progressive, 30 fps) and 1080I (1,080 lines, interlaced, 30 fps or 60 fields per
second). The latter provides slightly better picture quality but requires much higher bandwidth.

Video File Formats

File formats in the PC platform are indicated by the 3 letter filename extension.

 .mov = QuickTime Movie Format


 .avi = Windows movie format
 .mpg = MPEG file format
 .mp4 = MPEG-4 Video File
 .flv = flash video file

55
 .rm = Real Media File
 .3gp = 3GPP multimedia File (used in mobile phones)

Four Factors of Digital Video

With digital video, four factors have to be kept in mind. These are:

Frame Rate

The standard for displaying any type of non-film video is 30 frames per second (film is 24
frames per second). This means that the video is made up of 30 (or 24) pictures or framesfor
every second of video. Additionally these frames are split in half (odd lines and even lines), to
form what are called fields.

Color Resolution

Color resolution refers to the number of colors displayed on the screen at one time. Computers
deal with color in an RGB (red-green-blue) format, while video uses a variety of formats. One of
the most common video formats is called YUV. Although there is no direct correlation between
RGB and YUV, they are similar in that they both have varying levels of color depth (maximum
number of colours).

Spatial Resolution

The third factor is spatial resolution – or in other words, “How big is the picture?” Since PC and
Macintosh computers generally have resolutions in excess of 640 by 480, most people assume
that this resolution is the video standard. A standard analogue video signal displays a full, over
scanned image without the borders common to computer screens. The National Television
Standards Committee ( NTSC) standard used in North America and Japanese Television uses a
768 by 484 display. The Phase Alternative system (PAL) standard for European television is
slightly larger at 768 by 576. Most countries endorse one or the other, but never both.

Since the resolution between analogue video and computers is different, conversion of analogue
video to digital video at times must take this into account. This can often the result in the down-
sizing of the video and the loss of some resolution.

Image Quality

The last and most important factor is video quality. The final objective is video that looks
acceptable for your application.

Review Question

56
1. Describe different TV Standards.
2. What are the different factors of digital video?
3. What is Progressive scan and interlacing scan? Is there any difference?
4. Explain the different video file formats.

57
CHAPTER SIX:
BASICS OF DIGITAL AUDIO
Audio information is crucial for multimedia presentations and, in a sense, is the simplest type of
multimedia data. However, some important differences between audio and image information
cannot be ignored. For example, while it is customary and useful to occasionally drop a video
frame from a video stream, to facilitate viewing speed, we simply cannot do the same with sound
information or all sense will be lost from that dimension. A voice is a quantity produced by an
animal like a human being when it speaks using its vocal cords. When this voice is digitally
sampled, or in other words, converted into an electric signal and then rendered into a playable
format, it is called audio. E.g: When I record a speech given out by my mother on her
anniversary, the input to the recorder is “voice” while the output, when I play the speech on a
player is audio. Animation is an art of drawing sketches of object and then showing them in a
series of frames so that it looks like a moving and living thing to us while a video is a recording
of either still or moving objects.

Lesson Content

6.1. Digitizing Sound

6.2. Quantization and Transmission of Audio

Review Questions

6.1. Digitizing Sound


What is Sound?
Sound is a wave phenomenon like light, but it is macroscopic and involves molecules of air
being compressed and expanded under the action of some physical device. For example, a
speaker in an audio system vibrates back and forth and produces a longitudinal pressure wave
that we perceive as sound. (As an example, we get a longitudinal wave by vibrating a Slinky
along its length; in contrast, we get a transverse wave by waving the Slinky back and forth
perpendicular to its length.) Without air, there is no sound – for example, in space. Since sound
is a pressure wave, it takes on continuous values, as opposed to digitized ones with a finite range.
Nevertheless, if we wish to use a digital version of sound waves, we must form digitized
representations of audio information.
Even though such pressure waves are longitudinal, they still have ordinary wave properties and
behaviors, such as reflection (bouncing), refraction (change of angle when entering a medium
with a different density), and diffraction (bending around an obstacle). This makes the design of
“surround sound” possible.
 Microphone produces analog signal
 Computer deals with digital signal

58
Sampling Audio
Analog Audio
Most natural phenomena around us are continuous; they are continuous transitions between two
different states. Sound is not exception to this rule i.e. sound also constantly varies. Since sound
consists of measurable pressures at any 3D point, we can detect it by measuring the pressure
level at a location, using a transducer to convert pressure to voltage levels.

Figure 6. 1 analog signal: continuous measurement of pressure wave

Figure 6.1 shows the one-dimensional nature of sound. Values change over time in amplitude:
the pressure increases or decreases with time. The amplitude value is a continuous quantity.
Since we are interested in working with such data in computer storage, we must digitize the
analog signals (i.e., continuous-valued voltages) produced by microphones. For image data, we
must likewise digitize the time-dependent analog signals produced by typical video cameras.
Digitization means conversion to a stream of numbers – preferably integers for efficiency.

Continuously varying signals are represented by analog signal. Signal is a continuous function f
in the time domain. For value y=f(t), the argument t of the function represents time. If we graph
f, it is called wave. A wave has three characteristics:
 Amplitude
 Frequency, and
 Phase
Amplitude: is the intensity of signal. This is can be determined by looking at the height of
signal. If amplitude increases, the sound becomes louder. Amplitude measures the how high or
low the voltage of the signal is at a given point of time.
Frequency: is the number of times the wave cycle is repeated. This can be determined by
counting the number of cycles in given time interval. Frequency is related with pitchness of the
sound. Increased frequencyhigh pitch.

59
Phase: related to the wave is appearance

Figure 6. 2 Digitization

Analog to Digital Conversion


Converting an analog audio to digital audio requires that the analog signal is sampled. Sampling
is the process of taking periodic measurements of the continuous signal. Samples are taken at
regular time interval, i.e. every T seconds. This is called sampling frequency/sampling rate.
Digitized audio is sampled audio. Many times each second, the analog signal is sampled. How
often these samples are taken is referred to as sampling rate. The amount of information stored
about each sample is referred to as sample size.
Analog signal is represented by amplitude and frequency. Converting these waves to digital
information is referred to as digitizing. The challenge is to convert the analog waves to numbers
(digital information. The more numbers on the scale the better the quality of the sample, but
more bits will be needed to represent that sample.

Analog signal is represented by amplitude and frequency. Converting these waves to digital
information is referred to as digitizing. The challenge is to convert the analog waves to numbers
(digital information. The more numbers on the scale the better the quality of the sample, but
more bits will be needed to represent that sample.
In digital form, the measure of frequency is referred to as how often the sample is taken. In the
graph below the sample has been taken 7 times (reading across). Frequency is talked about in
terms of Kilohertz (KHz).
Hertz (Hz) = number of cycles per second
KHz = 1000Hz
MHz = 1000 KHz

Music CDs use a frequency of 44.1 KHz. A frequency of 22 KHz for example, would mean that
the sample was taken less often.
Sampling means measuring the value of the signal at a given time period. The samples are then
quantized. Quantization is rounding the value of each sample to the nearest amplitude number in

60
the graph. For example, if amplitude of a specific sample is 5.6, this should be rounded either up
to 6 or down to 5. This is called quantization. Quantization is assigning a value (from a set) to a
sample. The quantized values are changed to binary pattern. The binary patterns are stored in
computer.

The following diagram shows digitization process (sampling, quantization, and coding)

Sample Rate
A sample is a single measurement of amplitude. The sample rate is the number of these
measurements taken every second. In order to accurately represent all of the frequencies in a
recording that fall within the range of human perception, generally accepted as 20Hz-20KHz, we
must choose a sample rate high enough to represent all of these frequencies. At first
consideration, one might choose a sample rate of 20 KHz since this is identical to the highest
frequency. This will not work, however, because every cycle of a waveform has both a positive
and negative amplitude and it is the rate of alternation between positive and negative amplitudes
that determines frequency. Therefore, we need at least two samples for every cycle resulting in a
sample rate of at least 40 KHz.

61
Common Sampling Rates
 8KHz: Used for telephone
 11.025 KHz: Speech audio
 22.05 KHz: Low Grade Audio (WWW Audio)
 44.1 KHz: CD Quality audio

The sampling rate of a real signal needs to be greater than twice the signal bandwidth. Audio
practically starts at 0 Hz, so the highest frequency present in audio recorded at 44.1 kHz is 22.05
kHz (22.05 kHz bandwidth).

Sample Resolution/Sample Size


Each sample can only be measured to a certain degree of accuracy. The accuracy is dependent on

62
the number of bits used to represent the amplitude, which is also known as the sample resolution.
How do we store each sample value (quantized value)?
 8 Bit Value (0-255) 
 16 Bit Value (Integer) (0-65535)
The amount of memory required to store t seconds long sample is as follows:
 If we use 8 bit resolution, mono recording  memory = ft81  If we use 8 bit resolution, stereo
recording  memory = ft82
 If we use 16 bit resolution, and mono recording  memory = ft161  If we use 16 bit
resolution, and stereo recording  memory =f t162
Where f is sampling frequency and t is time duration in seconds

Note: Stereo: double the bandwidth to transmit a digital audio signal.

Examples: Abebe sampled audio for 10 seconds. How much storage space is required if
a) 22.05 KHz sampling rate is used, and 8 bit resolution with mono recording?
b) 44.1 KHz sampling rate is used, and 8 bit resolution with mono recording?
c) 44.1 KHz sampling rate is used, 16 bit resolution with stereo recording?
d) 11.025 KHz sampling rate, 16 bit resolution with stereo recording?

Solution:
a) m=220508101 m= 1764000bits=220500bytes=220.5KB b) m=441008101
m= 3528000 bits=441000butes=441KB
c) m=4410016102 m= 14112000 bits= 1764000 bytes= 1764KB d) m=102516102 m= 3528000
bits= 441000 bytes= 441KB
Implications of Sample Rate and Bit Size
 Affects Quality of Audio
 Affects Size of Data

6.2. Quantization and Transmission of Audio


To be transmitted, sampled audio information must be digitized. Once the information has been
quantized, it can then be transmitted or stored.

Quantization and transformation of data are collectively known as coding of the data.
Differences in signals between the present and a previous time can effectively reduce the size of
signal values and, most important; concentrate the histogram of pixel values (differences, now)
into a much smaller range. The result of reducing the variance of values is that lossless
compression methods that produce a bit stream with shorter bit lengths for more likely values. In
general, producing quantized sampled output for audio is called pulse code modulation or PCM.
The differences version is called DPCM (and a crude but efficient variant is called DM).

63
Audio is analog – the waves we hear travel through the air to reach our eardrums. We know that
the basic techniques for creating digital signals from analog ones consist of sampling and
quantization. Sampling is invariably done uniformly – we select a sampling rate and produce one
value for each sampling time.

In the magnitude direction, we digitize by quantization, selecting breakpoints in magnitude and


remapping any value within an interval to one representative output level.

Boundaries for quantizer input intervals that will all be mapped into the same output level form a
coder mapping and the representative values that are the output values from a quantizer are a
decoder mapping.

Every compression scheme has three stages:

1. Transformation: The input data is transformed to a new representation that is easier or more
efficient to compress. For example, in Predictive Coding, we predict the next signal from
previous ones and transmit the prediction error.
2. Loss: We may introduce loss of information. Quantization is the main lossy step. Here we use a
limited number of reconstruction levels, fewer than in the original signal. Therefore, quantization
necessitates some loss of information.
3. Coding: Here, we assign a codeword (thus forming a binary bit stream) to each output level or
symbol. This could be a fixed-length code or a variable-length code, such as Huffman coding.

Review Questions
1. Describe the following terms
a. Amplitude
b. Frequency, and
c. Phase
2. How to convert analog audio to digital audio? Is there any effect if it is not converted?

3. Discuss the different sampling rate and for which purpose we use it?

4. What is sampling resolution?

5. If Mr. X sampled audio for 20 seconds. How much storage space is required if
a. 22.05 KHz sampling rate is used, and 16 bit resolution with mono recording?

b. 44.1 KHz sampling rate is used, and 8 bit resolution with mono recording?
c. 44.1 KHz sampling rate is used, 16 bit resolution with stereo recording.

64
CHAPTER SEVEN:

LOSSLESS COMPRESSION ALGORITHMS


Lesson Content

7.1. Introduction

7.2. Basics of Information Theory

7.3. Run Length Coding

7.4. Variable-Length Coding (VLC)

7.5. Huffman Coding

7.6. The Shannon-Fano Encoding Algorithm

7.7. Lempel-Ziv Encoding

7.8. Arithmetic Coding

7.9. Lossless Image Compression

Review Questions

7.1. Introduction
The Need for Compression
Take, for example, a video signal with resolution 320×240 pixels and 256 (8 bits) colors, 30
frames per second. Raw bit rate = 320x240x8x30
= 18,432,000 bits
= 2,304,000 bytes = 2.3 MB

A 90 minute movie would take 2.3x60x90 MB = 12.44 GB. Without compression, data storage
and transmission would pose serious problems!
Figure 7.1 depicts a general data compression scheme, in which compression is performed by an
encoder and decompression is performed by a decoder. We call the output of the encoder codes
or code words. The intermediate medium could be either data storage or a
communication/computer network. If the compression and decompression processes induce no
information loss, the compression scheme is lossless; otherwise, it is Lossy. The next several
chapters deal with Lossy compression algorithms as they are commonly used for image, video,
and audio compression. Here, we concentrate on 10ssless compression.

65
Figure 7.1 A general data compression scheme.

7.2. Basics of Information Theory


Information theory is the scientific study of the quantification, storage, and communication of
digital information. … A key measure in information theory is entropy. Entropy quantifies the
amount of uncertainty involved in the value of a random variable or the outcome of a random
process.

Information theory is the mathematical treatment of the concepts, parameters and rules
governing the transmission of messages through communication systems. It was founded by
Claude Shannon toward the middle of the twentieth century and has since then evolved into a
vigorous branch of mathematics fostering the development of other scientific fields, such as
statistics, biology, behavioral science, neuroscience, and statistical mechanics. The techniques
used in information theory are probabilistic in nature and some view information theory as a
branch of probability theory. In a given set of possible events, the information of a message
describing one of these events quantifies the symbols needed to encode the event in an optimal
way. ‘Optimal’ means that the obtained code word will determine the event unambiguously,
isolating it from all others in the set, and will have minimal length, that is, it will consist of a
minimal number of symbols. Information theory also provides methodologies to separate real
information from noise and to determine the channel capacity required for optimal transmission
conditioned on the transmission rate.

The foundation of information theory was laid in a 1948 paper by Shannon titled, “A
Mathematical Theory of Communication.” Shannon was interested in how much information a
given communication channel could transmit. In neuroscience, you are interested in how much
information the neuron’s response can communicate about the experimental stimulus.
Information theory is based on a measure of uncertainty known as entropy (designated “H”). For
example, the entropy of the stimulus S is written H(S) and is defined as follows:
H(S)=−ΣSP(s)log2P(s)
The subscript S underneath the summation simply means to sum over all possible stimuli S=[1, 2
… 8]. This expression is called “entropy” because it is similar to the definition of entropy in
thermodynamics. Thus, the preceding expression is sometimes referred to as “Shannon entropy.”
The entropy of the stimulus can be intuitively understood as “how long of a message (in bits) do
I need to convey the value of the stimulus?” For example, suppose the center-out task had only

66
two peripheral targets (“left” and “right”), which appeared with an equal probability. It would
take only one bit (a 0 or a 1) to convey which target appeared; hence, you would expect the
entropy of this stimulus to be 1 bit. That is what the preceding expression gives you, as P(S)=0.5
and log2(0.5)=−1. The center-out stimulus in the dataset can take on eight possible values with
equal probability, so you expect its entropy to be 3 bits.

Multimedia Data Compression


Data compression is about finding ways to reduce the number of bits or bytes used to store or
transmit the content of multimedia data. It is the process of encoding information using fewer
bits Eg. ZIP file format. As with any communication, compressed data communication only
works when both the sender and receiver of the information understand the encoding scheme.
Is compression useful?
Compression is useful because it helps reduce the consumption of resources, such as hard disk
space or transmission bandwidth.
 save storage space requirement
 speed up document transmission time

On the downside, compressed data must be decompressed to be used, and this extra processing
may be harmful to some applications. For instance, a compression scheme for video may require
expensive hardware for the video to be decompressed fast enough to be viewed as it’s being
decompressed. The option of decompressing the video in full before watching it may be
inconvenient, and requires storage space for the decompressed video.

Tradeoffs in Data Compression


The design of data compression schemes therefore involves trade-offs among various factors,
including
 The degree of compression: to what extent the data should be compressed?

 The amount of distortion introduced: to what extent quality loss is tolerated.


 The computational resources required to compress and uncompressed the data: do we have
enough memory required for compressing and uncompressing the data?

Types of Compression
Lossless Compression
The original content of the data is not lost/changed when it is compressed (encoded). It is used
mainly for compressing symbolic data such as database records, spreadsheets, texts, executable
programs, etc., Lossless compression can recover the exact original data after compression where
exact replication of the original is essential and changing even a single bit cannot be tolerated.
Examples: Run Length Encoding (RLE), Lempel Ziv (LZ), Huffman Coding.
Lossy Compression
The original content of the data is lost to certain degree when compressed. For visual and audio

67
data, some loss of quality can be tolerated without losing the essential nature of the data. Lossy
compression is used for image compression in digital cameras like JPEG, audio compression like
mp3. Video compression in DVDs with MPEG format.

Figure 7. 2 Lossless and Lossy compression technique

Lossy and Lossless Compression


GIF image files and WinZip use lossless compression. For this reason, zip software is popular
for compressing program and data files. Lossless compression does not lose any data in the
compression process.
Lossless compression has advantages and disadvantages.
 The advantage is that the compressed file will decompress to an exact duplicate of the original
file, mirroring its quality.
 The disadvantage is that the compression ratio is not all that high, precisely because no data is
lost.
To get a higher compression ratio — to reduce a file significantly beyond 50% — you must use
lossy compression

Lossless vs. Lossy compression


Lossless & lossy compression have become part of our everyday vocabulary due to the
popularity of MP3 music file, JPEG image file, MPEG video file. A sound file in WAV format,
converted to a MP3 file will lose much data as MP3 employs a lossy compression. JPEG uses
lossy compression, while GIF follows lossless compression techniques.

68
An example of lossless vs. lossy compression is the following string: 25.888888888. This string
can be compressed as 25.9! 8 and interpreted as, “twenty five point 9 eights”, the original string
is perfectly recreated, just written in a smaller form.
In a Lossy system it can be compressed as 26
In which case, the original data is lost, at the benefit of a smaller file size. The above example is
a very simple example of run-length encoding.

7.3. Run Length Coding


Data often contains sequences of identical bytes. By replacing these repeated byte sequences
with the number of occurrences, a substantial reduction of data can be achieved. In Run-length
encoding, large runs of consecutive identical data values are replaced by a simple code with the
data value and length of the run, i.e. (dataValue, LengthOfTheRun)
This encoding scheme tries to tally occurrence of data value (Xi) along with its run length,
i.e.(Xi , Length_of_Xi).
It compress data by storing runs of data (that is, sequences in which the same data value occurs
in many consecutive data elements) as a single data value & count. For example, consider the
following image with long runs of white pixels (W) and short runs of black pixels (B).
WWWWWWWWWWBWWWWWWWWWBBBWWWWWWWWWWWW

The RLE data compression algorithm, the compressed code is: 10W1B9W3B12W (Interpreted
as ten W’s, one B, nine W’s, three B’s, …)
Original sequence: 111122233333311112222 can be encoded as: (1,4),(2,3),(3,6),(1,4),(2,4)
Run-length encoding performs lossless data compression.

Lossless vs. Lossy compression


Generally, the difference between the two compression techniques is that:

 Lossless compression schemes are reversible so that the original data can be reconstructed,
 Lossy schemes accept some loss of data in order to achieve higher compression.
These Lossy data compression methods typically offer a three-way tradeoff between
 Computer resource requirement (compression speed, memory consumption)
 Compressed data size and
 Quality loss.

Lossless compression is a method of reducing the size of computer files without losing any
information. That means when you compress a file, it will take up less space, but when you
decompress it, it will still have the exact same information. The idea is to get rid of any
redundancy in the information, this is exactly what happens is used in ZIP and GIF files. This
differs from Lossy compression, such as in JPEG files, which loses some information that is not
very noticeable.

69
Common compression methods
 Statistical methods: It requires prior information about the occurrence of symbols
E.g. Huffman coding- Estimate probabilities of symbols, code one symbol at a time, shorter
codes for symbols with high probabilities.
 Dictionary-based coding: The previous algorithms (both entropy and Huffman) require the
statistical knowledge. Dictionary based coding, such as Lempel-Ziv (LZ), compression
techniques do not require prior information to compress strings. Rather, replace symbols with a
pointer to dictionary entries.

7.4. Variable-Length Coding (VLC)


Since the entropy indicates the information content in an information source S, it leads to a
family of coding methods commonly known as entropy coding methods. Variable-length coding
(VLC) is one of the best known such methods. In coding theory, a variable-length code is a code,
which maps source symbols to a variable number of bits. Variable-length codes can allow
sources to be compressed and decompressed with zero error and still be read back symbol by
symbol. In this section we will discuss about the Shannon-Fano algorithm and Huffman coding.

7.5. Huffman Coding


Developed in 1950s by David Huffman, widely used for text compression, multimedia code and
message transmission
The problem: Given a set of n symbols and their weights (or frequencies), construct a tree
structure (a binary tree for binary code) with the objective of reducing memory space and
decoding time per symbol. For instance, Huffman coding is constructed based on frequency of
occurrence of letters in text documents.

The output of the Huffman encoder is determined by the Model (probabilities). The higher the
probability of occurrence of the symbol, the shorter the code assigned to that symbol and vice

70
versa. This will enable to easily control the most frequently occurring symbols in a data and also
reduce the time taken during decoding each symbols.

How to construct Huffman coding


Step 1: Create forest of trees for each symbol, t1, t2,… tn
Step 2: Sort forest of trees according to falling probabilities of symbol occurrence
Step 3: WHILE more than one tree exist DO
Merge two trees t1 and t2 with least probabilities p1 and p2
Label their root with sum p1 + p2

Associate binary code: 1 with the right branch and 0 with the left branch
Step 4: Create a unique code word for each symbol by traversing the tree from the root to the
leaf.
Concatenate all encountered 0s and 1s together during traversal
Example 1: Consider a 7-symbol alphabet given in the following table to construct the Huffman
coding.

The Huffman encoding algorithm picks each time two symbols (with the smallest frequency) to
combine.

71
Figure 7.
3 Huffman Coding Tree

Using the Huffman coding a table can be constructed by working down the tree, left to right.
This gives the binary equivalents for each symbol in terms of 1s and 0s.

Example 3: construct the tree & binary code by using Huffman coding

72
7.6. The Shannon-Fano Encoding Algorithm
1. Calculate the frequency of each of the symbols in the list.
2. Sort the list in (decreasing) order of frequencies.
3. Divide the list into two half’s, with the total frequency counts of each half being as close as
possible to each other.
4. The right half is assigned a code of 1 and the left half with a code of 0.
5. Recursively apply steps 3 and 4 to each of the halves, until each symbol has become a
corresponding code leaf on the tree. That is, treat each split as a list and apply splitting and code
assigning till you are left with lists of single elements.
6. Generate code word for each symbol
Let us assume the source alphabet S={X1,X2,X3,Ö,Xn} and Associated probability
P={P1,P2,P3 ,Ö,Pn}. The steps to encode data using Shannon-Fano coding algorithm is as
follows: Order the source letter into a sequence according to the probability of occurrence in
non-increasing order i.e. decreasing order.

ShannonFano (sequence s)

If s has two letters,

Attach 0 to the codeword of one letter and 1 to the codeword of another; Else if s has more than
two letter. Divide s into two subsequences S1, and S2 with the minimal difference between

73
probabilities of each subsequence; extend the codeword for each letter in S1 by attaching 0, and
by attaching 1 to each codeword for letters in S2;

ShannonFano(S1);

ShannonFano(S2);

Example 1: Given five symbols A to E with their frequencies being 15, 7, 6, 6 & 5; encode them
using Shannon-Fano entropy encoding

Solution:

Step1: Say, we are given that there are five symbols (A to E) that can occur in a source with their
frequencies being 15 7 6 6 and 5. First, sort the symbols in decreasing order of frequency.

Step2: Divide the list into two equal halves. That is, the counts of both halves are as close as
possible to each other. Therefore, in this case we split the list between B and C & assign 0 and 1.

Step3: We recursively repeat the steps of splitting and assigning code until each symbol become
a code leaf on the tree. That is, treat each split as a list, apply splitting, and code assigning until
you are left with lists of single elements.

Step 4: Note that we split the list containing C, D and E between C and D because the difference
between the split lists is 11 minus 6, which is 5, if we were to have divided between D and E we
would get a difference of 12-5 which is 7.

Step5: We complete the algorithm and as a result have codes assigned to the symbols.

74
Example 2: Suppose the following source and with related probabilities S={A,B,C,D,E}
P={0.35,0.17,0.17,0.16,0.15} Message to be encoded=îABCDEî The probability is already
arranged in non-increasing order. First, we divide the message into AB and CDE. Why? This
gives the smallest difference between the total probabilities of the two groups.

S1={A,B} P={0.35,0.17}=0.52 S2={C,D,E} P={0.17,0.17,0.16}=0.46 The difference is only


0.52-0.46=0.06. This is the smallest possible difference when we divide the message. Attach 0 to
S1 and 1 to S2. Subdivide S1 into sub groups. S11={A} attach 0 to this S12={B} attach 1 to this
Again subdivide S2 into subgroups considering the probability again. S21={C} P={0.17}=0.17
S22={D,E} P={0.16,0.15}=0.31 Attach 0 to S21 and 1 to S22. Since S22 has more than one
letter in it, we have to subdivide it. S221={D} attach 0 S222={E} attach 1

Figure 7. 4 Shannon-Fano coding tree


The message is transmitted using the following code (by traversing the tree)

75
A=00 B=01

C=10 D=110

E=111

Instead of transmitting ABCDE, we transmit 000110110111.

7.7. Lempel-Ziv Encoding


Data compression up until the late 1970’s mainly directed towards creating better methodologies
for Huffman coding. An innovative, radically different method was introduced in1977 by
Abraham Lempel and Jacob Ziv. The zip and unzip use the LZH technique while UNIX’s
compress methods belong to the LZW and LZC classes.

Lempel-Ziv compression

The problem with Huffman coding is that it requires knowledge about the data before encoding
takes place. Huffman coding requires frequencies of symbol occurrence before codeword is
assigned to symbols

76
In Lempel-Ziv compression not rely on previous knowledge about the data rather builds this
knowledge in the course of data transmission/data storage. Lempel-Ziv algorithm (called LZ)
uses a table of code-words created during data transmission; and it transmits the index of the
symbol/word instead of the word itself. Each time it replaces strings of characters with a
reference to a previous occurrence of the string.

Lempel-Ziv Compression Algorithm


 The multi-symbol patterns are of the form: C0C1 . . . Cn-1 Cn. The prefix of a pattern consists
of all the pattern symbols except the last: C0C1 . . . Cn-1
 Lempel-Ziv Output: there are three options in assigning a code to each symbol in the list
 If one-symbol pattern is not in dictionary, assign (0, symbol)
 If multi-symbol pattern is not in dictionary, assign (dictionaryPrefixIndex, lastPatternSymbol)
 If the last input symbol or the last pattern is in the dictionary, asign (dictionaryPrefixIndex, )
Eg: Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ algorithm.

Figure 7. 5 Lempel-Ziv algorithm.

The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)


Note: The above is just a representation, the commas and parentheses are not transmitted
Consider the string ABBCBCABABCAABCAAB given in example 1. The compressed string
consists of codewords and the corresponding codeword index as shown below:

Codeword: (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)


Codeword index: 1 2 3 4 5 6 7
The actual compressed message is: 0A0B10C11A010A100A110B where each character is
replaced by its binary 8-bit ASCII code.

77
Example: Decompression Decode (i.e., decompress) the sequence
(0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)

7.8. Arithmetic Coding


Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression.
Normally, a string of characters is represented using a fixed number of bits per character, as in
the ASCII code. An arithmetic coding algorithm encodes an entire file as a sequence of symbols
into a single decimal number. The input symbols are processed one at each iteration. The interval
derived at the end of this division process is used to decide the code word for the entire sequence
of symbols.

Example: Arithmetic coding of the word “BELBA”

78
UL= LL+d (ul)d (f) Where LL: lower limit, d (u, l) difference of upper and lower & d (f) is
frequency of letter For the first letter in B, the lower limit is zero, and the upper limit is 0.4. UL=
LL+d (ul)d (f)
B =0 + (0.4 – 0) 0.4= 0+0.40.4=0.16
E= 0 + (0.4 – 0) 0.6= 0+0.40.6=0.24
L= 0 + (0.4 – 0) 0.8= 0+0.40.8=0.32
A= 0 + (0.4 – 0) 1= 0+0.41=0.4
Similar to others

A message is represented by a half-open interval [a, b) where a and b are real numbers between a
and 1. Initially, the interval is [0, 1). When the message becomes longer, the length of the
interval shortens, and the number of bits needed to represent the interval increases. Suppose the
alphabet is [A, B, C, D, E, F, $], in which $ is a special symbol used to terminate the message,
and the known probability distribution is listed below.

79
7.9. Lossless Image Compression
One of the most commonly used compression techniques in multimedia data compression is
differential coding. The basis of data reduction in differential coding is the redundancy in
consecutive symbols in a data stream. Audio is a signal indexed by one dimension, time. Here
we consider how to apply the lessons learned from audio to the context of digital image signals
~hat are indexed by two, spatial, dimensions (x, y).

Let’s consider differential coding in the context of digital images. In a sense, we move from
signals with domain in one dimension to signals indexed by numbers in two dimensions (x, y) –
the rows and columns of an image. Later, we’ll look at video signals. These are even more
complex, in that they are indexed by space and time (x, y, t). Because of the continuity of the
physical world, the gray-level intensities (or color) of background and foreground objects in
images tend to change relatively slowly across the image frame. Since we were dealing with

80
signals in the time domain for audio, practitioners generally refer to images as signals in the
spatial doma’in. The generally slowly changing.

Lossless JPEG

Lossless IPEG is a special case of the JPEG image compression. It differs drastically from other
IPEG modes in that the algorithm has no lossy steps. Thus we treat it here and consider the more
used JPEG methods in Chapter 9. Lossless JPEG is invoked when the user selects a 100%
quality factor in an image tool. Essentially, lossless IPEG is included in the JPEG compression
standard simply for completeness. The following predictive method is applied on the
unprocessed original image (or each color band of the original color image). It essentially
involves two steps: forming a differential prediction and encoding.

Review Questions
1. Given the following symbols and their corresponding frequency of occurrence, find an optimal
binary code for compression

I.
Using the Huffman algorithm
II. Using Entropy coding scheme

2. Encode (i.e., compress) the following strings using the Lempel-Ziv algorithm.
ABBCDBBBDBCCBCCB
3. Encode using Arithmetic coding for the word “HELLO”
4. Encode this by using RLE
a. 4444666667777779999999 b. MMMEEEEDDDDIIIIIIIAAAAAA

81

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy