introduction to multimedia module
introduction to multimedia module
TO
MULTIMEDIA
MODULE
Table of Contents
CHAPTER ONE ............................................................................................................................. 4
INTRODUCTION TO MULTIMEDIA ......................................................................................... 4
1.1. What is Multimedia? ............................................................................................................. 4
1.2. History of Multimedia Systems ............................................................................................... 6
1.3. Hypermedia and Multimedia ................................................................................................ 7
1.4. Multimedia and World Wide Web (WWW)............................................................................ 9
1.5. Multimedia System Requirement ....................................................................................... 10
Review Question .......................................................................................................................... 17
CHAPTER TWO .......................................................................................................................... 18
MULTIMEDIA AUTHORING AND TOOLS ............................................................................ 18
2.1. What is Multimedia Authoring? ............................................................................................ 18
2.2. Multimedia Authoring Paradigms.......................................................................................... 19
2.3. Some Useful Editing and Authoring Tools ............................................................................ 24
Review Questions ........................................................................................................................ 26
CHAPTER THREE: ..................................................................................................................... 27
MULTIMEDIA DATA REPRESENTATIONS........................................................................... 27
3.1. Graphic/Image Data Representation ................................................................................. 27
3.2. Popular File Formats .............................................................................................................. 32
3.3. Digital Audio and MIDI ...................................................................................................... 34
Review Questions ........................................................................................................................ 38
CHAPTER FOUR:........................................................................................................................ 39
COLORS IN IMAGE AND VIDEO ............................................................................................ 39
4.1. Color Spaces .......................................................................................................................... 40
4.2. Color Models in Images ......................................................................................................... 42
4.3. Color Models in Video........................................................................................................... 45
Review Questions ........................................................................................................................ 47
CHAPTER FIVE: ......................................................................................................................... 48
FUNDAMENTAL CONCEPTS IN VIDEO ................................................................................ 48
5.1. Types of Video ....................................................................................................................... 48
5.2. Analog Video ......................................................................................................................... 48
2
5.3. Digital Video .......................................................................................................................... 49
5.4. Types of Color Video Signals .............................................................................................. 51
5.5. Video Broadcasting Standards/ TV standards ....................................................................... 53
Review Question .......................................................................................................................... 56
CHAPTER SIX: ............................................................................................................................ 58
BASICS OF DIGITAL AUDIO ................................................................................................... 58
6.1. Digitizing Sound .................................................................................................................... 58
6.2. Quantization and Transmission of Audio .......................................................................... 63
Review Questions ........................................................................................................................ 64
LOSSLESS COMPRESSION ALGORITHMS ........................................................................... 65
7.1. Introduction ............................................................................................................................ 65
7.2. Basics of Information Theory ................................................................................................ 66
7.3. Run Length Coding ................................................................................................................ 69
7.4. Variable-Length Coding (VLC) ............................................................................................. 70
7.5. Huffman Coding .................................................................................................................... 70
7.6. The Shannon-Fano Encoding Algorithm ........................................................................... 73
7.7. Lempel-Ziv Encoding ............................................................................................................ 76
7.8. Arithmetic Coding ................................................................................................................. 78
7.9. Lossless Image Compression ................................................................................................. 80
Review Questions ........................................................................................................................ 81
3
CHAPTER ONE
INTRODUCTION TO MULTIMEDIA
Lesson Content
Presentation– refers to the type of physical means to reproduce information to the user.
-Speakers
-Video windows, etc.
Perception– describes the nature of information as perceived by the user
-Speech
-Music
-Film
4
2) Based on the word “Multimedia”
It is composed of two words:
Multi– multiple/many
Media– source
Multimedia refers to multiple sources of information. It is a system, which integrates all the
above types.
Definitions:
Multimedia means computer information can be represented in audio, video and animated format
in addition to traditional format. The traditional formats are text and graphics.
General and working definition:
Multimedia is the field concerned with the computer controlled integration of text, graphics,
drawings, still and moving images (video), animation, and any other media where every type of
information can be represented, stored, transmitted, and processed digitally.
5
Virtual reality(the creation of artificial environment that you can explore, e.g. 3-D images)
Augmented reality (placing real-appearing computer graphics and video objects into scenes so
as to take the physics of objects and lights (e.g., shadows) into account
Distributed lectures for higher education
Digital libraries
World Wide Web
On-line reference works e.g. encyclopedias, games, etc.
Electronic Newspapers/Magazines
Games
Groupware (enabling groups of people to collaborate on projects and share information)
Cooperative work environments that allow business people to edit a shared document or
schoolchildren to share a single game using two mice that pass control back and forth.
Making multimedia components editable – allowing the user side to decide what components,
video, graphics, and so on are actually viewed and allowing the client to move components
around or delete them – making components distributed
Features of Multimedia
Multimedia has three aspects:
Content: movie, production, etc.
Creative Design: creativity is important in designing the presentation
Enabling Technologies: Network and software tools that allow creative designs to be presented.
6
1960s-Ted Nelson started Xanadu project (Xanadu – a kind of deep Hypertext).
Project Xanadu was the explicit inspiration for the World Wide Web, for Lotus
Notes and for HyperCard, as well as less-well-known systems.
1967 – Nicholas Negroponte formed the Architecture Machine Group at MIT.
A combination lab and think tank responsible for many radically new approaches to the human-
computer interface. Nicholas Negroponte is the Wiesner Professor of Media Technology at the
Massachusetts Institute of Technology.
1968 – Douglas Engelbart demonstrated NLS (Online Systems) system at SRI.
Shared-screen collaboration involving two persons at different sites communicating over a
network with audio and video interface is one of the many innovations presented at the
demonstration . 1976 1969 – Nelson & Van Dam hypertext editor at Brown
7
Hypertext
Hypermedia
8
systems; electronic newspapers and magazines; the World Wide Web; online reference works,
such as encyclopedias; games; groupware; home shopping; interactive TV; multimedia
courseware; video conferencing; video-on-demand; and interactive movies.
1. Very high processing speed processing power. Why? Because there are large data to be
processed. Multimedia systems deals with large data and to process data in real time, the
hardware should have high processing capacity.
2. It should support different file formats. Why? Because we deal with different data types (media
types).
3. Efficient and High Input-output: input and output to the file subsystem needs to be efficient and
fast. It has to allow for real-time recording as well as playback of data.
4. Special Operating System: to allow access to file system and process data efficiently and
quickly. It has to support direct transfers to disk, real-time scheduling, fast interrupt processing,
I/O streaming, etc.
5. Storage and Memory: large storage units and large memory are required. Large
Caches are also required.
6. Network Support: Client-server systems common as distributed systems common.
7. Software Tools: User-friendly tools needed to handle media, design and develop applications,
deliver media.
9
transmission of any file type. HTTP is a “stateless” request/response protocol, in the sense that a
client typically opens a connection to the HTTP server, requests information, the server
responds, and the connection is terminated – no information is carried over for the next request.
The basic request format is
Method URI Version
Additional-Headers Message-body
The Uniform Resource Identifier (URI) identifies the resource accessed, such as the host name,
always preceded by the token ”http://”. A URI could be a Uniform Resource Locator CURL), for
example. Here, the URI can also include query strings (some interactions require submitting
data). Method is a way of exchanging information or performing tasks on the URI. Two popular
methods are GET and POST. GET specifies that the information requested is in the request string
itself, while the POST method specifies that the resource pointed to in the URI should consider
the message body. POST is generally used for submitting HTML forms. Additional-Headers
specifies additional parameters about the client. For example, to request access to this textbook’s
web site, the following HTTP message might be generated:
Status-Code is a number that identifies the response type (or error that occurs), and Status-Phrase
is a textual description of it. Two commonly seen status codes and phrases are 200 OK when the
request was processed successfully and 404 Not Found when the URI does not exist. For
example, in response to the example request above the web server may return something like:
HTTP/l.l 200 OK Server:
[No~plugs~here~please] Date: Wed, 25 July 2002
20 : 04 : 30 GMT
Content-Length: 1045 Content-Type: text/html
<HTML>
….
</HTML>
10
Software Requirement
spell check
table formatting
Thesaurus
templates ( e.g. letters, resumes, & other common documents) Examples: Microsoft Word,
Word perfect, Note pad
11
Multimedia authoring tools:
Multimedia authoring tools provide important framework that is needed for organizing and
editing objects included in the multimedia project (e.g. graphics, animation, sound, video, etc.).
They provide editing capability to limited extent.
Examples: Macromedia Flash, Macromedia Director, Macromedia Authoware
Macromedia Flash: Flash allows users to create interactive movies by using the score metaphor
– a timeline arranged in parallel event sequences, much like a musical score consisting of
musical notes. Elements in the movie are called symbols in Flash. Symbols are added to a central
repository, called a library, and can be added to the movie’s timeline. Once the symbols are
present at a specific time, they appear on the Stage, which represents what the movie looks like
at a certain time, and can be manipulated and moved by the tools built into Flash. Finished Flash
movies are commonly used to show movies or games on the web.
Macromedia Director: Director uses a movie metaphor to create interactive presentations. This
powerful program includes a built-in scripting language, Lingo, which allows creation of
complex interactive movies. The “cast” of characters in Director includes bitmapped sprites,
scripts, music, sounds, and palettes. Director can read many bitmapped file formats. The program
itself allows a good deal of interactivity, and Lingo, with its own debugger, allows more control,
including control over external devices, such as VCRs and videodisc players. Director also has
web-authoring features available, for creation of fully interactive Shockwave movies playable
over the web.
Authorware: is a mature, well-supported authoring product that has an easy learning curve for
computer science students because it is based on the idea of flowcharts (the so-called
iconic/flow-control metaphor). It allows hyperlinks to link text, digital movies, graphics, and
sound. It also provides compatibility between files produced in PC and Mac versions.
Shockwave Authorware applications can incorporate Shockwave files, including Director
Movies, Flash animations, and audio.
OCR software
These soft wares convert printed document into electronically recognizable ASCII character. It is
used with scanners. Scanners convert printed document into bitmap. Then these software’s break
the bitmap into pieces according to whether it contains text or graphics. This is done by
examining the texture and density of the bitmap and by detecting edges.
Text area ASCII text
Bitmap area bitmap image to do the above, these softwares use probability and expert system.
Use:
To include printed documents in our project without typing from keyboard. To include
documents in their original format e.g signatures, drawings, etc Examples: OmniPage Pro
Perceive
12
To include printed documents in our project without typing from keyboard. To include
documents in their original format e.g signatures, drawings, etc Examples: OmniPage Pro
Perceive
Painting and Drawing Tools to create graphics for web and other purposes, painting and editing
tools are crucial.
Painting Tools: are also called image-editing tools. They are used to edit images of different
format. They help us to retouch and enhance bitmap images. Some painting tools allow to edit
vector based graphics too. Some of the activities of editing include:
bluring the picture
removing part of the picture
add texts to picture
merge two or more pictures together, etc
Examples: Macromedia Fireworks and Adobe photoshop
Drawing Tool: used to create vector based graphics. Examples: Macromedia Freehand,
CorelDraw, Illustrator
Drawing and painting tools should have the following features:
Scalable dimension for restore, stretch, and distorting images/graphics
Customizable pen and brush shapes and sizes
Multiple undo capabilities
Capacity to import and export files in different formats
Ability to create geometric shapes from circle, rectangle, line, etc.
Zooming for magnified image editing
Support for third party plug-ins.
Macromedia Fireworks: Fireworks is software for making graphics specifically for the web. It
includes a bitmap editor, a vector graphics editor, and a JavaScript generator for buttons and
rellovers.
Video Editing
13
Animation and digital video movie are sequence of bitmapped graphic frames rapidly played
back. Some of the tools to edit video include:
Adobe premier, Deskshare Video Edit Magic, Videoshop These application display time
references (relationship between time & the video), frame counts, audio, transparency level, etc.
Hardware Requirement
3) Network devices
I) RAM: is the primary requirement for multimedia system. Why? Reasons: – you have to store
authoring software itself. E.g Flash takes 20MB of memory, Photoshop 16-20MB, etc. –
digitized audio and video is stored in memory – Animated files, etc. To store this at the same
time, you need large amount of memory
II) Storage Devices: large capacity storage devices are necessary to store multimedia data.
Floppy Disk: not sufficient to store multimedia data. Because of this, they are not used to store
multimedia data.
Hard Disk: the capacity of hard disk should be high to store large data. CD: is important for
multimedia because they are used to deliver multimedia data to users. A wide variety of data
like:
DVD: have high capacity than CDs. Similarly, they are also used to distribute multimedia data to
users. Some of the characteristics of DVD:
High storage capacity 4.7-17GB
Use narrow tracks than CDs high storage capacity
14
High data transfer rate 4.6MB/sec
2) Input-Output Devices
I) Interacting with the system: to interact with multimedia system, we use either keyboard,
mouse, track ball, or touch screen, etc.
Mouse: multimedia project is typically designed to be used with mouse as an input pointing
device. Other devices like track ball and touch screen could be used in place of mouse. Track
ball is similar with mouse in many ways.
Wireless mouse: important when the presenter has to move around during presentation
Touch Screen: we use fingers instead of mouse to interact with touch screen computers.
There are three technologies used in touch screens:
i. Infrared light: such touch screens use invisible infrared light that are projected across the
surface of screen. A finger touching the screen interrupts the beams generating electronic signal.
Then it identifies the x-y coordinate of the screen where the touch occurred and sends signals to
the operating system for processing.
ii. Texture-coated: such monitors are coated with texture material that is sensitive towards
pressure. When user presses the monitor, the texture material on the monitor extracts the x-y
coordinate of the location and send signals to operating system
iii. Touch mate:
Use: touch screens are used to display/provide information in public areas such as airports,
Museums, transport service areas, hotels, etc.
Advantage:
user friendly
easy to use even for non-technical people
easy to learn how to use
II) Information Entry Devices: the purpose of these devices is to enter information to be
included in our multimedia project into our computer.
OCR: they enable us to use OCR softwares convert printed document into ASCII file.
Graphical Tablets/ Digitizer: both are used to convert points, lines, and curves from sketch into
digital format. They use a movable device called stylus.
Scanners: enable us to convert printed images into digital format.
Microphones: they are important because they enable us to record speech, music, etc. The
microphone is designed to pick up and amplify incoming acoustic waves or harmonics precisely
and correctly and convert them to electrical signals. You have to purchase a superior, high-
quality microphone because your recordings will depend on its quality.
Digital Camera and Video Camera (VCR): are important to record and include image and
video in MMS respectively. Digital video cameras store images as digital data, and they do not
record on film. You can edit the video taken using video camera and VCR using video editing
15
tools.
Remark: video takes large memory space.
Output Devices
Depending on the content of the project, & how the information is presented, you need different
output devices. Some of the output hardwares are:
Speaker: if your project includes speeches that are meant to convey message to audience, or
background music, using speaker is obligatory.
Projector: when to use projector:
if you are presenting on meeting or group discussion,
if you are presenting to large number of audience
Plotter/printer: when the situation arises to present using papers, you use printer and/or plotters.
In such cases, print quality of the device should be taken into consideration.
Impact printers: not good quality graphics/poor quality
Non-impact printers: good quality graphics
3) Network Devices
Why do we require network devices?
The following network devices are required for multimedia presentation:
i) Modem: which stands for modulator demodulator, is used to convert digital signal into analog
signal for communication of the data over telephone line which can carry only analog signal. At
the receiving end, it does the reverse action i.e. converts analog to digital data. Currently, the
standard modem is called v.90, which has the speed of 56kbps (kilobits per second). Older
standards include v.34, which has the speed of 28kbps. Data is transferred through modem in
compressed format to save time and cost.
ii) ISDN: stands for Integrated Services Digital Network. It is circuit switched telephone
network system, designed to allow digital transmission of voice and data over ordinary telephone
copper wires. This has the advantage of better quality and higher speeds than available with
analog systems.
It has higher transmission speed i.e faster data transfer rate.
They use additional hardware hence they are more expensive.
iii) Cable modem: uses existing cables stretched for television broadcast reception. The data
transfer rate of such devices is very fast i.e. they provide high bandwidth. They are primarily
used to deliver broadband internet access, taking advantage of unused bandwidth on a cable
television network.
iv) DSL: provide digital data transmission over the telephone wires of local telephone network.
The speed of DSL is faster than using telephone line with modem. How? They carry a digital
signal over the unused frequency spectrum (analog voice transmission uses limited range of
16
spectrum) available on the twisted pair cables running between the telephone company’s central
office and the customer premises.
Summary
Multimedia Information Flow
Review Question
1. What is multimedia?
2. What are the desirable feature of multimedia?
3. Discuss some application are of multimedia.
4. What are the different hardware and software requirements of multimedia?
5. What is the difference between hypertext and hypermedia?
6. How web is related to multimedia?
17
CHAPTER TWO
MULTIMEDIA AUTHORING AND TOOLS
Review Questions
In a slide show, interactivity generally consists of being able to control the pace (e.g., click to
advance to the next slide). The next level of interactivity is being able to control the sequence
and choose where to go next. Next is media control: start/stop video, search text, scroll the view,
and zoom. More control is available if we can control variables, such as changing a database
search query. The level of control is substantially higher if we can control objects – say, moving
objects around a screen, playing interactive games, and so on. Finally, we can control an entire
simulation: move our perspective in the scene, control scene objects
18
True authoring environments, which lie somewhere in between in terms of technical
complexity.
Authoring systems vary widely in:
Orientation
Capabilities, and
Learning curve: how easy it is to learn how to use the application
Scripting Language
The idea here is to use a special language to enable interactivity (buttons, mouse, etc.) and allow
conditionals, jumps, loops, functions/macros, and so on. Closest in form to traditional
programming. The paradigm is that of a programming language, which specifies:
multimedia elements,
sequencing of media elements,
19
hotspots (e.g links to other pages),
synchronization, etc.
Usually use a powerful, object-oriented scripting language. Multimedia elements and events
become objects that live in a hierarchical order. In-program editing of elements (still graphics,
video, audio, etc.) tends to be minimal or non-existent. Most authoring tools provide visually
programmable interface in addition to scripting language. Media handling can vary widely
Examples
The Apples HyperTalk for HyperCard,
Asymetrixís OpenScript for ToolBook and
Lingo scripting language for Macromedia Director
ActionScript for Macromedia Flash
In these authoring systems, multimedia elements and interaction cues (or events) are organised as
objects in a structural framework.
Provides visual programming approach to organizing and presenting multimedia
The core of the paradigm is the icon palette. You build a structure and flowchart of events,
tasks, and decisions by dragging appropriate icons from icon palette library. These icons are used
to represent and include menu choice, graphic images, sounds, computations, video, etc.
The flow chart graphically depict the project logic
Tends to be the speediest in development time. Because of this, they are best suited for rapid
prototyping and short-development time projects.
These tools are useful for storyboarding because you can change the sequence of objects,
restructure interaction, add objects, by dragging and dropping icons.
Examples:
-Authorware
– IconAuthor
20
Figure 2. 1 Iconic/Flow control
Hypercard (Macintosh)
– SuperCard(Macintosh)
21
Time Based Authoring Tools
In these authoring systems elements are organized along a time line with resolutions as high as
1/30th second. Sequentially organized graphic frames are played back at a speed set by
developer. Other elements, such as audio events, can be triggered at a given time or location in
the sequence of events.
Are the most popular multimedia authoring tool
They are best suited for applications that have a message with beginning and end, animation
intensive pages, or synchronized media application.
Examples
o Macromedia Director
o Macromedia Flash
Macromedia Director
Director is a powerful and complex multimedia authoring tool, which has broad set of features to
create multimedia presentation, animation, and interactive application. You can assemble and
sequence the elements of project using cast and score. Three important things that director uses
to arrange and synchronize media elements:
Cast
22
Cast: is multimedia database containing any media type that is to be included in the project. It
imports wide range of data type and multimedia element formats directly into the cast. You can
also create elements from scratch and add to cast. To include multimedia elements in cast into
the stages, you drag and drop the media on the stage.
Score: This is where the elements in the cast are arranged. It is sequence for displaying,
animating, and playing cast members. Score is made of frames and frames contain cast member.
You can set frame rate per second.
Lingo
Lingo is a full-featured object oriented scripting language used in Director.
Macromedia Flash
Can accept both vector and bitmap graphics
Uses a scripting language called Action Script, which gives greater capability to control the
movie.
Flash is commonly used to create animations, advertisements, to design web-page elements, to
add video to web pages, and more recently, to develop Rich Internet Applications. Rich Internet
Applications (RIA) are web applications that have the features and functionality of traditional
desktop applications. RIA’s uses a client side technology, which can execute instructions on the
client’s computer (no need to send every data to the server).
Flash is a simple authoring tool that facilitates the creation of interactive movies. Flash follows
the score metaphor in the way the movie is created and the windows are organized.
Flash uses:
Library: a place where objects that are to be re-used are stored. The Library window shows all
the current symbols in the scene and can be toggled by the Window > Library command. A
symbol can be edited by double-clicking its name in the library, which causes it to appear on the
stage. Symbols can also be added to a scene by simply dragging the symbol from the Library
onto the stage.
Timeline: used to organize and control a movie content over time. Manages the layers and
timelines of the scene. The left portion of the Timeline window consists of one or more layers of
the Stage, which enables you to easily organize the Stage’s contents. Symbols from the Library
can be dragged onto the Stage, into a particular layer. For example, a simple movie could have
two layers, the background and foreground. The background graphic from the library can be
23
dragged onto the stage when the background layer is selected.
Layer: helps to organize contents. Timeline is divided into layers.
ActionScript: enables interactivity and control of movies. Action scripts allow you to trigger
events such as moving to a different keyframe or requiring the movie to stop. Action scripts can
be attached to a keyframe or symbols in a keyframe. Right clicking on the symbol and pressing
Actions in the list can modify the actions of a symbol. Similarly, by right clicking on the
keyframe and pressing Actions in the pop-up, you can apply actions’ to a keyframe. A Frame
Actions window will come up, with a list of available actions on the left and the current actions
being applied symbol on the right.
Most of them are displayed in web browsers using plug-ins or the browser itself can
understand them.
This metaphor is the basis of WWW.
It is limited but can be extended by the use of suitable multimedia tags
Multimedia Production
A multimedia project can involve a host of people with specialized skills. Multimedia production
can easily involve an art director, graphic designer, production artist, producer, project manager,
writer, user interface designer, sound designer, videographer, and 3D and 2D animators, as well
as programmers.
24
Macromedia Flash
Dreamweaver MX
Adobe Premiere
Adobe Premiere is a very simple video editing program that allows you to quickly create a
simple digital video by assembling and merging multimedia components. It effectively uses the
score authoring metaphor, in that components are placed in tracks horizontally, in a Timeline
window. The File > New Project command opens a window that displays a series of presets –
assemblies of values for frame resolution, compression method, and frame rate. There are many
preset options, most of which conform to some NTSC or PAL video standard. Start by importing
resources, such as AVI (Audio Video Interleave) video files and WAV sound files and dragging
them from the Project window onto tracks 1 or 2.
4) Interactivity feature: interactivity offers to the end user of the project to control the content
and flow of information. Some of interactivity levels:
25
a. Simple branching: enables the user to go to any location in the presentation using key press,
mouse click, etc.
b. Conditional branching: branching based on if-then decisions
c. Structured branching: support complex programming logic such as nested if-then
subroutines
6) Playback feature: easy testing of the project. Testing enables you to debug the system and
find out how the user interacts with it. Not waste time in assembling and testing the project
7) Delivery feature: delivering your project needs building run-time version of the project using
authoring tools. Why run time version (executable format):
It does not require the full authoring software to play
It does not allow users to access or change the content, structure, and programming of the
project. Distributerun-time version
8) Cross platform feature: multimedia projects should be compatible with different platform
like Macintosh, Windows, etc. This enables the designer to use any platform to design the project
or deliver it to any platform.
9) Internet playability: web is significant delivery medium for multimedia. Authoring tools
typically provide facility so that output can be delivered in HTML or DHTML format.
10) Ease of learning: is it easy to learn? The designer should not waste much time learning how
to use it. Is it easy to use?
Review Questions
1. Explain briefly about Multimedia Authoring. Is that important for Multimedia? How?
2. Can you mention different Multimedia Authoring tool with their importance?
3. Explain briefly the characteristics of authoring tool.
4. What does it mean time based authoring tool?
26
CHAPTER THREE:
MULTIMEDIA DATA REPRESENTATIONS
3.1. Graphic/Image Data Representation
Review Questions
An image could be described as two-dimensional array of points where every point is allocated
its own color. Every such single point is called pixel, short form of picture element. Image is a
collection of these points that are colored in such a way that they produce meaningful
information /data. Pixel (picture element) contains the color or hue and relative brightness of that
point in the image. The number of pixels in the image determines the resolution of the image.
A digital image consists of many picture elements, called pixels
The number of pixels determines the quality of the imageimage resolution.
Higher resolution always yields better quality.
Bitmap resolution most graphics applications let you create bitmaps up to 300 dots per inch
(dpi). Such high resolution is useful for print media, but on the screen most of the
information is lost, since monitors usually display around 72 to 96 dpi.
A bit-map representation stores the graphic/image data in the same manner that the
27
computer monitor contents are stored in video memory.
Most graphic/image formats incorporate compression because of the large size of the data.
Types of Images
There are two basic forms of computer graphics: bit-maps and vector graphics. The kind you use
determines the tools you choose. Bitmap formats are the ones used for digital photographs.
Vector formats are used only for line drawings.
Bit-map images (also called Raster Graphics)
They are formed from pixels – a matrix of dots with different colors. Bitmap images are defined
by their dimension in pixels as well as by the number of colors they represent. For example, a
640X480 image contains 640 pixels and 480 pixels in horizontal and vertical direction
respectively. If you enlarge a small area of a bit-mapped image, you can clearly see the pixels
that
are used to create it.
Each of the small pixels can be a shade of gray or a color. Using 24-bit color, each pixel can be
set to any one of 16 million colors. All digital photographs and paintings are bitmapped, and any
other kind of image can be saved or exported into a bitmap format. In fact, when you print any
kind of image on a laser or ink-jet printer, it is first converted by either the computer or printer
into a bitmap form so it can be printed with the dots the printer uses.
To edit or modify bitmapped images you use a paint program. Bitmap images are widely used
but they suffer from a few unavoidable problems. They must be printed or displayed at a size
determined by the number of pixels in the image. Bitmap images also have large file sizes that
are determined by the images dimensions in pixels and its color depth. To reduce this problem,
some graphic formats such as GIF and JPEG are used to store images in compressed format.
Vector graphics
They are really just a list of graphical objects such as lines, rectangles, ellipses, arcs, or curves –
called primitives. Draw programs, also called vector graphics programs, are used to create and
edit these vector graphics. These programs store the primitives as a set of numerical coordinates
28
and mathematical formulas that specify their shape and position in the image. This format is
widely used by computer-aided design programs to create detailed engineering and design
drawings. It is also used in multimedia when 3D animation is desired. Draw programs have a
number of advantages over paint-type program.
These include:
Precise control over lines and colors.
Ability to skew and rotate objects to see them from different angles or add perspective.
Ability to scale objects to any size to fit the available space. Vector graphics always print at
the best resolution of the printer you use, no matter what size you make them.
Color blends and shadings can be easily changed.
Text can be wrapped around objects.
1. Monochrome/Bit-Map Images
Images consist of pixels, or pels – picture elements in digital images. A 1-bit image consists of
on and off bits only and thus is the simplest type of image. Each pixel is stored as a single bit (0
or 1). Hence, such an image is also referred to as a binary image. It is also called a I-bit
monochrome image, since it contains no color.
The value of the bit indicates whether it is light or dark
A 640 x 480 monochrome image requires 37.5 KB of storage.
Dithering is used to calculate patterns of dots such that values from 0 to 255 correspond to
patterns that are more and more filled at darker pixel values, for printing on a 1-bit printer.
Dithering is often used for displaying monochrome images
29
2. Gray-scale Images
Each pixel is usually stored as a byte (value between 0 to 255). The entire image can be
thought of as a two-dimensional array of pixel values. We refer to such an array as a bitmap, a
representation of the graphics/image data that parallels the manner in which it is stored in video
memory.
This value indicates the degree of brightness of that point. This brightness goes from black to
white
A 640 x 480 grayscale image requires over 300 KB of storage.
30
Such image files use the concept of a lookup table to store color information. Basically, the
image stores not color but instead just a set of bytes, each of which is an index into a table with
3-byte values that specify the color for a pixel with that lookup table index. In a way, it is a bit,
as a paint-by-number children’s art set, with number 1 perhaps standing for orange, number 2 for
green, and so on – there is no inherent pattern to the set of actual colors.
31
Figure 3. 5 Color LUT for 8-bit color images.
Image Resolution
Image resolution refers to the spacing of pixels in an image and is measured in pixels per inch,
ppi, sometimes called dots per inch, dpi. The higher the resolution, the more pixels in the image.
A printed image that has a low resolution may look pixelated or made up of small squares, with
jagged edges and without smoothness. Image size refers to the physical dimensions of an image.
The most common formats used on internet are the GIF, JPG, and PNG.
Standard System Independent Formats
Graphics Interchange Format (GIF)
Graphics Interchange Format (GIF) devised CompuServe, initially for transmitting graphical
images over phone lines via modems.
32
Uses the Lempel-Ziv Welch algorithm (a form of Huffman Coding), modified slightly for
image scan line packets (line grouping of pixels).
Limited to only 8-bit (256) color images, suitable for images with few distinctive colors (e.g.,
graphics drawing)
Supports one-dimensional interlacing (downloading gradually in web browsers. Interlaced
images appear gradually while they are downloading. They display at a low blurry resolution
first and then transition to full resolution by the time the download is complete.)
PNG
stands for Portable Network Graphics
It is intended as a replacement for GIF in the WWW and image editing tools.
GIF uses LZW compression, which is patented by Unisys. All use of GIF may have to pay
royalties to Unisys due to the patent.
PNG uses unpatented zip technology for compression
One version of PNG, PNG-8, is similar to the GIF format. It can be saved with a maximum of
256 colors and supports 1-bit transparency. File sizes when saved in a capable image editor like
Fire Works will be noticeably smaller than the GIF counterpart will, as PNGs save their color
data more efficiently.
PNG-24 is another version of PNG, with 24-bit color support, allowing ranges of color to a
high color JPG. However, PNG-24 is in no way a replacement format for JPG, because it is a
loss-less compression format, which results in large file size.
Provides transparency using alpha value
Supports interlacing
PNG can be animated through the MNG extension of the format, but browser support is less
for this format.
JPEG/JPG
A standard for photographic image compression
created by the Joint Photographic Experts Group
Intended for encoding and compression of photographs and similar images
Takes advantage of limitations in the human vision system to achieve high rates of
compression
33
Uses complex lossy compression, which allows user to set the desired level of quality
(compression). A compression setting of about 60% will result in the optimum balance of quality
and file size.
Though JPGs can be interlaced, they do not support animation and transparency unlike GIF.
TIFF
Tagged Image File Format (TIFF), stores many different types of images (e.g., monochrome,
grayscale, 8-bit & 24-bit RGB, etc.)
Uses tags, keywords defining the characteristics of the image that is included in the file. For
example, a picture 320 by 240 pixels would include a ‘width’ tag followed by the number ‘320’
and a ‘depth’ tag followed by the number ‘240’.
Developed by the Aldus Corp. in the 1980s and later supported by the Microsoft
TIFF is a lossless format (when not utilizing the new JPEG tag which allows for JPEG
compression)
It does not provide any major advantages over JPEG and is not as user-controllable.
Do not use TIFF for web images. They produce big files, and more importantly, most web
browsers will not display TIFFs.
PAINT was originally used in Mac Paint program, initially only for 1-bit monochrome images.
PICT is a file format that was developed by Apple Computer in 1984 as the native format for
Macintosh graphics.
The PICT format is a meta-format that can be used for both bitmap images and vector images
though it was originally used in MacDraw (a vector based drawing program) for storing
structured graphics.
Still an underlying Mac format (although PDF on OS X).
First, to record digital audio, you need a card, which has an Analog to Digital Converter (ADC)
circuitry. The ADC is attached to the Line In (and Mic In) jack of your audio card, and converts
the incoming analog audio to a digital signal. Your computer software can store the digitized
audio on your hard drive, visually display on the computer’s monitor, mathematically manipulate
in order to add effects, or process the sound, etc. While the incoming analog audio is being
recorded, the ADC is creates many digital values in its conversion to a digital audio
representation of what is being recorded. These values must be stored for later playback.
Digitizing Sound
Microphone produces analog signal
Computers understands only discrete(digital) entities
This creates a need to convert Analog audio to Digital audio — specialized hardware. This is
35
also known as Sampling.
Common Audio Formats
There are two basic types of audio files:
AU: The AU file format is a compressed audio file format developed by Sun Microsystems and
popular in the unix world. It is also the standard audio file format for the Java programming
language. Only supports 8-bit depth thus cannot provide CD-quality sound.
MP3: MP3 stands for Motion Picture Experts Group, Audio Layer 3 Compression. MP3 files
provide near-CD-quality sound but are only about 1/10th as large as a standard audio CD file.
Because MP3 files are small, they can easily be transferred across the Internet and played on any
multimedia computer with MP3 player software.
MIDI/MID: MIDI (Musical Instrument Digital Interface), is not a file format for storing or
transmitting recorded sounds, but rather a set of instructions used to play electronic music on
devices such as synthesizers. MIDI files are very small compared to recorded audio file formats.
However, the quality and range of MIDI tones is limited.
36
RA/RM
For audio data on the Internet, the de facto standard is RealNetwork’s RealAudio (.RA)
compressed streaming audio format. These files require a RealPlayer program or browser plug-
in. The latest versions of RealNetworksserver and player software can handle multiple
encodings of a single file, allowing the quality of transmission to vary with the available
bandwidth. Webcast radio broadcast of both talk and music frequently uses RealAudio. treaming
audio can also be provided in conjunction with video as a combined RealMedia (RM) file.
ASF
Microsofts Advanced Streaming Format (ASF) is similar to designed to RealNetwork’s
RealMedia format, in that it provides a common definition for internet streaming media and can
accommodate not only synchronized audio, but also video and other multimedia elements, all
while supporting multiple bandwidths within a single media file. Also like RealNetwork’s
RealMedia format, Microsofts ASF requires a program or browser plugin.
The pure audio file format used in Windows Media Technologies is Windows Media Audio 7
(WMA files). Like MP3 files, WMA audio files use sophisticated audio compression to reduce
file size. Unlike MP3 files, however, WMA files can function as either discrete or streaming data
and can provide a security mechanism to prevent unauthorized use.
MOV
Apple QuickTime movies (MOV files) can be created without a video channel and used as a
sound-only format. Since version 4.0, QuickTime provides true streaming capability. QuickTime
also accepts different audio sample rates, bit depths, and offers full functionality in both
Windows as well as the Mac OS. Popular audio file formats are:
au (Unix)
aiff (MAC)
wav (PC)
mp3
MIDI
MIDI stands for Musical Instrument Digital Interface.
Definition of MIDI: MIDI is a protocol that enables computer, synthesizers, keyboards, and other
musical device to communicate with each other. This protocol is a language that allows
interworking between instruments from different manufacturers by providing a link that is
capable of transmitting and receiving digital data. MIDI transmits only commands; it does not
transmit an audio signal. It was created in 1982.
37
1. Synthesizer: It is a sound generator (various pitch, loudness, tone color). A good (musician’s)
synthesizer often has a microprocessor, keyboard, control panels, memory, etc.
2. Sequencer: It can be a stand-alone unit or a software program for a personal computer. (It used to
be a storage server for MIDI data. Nowadays it is more a software music editor on the computer.
It has one or more MIDI INs and MIDI OUTs.
Multi-timbral: capable of playing many different sounds at the same time (e.g., piano, brass,
drums,..)
Pitch: The Musical note that the instrument plays
Voice: Voice is the portion of the synthesizer that produces sound. Synthesizers can have many
(1Two, Two0, Two4, 36, etc.) voices. Each voice works independently and simultaneously to
produce sounds of Different timbre and pitch.
Patch: The control settings that define a particular timbre.
Review Questions
1. What is the difference bit map and vectors graphics? Which one is better? Why.
2. What is grayscale Image?
3. What are the different popular file formats of images?
4. Explain briefly the common audio formats. Where we used?
38
CHAPTER FOUR:
COLORS IN IMAGE AND VIDEO
Light and Spectra
In 1672, Isaac Newton discovered that white light could be split into many colors by a prism.
The colors produced by light passing through prism are arranged in precise array or spectrum.
The colors spectral signature is identified by its wavelength.
Review Questions
RGB stands for Red, Green, Blue. RGB color space expresses/defines color as a mixture of three
primary colors:
Red
Green
Blue
Absence of all colors create black and presence of the three colors form white. These colors are
called additive colors. Pure black (0,0,0). Pure white (255,255,255) all other colors are produced
by varying the intensity of these three primaries and mixing the colors.
40
Figure 4. 2 RGB color Model
These colors are called additive colors since they add together the way light adds to make colors,
and is a natural color space to use with video displays.
A color model used with printers and other peripherals. Three primary colors, cyan (C), magenta
(M), and yellow (Y), are used to reproduce all colors.
41
Figure 4. 3 CMY Color Model
The three colors together absorb all the light that strikes it, appearing black (as contrasted to
RGB where the three colors together made white). “Nothing” on the paper is white (as contrasted
to RGB where nothing was black). These are called the subtractive or “paint” colors. Cyan,
Magenta, and Yellow (CMY) are complementary colors of RGB. CMY model is mostly used in
printing devices where the color pigments on the paper absorb certain colors (e.g., no red light
reflected from cyan ink) and in painting.
In practice, it is difficult to have the exact mix of the three colors to perfectly absorb all light and
thus produce a black color. Expensive inks are required to produce the exact color, and the paper
must absorb each color in exactly the same way. To avoid these problems, a forth color is often
added – black – creating the CYMK color space, even though the black is mathematically not
required. Sometimes, an alternative CMYK model (K stands for Black) is used in color printing
(e.g., to produce darker black than simply mixing CMY).
42
CMY Color Model (Subtractive Color)
Additive color: Namely, when two light beams affect on a target, their colors add; when two
phosphors on a CRT screen are turned on, their colors add. However, for ink deposited on paper,
the opposite situation holds: yellow ink subtracts blue from white illumination, but reflects red
and green; it appears yellow. Instead of red, green, and blue primaries, we need primaries that
amount to -red, -green, and -blue. i.e we need to subtract R, or G, or B. These subtractive color
primaries are Cyan (C), Magenta (M) and Yellow (Y) inks.
So far, we have effectively been dealing only with additive color. Namely, when two light beams
impinge on a target, their colors add; when two phosphors on a CRT screen are turned 49
on, their colors add. Therefore, for example, red phosphor + green phosphor makes yellow light.
However, for ink deposited on paper, in essence the opposite situation holds: yellow ink
subtracts blue from white illumination but reflects red and green; which is why it appears
yellow! Therefore, instead of red, green, and blue primaries, we need primaries that amount to —
-red, -green, and -blue; we need to subtract R, G, or B. These subtractive color primaries are
cyan (C), magenta (M), and yellow (Y) inks. RGB and CMY are connected. In the additive
(RGB) system, black is “no light”, RGB = (0,0, 0). In the subtractive CMY system, black arises
from, subtracting all the light by laying down inks with C = M = Y = 1.
Transformation from RGB to CMY
Simplest model we can invent to specify what ink density to lay down on paper, to make a
certain desired RGB color: Then the inverse transform is:
Undercolor Removal: CMYK System
Undercolor removal: Sharper and cheaper printer colors: calculate that part of the CMY mix that
would be black, remove it from the color proportions, and add it back as real black. The new
specification of inks is thus: K stands for black.
Color combinations
43
Transformation from RGB to CMY
Simplest model we can invent to specify what ink density to lay down on paper, to make a
certain desired RGB color: Then the inverse transform is:
Undercolor removal: Sharper and cheaper printer colors: calculate that part of the CMY mix
that would be black, remove it from the color proportions, and add it back as real black. The new
specification of inks is thus: K stands for black.
Color combinations that result from combining primary colors available in the two situations,
additive color and subtractive
44
4.3. Color Models in Video
Video Color Transforms
Methods of dealing with color in digital video derive largely from older analog methods of
coding color for TV. Typically, some version of the luminance is combined with color
information in a single signal. YIQ is used to transmit TV signals in North America and Japan.
This coding also makes its way into VHS videotape coding in these countries, since video tape
technologies also use YIQ. In Europe, videotape uses the PAL or SECAM coding’s, which are
based on TV that uses a matrix transform called YUV. Finally, digital video mostly uses a matrix
transform called YCbCr that is closely related to YUV.
45
YIQ Color Model
YIQ is used in color TV broadcasting, it is downward compatible with Black and White TV. The
YIQ color space is commonly used in North American television systems. Note that if the
chrominance is ignored, the result is a “black and white” picture. I and Q are a rotated version of
U and V. Y in YIQ is the same as in YUV; U and V are rotated by 33 degree. Y (luminance). I is
red-orange axis, Q is roughly orthogonal to I. Eye is most sensitive to Y (luminance), next to I,
next to Q. YIQ is intended to take advantage of human color response characteristics. Eye is
more sensitive to (I) than in (Q). Therefore less bandwidth is required for Q than for I. NTSC
limits I to 1.5 MHZ and Q to 0.6 MHZ. Y is assigned higher bandwidth, 4MHZ.
YCbCr
This is similar to YUV. This color space is closely related to the YUV space, but with the
coordinates shifted to allow all positive valued coefficients. The luminance (brightness), Y, is
retained separately from the chrominance (color). During development and testing of JPEG, it
became apparent that chrominance sub sampling in this space allowed a much better
compression than simply compressing RGB or CYM. Sub sampling means that only one half or
one quarter as much detail is retained for the color as for the brightness. It is used in MPEG and
JPEG compressions.
Y-Luma component
Summary of Color
Color images are encoded as (R,G,B) integer triplet values. These triplets encode how much
the corresponding phosphor should be excited in devices such as a monitor.
46
Three common systems of encoding in video are RGB, YIQ, and YcrCb(YUV).
Besides the hardware-oriented color models (i.e., RGB, CMY, YIQ, YUV), HSB (Hue,
Saturation, and Brightness, e.g., used in Photoshop) and HLS (Hue, Lightness, and Saturation)
are also commonly used.
YIQ uses properties of the human eye to prioritize information. Y is the black and white
(luminance) image; I and Q are the color (chrominance) images. YUV uses similar idea.
YUV is a standard for digital video that specifies image size, and decimates the chrominance
images (for 4:2:2 video)
A black and white image is a 2-D array of integers
Review Questions
1. Explain RGB and CMY color models with their color cube.
2. Is there any difference between CMY and CMYK colors?
3. Can you describe HSB color model that used in Photoshop?
47
CHAPTER FIVE:
FUNDAMENTAL CONCEPTS IN VIDEO
Lesson Content
Review Question
An analog signal f(t) samples a time-varying image. So-called progressive scanning traces
through a complete picture (a frame) row-wise for each time interval. A high-resolution
computer monitor typically uses a time interval of 1n2 second. In TV and in some monitors and
multimedia standards, another system, interlaced scanning, is used. Here, the odd-numbered lines
are traced first, then the even-numbered lines. This result in “odd” and “even” fields – two fields
make up one frame.
In fact, the odd lines (starting from 1) end up at the middle of a line at the end of the odd field,
and the even scan starts at a halfway point. The following figure shows the scheme used. First
the solid (odd) lines are traced – P to Q, then R to S, and so on, ending at T – then the even field
starts at U and ends at V. The scan lines are not horizontal because a small voltage is applied,
moving the electron beam down over time.
48
Figure 5. 1 Scanning video
Computer-based digital video is defined as a series of individual images and associated audio.
These elements are stored in a format in which both elements (pixel and sound sample) are
represented as a series of binary digits (bits). Almost all digital video uses component video.
Storing video on digital devices or in memory, ready to be processed (noise removal, cut and
paste, and so on) and integrated into various multimedia applications
Direct access, which makes nonlinear video editing simple
49
Repeated recording without degradation of image quality
Ease of encryption and better tolerance to channel noise
An analog video can be very similar to the original video copied, but it is not identical. Digital
copies will always be identical and will not lose their sharpness and clarity over time. However,
digital video has the limitation of the amount of RAM available, whereas this is not a factor with
analog video. Digital technology allows for easy editing and enhancing of videos.
Displaying Video
Progressive scan
Interlaced scan
Progressive scan
Progressive scan updates all the lines on the screen at the same time. This is known as
progressive scanning. Today all PC screens write a picture like this
Figure 5. 2
Progressive scan
Interlaced Scanning
Interlaced scanning writes every second line of the picture during a scan, and writes the other
half during the next sweep. Doing that we only need 25/30 pictures per second. This idea of
splitting up the image into two parts became known as interlacing and the splitted up pictures as
50
fields. Graphically seen a field is basically a picture with every 2nd line black/white. Here is an
image that shows interlacing so that you can better imagine what happens.
1. Component video each primary is sent as a separate video signal. The primaries can either be
RGB or a luminance-chrominance transformation of them (e.g., YIQ, YUV). Best color
reproduction. Requires more bandwidth and good synchronization of the three components.
Component video takes the different components of the video and breaks them into separate
signals. Improvements to component video have led to many video formats, including S-Video,
RGB etc. Component video – Higher-end video systems make use of three separate video signals
for the red, green, and blue image planes. Each color channel is sent as a separate video signal.
Most computer systems use Component Video, with separate signals for R, G, and B signals. For
any color separation scheme, Component Video gives the best color reproduction since there is
no “crosstalk” between the three channels. This is not the case for S-Video or Composite Video.
Component video, however, requires more bandwidth and good synchronization of the three
components.
2. Composite video/1 Signal: color (chrominance) and luminance signals are mixed into a single
carrier wave. Some interference between the two signals is inevitable. Composite analog video
51
has all its components (brightness, color, synchronization information, etc.) combined into one
signal. Due to the compositing (or combining) of the video components, the quality of composite
video is marginal at best. The results are color bleeding, low clarity and high generational loss.
In NTSC TV, for example, I and Q are combined into a chroma signal, and a color subcarrier
then puts the chroma signal at the higher frequency end of the channel shared with the luminance
signal. The chrominance and luminance components can be separated at the receiver end, and the
two color components can be further recovered.
When connecting to TVs or VCRs, composite video uses only one wire (and hence one
connector, such as a BNC connector at each end of a coaxial cable or an RCA plug at each end
of an ordinary wire), and video color signals are mixed, not sent separately. The audio signal is
another addition to this one signal. Since color information is mixed and both color and intensity
are wrapped into the same signal, some interference between the luminance and chrominance
signals is inevitable
2. S-Video/2 Signal (Separated video): a compromise between component analog video and
the composite video. It uses two lines, one for luminance and another for composite
chrominance signal.
As a compromise, S-video (separated video, or super-video, e.g” in S-VHS) uses two
wires: one for luminance and another for a composite chrominance signal. As a result,
there is less crosstalk between the color information and the crucial gray-scale
information. The reason for placing luminance into its own part of the signal is that
black-and-white information is crucial for visual perception. Humans are able to
differentiate spatial resolution in grayscale images much better than for the color part of
color images (as opposed to the “black-and-white” part). Therefore, color information
sent can be much less accurate than intensity information. We can see only large blobs of
color, so it makes sense to send less color detail.
52
5.5. Video Broadcasting Standards/ TV standards
There are three different video broadcasting standards: PAL, NTSC, and SECAM
PAL is a TV standard originally invented by German scientists and uses 625 horizontal lines at a
field rate of 50 fields per second (or 25 frames per second). It is used in Australia, New Zealand,
United Kingdom, and Europe.
SECAM uses the same bandwidth as PAL but transmits the color information sequentially. It is
used in France, East Europe, etc. SECAM (System Electronic Pour Couleur Avec Memoire) is
very similar to PAL. It specifies the same number of scan lines and frames per second. SECAM
also uses 625 scan lines per frame, at 25 frames per second; it is the broadcast standard for
France, Russia, and parts of Africa and Eastern Europe.
SECAM and PAL are similar, differing slightly in their color-coding scheme. In SECAM U and
V, signals are modulated using separate color subcarriers at 4.25 MHz and 4.41 MHz,
respectively. They are sent in alternate lines – that is, only one of the U or V signals will be sent
on each scan line.
525 scan lines per frame, 30 frames per second (or be exact, 29.97 fps, 33.37 sec/frame)
Interlaced, each frame is divided into 2 fields, 262.5 lines/field
20 lines reserved for control information at the beginning of each field
First-generation HDTV was based on an analog technology developed by Sony and NHK in
Japan in the late 1970s. HDTV successfully broadcast the 1984 Los Angeles Olympic Games in
Japan. Multiple sub-Nyquist Sampling Encoding (MUSE) was an improved NHK HDTV with
hybrid analog/digital technologies that was put in use in the 1990s. It has 1,125 scan lines,
interlaced (60 fields per second), and a 16:9 aspect ratio. It uses satellite to broadcast ~ quite
appropriate for Japan, which can be covered with one or two satellites. The Direct Broadcast
Satellite (DBS) channels used have a bandwidth of 24 :MHz.
54
Modern plasma television uses this
It consists of 720-1080 lines and higher number of pixels (as many as 1920 pixels).
Having a choice in between progressive and interlaced is one advantage of HDTV. Many
people have their preferences
The HDTV signal is digital resulting in crystal clear, noise-free pictures and CD quality sound. It
has many viewer benefits like choosing between interlaced or progressive scanning.
File formats in the PC platform are indicated by the 3 letter filename extension.
55
.rm = Real Media File
.3gp = 3GPP multimedia File (used in mobile phones)
With digital video, four factors have to be kept in mind. These are:
Frame Rate
The standard for displaying any type of non-film video is 30 frames per second (film is 24
frames per second). This means that the video is made up of 30 (or 24) pictures or framesfor
every second of video. Additionally these frames are split in half (odd lines and even lines), to
form what are called fields.
Color Resolution
Color resolution refers to the number of colors displayed on the screen at one time. Computers
deal with color in an RGB (red-green-blue) format, while video uses a variety of formats. One of
the most common video formats is called YUV. Although there is no direct correlation between
RGB and YUV, they are similar in that they both have varying levels of color depth (maximum
number of colours).
Spatial Resolution
The third factor is spatial resolution – or in other words, “How big is the picture?” Since PC and
Macintosh computers generally have resolutions in excess of 640 by 480, most people assume
that this resolution is the video standard. A standard analogue video signal displays a full, over
scanned image without the borders common to computer screens. The National Television
Standards Committee ( NTSC) standard used in North America and Japanese Television uses a
768 by 484 display. The Phase Alternative system (PAL) standard for European television is
slightly larger at 768 by 576. Most countries endorse one or the other, but never both.
Since the resolution between analogue video and computers is different, conversion of analogue
video to digital video at times must take this into account. This can often the result in the down-
sizing of the video and the loss of some resolution.
Image Quality
The last and most important factor is video quality. The final objective is video that looks
acceptable for your application.
Review Question
56
1. Describe different TV Standards.
2. What are the different factors of digital video?
3. What is Progressive scan and interlacing scan? Is there any difference?
4. Explain the different video file formats.
57
CHAPTER SIX:
BASICS OF DIGITAL AUDIO
Audio information is crucial for multimedia presentations and, in a sense, is the simplest type of
multimedia data. However, some important differences between audio and image information
cannot be ignored. For example, while it is customary and useful to occasionally drop a video
frame from a video stream, to facilitate viewing speed, we simply cannot do the same with sound
information or all sense will be lost from that dimension. A voice is a quantity produced by an
animal like a human being when it speaks using its vocal cords. When this voice is digitally
sampled, or in other words, converted into an electric signal and then rendered into a playable
format, it is called audio. E.g: When I record a speech given out by my mother on her
anniversary, the input to the recorder is “voice” while the output, when I play the speech on a
player is audio. Animation is an art of drawing sketches of object and then showing them in a
series of frames so that it looks like a moving and living thing to us while a video is a recording
of either still or moving objects.
Lesson Content
Review Questions
58
Sampling Audio
Analog Audio
Most natural phenomena around us are continuous; they are continuous transitions between two
different states. Sound is not exception to this rule i.e. sound also constantly varies. Since sound
consists of measurable pressures at any 3D point, we can detect it by measuring the pressure
level at a location, using a transducer to convert pressure to voltage levels.
Figure 6.1 shows the one-dimensional nature of sound. Values change over time in amplitude:
the pressure increases or decreases with time. The amplitude value is a continuous quantity.
Since we are interested in working with such data in computer storage, we must digitize the
analog signals (i.e., continuous-valued voltages) produced by microphones. For image data, we
must likewise digitize the time-dependent analog signals produced by typical video cameras.
Digitization means conversion to a stream of numbers – preferably integers for efficiency.
Continuously varying signals are represented by analog signal. Signal is a continuous function f
in the time domain. For value y=f(t), the argument t of the function represents time. If we graph
f, it is called wave. A wave has three characteristics:
Amplitude
Frequency, and
Phase
Amplitude: is the intensity of signal. This is can be determined by looking at the height of
signal. If amplitude increases, the sound becomes louder. Amplitude measures the how high or
low the voltage of the signal is at a given point of time.
Frequency: is the number of times the wave cycle is repeated. This can be determined by
counting the number of cycles in given time interval. Frequency is related with pitchness of the
sound. Increased frequencyhigh pitch.
59
Phase: related to the wave is appearance
Figure 6. 2 Digitization
Analog signal is represented by amplitude and frequency. Converting these waves to digital
information is referred to as digitizing. The challenge is to convert the analog waves to numbers
(digital information. The more numbers on the scale the better the quality of the sample, but
more bits will be needed to represent that sample.
In digital form, the measure of frequency is referred to as how often the sample is taken. In the
graph below the sample has been taken 7 times (reading across). Frequency is talked about in
terms of Kilohertz (KHz).
Hertz (Hz) = number of cycles per second
KHz = 1000Hz
MHz = 1000 KHz
Music CDs use a frequency of 44.1 KHz. A frequency of 22 KHz for example, would mean that
the sample was taken less often.
Sampling means measuring the value of the signal at a given time period. The samples are then
quantized. Quantization is rounding the value of each sample to the nearest amplitude number in
60
the graph. For example, if amplitude of a specific sample is 5.6, this should be rounded either up
to 6 or down to 5. This is called quantization. Quantization is assigning a value (from a set) to a
sample. The quantized values are changed to binary pattern. The binary patterns are stored in
computer.
The following diagram shows digitization process (sampling, quantization, and coding)
Sample Rate
A sample is a single measurement of amplitude. The sample rate is the number of these
measurements taken every second. In order to accurately represent all of the frequencies in a
recording that fall within the range of human perception, generally accepted as 20Hz-20KHz, we
must choose a sample rate high enough to represent all of these frequencies. At first
consideration, one might choose a sample rate of 20 KHz since this is identical to the highest
frequency. This will not work, however, because every cycle of a waveform has both a positive
and negative amplitude and it is the rate of alternation between positive and negative amplitudes
that determines frequency. Therefore, we need at least two samples for every cycle resulting in a
sample rate of at least 40 KHz.
61
Common Sampling Rates
8KHz: Used for telephone
11.025 KHz: Speech audio
22.05 KHz: Low Grade Audio (WWW Audio)
44.1 KHz: CD Quality audio
The sampling rate of a real signal needs to be greater than twice the signal bandwidth. Audio
practically starts at 0 Hz, so the highest frequency present in audio recorded at 44.1 kHz is 22.05
kHz (22.05 kHz bandwidth).
62
the number of bits used to represent the amplitude, which is also known as the sample resolution.
How do we store each sample value (quantized value)?
8 Bit Value (0-255)
16 Bit Value (Integer) (0-65535)
The amount of memory required to store t seconds long sample is as follows:
If we use 8 bit resolution, mono recording memory = ft81 If we use 8 bit resolution, stereo
recording memory = ft82
If we use 16 bit resolution, and mono recording memory = ft161 If we use 16 bit
resolution, and stereo recording memory =f t162
Where f is sampling frequency and t is time duration in seconds
Examples: Abebe sampled audio for 10 seconds. How much storage space is required if
a) 22.05 KHz sampling rate is used, and 8 bit resolution with mono recording?
b) 44.1 KHz sampling rate is used, and 8 bit resolution with mono recording?
c) 44.1 KHz sampling rate is used, 16 bit resolution with stereo recording?
d) 11.025 KHz sampling rate, 16 bit resolution with stereo recording?
Solution:
a) m=220508101 m= 1764000bits=220500bytes=220.5KB b) m=441008101
m= 3528000 bits=441000butes=441KB
c) m=4410016102 m= 14112000 bits= 1764000 bytes= 1764KB d) m=102516102 m= 3528000
bits= 441000 bytes= 441KB
Implications of Sample Rate and Bit Size
Affects Quality of Audio
Affects Size of Data
Quantization and transformation of data are collectively known as coding of the data.
Differences in signals between the present and a previous time can effectively reduce the size of
signal values and, most important; concentrate the histogram of pixel values (differences, now)
into a much smaller range. The result of reducing the variance of values is that lossless
compression methods that produce a bit stream with shorter bit lengths for more likely values. In
general, producing quantized sampled output for audio is called pulse code modulation or PCM.
The differences version is called DPCM (and a crude but efficient variant is called DM).
63
Audio is analog – the waves we hear travel through the air to reach our eardrums. We know that
the basic techniques for creating digital signals from analog ones consist of sampling and
quantization. Sampling is invariably done uniformly – we select a sampling rate and produce one
value for each sampling time.
Boundaries for quantizer input intervals that will all be mapped into the same output level form a
coder mapping and the representative values that are the output values from a quantizer are a
decoder mapping.
1. Transformation: The input data is transformed to a new representation that is easier or more
efficient to compress. For example, in Predictive Coding, we predict the next signal from
previous ones and transmit the prediction error.
2. Loss: We may introduce loss of information. Quantization is the main lossy step. Here we use a
limited number of reconstruction levels, fewer than in the original signal. Therefore, quantization
necessitates some loss of information.
3. Coding: Here, we assign a codeword (thus forming a binary bit stream) to each output level or
symbol. This could be a fixed-length code or a variable-length code, such as Huffman coding.
Review Questions
1. Describe the following terms
a. Amplitude
b. Frequency, and
c. Phase
2. How to convert analog audio to digital audio? Is there any effect if it is not converted?
3. Discuss the different sampling rate and for which purpose we use it?
5. If Mr. X sampled audio for 20 seconds. How much storage space is required if
a. 22.05 KHz sampling rate is used, and 16 bit resolution with mono recording?
b. 44.1 KHz sampling rate is used, and 8 bit resolution with mono recording?
c. 44.1 KHz sampling rate is used, 16 bit resolution with stereo recording.
64
CHAPTER SEVEN:
7.1. Introduction
Review Questions
7.1. Introduction
The Need for Compression
Take, for example, a video signal with resolution 320×240 pixels and 256 (8 bits) colors, 30
frames per second. Raw bit rate = 320x240x8x30
= 18,432,000 bits
= 2,304,000 bytes = 2.3 MB
A 90 minute movie would take 2.3x60x90 MB = 12.44 GB. Without compression, data storage
and transmission would pose serious problems!
Figure 7.1 depicts a general data compression scheme, in which compression is performed by an
encoder and decompression is performed by a decoder. We call the output of the encoder codes
or code words. The intermediate medium could be either data storage or a
communication/computer network. If the compression and decompression processes induce no
information loss, the compression scheme is lossless; otherwise, it is Lossy. The next several
chapters deal with Lossy compression algorithms as they are commonly used for image, video,
and audio compression. Here, we concentrate on 10ssless compression.
65
Figure 7.1 A general data compression scheme.
Information theory is the mathematical treatment of the concepts, parameters and rules
governing the transmission of messages through communication systems. It was founded by
Claude Shannon toward the middle of the twentieth century and has since then evolved into a
vigorous branch of mathematics fostering the development of other scientific fields, such as
statistics, biology, behavioral science, neuroscience, and statistical mechanics. The techniques
used in information theory are probabilistic in nature and some view information theory as a
branch of probability theory. In a given set of possible events, the information of a message
describing one of these events quantifies the symbols needed to encode the event in an optimal
way. ‘Optimal’ means that the obtained code word will determine the event unambiguously,
isolating it from all others in the set, and will have minimal length, that is, it will consist of a
minimal number of symbols. Information theory also provides methodologies to separate real
information from noise and to determine the channel capacity required for optimal transmission
conditioned on the transmission rate.
The foundation of information theory was laid in a 1948 paper by Shannon titled, “A
Mathematical Theory of Communication.” Shannon was interested in how much information a
given communication channel could transmit. In neuroscience, you are interested in how much
information the neuron’s response can communicate about the experimental stimulus.
Information theory is based on a measure of uncertainty known as entropy (designated “H”). For
example, the entropy of the stimulus S is written H(S) and is defined as follows:
H(S)=−ΣSP(s)log2P(s)
The subscript S underneath the summation simply means to sum over all possible stimuli S=[1, 2
… 8]. This expression is called “entropy” because it is similar to the definition of entropy in
thermodynamics. Thus, the preceding expression is sometimes referred to as “Shannon entropy.”
The entropy of the stimulus can be intuitively understood as “how long of a message (in bits) do
I need to convey the value of the stimulus?” For example, suppose the center-out task had only
66
two peripheral targets (“left” and “right”), which appeared with an equal probability. It would
take only one bit (a 0 or a 1) to convey which target appeared; hence, you would expect the
entropy of this stimulus to be 1 bit. That is what the preceding expression gives you, as P(S)=0.5
and log2(0.5)=−1. The center-out stimulus in the dataset can take on eight possible values with
equal probability, so you expect its entropy to be 3 bits.
On the downside, compressed data must be decompressed to be used, and this extra processing
may be harmful to some applications. For instance, a compression scheme for video may require
expensive hardware for the video to be decompressed fast enough to be viewed as it’s being
decompressed. The option of decompressing the video in full before watching it may be
inconvenient, and requires storage space for the decompressed video.
Types of Compression
Lossless Compression
The original content of the data is not lost/changed when it is compressed (encoded). It is used
mainly for compressing symbolic data such as database records, spreadsheets, texts, executable
programs, etc., Lossless compression can recover the exact original data after compression where
exact replication of the original is essential and changing even a single bit cannot be tolerated.
Examples: Run Length Encoding (RLE), Lempel Ziv (LZ), Huffman Coding.
Lossy Compression
The original content of the data is lost to certain degree when compressed. For visual and audio
67
data, some loss of quality can be tolerated without losing the essential nature of the data. Lossy
compression is used for image compression in digital cameras like JPEG, audio compression like
mp3. Video compression in DVDs with MPEG format.
68
An example of lossless vs. lossy compression is the following string: 25.888888888. This string
can be compressed as 25.9! 8 and interpreted as, “twenty five point 9 eights”, the original string
is perfectly recreated, just written in a smaller form.
In a Lossy system it can be compressed as 26
In which case, the original data is lost, at the benefit of a smaller file size. The above example is
a very simple example of run-length encoding.
The RLE data compression algorithm, the compressed code is: 10W1B9W3B12W (Interpreted
as ten W’s, one B, nine W’s, three B’s, …)
Original sequence: 111122233333311112222 can be encoded as: (1,4),(2,3),(3,6),(1,4),(2,4)
Run-length encoding performs lossless data compression.
Lossless compression schemes are reversible so that the original data can be reconstructed,
Lossy schemes accept some loss of data in order to achieve higher compression.
These Lossy data compression methods typically offer a three-way tradeoff between
Computer resource requirement (compression speed, memory consumption)
Compressed data size and
Quality loss.
Lossless compression is a method of reducing the size of computer files without losing any
information. That means when you compress a file, it will take up less space, but when you
decompress it, it will still have the exact same information. The idea is to get rid of any
redundancy in the information, this is exactly what happens is used in ZIP and GIF files. This
differs from Lossy compression, such as in JPEG files, which loses some information that is not
very noticeable.
69
Common compression methods
Statistical methods: It requires prior information about the occurrence of symbols
E.g. Huffman coding- Estimate probabilities of symbols, code one symbol at a time, shorter
codes for symbols with high probabilities.
Dictionary-based coding: The previous algorithms (both entropy and Huffman) require the
statistical knowledge. Dictionary based coding, such as Lempel-Ziv (LZ), compression
techniques do not require prior information to compress strings. Rather, replace symbols with a
pointer to dictionary entries.
The output of the Huffman encoder is determined by the Model (probabilities). The higher the
probability of occurrence of the symbol, the shorter the code assigned to that symbol and vice
70
versa. This will enable to easily control the most frequently occurring symbols in a data and also
reduce the time taken during decoding each symbols.
Associate binary code: 1 with the right branch and 0 with the left branch
Step 4: Create a unique code word for each symbol by traversing the tree from the root to the
leaf.
Concatenate all encountered 0s and 1s together during traversal
Example 1: Consider a 7-symbol alphabet given in the following table to construct the Huffman
coding.
The Huffman encoding algorithm picks each time two symbols (with the smallest frequency) to
combine.
71
Figure 7.
3 Huffman Coding Tree
Using the Huffman coding a table can be constructed by working down the tree, left to right.
This gives the binary equivalents for each symbol in terms of 1s and 0s.
Example 3: construct the tree & binary code by using Huffman coding
72
7.6. The Shannon-Fano Encoding Algorithm
1. Calculate the frequency of each of the symbols in the list.
2. Sort the list in (decreasing) order of frequencies.
3. Divide the list into two half’s, with the total frequency counts of each half being as close as
possible to each other.
4. The right half is assigned a code of 1 and the left half with a code of 0.
5. Recursively apply steps 3 and 4 to each of the halves, until each symbol has become a
corresponding code leaf on the tree. That is, treat each split as a list and apply splitting and code
assigning till you are left with lists of single elements.
6. Generate code word for each symbol
Let us assume the source alphabet S={X1,X2,X3,Ö,Xn} and Associated probability
P={P1,P2,P3 ,Ö,Pn}. The steps to encode data using Shannon-Fano coding algorithm is as
follows: Order the source letter into a sequence according to the probability of occurrence in
non-increasing order i.e. decreasing order.
ShannonFano (sequence s)
Attach 0 to the codeword of one letter and 1 to the codeword of another; Else if s has more than
two letter. Divide s into two subsequences S1, and S2 with the minimal difference between
73
probabilities of each subsequence; extend the codeword for each letter in S1 by attaching 0, and
by attaching 1 to each codeword for letters in S2;
ShannonFano(S1);
ShannonFano(S2);
Example 1: Given five symbols A to E with their frequencies being 15, 7, 6, 6 & 5; encode them
using Shannon-Fano entropy encoding
Solution:
Step1: Say, we are given that there are five symbols (A to E) that can occur in a source with their
frequencies being 15 7 6 6 and 5. First, sort the symbols in decreasing order of frequency.
Step2: Divide the list into two equal halves. That is, the counts of both halves are as close as
possible to each other. Therefore, in this case we split the list between B and C & assign 0 and 1.
Step3: We recursively repeat the steps of splitting and assigning code until each symbol become
a code leaf on the tree. That is, treat each split as a list, apply splitting, and code assigning until
you are left with lists of single elements.
Step 4: Note that we split the list containing C, D and E between C and D because the difference
between the split lists is 11 minus 6, which is 5, if we were to have divided between D and E we
would get a difference of 12-5 which is 7.
Step5: We complete the algorithm and as a result have codes assigned to the symbols.
74
Example 2: Suppose the following source and with related probabilities S={A,B,C,D,E}
P={0.35,0.17,0.17,0.16,0.15} Message to be encoded=îABCDEî The probability is already
arranged in non-increasing order. First, we divide the message into AB and CDE. Why? This
gives the smallest difference between the total probabilities of the two groups.
75
A=00 B=01
C=10 D=110
E=111
Lempel-Ziv compression
The problem with Huffman coding is that it requires knowledge about the data before encoding
takes place. Huffman coding requires frequencies of symbol occurrence before codeword is
assigned to symbols
76
In Lempel-Ziv compression not rely on previous knowledge about the data rather builds this
knowledge in the course of data transmission/data storage. Lempel-Ziv algorithm (called LZ)
uses a table of code-words created during data transmission; and it transmits the index of the
symbol/word instead of the word itself. Each time it replaces strings of characters with a
reference to a previous occurrence of the string.
77
Example: Decompression Decode (i.e., decompress) the sequence
(0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)
78
UL= LL+d (ul)d (f) Where LL: lower limit, d (u, l) difference of upper and lower & d (f) is
frequency of letter For the first letter in B, the lower limit is zero, and the upper limit is 0.4. UL=
LL+d (ul)d (f)
B =0 + (0.4 – 0) 0.4= 0+0.40.4=0.16
E= 0 + (0.4 – 0) 0.6= 0+0.40.6=0.24
L= 0 + (0.4 – 0) 0.8= 0+0.40.8=0.32
A= 0 + (0.4 – 0) 1= 0+0.41=0.4
Similar to others
A message is represented by a half-open interval [a, b) where a and b are real numbers between a
and 1. Initially, the interval is [0, 1). When the message becomes longer, the length of the
interval shortens, and the number of bits needed to represent the interval increases. Suppose the
alphabet is [A, B, C, D, E, F, $], in which $ is a special symbol used to terminate the message,
and the known probability distribution is listed below.
79
7.9. Lossless Image Compression
One of the most commonly used compression techniques in multimedia data compression is
differential coding. The basis of data reduction in differential coding is the redundancy in
consecutive symbols in a data stream. Audio is a signal indexed by one dimension, time. Here
we consider how to apply the lessons learned from audio to the context of digital image signals
~hat are indexed by two, spatial, dimensions (x, y).
Let’s consider differential coding in the context of digital images. In a sense, we move from
signals with domain in one dimension to signals indexed by numbers in two dimensions (x, y) –
the rows and columns of an image. Later, we’ll look at video signals. These are even more
complex, in that they are indexed by space and time (x, y, t). Because of the continuity of the
physical world, the gray-level intensities (or color) of background and foreground objects in
images tend to change relatively slowly across the image frame. Since we were dealing with
80
signals in the time domain for audio, practitioners generally refer to images as signals in the
spatial doma’in. The generally slowly changing.
Lossless JPEG
Lossless IPEG is a special case of the JPEG image compression. It differs drastically from other
IPEG modes in that the algorithm has no lossy steps. Thus we treat it here and consider the more
used JPEG methods in Chapter 9. Lossless JPEG is invoked when the user selects a 100%
quality factor in an image tool. Essentially, lossless IPEG is included in the JPEG compression
standard simply for completeness. The following predictive method is applied on the
unprocessed original image (or each color band of the original color image). It essentially
involves two steps: forming a differential prediction and encoding.
Review Questions
1. Given the following symbols and their corresponding frequency of occurrence, find an optimal
binary code for compression
I.
Using the Huffman algorithm
II. Using Entropy coding scheme
2. Encode (i.e., compress) the following strings using the Lempel-Ziv algorithm.
ABBCDBBBDBCCBCCB
3. Encode using Arithmetic coding for the word “HELLO”
4. Encode this by using RLE
a. 4444666667777779999999 b. MMMEEEEDDDDIIIIIIIAAAAAA
81