0% found this document useful (0 votes)
5 views86 pages

Multimedia Note

The document provides an overview of multimedia computing, defining multimedia as a combination of various media types for interactive information representation. It discusses the objectives, applications, and components of multimedia systems, as well as the differences between linear and non-linear multimedia projects. Additionally, it addresses challenges faced by multimedia systems and introduces concepts related to sound and audio systems.

Uploaded by

first born
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views86 pages

Multimedia Note

The document provides an overview of multimedia computing, defining multimedia as a combination of various media types for interactive information representation. It discusses the objectives, applications, and components of multimedia systems, as well as the differences between linear and non-linear multimedia projects. Additionally, it addresses challenges faced by multimedia systems and introduces concepts related to sound and audio systems.

Uploaded by

first born
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Multimedia Computing

1 UNIT 1: INTRODUCTION (5 HRS.)

The word multi and media are combined to form the word multimedia. The word “multi”
signifies “many.” Multimedia is a type of medium that allows information to be easily
transferred from one location to another. Multimedia is an interactive media and provides
multiple ways to represent information to the user in a powerful manner. It provides an
interaction between users and digital information. It is a medium of communication. Some of
the sectors where multimedia’s is used extensively are education, training, reference material,
business presentations, advertising and documentaries.

1.1 DEFINITION OF MULTIMEDIA


By definition Multimedia is a combination of information having different transport signal
characteristics and representation of information in an attractive and interactive manner with
the use of a combination of text, audio, video, graphics and animation. In other words, we can
say that Multimedia is the presentation of text, pictures, audio, and video with links and tools
that allow the user to navigate, engage, create, and communicate using a computer.

Multimedia becomes interactive multimedia when a user is given the option of controlling the
elements, such as text, drawings, still and moving images(videos) graphics, audio, animation,
and any other media in which any type of information can be expressed, stored, communicated,
and processed digitally. Multi and Media refers to many types of media (hardware/software)
used for communication of information.

Compiled by mail2prakashbaral@gmail.com
1.1.1 Objective of multimedia
• Data representation
• Data processing
• Data compression
• Data transmission
• Mobiles games
• Data security
• Human computer interaction

1.1.2 Hypermedia and Multimedia


• A hypertext system: meant to be read nonlinearly, by following links that point to other parts
of the document, or to other documents
• HyperMedia: not constrained to be text-based, can include other media, e.g., graphics,
images, and especially the continuous media – sound and video. - The World Wide Web
(WWW) — the best example of a hypermedia application.
Multimedia means that computer information can be represented through audio, graphics,
images, video, and animation in addition to traditional media.
1.1.3 Linear VS Non-Linear
A Multimedia Project is identified as Linear when:
• It is not interactive
• User have no control over the content that is being showed to them.
Example:
• A movie
• A non-interactive lecture / demo show

A Multimedia Project is identified as Non-Linear when:


• It is interactive
• Users have control over the content that is being showed to them.
• Users are given navigational control
Example:
• Games
• Interactive CD

1.2 MULTIMEDIA APPLICATION:

Multimedia applications can be subdivided into different categories:


• Information Systems: The major purpose of such systems is to provide information
for one or several users. The requested information is typically stored in the databases

Compiled by mail2prakashbaral@gmail.com
or media archives. Examples are electronic publishing, online galleries or weather
information systems.
• Remote Representation: By means of a remote representation system a user can take
part in or monitor events at a remote location. Important examples are the distance
conferencing or lecturing virtual reality or remote robotic agents.
• Entertainment: This major application area of multimedia technology is strongly
oriented towards the audio and video data. Example entertainment applications are
digital television, video on demand, distributed games or interactive television.
• Multimedia in Marketing and Advertising- By using multimedia marketing of new
products can be greatly enhanced. Multimedia boost communication on an affordable
cost opened the way for the marketing and advertising personnel.
• Multimedia in Hospital- Multimedia best use in hospitals is for real time monitoring
of conditions of patients in critical illness or accident. The conditions are displayed
continuously on a computer screen and can alert the doctor/nurse on duty if any changes
are observed on the screen.

1.3 GLOBAL STRUCTURE OF MULTIMEDIA

• Device domain:
It deals with interaction between multimedia application and multimedia devices such
as Accelerated Graphics Port (AGP) Card, Sound Card etc. Basic concepts for the
processing of digital audio and video data are based on digital signal processing.
Different methods for the processing of image, graphics and animation are described.
The audio techniques section includes music, Musical Instrument Digital Interface
(MIDI) and speech processing.

Compiled by mail2prakashbaral@gmail.com
• System Domain:
The interface between the device domain and the system domain is specified by the
computer technology. To utilize the device domain, several system services are needed.
Basically, three services exit. These services are mostly implemented in software. The
operating system, serves as an interface between computer hardware/system and all
other software components. It provides the user with a programming and computational
environment, which should be easy to operate. The database system allows a structured
access to data and a management of large databases. The communication system is
responsible for data transmission according to the timing and reliability requirements
of the networked multimedia.
• Application domain:
Provides functions to the user to develop and present multimedia projects. This includes
software tools, and multimedia projects development methodology. The services of the
system domain are offered to the application domain through proper programming
abstractions. Another topic embedded in the application domain is document handling.
• Cross domain:
It turns out that, some aspects such as synchronization aspects, are difficult to locate in
one or two components or domains. The reason is that synchronization, being the
temporal relationship among various media, relates to many components across all
domains.

1.4 MEDIUM, MULTIMEDIA SYSTEM AND PROPERTIES


Multimedia system is defined by computer controlled, integrated production, manipulation,
presentation, storage and communication of independent information, which is encoded at least
through a continuous and discrete media.
Media are divided into two types in respect to time in their representation space:
Time independent (discrete) Information is expressed only in its individual value, without a
time component. E.g.: text, image, graphics, etc.
Time dependent (continuous) Information is expressed not only it’s individual value, but also
by the time of its occurrences. E.g.: sound and video.
Classification of Media
Medium is defined as means for distribution and presentation of information. Examples of a
medium are text, graphics, speech, and music. Media can be classified with respect to different
criteria. We classify media according to perception, representation, presentation, storage,
transmission, and information exchange.
It can be categorized as following sections:

• The perception media


• The representation Media
• The presentation Media
• The storage media
• The transmission media
• The information Exchange media

Compiled by mail2prakashbaral@gmail.com
1. Perception Medium: Perception media help human to sense their environment. The
central question is how human perceive information in a computer environment. The
answer is through seeing and hearing.
Seeing: For the perception of information through seeing the usual such as text,
image and video are used
Hearing: For the perception of information through hearing media such as
music, noise and speech are used.
2. Representation medium: Representation media are defined by internal computer
representation of information. The central question is how the computer information is
coded? The answer is that various format is used to represent media information in
computer.
3. Presentation medium: Presentation media refer to the tools and devices for the input
and output of the information. The central question is, through which the information
is delivered by the computer and is introduced to the computer.
4. Storage medium: Storage Media refer to the data carrier which enables storage of
information. The central question is, how will information be stored? The answer is
hard disk, CD-ROM, Floppy, Micro- film, printed documents, digital storage etc.
5. Transmission medium: Transmission Media are the different information carrier that
enables continuous data transmission. The central question is, over which information
will be transmitted? Information is transmitted over network either by using wired or
wireless connection. Wired connection can be twisted pair, coaxial cable, optical fiber
cable etc. Wireless connection can be satellite connection or radio link connections etc.
6. Information exchange medium: Information exchange media includes all information
carrier for transmission, i.e. all storage and transmission media.

A Multimedia system has four basic properties:


1. Multimedia systems must be computer controlled.
2. Multimedia systems are integrated.
3. The information they handle must be represented digitally.
4. The interface to the final presentation of media is usually interactive.

1.5 CHALLENGES FOR MULTIMEDIA SYSTEMS


Some of the challenges for multimedia system are:

• Continuous media types such as video need a lot of space to store and very high
bandwidth to transmit.
• They also have tight timing constraints.
• Automatically analysing, indexing and organizing information in audio, image and
video is much harder than from text.
• Multimedia involves many different research areas and needs more complex and more
efficient algorithms and hardware platforms.
• Distributed Network
• Temporal relationship between data

Compiled by mail2prakashbaral@gmail.com
▪ Render different data at same time- continuous data
▪ Sequencing within the media
▪ Synchronisation – inter medium scheduling
• Data representation – digitally- analog – digital conversion, sampling etc.
• Large data requirements- bandwidth, storage, compression.

1.6 COMPONENTS OF A MULTIMEDIA SYSTEM


Following are the common components of multimedia:
• Text-All multimedia productions contain some amount of text. The text can have various
types of fonts and sizes to suit the profession presentation of the multimedia software.
o Hypermedia: Hypermedia, the term derived from hypertext, extends the notion of
the hypertext link to include the links among any set of multimedia objects, including
the sound, motion video, and virtual reality. It may also suggest a higher level of
user/network interactivity than the interactivity already implicit in the hypertext. The
World Wide Web (www) is a classic example of hypermedia.

• Graphics- Graphics make the multimedia application attractive. In many cases people do
not like reading large amount of textual matter on the screen. Therefore, graphics are used
more often than text to explain a concept, present background information etc. There are
two types of Graphics:
▪ Bitmap images- Bitmap images are real images that can be captured from devices
such as digital cameras or scanners. Generally, bitmap images are not editable.
Bitmap images require a large amount of memory.
▪ Vector Graphics- Vector graphics are drawn on the computer and only require a
small amount of memory. These graphics are editable.

• Audio- A multimedia application may require the use of speech, music and sound effects.
These are called audio or sound element of multimedia. Speech is also a perfect way for
teaching. Audio are of analog and digital types. Analog audio or sound refers to the original
sound signal. Computer stores the sound in digital form. Therefore, the sound used in
multimedia application is digital audio.

• Video- The term video refers to the moving picture, accompanied by sound such as a picture
in television. Video element of multimedia application gives a lot of information in small
duration of time. Digital video is useful in multimedia application for showing real life
objects. Video have highest performance demand on the computer memory and on the
bandwidth if placed on the internet. Digital video files can be stored like any other files in
the computer and the quality of the video can still be maintained. The digital video files can
be transferred within a computer network. The digital video clips can be edited easily.

• Animation- Animation is a process of making a static image look like it is moving. An


animation is just a continuous series of still images that are displayed in a sequence. The
animation can be used effectively for attracting attention. Animation also makes a
presentation light and attractive. Animation is very popular in multimedia application.

References:
• Multimedia: Computing, Communications and Applications”, Ralf Steinmetz and
Klara Nahrstedt, Pearson Education Asia

Compiled by mail2prakashbaral@gmail.com
• “Multimedia Communications, Applications, Networks, protocols ad Standards”, Fred
Halsall, Pearson Education Asia
• “Multimedia Systems”, John F. Koegel Buford, Pearson Education Asia

Assignments:
1. Describe the data stream characteristics for continuous media.
2. Define Multimedia and explain how media can be classified.
3. Define Multimedia. Explain the characteristics of multimedia.
4. What do you mean by medium? Define different types of medium.
5. What is multimedia? With suitable example, discuss the definition and properties of
a multimedia system.

A Gentle Advice: Please go through your text books and reference books for detail study!!!
Thank you all.

Compiled by mail2prakashbaral@gmail.com
UNIT 2: SOUND /AUDIO SYSTEM (6 HRS.)

1.1 CONCEPTS OF SOUND SYSTEM;


Sound is a physical phenomenon by which matters are vibrated such as a violin string, guitar,
and block of woods. As the matter vibrates, pressure variation is created in the surrounding air.
This low and high air pressure is propagated through an air in a wave light motion. Which when
reaches to human ear, a sound is heard. The propagations of oscillation which is called a wave
form.

The pattern of the oscillation is called a waveform. The waveform repeats the same shape at
regular intervals and this point is called a period. Since sound wave forms occur naturally,
Sound waves are never perfectly smooth or uniformly periodic.
The wave form repeats the same shape at regular intervals which is called period. (one complete
cycle). The natural sounds are not perfectly smooth or uniformly period. The sounds which
have recognizable periodicity tend to be more musical than non-periodic sounds.
Example of periodic sounds sources are musical instrument, vowel sounds whistling wind, bird
songs, etc
Non periodic sounds are coughs, sneezing, rousing water etc.
Sound vs Audio
• The key difference between sound and audio is their form of energy.
• Sound is mechanical wave energy (longitudinal sound waves) that propagate through a
medium causing variation in pressure within the medium.
• Audio is made of electrical energy (analog or digital signals) that represent sound
electrically.
To put it simply:

• Sound is vibrations through materials, the Action.


• Audio is the End result, the technology to hear sounds coming from natural or human-
made sources.
• Sound is a continuous wave that travels through air by measuring the pressure level at
a point.

Compiled by mail2prakashbaral@gmail.com
• Microphone is sound field moves according to the varying pressure exerted on it,
Transducer convert energy into voltage level (energy of another from – electrical
energy)

1.2 FREQUENCY
The frequency of a sound is the reciprocal value of the period. It represents the number of times
the pressure rises and falls, or oscillates, in a second and is measured in hertz (Hz) or cycles
per second (cps). A frequency of 100 Hz means 100 oscillations per second. A convenient
abbreviation, kHz for kilohertz, is used to indicate thousands of oscillations per second: 1 kHz
equals 1000 Hz.
The frequency range of normal human hearing extends from around 20 Hz up to about 20 kHz.
Represents the number of periods in a second and is measured in hertz (Hz) or cycles per
second.
Frequency represents the number of periods in a second (measured in hertz, cycles/second).
Some of the frequency ranges are:

• Infra sound: 0 - 20 Hz
• Human audible sound: 20 Hz - 20KHz
• Ultra sound: 20KHz - 1GHz
• Hyper sound: 1GHz - 10THz
Human audible sound is also called audio or acoustic signals (waves). Speech is an acoustic
signal produced by the humans.

1.3 AMPLITUDE
The amplitude of the sound is the measure of the displacement of the air pressure wave from
its mean or quiescent state. The greater the amplitude, the louder the sound.
Subjectively heard as loudness. Measured in decibels.
• 0 db - essentially no sound heard
• 35 db - quiet home
• 70 db - noisy street
• 120db - discomfort

1.4 COMPUTER REPRESENTATION OF SOUND


Sound waves are continuous while computers are good at handling discrete numbers. In order
to store a sound wave in a computer, samples of the wave are taken. Each sample is represented
by a number, the ‘code’. This process is known as digitization.
Digitization is a process of converting the analog signals to a digital signal. There are three
steps of digitization of sound. These are:
• Sampling
• Quantization
• Encoding

Compiled by mail2prakashbaral@gmail.com
Sampling - Sampling is a process of measuring air pressure amplitude at equally spaced
moments in time, where each measurement constitutes a sample.
A sampling rate is the number of times the analog sound is taken per second. A higher sampling
rate implies that more samples are taken during the given time interval and ultimately, the
quality of reconstruction is better.

To discretize the signals, the gap between the samples should be fixed. That gap can be termed
as a sampling period Ts.

Sampling Frequency= 1/Ts = fs

Where,

• Ts is the sampling time


• fs is the sampling frequency or the sampling rate

Sampling frequency is the reciprocal of the sampling period. This sampling frequency, can be
simply called as Sampling rate. The sampling rate denotes the number of samples taken per
second, or for a finite set of values.

The sampling rate is measured in terms of Hertz, Hz in short, which is the term for Cycle per
second. A sampling rate of 5000 Hz (or 5kHz, which is more common usage) implies that mt
uj vu8i 9ikuhree sampling rates most often used in multimedia are 44.1kHz(CD-quality),
22.05kHz and 11.025kHz.

Quantization - Quantization is a process of representing the amplitude of each sample as


integers or numbers. How many numbers are used to represent the value of each sample known
as sample size or bit depth or resolution. Commonly used sample sizes are either 8 bits or 16
bits. The larger the sample size, the more accurately the data will describe the recorded sound.
An 8-bit sample size provides 256 equal measurement units to describe the level and frequency
of the sound in that slice of time. A 16-bit sample size provides 65,536 equal units to describe
the sound in that sample slice of time. The value of each sample is rounded off to the nearest
integer (quantization) and if the amplitude is greater than the intervals available, clipping of
the top and bottom of the wave occurs.

Encoding - Encoding converts the integer base-10 number to a base-2 that is a binary number.
The output is a binary expression in which each bit is either a 1(pulse) or a 0(no pulse).

Compiled by mail2prakashbaral@gmail.com
1.4.1 Quantization of Audio
Quantization is a process to assign a discrete value from a range of possible values to each
sample. Number of samples or ranges of values are dependent on the number of bits used to
represent each sample. Quantization results in stepped waveform resembling the source signal.
• Quantization Error/Noise - The difference between sample and the value assigned to it
is known as quantization error or noise.
• Signal to Noise Ratio (SNR) - Signal to Ratio refers to signal quality versus quantization
error. Higher the Signal to Noise ratio, the better the voice quality. Working with very
small levels often introduces more error. So instead of uniform quantization, non-uniform
quantization is used as companding. Companding is a process of distorting the analog
signal in controlled way by compressing large values at the source and then expanding
at receiving end before quantization takes place.

1.4.2 Transmission of Audio


In order to send the sampled digital sound/ audio over the wire that it to transmit the digital
audio, it is first to be recovered as analog signal. This process is called de-modulation.
• PCM Demodulation - PCM Demodulator reads each sampled value then apply the
analog filters to suppress energy outside the expected frequency range and outputs the
analog signal as output which can be used to transmit the digital signal over the network.

Compiled by mail2prakashbaral@gmail.com
Sound Bit Depth
Sampling rate and sound bit depth are the audio equivalent of resolution and colour depth of a
graphic image. A single bit rate and single sampling rate are recommended throughout the
work. Bit depth depends on the amount of space in bytes used for storing a given piece of audio
information. Higher the number of bytes higher is the quality of sound. Multimedia sound
comes in 8-bit, 16-bit, 32-bit and 64-bit formats. An 8-bit has 28 or 256 possible values. A
single bit rate and single sampling rate are recommended throughout the work. An audio file
size can be calculated with the simple formula:

File Size in Disk = (Length in seconds) × (sample rate) × (bit depth/8 bits per byte).

Bit Rate refers to the amount of data, specifically bits, transmitted or received per second. It is
Notes comparable to the sample rate but refers to the digital encoding of the sound. It refers
specifically to how many digital 1s and 0s are used each second to represent the sound signal.
This means the higher the bit rate, the higher the quality and size of your recording. For

Compiled by mail2prakashbaral@gmail.com
instance, an MP3 file might be described as having a bit rate of 320 kb/s or 320000 b/s. This
indicates the amount of compressed data needed to store one second of music.

Bit Rate = (Sample Rate) × (Bit Depth) × (Number of Channels)

1.4.3 Types of Digital Audio File Formats


There are many different types of digital audio file formats that have resulted from working
with different computer platforms and software. Some of the better known formats include:

WAV
WAV is the Waveform format. It is the most commonly used and supported format on the
Windows platform. Developed by Microsoft, the Wave format is a subset of RIFE RIFF is
capable of sampling rates of 8 and 16 bits. With Wave, there are several different encoding
methods to choose from including Wave or PCM format. Therefore, when developing sound
for the Internet, it is important to make sure you use the encoding method that the player you’re
recommending supports.

AU
AU is the Sun Audio format. It was developed by Sun Microsystems to be used on UNIX,
NeXT and Sun Sparc workstations. It is a 16-bit compressed audio format that is fairly
prevalent on the Web. This is probably because it plays on the widest number of platforms.

RA
RA is Progressive Networks RealAudio format. It is very popular for streaming audio on the
Internet because it offers good compression up to a factor of 18. Streaming technology enables
a sound file to begin playing before the entire file has been downloaded.

AIFF
AIFF or AFF is Apple’s Audio Interchange File Format. This is the Macintosh waveform
format. It is also supported on IBM compatibles and Silicon Graphics machines. The AIFF
format supports a large number of sampling rates up to 32 bits.

MPEG
MPEG and MPEG2 are the Motion Picture Experts Group formats. They are a compressed
audio and video format. Some Web sites use these formats for their audio because their
compression capabilities offer up to a factor of at least 14:1. These formats will probably
become quite.

2 MUSIC AND SPEECH


The relationship between music and computers has become more and more important,
especially considering the development of MIDI (Music Instrument Digital Interface) and its
important contributions in the music industry today. The MIDI interface between electronic
musical instruments and computers is a small piece of equipment that plugs directly into the
computer's serial port and allows the trans- mission of music signals. MIDI is considered to be
the most compact interface that allows full-scale output.

Compiled by mail2prakashbaral@gmail.com
2.1 MIDI BASIC CONCEPTS
MIDI is a standard that manufacturers of electronic musical instruments have agreed upon. It
is a set of specifications they use in building their instruments so that the instruments of
different manufacturers can, without difficulty, communicate musical information between one
another.
A MIDI interface has two different components:
Hardware connects the equipment. It specifies the physical connection between musical
instruments, stipulates that a MIDI port is built into an instrument, specifies a MIDI cable
(which connects two instruments) and deals with electronic signals that are sent over the cable.
A data format encodes the information traveling through the hardware. A MIDI data format
does not include an encoding of individual samples as the audio format does. Instead of
individual samples, an instrument- connected data format is used. The encoding includes,
besides the instrument specification, the notion of the beginning and end of a note, basic
frequency and sound volume. MIDI data allow an encoding of about 10 octaves, which
corresponds to 128 notes.
The MIDI data format is digital; the data are grouped into MIDI messages. Each MIDI message
communicates one musical event between machines. These musical events are usually actions
that a musician performs while playing a musical instrument. The action might be pressing
keys, moving slider controls, setting switches and adjusting foot pedals.
When a musician presses a piano key, the MIDI interface creates a MIDI message where the
beginning of the note with its stroke intensity is encoded. This message is transmitted to another
machine. In the moment the key is released, a corresponding signal (MIDI message) is
transmitted again. For ten minutes of music, this process creates about 200 Kbytes of MIDI
data, which is essentially less than the equivalent volume of a CD-audio coded stream in the
same time.
If a musical instrument satisfies both components of the MIDI standard, the instrument is MIDI
device (e.g. a synthesizer), capable of communicating with other MIDI devices through
channels. The MIDI standard specifies 16 channels. A MIDI device (musical instrument) is
mapped to a channel. Music data, transmitted through a channel, are reproduced at the receiver
side with the synthesizer instrument. The MIDI standard identifies 128 instruments, including
noise effects (e-g, telephone, air craft), with unique numbers. For example, 0 is for the Acoustic
Grand Piano, 12 for the marimba, 40 for the violin, 73 for the flute, etc.
Some instruments allow only one note to be played at a time, such as the flute. 0ther instruments
allow more than one note to be played simultaneously, such as the organ. The maximum
number of simultaneously played notes per channel is a main property of each synthesizer. The
range can be from 3 to 16 notes per channel. To tune a MIDI device to one or more channels,
the device must be set to one of the MIDI reception modes. There are four modes:
• Mode 1: Omni On/Poly;
• Mode 2: Omni On/Mono;
• Mode 3: Omni Off/Poly;
• Mode 4: Omni Off/Mono

Compiled by mail2prakashbaral@gmail.com
The first half of the mode name specifies how the MIDI device monitors the incoming MIDI
channels. If Omni is turned on, the MIDI device monitors all the MIDI channels and responds
to all channel messages, no matter which channel they are transmitted on. If Omni is turned
off, the MIDI device responds only to channel messages sent on the channel(s) the device is
set to receive.
The second half of the mode name tells the MIDI device how to play notes coming in over the
MIDI cable. If the option Poly is set, the device can play several notes at a time. If the mode is
set to Mono, the device plays notes like a monophonic synthesizer one note at a time.

2.2 COMMON MIDI DEVICES:


There are many types of MIDI devices. They play different role in making music.
Sound generators: It synthesizes the sound. It produces an audio signal that becomes sound
when fed into a loud speaker. It can change quality of sound by varying the voltage oscillation
of the audio. Sound generation is done in 2-ways:
1. Storing acoustic signals as MIDI data in advance
2. Creating acoustic signals synthetically
Microprocessor: Microprocessor communicates with the keyboard to know which notes the
musician is playing. Microprocessor communicates with the control panel to know what
commands the musician wants to sent to the microprocessor. The microprocessor then specifies
note and sound commands to the sound generators (i.e. microprocessor sends and receives the
MIDI message).
Keyboard: It affords the musician’s direct control of the synthesizer. Pressing keys means
signalling microprocessor what notes to play and how long to play them. Keyboard should have
at least 5 octaves and 61 keys.
Control panel: Controls those function that are not directly concerned with notes and duration.
Control panel includes a slider, a button and a menu.
Auxiliary controllers: Gives more control over the notes played on keyboard. Pitch bend and
modulation are the 2 common variables on the synthesizer.
Memory: Stores patches for the sound generation and settings on the control panel. Drum
machine: Specialize in percussion sounds and rhythms.
Master keyboard: Increases the quality of the synthesizer keyboard, Guitar Synthesizer, Drum
pad controllers, Guitar controllers and many more.
Channel: MIDI supports upto 16 different channels. We can send off a MIDI event to any of
those channels which are later synchronized by the sequencer.
Sequencer: Sequencer is the important MIDI device. It is used as storage server for generated
MIDI data. It is also used as music editor. Musical data are represented in musical notes.
Sequencer transforms the notes into MIDI message.
Track: It is a sequence of MIDI events.

Compiled by mail2prakashbaral@gmail.com
2.3 MIDI MESSAGES:
MIDI messages are used by MIDI devices to communicate with each other and to determine
what kinds of musical events can be passed from device to device.
Structure of MIDI messages:
➢ MIDI message includes a status byte and up to two data bytes.
➢ Status byte
➢ The most significant bit of status byte is set to 1.
➢ The 4 low-order bits identify which channel it belongs to (four bits produce 16
possible channels).
➢ The 3 remaining bits identify the message.
➢ Data Byte: The most significant bit of data byte is set to 0.
Classification of MIDI messages:

Figure 1: MIDI message taxonomy


Channel Message: Since, channel message are specified, the channel messages go only to
specified devices. There are 2 types of channel messages:
➢ Channel voice messages: Sends actual performance data between MIDI devices,
describing keyboard action, controller action and control panel changes. E.g. note on,
Note off, channel pressure, control change etc.
➢ Channel mode messages: Determine the way that a receiving MIDI device responds
to channel voice messages. E.g. local control, All note off, Omni mode off etc.
System Message: System messages go to all devices in a MIDI system because no channel
numbers are specified. There are three types of system messages:
➢ System real time messages: These messages are short and simple (one byte). It
synchronizes the timing of MIDI devices in performance. To avoid delay, they are sent
in the middle of other messages. E.g. System reset, Timing clock i.e. MIDI clock etc.
➢ System common messages: Commands that prepare sequencer and synthesizer to play
a song. E.g. song select, tune request etc.
➢ System exclusive messages: Allows MIDI manufacturers to create customized MIDI
messages to send between their MIDI devices.

2.4 MIDI AND SMPTE TIMING STANDARDS


MIDI reproduces traditional note length using MIDI clocks, which are represented through
timing clock messages. Using a MIDI cock, a receiver can synchronize with the clock cycles

Compiled by mail2prakashbaral@gmail.com
of the sender. For example, a MIDI clock helps keep separate sequencers in the same MIDI
system playing at the same tempo. When a master sequencer plays a song, it sends out a stream
of Timing Clock' messages to convey the tempo to other sequencers. The faster the Timing
Clock messages come in, the faster the receiving sequencer plays the song. To keep a standard
timing reference, the MIDI specifications state that 24 MIDI clocks equal one quarter note.
As an alternative, the SMPTE timing standard (Society of Motion Picture and Television
Engineers) can be used. The SMPTE timing standard was originally developed by NASA as a
way to mark incoming data from different tracking stations so that receiving computers could
tell exactly what time each piece of data was created. In the film and video version promoted
by the SMPTE, the SMPTE timing standard acts as a very precise clock that stamps a time
reading on each frame and fraction of a frame, counting from the beginning of a film or video.
To make the time readings precise, the SMPTE format consists of hours: minutes: seconds:
frames: bits (e.g., 30 frames per second), uses a 24-hour clock and counts from 0 to 23 before
recycling to 0. The number of frames in a second differs depending on the type of visual
medium. To divide time even more precisely, SMPTE breaks each frame into 80 bits (not
digital bits). When SMPTE is counting bits in a frame, it is dividing time into segments as
small as one twenty-five hundredth of a second.
Because many film composers now record their music on a MIDI recorder, it is desirable to
synchronize the MIDI recorder with video equipment. A SMPTE synchronizer should be able
to give a time location to the MIDI recorder so it can move to that location in the MIDI score
(pre-recorded song) to start playback or recording But MIDI recorders cannot use incoming
SMPTE signals to control their recording and playback. The solution is a MIDI/SMPTE
synchronizer that converts SMPTE into MIDI, and vice versa. The MIDI/SMPTE synchronizer
lets the user specify different tempos and the exact points in SMPTE timing at which each
tempo is to start, change, and stop. The synchronizer keeps these tempos and timing points in
memory. As a SMPTE video deck plays and sends a stream of SMPTE times to the
synchronizer, the synchronizer checks the incoming time and sends out MIDI clocks at a
corresponding tempo.

2.5 MIDI SOFTWARE:


The software applications generally fall into 4 major categories:
1. Music recording and performance applications: Provides function as recording of
MIDI messages. Editing and playing the messages in performance.
2. Musical notations and printing applications: Allows writing music using traditional
musical notation. User can play and print music on paper for live performance or
publication.
3. Synthesizer path editor and librarians: Allows information storage of different
synthesizer patches in the computer’s memory and disk drives. Editing of patches in
computer.
4. Music education applications: Teaches different aspects of music using the computer
monitor, keyboard and other controllers of attached MIDI instruments.

Processing chain of interactive computer music systems:

Compiled by mail2prakashbaral@gmail.com
➢ Sensing stage: Data is collected from controllers reading the gesture information from
human performers on stage.
➢ Processing stage: Computer reads and interprets information coming from the sensors
and prepares data for the response stage.
➢ Response stage: Computer ad some collection of sound producing devices share in
realizing a musical output.

3 SPEECH GENERATION
Speech can be perceived, understood and generated by humans and by machines. Generated
speech must be understandable and must sound natural. The requirement of understandable
speech is a fundamental assumption, and the natural sound of speech increases user acceptance.
Speech signals have two properties which can be used in speech processing:
• Voiced speech signals show during certain time intervals almost periodic behavior.
Therefore, we can consider these signals as quasi-stationary signals for around 30
milliseconds.
• The spectrum of audio signals shows characteristic maxima, these maxima, called
formants, occur because of resonances of the vocal tract.
Speech Generation:
An important requirement for speech generation is real-time signal generation. With such a
requirement met, a speech output system could transform text into speech automatically
without any lengthy pre-processing.
Generated speech must be understandable and must sound natural. The requirement of
understandable speech is a fundamental assumption, and the natural sound of speech increases
user acceptance.

3.1 BASIC NOTIONS:


• The lowest periodic spectral component of the speech signal is called the fundamental
frequency. It is present in a voiced sound.
• A phone is the smallest speech unit, such as the m of mat and the b of bat in English,
that distinguish one utterance or word from another in a given language.
• Allophones mark the variants of a phone. For example, the aspirated p of pit and the
unaspirated p of spit are allophones of the English phoneme p.
• The morph marks the smallest speech unit which carries a meaning itself. Therefore,
consider is a morph, but reconsideration is not.
• A voiced sound is generated through the vocal cords. m, v and l are examples of voiced
sounds. The pronunciation of a voiced sound depends strongly on each speaker.
• During the generation of an unvoiced sound, the vocal cords are opened. F and S are
unvoiced sounds. Unvoiced sounds are relatively independent from the speaker.
Exactly, there are:
Vowels: a speech sound created by the relatively free passage through the larynx and oral
cavity, usually forming the most prominent and central sound of a syllable (e.g., u from hunt);

Compiled by mail2prakashbaral@gmail.com
Consonants: a speech sound produced by a partial or complete obstruction of the air stream
by any of the various constrictions of the speech organs (e.g., voiced consonants, such as m
from mother, fricative voiced consonants, such as v from voice, fricative voiceless consonants,
such as s from nurse, plosive consonants, such as d from daily and affricate consonants, such
as dg from knowledge. or ch from chew).

3.2 REPRODUCED SPEECH OUTPUT


The easiest method of speech generation/output is to use pre-recorded speech and play it back
in a timely fashion. The speech can be stored as PCM (Pulse Code Modulation) samples.
Further data compression methods, without using language typical properties, can be applied
to recorded speech.
There are two way of speech generation/output performed by time-dependent sound
concatenation and a frequency-dependent sound concatenation.

3.2.1 Time-dependent Sound Concatenation


• Individual speech units are composed like building blocks, e.g. phones
• Transitions between speech (coarticulation) units via allophones, i.e. variants of phones
depending on previous and following phones
• Creations of syllables as building blocks for words and sentences
• Prosody, i.e. stress and melody course of a spoken phrase. Problem: Prosody is often
context dependent.

3.2.2 Frequency‐dependent Sound Concatenation


• Speech generation/output can also be based on a frequency-dependent sound
concatenation, e-g, through a formant-synthesis. Formants are frequency maxima in the
spectrum of the speech signal. Formant synthesis simulates the vocal tract through a
filter.
• Individual speech elements (e.g., phones) are defined through the characteristic values
of the formants. Similar problems to the time-dependent sound concatenation exist
here. The transitions, known as co-articulation, present the most critical problem.
Additionally, the respective prosody has to be determined.
• New sound-specific methods provide a sound concatenation with combined time and
frequency dependencies. Initial results show that new methods generate fricative and
plosive sounds with higher quality.
• Human speech can be generated using a multi-pole lattice filter. The first four or five
formants, occurring in human speech are modeled correctly with this filter type.
• Using speech synthesis, an existent text can be transformed into an acoustic signal. The
typical components of a speech synthesis system with time-dependent concatenation:

Compiled by mail2prakashbaral@gmail.com
Step 1: Generation of a Sound Script
Transcription from text to a sound script using a library containing (language specific) letter ‐
to ‐phone rules. A dictionary of exceptions is used for word with a non ‐standard pronunciation.
Step 2: Generation of Speech
The sound script is used to drive the time ‐ or frequency ‐dependent sound concatenation
process.

3.2.3 Problem of speech synthesis:


➢ Ambiguous pronunciation. In many languages, the pronunciation of certain words
depends on the context.
➢ Example: ‘lead’
➢ This is not so much of a problem for the German language
➢ It is a problem for the English language
➢ Anecdote by G. B. Shaw:
• if we pronounce “gh” as “f” (example: “laugh“)
• if we pronounce “o” as “i” (example: “women”)
• if we pronounce “ti” as “sh” (example: “nation”), then why don’t we write
“ghoti” instead of fish?

4 SPEECH ANALYSIS
Purpose of Speech Analysis:
➢ Who is speaking: speaker identification for security purposes
➢ What is being said: automatic transcription of speech into text
➢ How was a statement said: understanding psychological factors of a speech pattern (was
the speaker angry or calm, is he lying, etc)
The primary goal of speech analysis in multimedia systems is to correctly determine individual
words (speech recognition).

Compiled by mail2prakashbaral@gmail.com
Speech analysis is of strong interest for multimedia systems. Together with speech synthesis,
different media transformations can be implemented. The primary goal of speech analysis is to
correctly determine individual words with probability <=1. A word is recognized only with a
certain probability. Here, environmental noise, room acoustics and a speaker's physical and
psychological conditions play an important role.

Figure 2: speech recognition system

The system which provides recognition and understanding of a speech signal applies this
principle several times as follows:

• In the first step, the principle is applied to a sound pattern and/or word model. An
acoustical and phonetical analysis is performed.
• In the second step, certain speech units go through syntactical analysis; thereby, the
errors of the previous step can be recognized. Very often during t step, no unambiguous
decisions can be made. In this case, syntactical analysis provides additional decision
help and the result is a recognized speech.
• The third step deals with the semantics of the previously recognized language. Here the
decision errors of the previous step can be recognized and corrected with other analysis
methods. Even today, this step is non-trivial to implement with current methods known
in artificial intelligence and neural nets research. The result of this step is an understood
speech.
These steps work mostly under the consideration of time and/or frequency-dependent sounds.
The same criteria and speech units (formants, phones, etc.) are considered as in speech
generation/output.

Compiled by mail2prakashbaral@gmail.com
There are still many problems into which speech recognition research is being conducted:
• A specific problem is presented by room acoustics with existent environmental noise.
The frequency-dependent reflections of a sound wave from walls and objects can
overlap with the primary sound wave.
• Further, word boundaries must be determined. Very often neighboring words flow into
one another.
• For the comparison of a speech element to the existing pattern, time normalization is
necessary. The same word can be spoken quickly or slowly. However, the time axis
cannot be modified because the extension factors are not proportional to the global time
interval. There are long and short voiceless sounds (e.g., s, sh). Individual sounds are
extended differently and need a minimal time duration for their recognition.

Figure 3: Components of speech recognition systems

Speech recognition systems are divided into speaker-independent recognition systems and
speaker-dependent recognition systems. A speaker-independent system can recognize with the
same reliability essentially fewer words than a speaker-dependent system because the latter is
trained in advance. Training in advance means that there exists a training phase for the speech
recognition system, which takes a half an hour. Speaker-dependent systems can recognize
around 25,000 words; speaker independent systems recognize a maximum of about 500 words,
but with a worse recognition rate. These values should be understood as gross guidelines. In a
concrete situation, the marginal conditions must be known. (e.g., Was the measurement taken
in a sound deadening room?, Does the speaker have to adapt to the system to simplify the time
normalization?, etc.)

5 SPEECH TRANSMISSION
The area of speech transmission deals with efficient coding of the speech signal to allow
speech/sound transmission at low transmission rates over networks. The goal is to provide the
receiver with the same speech/sound quality as was generated at the sender side. This section
includes some principles that are connected to speech generation and recognition.

• Signal Form Coding:


This kind of coding considers no speech-specific properties and parameters. Here, the
goal is to achieve the most efficient coding of the audio signal. The data rate of a PCM-
coded stereo-audio signal with CD-quality requirements is: 1,411,200 bits/s Telephone
quality, in comparison to CD-quality, needs only 64 Kbit/s. Using Difference Pulse
Code Modulation (DPCM), the data rate can be lowered to 56 Kbits/s without loss of

Compiled by mail2prakashbaral@gmail.com
quality. Adaptive Pulse Code Modulation (ADPCM) allows a further rate reduction to
32 Kbits/s.

• Source Coding:
Parameterized systems work with source coding algorithms. Here, the specific speech
characteristics are used for data rate reduction. Channel vo-coder is an example of such
a parameterized system. The channel vo-coder is an extension of a sub-channel coding.

Figure 4: source coding

The signal is divided into a set of frequency channels during speech analysis because
only certain frequency maxima are relevant to speech. Additionally, the differences
between voiced and unvoiced sounds are taken into account. Voiceless sounds are
simulated by the noise generator. For generation of voiced sounds, the simulation
comes from a sequence of pulses. The rate of the pulses is equivalent to the a priori
measured basic speech frequency. The data rate of about 3 Kbits/s can be generated
with a channel vo-coder; however the quality is not always satisfactory.
Major effort and work on further data rate reduction from 64 Kbits/s to 6 Kbits/s is
being conducted, where the compressed signal quality should correspond, after a
decompression, to the quality of an uncompressed 64 Kbits/s signal.

• Recognition/Synthesis Methods:
There have been attempts to reduce the transmission rate using pure
recognition/synthesis methods. Speech analysis (recognition) follows on the sender side
of a speech transmission system and speech synthesis (generation) follows on the
receiver side.

Figure 5: Recognition/synthesis system

Only the characteristics of the speech elements are transmitted. For example, the speech
elements with their characteristics are the formants with their middle frequency

Compiled by mail2prakashbaral@gmail.com
bandwidths. The frequency bandwidths are used in the corresponding digital filter. This
reduction brings the data rate down to 50 bits/s. The quality of the reproduced speech
and its recognition rate are not acceptable by today's standards.

• Achieved Quality:
The essential question regarding speech and audio transmission with respect to multimedia
systems is how to achieve the minimal data rate for a given quality. The published function
from Flanagan shows the dependence of the achieved quality of compressed speech on the data
rate. One can assume that for telephone quality, a data rate of 8 Kbits/s is sufficient. Figure
below shows the dependence of audio quality on the number of bits per sample value. For
example, excellent CD-quality can be achieved with a reduction from 16 bits per sample value
to 2 bits per sample value. This means that only 1/8 of the actual data needs to be transmitted.

Compiled by mail2prakashbaral@gmail.com
UNIT 3: IMAGES AND GRAPHICS

1.1 INTRODUCTION
Image is the spatial representation of an object. It may be 2D or 3D scene or another image.
Images may be real or virtual. It can be abstractly thought of as continuous function defining
usually a rectangular region of plane. Example:
• Recorded image- photographic, or in digital format
• Computer vision- video image, digital image or picture
• Computer graphics- digital image
• Multimedia- deals about all above formats

1.2 DIGITAL IMAGE REPRESENTATION


A digital image is represented by a matrix of numeric values each representing a quantized
intensity value. When I is a two-dimensional matrix, then I(r, c) is the intensity value at the
position corresponding to row r and column c of the matrix. The points at which an image is
sampled are known as picture elements, commonly abbreviated as pixels. The pixel values of
intensity images are called gray scale levels. The intensity at each pixel is represented by an
integer and is determined from the continuous image by averaging over a small
neighbourhood around the pixel location. If there are just two intensity values, for example,
black, and white, they are represented by the numbers 0 and 1; such images are called
binary-valued images. If 8-bit integers are used to store each pixel value, the gray levels
range from 0 (black) to 255 (white).

1.2.1 Digital Image Format


There are different kinds of image formats in the literature. We shall consider the image
format that comes out of an image frame grabber, i.e., the captured image format, and the
format when images are stored, i.e., the stored image format.
Captured Image Format
The image format is specified by two main parameters: spatial resolution, which is specified
as pixels*pixels (e.g. 640x480) and color encoding, which is specified by bits per pixel. Both
parameter values depend on hardware and software for input/output of images.
For example, for image capturing on a SPARCstation, the VideoPix card and its software are
used. The spatial resolution is 320 X 240 pixels and the color can be encoded with 1-bit (a
binary image format), 8-bit (color or grayscale) or 24-bit (color-RGB).
Stored Image Format
When we store an image, we are storing a two-dimensional array of values, in which each
value represents the data associated with a pixel in the image. For a bitmap, this value is a
binary digit.
For a color image (pixmap), the value may be a collection of:

Compiled by mail2prakashbaral@gmail.com
• Three numbers representing the intensities of the red, green and blue components of
the color at that pixel.
• Three numbers that are indices to tables of the red, green and blue intensities.
• A single number that is an index to a table of color triples.
• An index to any number of other data structures that can represent a color.
• Four or five spectral samples for each other.
In addition, each pixel may have other information associated with it; e.g., three numbers
indicating the normal to the surface drawn at that pixel.
Information associated with the image as a whole, e.g., width, height, depth, name of the
creator, etc. may also have to be stored.
The image may be compressed before storage for saving storage space. Some current image
file formats for storing images include GIF, X11 Bitmap, Sun Rasterfile, PostScript, IRIS,
JPEG, TIFF, etc.

1.3 IMAGE AND GRAPHICS FORMATS


Image File Format
There are many file formats used to store bitmaps and vectored drawing. Following is a list of
few image file formats:

Bitmap (BMP):
BMP is a standard format used by Windows to store device-independent and application
independent images. The number of bits per pixel (1, 4, 8, 15, 24, 32, or 64) for a given BMP
file is specified in a file header. BMP files with 24 bits per pixel are common.
Graphics Interchange Format (GIF)
GIF is a common format for images that appear on webpages. GIFs work well for line
drawings, pictures with blocks of solid color, and pictures with sharp boundaries between
colors. GIFs are compressed, but no information is lost in the compression process; a
decompressed image is exactly the same as the original. One color in a GIF can be designated
as transparent, so that the image will have the background color of any Web page that

Compiled by mail2prakashbaral@gmail.com
displays it. A sequence of GIF images can be stored in a single file to form an animated GIF.
GIFs store at most 8 bits per pixel, so they are limited to 256 colors.
Joint Photographic Experts Group (JPEG)
JPEG is a compression scheme that works well for natural scenes, such as scanned
photographs. Some information is lost in the compression process, but often the loss is
imperceptible to the human eye. Colour JPEG images store 24 bits per pixel, so they are
capable of displaying more than 16 million colors. There is also a grey scale JPEG format
that stores 8 bits per pixel. JPEGs do not support transparency or animation. The level of
compression in JPEG images is configurable, but higher compression levels (smaller files)
result in more loss of information. A 20:1 compression ratio often produces an image that the
human eye finds difficult to distinguish from the original. The Figure below shows a BMP
image and two JPEG images that were compressed from that BMP image. The first JPEG has
a compression ratio of 4:1 and the second JPEG has a compression ratio of about 8:1.

JPEG compression does not work well for line drawings, blocks of solid color, and sharp
boundaries. JPEG is a compression scheme, not a file format. JPEG File Interchange Format
(JFIF) is a file format commonly used for storing and transferring images that have been
compressed according to the JPEG scheme. JFIF files displayed by Web browsers use the
.jpg extension.
Exchangeable Image File (Exif)
Exif is a file format used for photographs captured by digital cameras. An Exif file contains
an image that is compressed according to the JPEG specification. An Exif file also contains
information about the photograph (date taken, shutter speed, exposure time, and so on) and
information Notes about the camera (manufacturer, model, and so on).
Portable Network Graphics (PNG)
The PNG format retains many of the advantages of the GIF format but also provides
capabilities beyond those of GIF. Like GIF files, PNG files are compressed with no loss of
information. PNG files can store colors with 8, 24, or 48 bits per pixel and gray scales with 1,
2, 4, 8, or 16 bits per pixel. In contrast, GIF files can use only 1, 2, 4, or 8 bits per pixel. A
PNG file can also store an alpha value for each pixel, which specifies the degree to which the
color of that pixel is blended with the background colour.

Compiled by mail2prakashbaral@gmail.com
Tag Image File Format (TIFF)
TIFF is a flexible and extendable format that is supported by a wide variety of platforms and
image-processing applications. TIFF files can store images with an arbitrary number of bits
per pixel and can employ a variety of compression algorithms. Several images can be stored
in a single, multiple-page TIFF file. Information related to the image (scanner make, host
computer, type of compression, orientation, samples per pixel, and so on) can be stored in the
file and arranged through the use of tags. The TIFF format can be extended as needed by the
approval and addition of new tags.
Graphics Format
Graphic image formats are specified through graphics primitives and their attributes.

• Graphic primitive – line, rectangle, circle, ellipses, specification 2D and 3D


objects.
• Graphic attribute – line style, line width, color.
Graphics formats represent a higher level of image representation, i.e., they are not
represented by a pixel matrix initially.
• PHIGS (Programmer’s Hierarchical Interactive Graphics)
• GKS (Graphical Kernel System)
Vector Drawings
Vector Drawings are completely computer generated. They are otherwise known as object
oriented graphics as they consist of objects such as shapes. Vectors are used to create
graphics such as interface elements (banners, buttons) text, line art and detailed drawings
(plans, maps). Essentially they are computer generated drawings. Effects can be added to
vector graphics to add realism, however, they need to be converted to bitmaps in order to do
this. Vectors don’t consist of pixels. Instead, they are made up of co-ordinates, shapes, line,
and colour data. Therefore they aren’t resolution dependent. It is for this reason that vector
graphics can be scaled without losing their quality. Vectors are also easier to edit. In
comparison to bitmaps, vectors have nice clean edges.
The following extract from the Adobe site gives a further definition of Vector graphics: “You
can freely move or modify vector graphics without losing detail or clarity, because they are
resolution-independent-they maintain crisp edges when resized, printed to a PostScript
printer, saved in a PDF file, or imported into a vector-based graphics application. As a result,
vector graphics are the best choice for artwork, such as logos, that will be used at various
sizes and in various output media.”
Vector file formats create smaller file sizes than bitmaps generally. Vectors still aren’t well
supported in the world wide web. To include a vector in a website it is still best to rasterize it,
for instance, convert it to a bitmap. This can however, result in the edges of the vector
graphic loosing definition and become pixelated.

Compiled by mail2prakashbaral@gmail.com
1.4 IMAGE SYNTHESIS, ANALYSIS AND TRANSMISSION
1.4.1 Computer Image Processing
Image processing is a method to perform some operations on an image, in order to get an
enhanced image or to extract some useful information from it. Computer image processing
comprises of image synthesis (generation) and image analysis (recognition). It is a type of
signal processing in which input is an image and output may be image or
characteristics/features associated with that Image, processing basically includes the
following three steps:
• Importing the image via image acquisition tools;
• Analyzing and manipulating the image;
• Output in which result can be altered image or report that is based on image analysis.
There are two types of methods used for image processing namely, analogue and digital
image processing. Analogue image processing can be used for the hard copies like printouts
and photographs. Digital image processing techniques help in manipulation of the digital
images by using computers. The three general phases that all types of data have to undergo
while using digital technique are pre-processing, enhancement, and display, information
extraction. image.

1.4.2 Dynamics in Graphics


Dynamic graphics for data, means simulating motion or movement using the computer. It
may also be thought of as multiple plots linked by time. Two main examples of dynamic
graphics are animations, and tours. An animation, very generally defined, may be produced
for time-indexed data by showing the plots in time order, for example as generated by an
optimization algorithm. With dynamic simulation, you can create many impressive effects
such as explosion, flood, storm, tornado, ocean, etc., for animations and computer games.
Motion Dynamic: With motion dynamic, objects can be moved and enabled with respect to a
stationary observer.
Update Dynamic: Update dynamic is the actual change of the shape, color, or other
properties of the objects being viewed.

1.4.3 The Framework of Interactive Graphics Systems


In interactive Computer Graphics user have some controls over the picture, i.e., the user can
make any change in the produced image. Interactive Computer Graphics require two-way
communication between the computer and the user. A User can see the image and make any
change by sending his command with an input device. The framework of interactive graphics
systems have following three components:

Compiled by mail2prakashbaral@gmail.com
• Application model:
The application model represents the data or objects to be pictured on the screen; it is
stored in an application database. The model typically stores descriptions of
primitives that define the shape of components of the object, object attributes and
connectivity relationships that describe how the components fit together. The model is
application-specific and is created independently of any particular display system.
• Application program:
Therefore, the application program must convert a description of the portion of the
model to whatever procedure calls or commands the graphics system uses to create an
image. This conversion process has two phases. The application program traverses the
application database that stores the model to extract the portions to be viewed, using
some selection or query system.
• Graphics system:
Second, the extracted geometry is put in a format that can be sent to the graphics
system. The application program handles user input. It produces views by sending to
the third component, the graphics system, a series of graphics output commands that
contain both a detailed geometric description of what is to be viewed and the
attributes describing how the objects should appear. The graphics system is
responsible for actually producing the picture from the detailed descriptions and for
passing the user's input to the application program for processing.

1.4.4 Graphics input/ output hardware


Graphics Hardware – Input: Current input technology provides us with the ubiquitous
mouse, the data tablet and the transparent, touch-sensitive panel mounted on the screen. Even
fancier input devices that supply, in addition to (x,y) screen location, 3D and higher-
dimensional input values, are becoming common, such as track-balls, space balls or the
data glove.
Track-balls can be made to sense rotation about the vertical axis in addition to that about the
two horizontal axes. However, there is no direct relationship between hand movements with
the device and the corresponding movement in 3D space.
A space-ball is a rigid sphere containing strain gauges. The user pushes or pulls the sphere in
any direction, providing 3D translation and orientation. In this case, the directions of
movement correspond to the user's attempts to move the rigid sphere, although the hand does
not actually move.

Compiled by mail2prakashbaral@gmail.com
The data glove records hand position and orientation as well as finger movements. It is a
glove covered with small, lightweight sensors. Each sensor is a short length of fiber-optic
cable, with a Light-Emitting Diode (LED) at one end and a photo- transistor at the other.
Wearing the data glove, a user can grasp objects, move and rotate them and then release
them.
Graphics Hardware –Output: Current output technology uses raster displays, which store
display primitives in a refresh buffer in terms of their component pixels. The architecture of a
raster display is shown in figure below. In some raster displays, there is a hardware display
controller that receives and interprets sequences of output commands. In simpler, more
common systems, such as those in personal computers, the display controller exists only as a
software component of the graphics library package, and the refresh buffer is no more than a
piece of the CPU's memory that can be read by the image display subsystem that produces
the actual image on the screen.

Figure 1: Architecture of Raster Display

The complete image on a raster display is formed from the raster, which is a set of horizontal
raster lines, each a row of individual pixels; the raster is thus stored as a matrix of pixels
representing the entire screen area. The entire image is scanned out sequentially by the video
controller. The raster scan is shown in figure below.

Compiled by mail2prakashbaral@gmail.com
Figure 2: Raster Scan

At each pixel, the beam's intensity is set to reflect the pixel's intensity; in color systems, three
beams are controlled - one for each primary color (red, green, blue) as specified by the three
color components of each pixel's value.
Raster graphics systems have other characteristics. To avoid flickering of the image, a 60 Hz
or higher refresh rate is used today; an entire image of 1024 lines of 1024 pixels each must be
stored explicitly and a bitmap or pixmap is generated. Raster graphics can display areas filled
with solid colors or patterns, i.e., realistic images of 3D objects.

1.4.5 Dithering
Dithering is the process by which we create illusions of the color that are not present actually.
It is done by the random arrangement of pixels. For example. Consider this image.

This is an image with only black and white pixels in it. Its pixels are arranged in an order to
form another image that is shown below. Note at the arrangement of pixels has been changed,
but not the quantity of pixels.

The growth of raster graphics has made color and grayscale an integral part of contemporary
computer graphics. The color of an object depends not only on the object itself, but also on
the light source illuminating it, on the color of the surrounding area and on the human visual
system.
What we see on a black-and-white television set or display monitor is achromatic light.
Achromatic light is determined by the attribute quality of light. Quality of light is determined
by the intensity and luminance parameters. For example, if we have hardcopy devices or

Compiled by mail2prakashbaral@gmail.com
displays which are only bi-levelled, which means they produce just two intensity levels, then
we would like to expand the range of available intensity.
The solution lies in our eye's capability for spatial integration. If we view a very small area
from a sufficiently large viewing distance, our eyes average fine detail within the small area
and record only the overall intensity of the area. This phenomenon is exploited in the
technique called half toning, or clustered-dot ordered dithering (half toning
approximation). Each small resolution unit is imprinted with a circle of black ink whose area
is proportional to the blackness 1- I (I=intensity) of the area in the original photograph.

Figure 3:five intensity levels approximated with two 2*2 dither patterns

Graphics output devices can approximate the variable- area circles of halftone reproduction.
For example, a 2 x 2 pixel area of a bi-level display can be used to produce five different
intensity levels at the cost of halving the spatial resolution along each axis. The patterns,
shown in above Figure, can be filled by 2 x 2 areas, with the number of 'on' pixels
proportional to the desired intensity. The patterns can be represented by the dither matrix.
This technique is used on devices which are not able to display individual dots (e.g., laser
printers). This means that these devices are poor at reproducing isolated 'on' pixels (the black
dots in figure above). All pixels that are 'on' for a particular intensity must be adjacent to
other 'on' pixels.

1.4.6 Image Analysis


Image analysis is concerned with techniques for extracting descriptions from images that are
necessary for higher-level scene analysis methods. By itself, knowledge of the position and
value of any particular pixel almost conveys no information related to the recognition of an
object, the description of an object's shape, its position or orientation, the measurement of any
distance on the object or whether the object is defective. Hence, image analysis techniques
include computation of perceived brightness and color, partial or complete recovery of three-
dimensional data in the scene, location of discontinuities corresponding to objects in the
scene and characterization of the properties of uniform regions in the image.
Image analysis is important in many areas: aerial surveillance photographs, Scan television
images of the moon or of planets gathered from space probes, television images taken from
an industrial robot's visual sensor, X-ray images and computerized axial tomography (CAT)
scans. Subareas of image processing include image enhancement, pattern detection and
recognition and scene analysis and computer vision.
Image enhancement deals with improving image quality by eliminating noise (extraneous or
missing pixels) or by enhancing contrast. Pattern detection and recognition deal with
detecting and clarifying standard patterns and finding distortions from these patterns. A
particularly important example is Optical Character Recognition (OCR) technology, which
allows for the economical bulk input of pages of typeset, typewritten or even hand-printed
characters.

Compiled by mail2prakashbaral@gmail.com
Scene Analysis and computer vision deal with recognizing and reconstructing 3D models of a
scene from several 2D images. An example is an industrial robot sensing the relative sizes,
shapes, positions and colors of objects.

1.4.7 Image Recognition


To fully recognize an object in an image means knowing that there is an agreement between
the sensory projection and the observed image. Agreement between the observed spatial
configuration and the expected sensory projection requires the following capabilities:
• Infer explicitly or implicitly an object's position and orientation from the spatial
configuration.
• Confirm that the inference is correct.
To infer an object's (e.g. a cup) position, orientation and category or class from the spatial
configuration of gray levels requires the capability to infer which pixels are part of the object.
Further, from among those pixels that are part of the object, it requires the capability to
distinguish observed object features, such as special markings, lines, curves, surfaces or
boundaries (e.g. edges of the cup).
Analytic inference of object shape, position and orientation depends on matching the
distinguishing image features (in 2D, a point, line segment or region) with corresponding
object features (in 3D, a point, line segment, arc segment, or a curved or planar surface). The
kind of object, background, imaging sensor and viewpoint of the sensor determine whether
the recognition problem is easy or difficult.
Computer recognition and inspection of objects is, in general, a complex procedure, requiring
a variety of steps that successively transform the iconic data into recognition information. A
recognition methodology must pay substantial attention to each of the following six steps:
image formatting, conditioning, labelling, grouping, extracting and matching.

Figure 4: Image Recognition Steps

Compiled by mail2prakashbaral@gmail.com
1. Image Formatting
Image formatting means capturing an image from a camera and bringing it into a digital form.
It means that we will have a digital representation of an image in the form of pixels.

Figure 5: Observed image

Conditioning, labelling, grouping, extracting and matching constitute a canonical


decomposition of the image recognition problem, each step preparing and transforming the
data to facilitate the next step. As these steps work on any level in the unit transformation
process, they prepare the data for the unit transformation, identify the next higher-level unit
and interpret it. The five transformation steps, in more detail, are:
2. Conditioning:
Conditioning is based on a model that suggests the observed image is composed of an
informative pattern modified by uninteresting variations that typically add to or multiply the
informative pattern. Conditioning estimates the informative pattern on the basis of the
observed image. Thus, conditioning suppresses noise, which can be thought of as random un-
patterned variations affecting all measurements. Conditioning can also perform background
normalization by suppressing uninteresting systematic or patterned variations. Conditioning
is typically applied uniformly and is context-independent.
3. Labelling:
Labelling is based on a model that suggests the informative pattern has structure as a spatial
arrangement of events, each spatial event being a set of connected pixels. Labelling
determines in what kinds of spatial events each pixel participates.
An example of a labelling operation is edge detection. Edge detection is an important part of
the recognition process. Edge detection techniques find local discontinuities in some image
attribute, such as intensity or color (e.g. detection of cup edges). These discontinuities are of
interest because they are likely to occur at the boundaries of objects.
An edge is said to occur at a point in the image if some image attribute changes in value
discontinuously at that point. Examples are intensity edges. An ideal edge, in one dimension,
may be viewed as a step change in intensity; for example, a step between high-valued and
low-valued pixels. If the step is detected, the neighbouring high-valued and low-valued pixels
are labelled as part of an edge.

Compiled by mail2prakashbaral@gmail.com
Figure 6: Edge detection of the image

Edge detection recognizes many edges, but not all of them are significant. Therefore, another
labelling operation must occur after edge detection, namely thresholding. Thresholding
specifies which edges should be accepted and which should not; the thresholding operation
filters only the significant edges from the image and labels them. Other edges are removed.
Other kinds of labelling operations include corner finding and identification of pixels that
participate in various shape primitives.
4. Grouping:
The labelling operation labels the kinds of primitive spatial events in which the pixel
participates. The grouping operation identifies the events by collecting together or identifying
maximal connected sets of pixels participating in the same kind of event. When the reader
recalls the intensity edge detection viewed as a step change in intensity, the edges are labelled
as step edges, and the grouping operation constitutes the step edge linking.

Figure 7: Thresholding the image

A grouping operation, where edges are grouped into lines, is called line-fitting. Again the
grouping operation line-fitting is performed on the image shown in Figure below:

Compiled by mail2prakashbaral@gmail.com
Figure 8: Line-fitting of the image

The grouping operation involves a change of logical data structure. The observed image, the
conditioned image and the labelled image are all digital image data structures. Depending on
the implementation, the grouping operation can produce either an image data structure in
which each pixel is given an index associated with the spatial event to which it belongs or a
data structure that is a collection of sets. Each set corresponds to a spatial event and contains
the pairs of positions (row, column) that participate in the event. In either case, a change
occurs in the logical data structure.
The entities of interest prior to grouping are pixels; the entities of interest after grouping are
sets of pixels.
5. Extraction
Extracting the grouping operation determines the new set of entities, but they are left naked in
the sense that the only thing they poses is their identity. The extracting operation computes
for each group of pixels a list of properties. Example properties might include its centroid,
area, orientation, spatial moments, gray tone moments, spatial-gray tone moments,
circumscribing circle, inscribing circle, and so on.
Other properties might depend on whether the group is considered a region or an arc. If the
group is a region, the number of holes might be a useful property. If the group is an arc,
average curvature might be a useful property. Extraction can also measure topological or
spatial relationships between two or more groupings. For example, an extracting operation
may make explicit that two groupings touch, or are spatially close, or that one grouping is
above another.
6. Matching:
After the completion of the extracting operation, the events occurring on the image have been
identified and measured, but the events in and of themselves have no meaning. The meaning
of the observed spatial events emerges when a perceptual organization has occurred such that
a specific set of spatial events in the observed spatial organization clearly constitutes an
imaged instance of some previously known object, such as a chair or the letter A. Once an
object or set of object parts has been recognized, measurements (such as the distance between
two parts, the angle between two lines or the area of an object part) can be made and related
to the allowed tolerance, as may be the case in an inspection scenario. It is the matching
operation that determines the interpretation of some related set of image events, associating
these events with some given three-dimensional object or two-dimensional shape. There are a

Compiled by mail2prakashbaral@gmail.com
wide variety of matching operations. The classic example is tem- plate matching, which
compares the examined pattern with stored models (templates) of known patterns and
chooses the best match.

1.4.8 Image Transmission:

Image transmission takes into account transmission of digital images through computer
networks. There are several requirements on the networks when images are transmitted: (1)
The network must accommodate bursty data transport because image transmission is bursty
(The burst is caused by the large size of the image.); (2) Image transmission requires reliable
transport; (3) Time-dependence is not a dominant characteristic of the image in contrast to
audio/video transmission.
Image size depends on the image representation format used for transmission. There are
several possibilities:

• Raw image data transmission:


In this case, the image is generated through a video digitizer and transmitted in its digital
format. The size can be computed in the following manner:
Size = spatial resolution x pixel quantization
For example, the transmission of an image with a resolution of 640 x 480 pixels and pixel
quantization of8 bits per pixel requires transmission of 307,200 bytes through the network.
• Compressed image data transmission
In this case, the image is generated through a video digitizer and compressed before
transmission. Methods such as JPEG or MPEG, are used to downsize the image. The
reduction of image size depends on the compression method and compression rate.
• Symbolic image data transmission
In this case, the image is represented through symbolic data representation as image
primitives (e.g., 2D or 3D geometric representation), attributes and other control information.
This image representation method is used in computer graphics. Image size is equal to the
structure size, which carries the transmitted symbolic information of the image.

Compiled by mail2prakashbaral@gmail.com
UNIT 4: VIDEO AND ANIMATION
Video is a combination of image and audio. It consists of a set of still images called frames
displayed to the user one after another at a specific speed, known as the frame rate measured in
number of frames per second (fps), If displayed fast enough our eye cannot distinguish the
individual frames, but because of persistence of vision merges the individual frames with each
other thereby creating an illusion of motion. The frame rate should range between 20 and 30 for
perceiving smooth realistic motion. Audio is added and synchronized with the apparent movement
of images.

1.1 VIDEO SIGNAL REPRESENTATION


In conventional black-and-white TV sets, the video signal is displayed using a CRT (Cathode Ray
Tube). An electron beam carries corresponding pattern information, such as intensity in a viewed
scene. To understand later reasoning behind data rates of motion video and computer-based
animation, we focus on the description of their respective signals rather than specific camera or
monitor technologies. We analyse the video signal coming from camera and the resulting pictures
(using USA standards).
Video signal representation includes three aspects:
• Visual representation
• Transmission
• Digitalization

1.1.1 Visual representation


A central objective is to offer the viewer a sense of presence in the scene and of participation in
the events portrayed. To meet this objective, the televised image should convey spatial and
temporal content of the scene. Important measures are:
1. Vertical Detail and Viewing Distance:
The geometry of the field occupied by the television image is based on the ratio of the
picture width W to height H. It is called aspect ratio. The conventional aspect ratio is
4/3=1.33.
The smallest detail that can be reproduced in the image is a pixel. Ideally, each detail of
the scene would be reproduced by one pixel. Practically, however, some of the details in
the scene inevitably fall between scanning lines, so that two lines are required for such
picture elements. Thus, some vertical resolution is lost. Measurements of this effect show
that only about 70% of the vertical detail is presented by the scanning lines. The ratio is
known as the Kell factor; it applies irrespective of the manner of scanning, whether the
lines follow each other sequentially (a progressive scan) or alternately (an interlaced scan).

2. Horizontal Detail and Picture Width:

Compiled by mail2prakashbaral@gmail.com
The picture width chosen for conventional television service is 4/3*picture height. Using
the aspect ratio, we can determine the horizontal field of view from the horizontal angle.

3. Total Detail Content of the Image:


The vertical resolution is equal to the number of picture elements separately presented in
the picture height, while the number of elements in the picture width is equal to the
horizontal resolution times the aspect ratio. The product of the number of elements
vertically and horizontally equals the total number of picture elements in the image.

4. Perception of Depth:
In natural vision, perception of the third spatial dimension, depth, depends primarily on the
angular separation of the images received by the two eyes of the viewer. In the fat image
of television, a considerable degree of depth perception is inferred from the perspective
appearance of the subject matter. Further, the choice of the focal length of lenses and
changes in depth of focus in a camera influence the depth perception.

5. Luminance and Chrominance:


Color vision is achieved through three signals, proportional to the relative intensities of
Red, Green and Blue light (RGB) in each portion of the scene. The three signals are
conveyed separately to the input terminals of the picture tube, so that the tube reproduces
at each point the relative intensities of red, green and blue discerned by the camera.
During the transmission of the signals from the camera to the receiver (display), a different
division of signals in comparison to the RGB division is often used. The color encoding
during transmission uses luminance and two chrominance signals.

6. Temporal Aspects of Illumination:


Another property of human vision is the boundary of motion resolution. In contrast to
continuous pressure waves of an acoustic signal, a discrete sequence of individual pictures
can be perceived as a continuous sequence. This property is used in television and motion
pictures, i.e., motion is the presentation of a rapid succession of slightly different still
pictures (frames). Between frames, the light is cut off briefly. To represent visual reality,
two conditions must be met. First, the rate of repetition of the images must be high enough
to guarantee smooth motion from frame to frame. Second, the rate must be high enough so
that the persistence of vision extends over the interval between flashes.

7. Continuity of motion:
It is known that we perceive a continuous motion to happen at any frame rate faster than
15 frames per second. Video motion seems smooth and is achieved at only 30 frames per
second, when filmed by a camera and not synthetically generated. Movies, however, at 24
frames/s. The new Show scan technology involves making and showing movies at 60
frames per second and on 70-millimeter films. This scheme produces a bigger picture,
which therefore occupies a larger portion of the visual field, and produces much smoother
motion.

Compiled by mail2prakashbaral@gmail.com
There are several standards for motion video signals which determine the frame rate to
achieve proper continuity of motion. The USA standard for motion video signals, NTSC
(National Television Systems Committee) standard, specified the frame rate initially to
30 frames/s, but later changed it to 29.97 Hz to maintain the visual-aural carrier separation
at precisely 4.5 MHz. NTSC scanning equipment presents images at the 24 Hz standard,
but transposes them to the 29.97 Hz scanning rate. The European standard for motion
video, PAL (Phase Alternating Line), adopted the repetition rate of 25 Hz, and the frame
rate therefore is 25 frames/s.

8. Flicker:
Through a slow motion, a periodic fluctuation of brightness perception, a flicker effect,
arises. The marginal value to avoid flicker is at least 50 refresh cycles/s. To achieve
continuous flicker-free motion, we need a relatively high refresh frequency. Movies, as
well as television, apply some technical measures to work with lower motion frequencies.

9. Temporal Aspect of Video Bandwidth:


An important factor to determine which video bandwidth to use to transmit motion video
is its temporal specification. Temporal specification depends on the rate of the visual
system to scan pixels, as well as on the human eye's scanning capabilities. For example, in
a regular TV device, the time consumed in scanning lines and frames is measured in
microseconds. In an HDTV (High Definition TV) device, however, a pixel can be scanned
in less than a tenth of a millionth of a second.
From the human visual perspective, the eye requires that a video frame be scanned every
1/25 second. This time is equivalent to the time during which a human eye does not see the
flicker effect.

1.1.2 Transmission
Video signals are transmitted to receivers through a single television channel. The NTSC channel
is shown in Figure below:

Figure 1: Bandwidth of the NTSC system

Compiled by mail2prakashbaral@gmail.com
To encode color, a video signal is a composite of three signals. For transmission purposes, a video
signal consists of one luminance and two chrominance signals. Luminance as different shades
of light in grays while chroma are different hues (shade) of color.
In NTSC systems, the composite transmission of luminance and chrominance signals in a single
channel is achieved by specifying the chrominance subcarrier to be an odd multiple of one-half of
the line-scanning frequency. This causes the component frequencies of chrominance to be
interleaved with those of luminance. The goal is to separate the two sets of components in the
receiver and avoid interference between them prior to the recovery of the primary color signals for
display.
Several approaches of color encoding are: RGB signal, composite signal

1.1.3 Digitalization
Before a picture or motion, video can be processed by a computer or transmitted over a computer
network, it needs to be converted from analog to digital representation. In an ordinary sense,
digitalization consists of sampling the gray (color) level in the picture at M x N array of points.
Since the gray level at these points may take any value in a continuous range, for digital processing,
the gray level must be quantized. By this we mean that we divide the range of gray levels into K
intervals, and require the gray level at any point to take on only one of these values. For a picture
reconstructed from quantized samples to be acceptable, it may be necessary to use 100 or more
quantizing levels.
When samples are obtained by using an array of points or finite strings, a fine degree of
quantization is very important for samples taken in regions of a picture where the gray (color)
levels change slowly. The result of sampling and quantizing is a digital image (picture), at which
point we have obtained a rectangular array of integer values representing pixels.
The next step in the creation of digital motion video is to digitize pictures in time and get a
sequence of digital images per second that approximates analog motion video.

1.2 COMPUTER VIDEO FORMAT


The computer video format depends on the input and output devices for the motion video medium.
Current video digitizers differ in digital image (frame) resolution, quantization and frame rate
(frames/s).
NTSC video signal and after digitalization can achieve spatial resolution of 640 x 480 pixels,
quantization of 8 bits/pixel (256 shades of gray) and a frame rate of 4 frames/second. The Sun
Video digitizer from Sun Microsystems, on the other hand, captures NTSC video signal in the
form of the RGB signal with frame resolution of 320 x 240 pixels, quantization of 8 bits/pixel, and
a frame rate of 30 frames/second.
The output of the digitalized motion video depends on the display device. The most often used
displays are raster displays, described in the previous chapter. A common raster display system
architecture is shown in below:

Compiled by mail2prakashbaral@gmail.com
Figure 2: Display for video controller

The video controller displays the image stored in the frame buffer, accessing the memory through
a separate access port as often as the raster scan rate dictates. The constant refresh of the display
is its most important task. Because of the disturbing flicker effect, the video controller cycles
through the frame buffer, one scan line at a time, typically 60 times/second. For presentation of
different colors on the screen, the system works with a Color Look Up Table (CLUT or lut). At a
certain time, a limited number of colors (n) is prepared for the whole picture. The set of n colors,
used mostly, is chosen from a color space, consisting of m colors, where generally n << m.
Some of the computer video controller standards are:

• CGA (Color Graphics adapter): The first color monitor and graphics cards for PC
computers. Capable of producing 16 colors at 160x200 pixels.
• EGA (Enhanced Graphics Adapter): an adapter that could display 16 colors with a screen
resolution of 640x350 pixels.
• VGA (Video Graphics Adapter): Currently the base standard for PC video cards and
monitors. True VGA supports 16 colors at 640x480 pixels or 256 colors at 320x200 pixels.
• SVGA (Super VGA): A SVGA card or monitor is capable of displaying more pixels (dots
on the screen) and/or colors than basic VGA. For example, an SVGA graphics card may
be able to display 16-bit color with a resolution of 800x600 pixels.
• XGA (Extended Graphics Array): A standard used on some IBM PS/2 models. XGA
supports 256 colors at 1024x728 pixels, or 16-bit colors at 640x480 pixels.

1.3 COMPUTER BASED ANIMATION


Animation means giving life to any object in computer graphics. It has the power of injecting
energy and emotions into the most seemingly inanimate objects. Computer-assisted animation and
computer-generated animation are two categories of computer animation. It can be presented via
film or video.
An animation covers all changes that have a visual effect. Visual effects can be of different nature.
They might include time-varying positions (motion dynamics), shape, color, transparency,

Compiled by mail2prakashbaral@gmail.com
structure and texture of an object (update dynamics), and changes in lighting, camera position,
orientation and focus.
A computer-based animation is an animation performed by a computer using graphical tools to
provide visual effects. Processes of computer based animation are as follows:

1.3.1 Input Process


Before the computer can be used, drawings must be digitized because key frames, meaning frames
in which the entities being animated are at extreme or characteristic positions, must be drawn. This
can be done through optical scanning, tracing the drawings with a data tablet or producing the
original drawings with a drawing program in the first place. The drawings may need to be post-
processed (e.g., filtered) to clean up any glitches arising from the input process.

1.3.2 Composition Stage


The composition stage, in which foreground and background figures are combined to generate the
individual frames for the final animation, can be performed with image-composition techniques.
By placing several low-resolution frames of an animation in a rectangular array, a trail film (pencil
test) can be generated using the pan-zoom feature available in some frame buffers. The frame
buffer can take a particular portion of such an image (pan) and then enlarge it to fill the entire
Screen. This process can be repeated on several frames of the animation stored in the single image.
If it is done fast enough, it gives the effect of continuity. Since each frame of the animation is
reduced to a very small part of the total image (1/25 or 1/36), and then expanded to fill the screen,
the display device's resolution is effectively lowered.

1.3.3 Inbetween Process


The animation of movement from one position to another needs a composition of frames with
intermediate positions (intermediate frames) inbetween the key frames. This is called the
inbetween process. The process of inbetweening is performed in computer-based animation
through interpolation. The system gets only the starting and ending positions.
Inbetweening also involves interpolating the shapes of objects in intermediate frames.
Interpolation may be either linear or spline. They made a skeleton for a motion by choosing a
polygonal arc describing the basic shape of a 2D figure (or portion of a figure) and a
neighbourhood of this arc. Inbetweening is performed by interpolating the characteristics of the
skeleton between the key frames. A similar technique can be developed for 3D, but generally
interpolation between key frames is a difficult problem.

1.3.4 Changing Colors


For changing colors, computer-based animation uses CLUT (lut) in a frame buffer and the process
of double buffering. The lut animation is generated by manipulating the lut. The simplest method
is to cycle the colors in the lut, thus changing the colors of the various pieces of the image. Using
lut animation is faster than sending an entire new pixmap to the frame buffer for each frame.
Assuming 8 color bits per pixel in a 640 x 512 frame buffer, a single image contains 320 Kbytes
of information. Transferring a new image to the frame buffer every 1/30 of a second requires a

Compiled by mail2prakashbaral@gmail.com
bandwidth of over 9 Mbytes per second. On the other hand, new values for the lut can be sent very
rapidly, since luts are typically on the order of a few hundred to a few thousand bytes.

1.4 ANIMATION LANGUAGE


There are many different languages for describing animation, and new ones are constantly being
developed. They fall into three categories:

1.4.1 Linear-list Notations


In linear-list notations for animation each event in the animation is described by a starting and
ending frame number and an action that is to take place (event). The actions typically take
parameters, so a statement such as
42, 53, B, ROTATE PALM'',1 ,30
Means "between frames 42 and 53, rotate the object called PALM about axis 1 by 30 degrees,
determining the amount of rotation at each frame". Many other linear-list notations have been
developed, and many are supersets of the basic linear-list idea. An example is Scefo (Scene
Format), which also includes the notion of groups and object hierarchy and supports abstractions
of changes (called actions) using higher-level programming language constructs.

1.4.2 General-purpose Languages


Another way to describe animation is to embed an animation capability within a general-purpose
programming language. The values of variables in the language can be used as parameters to the
routines, which perform the animation.
ASAS is an example of such a language. It is built on top of LISP, and its primitive entities include
vectors, colors, polygons, solids, groups, points of view, subworlds and lights. ASAS also includes
a wide range of geometric transformations that operate on objects.

1.4.3 Graphical Languages


One problem with textual languages is inability to visualize the action by looking at the script. If
a real-time previewer for textual animation languages were available, this would not be a problem;
unfortunately the production of real-time animation is still beyond the power of most computer
hardware. Graphical animation languages describe animation in a more visual way. These
languages are used for expressing, editing and comprehending the simultaneous changes taking
place in an animation. The principal notion in such languages is substitution of a visual paradigm
for a textual one. Rather than explicitly writing out descriptions of actions, the animator provides
a picture of the action.
Examples of such systems and languages are GENESYSTM, DIAL and S-Dynamics System.

1.5 METHODS OF CONTROLLING ANIMATION

Controlling animation is independent of the language used for describing it. Animation control
mechanisms can employ different techniques.

Compiled by mail2prakashbaral@gmail.com
1.5.1 Full Explicit Control
Explicit control is the simplest type of animation control. Here, the animator provides a description
of everything that occurs in the animation, either by specifying simple changes, such as scaling,
translation, and rotation, or by providing key frame information and interpolation methods to use
between key frames. This interpolation may be given explicitly or (in an interactive system) by
direct manipulation with a mouse, joystick, data glove or other input device. An example of this
type of control is the BBOP system.

1.5.2 Procedural Control


Procedural control is based on communication between various objects to determine their
properties. Procedural control is a significant part of several other control mechanisms. In
particular, in physically-based systems, the position of one object may influence the motion of
another (e.g., balls cannot pass through walls); in actor based systems, the individual actors may
pass their positions to other actors to affect the other actors' behaviours.

1.5.3 Constraint-based Systems


Some objects in the physical world move in straight lines, but many objects move in a manner
determined by other objects with which they are in contact, and this compound motion may not be
linear at all. Such motion can be modelled by constraints. Specifying an animated sequence using
constraints is often much easier to do than using explicit control. Systems using this type of control
are Sutherland's Sketchpad or Borning's ThingLab. The extension of constraint-based animation
systems to support a hierarchy of constraints and to provide motion where constraints are specified
by the dynamics of physical bodies and structural characteristics of materials is a subject of active
re- search.

1.5.4 Tracking Live Action


Trajectories of objects in the course of an animation can also be generated by tracking live action.
Traditional animation uses rotoscoping. A film is made in which people/animals act out the parts
of the characters in the animation, then animators draw over the film, enhancing the background
and replacing the human actors with their animated equivalents.
Another live-action technique is to attach some sort of indictor to key points on a person's body.
By tracking the positions of the indicators, one can get locations for corresponding key points in
an animated model. An example of this sort of interaction mechanism is the data glove, which
measures the position and orientation of the wearer's hand, as well as the flexion and
hyperextension of each finger point.

1.5.5 Kinematics and Dynamics


Kinematics refers to the position and velocity of points. A kinematic description of a scene, for
example, might say, "The cube is at the origin at time t =0. It moves with a constant acceleration
in the direction (1, 1, 5) thereafter.
By contrast, dynamics takes into account the physical laws that govern kinematics (e.g. Newton's
laws of motion for large bodies, the Euler-Lagrange equations for fluids, etc.). A particle moves

Compiled by mail2prakashbaral@gmail.com
with an acceleration proportional to the forces acting on it, and the proportionality constant is the
mass of the particle. Thus, a dynamic description of a scene might be, "At time t =0 seconds, the
cube is at position (0 meters, 100 meters, 0 meters). The cube has a mass of 100 grams. The force
of gravity acts on the cube." Naturally, the result of a dynamic simulation of such a model is that
the cube falls.

1.6 DISPLAY OF ANIMATION


To display animations with raster systems, animated objects (which may consist of graphical
primitives such as lines, polygons, and so on) must be scan-converted into their pixmap in the
frame buffer. To show a rotating object, we can scan-convert into the pixmap successive views
from slightly different locations, one after another. This scan-conversion must be done at least 10
(preferably 15 to 20) times per second to give a reasonably Smooth effect; hence a new image
must be created in no more than 100 milliseconds. From these 100 milliseconds, scan-converting
should take only a small portion of time.
For example, if scan-converting of an object takes 75 milliseconds, only 25 milliseconds remain
to erase and redraw the complete object on the display, which is not enough, and a distracting
effect occurs. Double-buffering is used to avoid this problem. The frame buffer is divided into two
images, each with half of the bits per pixel of the overall frame buffer. As an example, we describe
the display of the rotation animation. Let us assume that the two halves of the pixmap are image0
and image1.
Load look-up table to display values as background color

Scan-convert object into image0

Load look-up table to display only image0

Repeat

Scan-convert object into image1

Load look-up table to display only image

Rotate object data structure description

Scan-convert object into image0

Load look-up table to display only image0

Rotate object data structure description

Until (termination condition).

If rotating and scan-converting the object takes longer than 100 milliseconds, the animation is
quite slow, but the transition from one image to the next appears to be instantaneous. Loading the
look-up table typically takes less than one millisecond.

Compiled by mail2prakashbaral@gmail.com
1.7 TRANSMISSION OF ANIMATION
As described above, animated objects may be represented symbolically using graphical objects or
scan-converted pixmap images. Hence, the transmission of animation over computer networks
may be performed using one of two approaches:
• The symbolic representation (e.g. circle) of animation objects (e.g. ball) is transmitted
together with the operation commands (e.g. roll the ball) performed on the object, and at
the receiver side the animation is displayed. In this case, the transmission time is short
because the symbolic representation of an animated object is smaller in byte size than its
pixmap representation, but the display time at the receiver takes longer because the scan
converting operation has to be performed at the receiver side. In this approach, the
transmission rate (bits/second or bytes/second) of animated objects depends
1. On the size of the symbolic representation structure, where the animated object is
encoded,
2. On the size of the structure, where the operation command is encoded,
3. On the number of animated objects and operation commands sent per second.

• The pixmap representation of the animated objects is transmitted and displayed on the
receiver side. In this case, the transmission time is longer in comparison to the previous
approach because of the size of the pixmap representation, but the display time is shorter
because the scan-conversion of the animated objects is avoided at the receiver side. It is
performed at the sender side where animation objects and operation commands are
generated. In this approach, the transmission rate of the animation is equal to the size of
the pixmap representation of an animated object (graphical image) multiplied by the
number of graphical images per second.

THANK YOU!!!

Compiled by mail2prakashbaral@gmail.com
Unit 5: Data Compression
Why compression?
➢ To reduce the volume of data to be transmitted (text, fax, images)
➢ To reduce the bandwidth required for transmission and to reduce storage requirements
(speech, audio, video)
Data compression implies sending or storing a smaller number of bits. Although many methods
are used for this purpose, in general these methods can be divided into two broad categories:
lossless and lossy methods.

Figure 1: Data Compression Methods

1.1 STORAGE SPACE


Uncompressed graphics, audio and video data require considerable storage capacity which in
the case of uncompressed video is often not even feasible given today's technology. Data
transfer of uncompressed video data over digital networks requires very high bandwidth to be
provided for a single point-to-point communication. To provide feasible and cost-effective
solutions, most multimedia systems handle compressed digital video and audio data streams.

1.2 CODING REQUIREMENTS


Images have considerably higher storage requirements than text; audio and video have even
more demanding properties for data storage. Not only is a huge amount of storage required,
but the data rates for the communication of continuous media are also significant.
How is compression possible?
➢ Redundancy in digital audio, image, and video data.
➢ Properties of human perception
Redundancy: Adjacent audio samples are similar (predictive encoding); samples
corresponding to silence (silence removal).
In digital image, neighboring samples on a scanning line are normally similar (spatial
redundancy)

Compiled by mail2prakashbaral@gmail.com
In digital video, in addition to spatial redundancy, neighboring images in a video
sequence may be similar (temporal redundancy)
Human perception: Compressed version of digital audio, image, video need not
represent the original information exactly.
Perception sensitivities are different for different signal patterns.
Human eye is less sensitive to the higher spatial frequency components than the lower
frequencies (transform coding).

1.3 DIFFERENCE BETWEEN LOSSLESS AND LOSSY DATA COMPRESSION:

S.No Lossless data compression Lossy data compression

1. In Lossless data compression, there is no In Lossy data compression, there is a loss of


loss of any data and quality. quality and data, which is not measurable.

2. In lossless, the file is restored in its In Lossy, the file does not restore in its original
original form. form.

3. Lossless data compression algorithms are Lossy data compression algorithms are: Transform
Run Length Encoding, Huffman encoding, coding, Discrete Cosine Transform, Discrete
Shannon fano encoding, Arithmetic Wavelet Transform, fractal compression, etc.
encoding, Lempel Ziv Welch encoding.

4. Lossless compression is mainly used to Lossy compression is mainly used to compress


compress text-sound and images. audio, video, and images.

5. In lossy data compression, lossless data As compare to lossless data compression, lossy
compression holds more data. data compression holds less data.

6. File quality is high in this compression. File quality is low in the lossy data compression.

7. Lossless data compression mainly Lossy data compression mainly supports JPEG,
supports RAW, BMP, PNG, WAV. GIF, MP3, MP4, MKV.

Compiled by mail2prakashbaral@gmail.com
1.4 ENTROPY, SOURCE AND HYBRID CODING

1.4.1 Entropy coding


An entropy coding is any lossless data compression method that attempts to approach the lower
bound declared by Shannon's source coding theorem, which states that any lossless data
compression method must have expected code length greater or equal to the entropy of the
source.

1.4.1.1 Run length encoding


This method replaces the consecutive occurrences of a given symbol with only one copy of the
symbol along with a count of how many times that symbol occurs. Hence the names ‘run length’.
For example, the string AAABBCDDDD would be encoded as 3A2B1C4D.
A real-life example where run-length encoding is quite effective is the fax machine. Most faxes
are white sheets with the occasional black text. So, a run-length encoding scheme can take each
line and transmit a code for while then the number of pixels, then the code for black and the
number of pixels and so on. This method of compression must be used carefully. If there is not a

Compiled by mail2prakashbaral@gmail.com
lot of repetition in the data then it is possible the run length encoding scheme would actually
increase the size of a file.

1.4.1.2 Arithmetic coding


Arithmetic coding is a type of entropy encoding utilized in lossless data compression. Ordinarily,
a string of characters, for example, the words “hey” is represented for utilizing a fixed number of
bits per character.
Arithmetic coding yields a single codeword for each encoded string of characters. The first step
is to divide the numeric range from 0 to 1 into a number of different characters present in the
message to be sent – including the termination character – and the size of each segment by the
probability of the related character.

1.4.2 Source coding


Source coding takes into account the semantics and the characteristics of the data. Thus, the
degree of compression that can be achieved depends on the data contents. Source coding is a
lossy coding process in which there is some loss of information content. For e.g. in case of
speech the speech is transformed from the time domain to frequency domain. In the
psychoacoustic the encoder analyzes the incoming audio signals to identify perceptually
important information by incorporating several psychoacoustic principles of the human ear. One
is the critical-band spectral analysis, which accounts for the ear’s poorer discrimination in higher
frequency regions than in lower-frequency regions. The encoder performs the psychoacoustic
analysis based on either a side-chain FFT analysis or the output of the filter bank.
E.g. Differential Pulse Code Modulation, Delta Modulation, Fast Fourier Transform, Discrete
Fourier Transform, Sub-band coding etc

1.4.3 Hybrid Coding


This type of coding mechanism involves the combine use of both the source coding and the
entropy coding for enhancing the compression ratio still preserving the quality of information
content. The example of Hybrid Coding includes MPEG, JPEG, H.261, DVI techniques.

1.4.4 Major Steps of data compression:

Preparation: Preparation includes analog to digital conversion and generating an appropriate


digital representation of the information. An image is divided into blocks of 4*4 or 8*8 pixels,
and represented by a fixed number of bits per pixel.

Compiled by mail2prakashbaral@gmail.com
Processing: This involves the conversion of the information from the time domain to the
frequency domain by using DCT (discrete cosine transform). In the case of motion video
compression, interframe coding uses a motion vector for each 8*8 block.
Quantization: It defines discrete level or values that the information is allowed to take. This
process involves the reduction of precision. The quantization process may be uniform or it may
be differential depending upon the characteristics of the picture.
Entropy Encoding: This is the lossless compression method where the semantics of data is
ignored but only its characteristics are considered. It may be run length coding or entropy
coding. After compression, the compressed video stream contains the specification of the image
starting point and an identification of the compression technique may be the part of the data
stream. The error correction code may also be added to the stream. Decompression is the inverse
process of compression.

1.5 HUFFMAN ENCODING


Huffman coding is a lossless data compression algorithm. It is a lossless data
compression mechanism. It is also known as data compression encoding. It is widely used in
image (JPEG or JPG) compression. The idea is to assign variable-length codes to input
characters, lengths of the assigned codes are based on the frequencies of corresponding
characters.
Huffman encoding idea are mention in following steps:
o It assigns a variable-length code to all the given characters.
o The code length of a character depends on how frequently it occurs in the given text or
string.
o A character gets the smallest code if it frequently occurs.
o A character gets the largest code if it least occurs.
Prefix rule: The variable-length codes assigned to input characters are Prefix Codes, means the
codes (bit sequences) are assigned in such a way that the code assigned to one character is not
the prefix of code assigned to any other character.
The major steps involved in Huffman coding:
o First, construct a Huffman tree from the given input string or characters or text.
o Assign, a Huffman code to each character by traversing over the tree.

Compiled by mail2prakashbaral@gmail.com
1.5.1 Huffman Tree

Step 1: For each character of the node, create a leaf node. The leaf node of a character contains
the frequency of that character.

Step 2: Set all the nodes in sorted order according to their frequency.

Step 3: There may exist a condition in which two nodes may have the same frequency. In such a
case, do the following:

1. Create a new internal node.


2. The frequency of the node will be the sum of the frequency of those two nodes that have
the same frequency.
3. Mark the first node as the left child and another node as the right child of the newly
created internal node.

Step 4: Repeat step 2 and 3 until all the node forms a single tree. Thus, we get a Huffman tree

For example:
Character Frequency
A 5
B 9
C 12
D 13
E 16
F 45

1. Step 1: Make pairs of characters and their frequencies. i.e. (a,5), (b,9), (c,12), (d,13),
(e,16), (f,45)
2. Step 2: Sort pairs with respect to frequency, in this case given data are already sorted.
3. Step 3: Pick the first two characters and join them under a parent node. Add a new
internal node with frequency 5 + 9 = 14.

Character Frequency
C 12
D 13
New node 14
E 16
F 45

Compiled by mail2prakashbaral@gmail.com
4. Step 4: Repeat Steps 2 and 3 until, we get a single tree.
The Huffman tree will be

Character Frequency
A 5 12 14 25 45
B 9 13 16 100
C 12 14 25 30 55
D 13
E 16 16 45 45
F 45 45

1.5.2 Coding
Therefore, we get a single tree. Then, we will find the code for each character with the help of
the above tree. Assign a weight to each edge. Note that each left edge-weighted is 0 and
the right edge-weighted is 1.

Compiled by mail2prakashbaral@gmail.com
Character Frequency code-word
A 5 1100
B 9 1101
C 12 100
D 13 101
E 16 111
F 45 0

Here, no of character = 6
No of bits required to represent character = 2^3 i.e. code length will be 3.
Bits used before coding: 5*3 +9*3 + 12*3 + 13*3 +16*3 + 45*3 = 300
Bits used after coding: 5*4 + 9*4 + 12*3 + 13*3 + 16*3 + 45*0 = 179
Therefore, (300-179)/300 = 40.3% of space is saved after Huffman coding.
Q. Find the Average Code Length for the String and Length of the Encoded String of the given
string “abcdrcbd” Use the Huffman coding technique to find the representation bits of the
character.

Character Frequency

A 5

B 2

C 1

D 1

R 2

Hints:
Average Code Length = ∑ ( frequency × code length ) / ∑ ( frequency )
length= Total number of characters in the text x Average code length per character

2 JPEG

The JPEG standard for compressing continuous –tone still pictures (e.g. photographs) was
developed by joint photographic experts group, JPEG became an ISO standard. JPEG is one
algorithm, to satisfy the requirements of a broad range of still-image compression applications.

Compiled by mail2prakashbaral@gmail.com
Why we need image compression?
Let us take an example, A typical digital image has 512x480 pixels. In 24-bit colour (one byte
for each of the red, green and blue components), the image requires 737,280 bytes of storage
space. It would take about 1.5 minutes to transmit the uncompressed image over a 64kb/second
link. The JPEG algorithms offer compression rates of most images at ratios of about 24:1.
Effectively, every 24 bits of data is stuffed into 1 bit, giving a compressed file size (for the above
image dimensions) of 30,720 bytes, and a corresponding transmission time of 3.8 seconds.
There are some requirements of JPEG standard and they are:
• The JPEG implementation should be independent of image size.
• The JPEG implementation should be applicable to any image and pixel aspect ratio.
• Color representation itself should be independent of the special implementation.
• Image content may be of any complexity, with any statistical characteristics.
• The JPEG standard specification should be state of art (or near) regarding the
compression factor and achieved image quality.
• Processing complexity must permit a software solution to run on as many available
standard processors as possible. Additionally, the use of specialization hardware should
substantially enhance image quality.

2.1 STEPS OF JPEG COMPRESSION PROCESS

Figure 2: Steps of JPEG compression process


Step 1: (Block/Image Preparation)
This step involves the block preparation. For e.g. let us assume the input to be 640*480 RGB
image with 24 bits/pixel. The luminance and chrominance component of the image is calculated
using the YIQ model for NTSC system.
Y=0.30R + 0.59G + 0.11B

Compiled by mail2prakashbaral@gmail.com
I= 0.60R – 0.28G – 0.32B
Q= 0.21R – 0.52G + 0.31B
Separate matrices are constructed for Y, I and Q each elements in the range of 0 and 255. The
square blocks of four pixels are averaged in the I and Q matrices to reduce them to 320*240.
Thus the data is compressed by a factor of two. Now, 128 is subtracted from each element of all
three matrices to put 0 in the middle of the range. Each image is divided up into 8*8 blocks. The
Y matrix has 4800 blocks; the other two have 1200 blocks.

Step 2: (Discrete Cosine Transformation)


Discrete Cosine Transformation is applied to each 7200 blocks separately. The output of each
DCT is an 8*8 matrix of DCT coefficients. DCT element (0,0)is the average value of the block.
The other element tells how much spectral power is present at each spatial frequency.
Step 3: (Quantization) In this step the less important DCT coefficients are wiped out. This
transformation is done by dividing each of the coefficients in the 8*8 DCT matrix by a weight
taken from a table. If all the weights are 1 the transformation does nothing however, if the
weights increase sharply from the origin, higher spatial frequencies are dropped quickly.

Compiled by mail2prakashbaral@gmail.com
Step 4: (Differential Quantization) This step reduces the (0,0) value of each block by replacing
it with the amount it differs from the corresponding element in the previous block. Since these
elements are the averages of their respective blocks, they should change slowly, so taking the
differential values should reduce most of them to small values. The (0,0) values are referred to as
the DC components; the other values are the AC components.
Step 5: (Run length Encoding) This step linearizes the 64 elements and applies run-length
encoding to the list. In order to concentrate zeros together, a zigzag scanning pattern is used.
Finally run length coding is used to compress the elements.
Step 6: (Statistical Encoding) Huffman encodes the numbers for storage or transmission,
assigning common numbers shorter codes than uncommon ones. JPEG produces a 20:1 or even
better compression ratio. Decoding a JPEG image requires running the algorithm backward and
thus it is roughly symmetric: decoding takes as long as encoding.
Although JPEG is one algorithm, to satisfy the requirements of a broad range of still-image
compression applications, it has 4 modes of operation.

2.2 SEQUENTIAL DCT-BASED


In this mode, 8x8 blocks of the image input are formatted for compression by scanning the image
left to right and top to bottom. A block consists of 64 samples of one component that make up
the image. Each block of samples is transformed to a block of coefficients by the forward
discrete cosine transform (FDCT). The coefficients are then quantized and entropy-encoded.

Figure 3: Steps of the lossy sequential DCT-based coding mode


2.2.1 Image Processing
It basically involves the block preparation where the image samples are grouped into 8*8 pixels
and passed to the encoder. Then Discrete Cosine Transformation is applied to the blocks where
the pixel values are shifted into the range [-128,127] with zero as the center. Each of these values
is then transformed using Forward DCT (FDCT). DCT is similar to Discrete Fourier
Transformation as it maps the values from the time to the frequency domain.

Compiled by mail2prakashbaral@gmail.com
2.2.2 Quantization
The JPEG application provides a table of 64 entries. Each entry will be used for the quantization
of one of the 64 DCT-coefficients. Each of the 64 coefficient can be adjusted separately.
Each table entry is an 8-bit integer value (coefficient Qvu). The quantization process become less
accurate as the size of the table entries increase. Quantization and de-quantization must use the
same tables.

2.2.3 Entropy Encoding


During the initial step of entropy encoding, the quantized DC-coefficients are treated separately
from the quantized AC-coefficients.
• The DC-coefficient determines the basic color of the data units.
• The DCT processing order of the AC coefficients involves the zigzag sequence to
concentrate the number of zeros.

AC coefficient are DCT coefficients for which the frequency in one or both dimensions is
non zero.
DC coefficient are DCT coefficients for which the frequency in one or both dimensions is
zero

Compiled by mail2prakashbaral@gmail.com
JPEG specifies Huffman and arithmetic encoding as entropy encoding methods. However, as
this is lossy sequential DCT-based mode, only Huffman encoding is allowed. In lossy
sequential mode the framework of the whole picture is not formed but parts of it are drawn
i.e. sequentially done.

Figure 4: Sequential picture presentation used in lossy DCT-based mode

2.3 EXPANDED/ PROGRESSIVE DCT-BASED


This method produces a quick low-resolution version of the image, which is gradually
(progressively) refined to higher resolutions. This is particularly useful if the medium separating
the coder and decoder has a low bandwidth (e.g., a 14.4K modem connection to the Internet, in
turn providing a slow connection to a remote image database). The user can stop the download at
any time. This is similar to the sequential DCT-based algorithm, but the image is encoded in
multiple scans.

Figure 5: Progressive picture presentation used in the expanded lossy DCT-based mode
For the expanded lossy DCT-based mode, JPEG specifies progressive encoding in addition to
sequential encoding. At first, a very rough representation of the image appears which is
progressively refined until the whole image is formed. This progressive coding is achieved by
layered coding. Progressiveness is achieved in two different ways:
▪ By using a spectral selection in the first run only, the quantized DCT coefficients of low
frequencies of each data unit are passed in the entropy encoding. In successive runs, the
coefficients of higher frequencies are processed.

Compiled by mail2prakashbaral@gmail.com
▪ Successive approximation transfers all of the quantized coefficients in each run, but
single bits are differentiated according to their significance. The most-significant bits are
encoded first, then the less-significant bits.

2.4 LOSSLESS
The decoder renders an exact reproduction of the original digital image.
This mode is used when it is necessary to decode a compressed image identical to the original.
Compression ratios are typically only 2:1. Rather than grouping the pixels into 8x8 blocks, data
units are equivalent to single pixels. Image processing and quantization use a predictive
technique, rather than a transformation encoding one. For a pixel X in the image, one of 8
possible predictors is selected (see table below). The prediction selected will be the one which
gives the best result from the a priori known values of the pixel's neighbours, A, B, and C. The
number of the predictor as well as the difference of the prediction to the actual value is passed to
the subsequent entropy encoding.

Selection Value Prediction

0 No prediction

1 X=A

2 X=B

3 X=C

4 X=A+B-C

5 X=A+(B-C)/2

6 X=B+(A-C)/2

7 X=(A+B)/2

2.5 HIERARCHICAL
The input image is coded as a sequence of increasingly higher resolution frames. The client
application will stop decoding the image when the appropriate resolution image has been
reproduced.
This mode uses either the lossy DCT-based algorithms or the lossless compression technique.
The main feature of this mode is the encoding of the image at different resolutions. The prepared
image is initially sampled at a lower resolution (reduced by the factor 2n). Subsequently, the
resolution is reduced by a factor 2n-1 vertically and horizontally. This compressed image is then

Compiled by mail2prakashbaral@gmail.com
subtracted from the previous result. The process is repeated until the full resolution of the image
is compressed.
Hierarchical encoding requires considerably more storage capacity, but the compressed image is
immediately available at the desired resolution. Therefore, applications working at lower
resolutions do not have to decode the whole image and then subsequently reduce the resolution.

2.6 MPEG COMPRESSION PROCESS


MPEG stands for Moving Picture Coding Exports Group. MPEG algorithms compress data to
form small bits that can be easily transmitted and then decompressed. MPEG achieves its high
compression rate by storing only the changes from one frame to another, instead of each entire
frame. The video information is then encoded using a technique called Discrete Cosine
Transform (DCT). MPEG uses a type of lossy compression, since some data is removed. But the
diminishment of data is generally imperceptible to the human eye.
MPEG compression removes two types of redundancies:
Spatial redundancy:
• Pixel values are not independent, but are correlated with their neighbour both within the
same frame and across frames. So, to some extent, the value of a pixel is predictable
given the values of neighbouring pixels.
• It is removed with the help of DCT compression.
Temporal redundancy:
• Pixels in two video frames that have the same values in the same location (some objects
repeated again and again in every frame).
• It is removed with the help of Motion compensation technique Macroblock
• It is the basic hierarchical component used achieving high level of compression.
• The key to achieving a high rate of compression is to remove much redundant
information as possible.
• Entropy encoding and Huffman coding are two schemes used for encoding video
information.
• MPEG takes the advantage of the fact that there exists a correlation between successive
frames of moving pictures.
2.6.1 MPEG Compression
The MPEG compression algorithm encodes the data in 5 steps.
First a reduction of the resolution is done, which is followed by a motion compensation in order
to reduce temporal redundancy. The next steps are the Discrete Cosine Transformation (DCT)
and a quantization as it is used for the JPEG compression; this reduces the spatial redundancy

Compiled by mail2prakashbaral@gmail.com
(referring to human visual perception). The final step is an entropy coding using the Run Length
Encoding and the Huffman coding algorithm.

Figure 6: Five steps for a standard MPEG compression

Step 1: Reduction of the Resolution


The human eye has a lower sensibility to colour information than to dark-bright contrasts. A
conversion from RGB-colour-space into YUV colour components help to use this effect for
compression. The chrominance components U and V can be reduced (subsampling) to half of the
pixels in horizontal direction (4:2:2), or a half of the pixels in both the horizontal and vertical
(4:2:0).
Step 2: Motion Estimation
An MPEG video can be understood as a sequence of frames. Because two successive frames of a
video sequence often have small differences (except in scene changes), the MPEG-standard
offers a way of reducing this temporal redundancy. It uses three types of frames:
I-frames (intra), P-frames (predicted) and B-frames (bidirectional)
The I-frames are “key-frames”, which have no reference to other frames and their compression
is not that high. The P-frames can be predicted from an earlier I-frame or P-frame. P-frames
cannot be reconstructed without their referencing frame, but they need less space than the I-
frames, because only the differences are stored. The B-frames are a two directional version of
the P-frame, referring to both directions (one forward frame and one backward frame). B-frames
cannot be referenced by other P- or B-frames, because they are interpolated from forward and

Compiled by mail2prakashbaral@gmail.com
backward frames. P-frames and B-frames are called inter coded frames, whereas I-frames are
known as intra coded frames.
The references between the different types of frames are realized by a process called motion
estimation or motion compensation. The correlation between two frames in terms of motion is
represented by a motion vector. The resulting frame correlation, and therefore the pixel
arithmetic difference, strongly depends on how good the motion estimation algorithm is
implemented. The steps involved in motion estimation:
• Frame Segmentation - The Actual frame is divided into nonoverlapping blocks (macro
blocks) usually 8x8 or 16x16 pixels. The smaller the block sizes are chosen, the more
vectors need to be calculated.
• Search Threshold - In order to minimize the number of expensive motion estimation
calculations, they are only calculated if the difference between two blocks at the same
position is higher than a threshold, otherwise the whole block is transmitted.
• Block Matching - In general block matching tries to “stitch together” an actual predicted
frame by using snippets (blocks) from previous frames.
• Prediction Error Coding - Video motions are often more complex, and a simple
“shifting in 2D” is not a perfectly suitable description of the motion in the actual scene,
causing so called prediction errors.
• Vector Coding - After determining the motion vectors and evaluating the correction,
these can be compressed. Large parts of MPEG videos consist of B- and P-frames as seen
before, and most of them have mainly stored motion vectors.
Step 3: Discrete Cosine Transform (DCT)
DCT allows, similar to the Fast Fourier Transform (FFT), a representation of image data in terms
of frequency components. So the frame-blocks (8x8 or 16x16 pixels) can be represented as
frequency components. The transformation into the frequency domain is described by the
following formula:

Step 4: Quantization

Compiled by mail2prakashbaral@gmail.com
During quantization, which is the primary source of data loss, the DCT terms are divided by a
quantization matrix, which takes into account human visual perception. The human eyes are
more reactive to low frequencies than to high ones. Higher frequencies end up with a zero entry
after quantization and the domain was reduced significantly.

Step 5: Entropy Coding


The entropy coding takes two steps: Run Length Encoding (RLE ) [2] and Huffman coding [1].
These are well known lossless compression methods, which can compress data, depending on its
redundancy, by an additional factor of 3 to 4.

Compiled by mail2prakashbaral@gmail.com
UNIT 6: USER INTERFACES
The user interface is the point at which human users interact with a computer, website or
application. The goal of effective UI is to make the user's experience easy and intuitive, requiring
minimum effort on the user's part to receive the maximum desired outcome.
They include both input devices like a keyboard, mouse, trackpad, microphone, touch screen,
fingerprint scanner, e-pen and camera, and output devices like monitors, speakers and printers.
Devices that interact with multiple senses are called "multimedia user interfaces." For example,
everyday UI uses a combination of tactile input (keyboard and mouse) and a visual and auditory
output (monitor and speakers).
GUI is the acronym for graphical user interface—the interface that allows users to interact with
electronic devices, such as computers, laptops, smartphones and tablets, through graphical
elements. It’s a valuable part of software application programming in regards to human-computer
interaction, replacing text-based commands with user-friendly actions. Its goal is to present the
user with decision points that are easy to find, understand and use. In other words, GUI lets you
control your device with a mouse, pen or even your finger.
GUI was created because text command-line interfaces were complicated and difficult to learn.
The GUI process lets you click or point to a small picture, known as an icon or widget, and open
a command or function on your devices, such as tabs, buttons, scroll bars, menus, icons, pointers
and windows. It is now the standard for user-centered design in software application programming.

1.1 BASIC DESIGN ISSUES


The main emphasis in the design of multimedia user interfaces is multimedia presentation. There
are several issues which must be considered:
1. To determine the appropriate information content to be communicated.
2. To represent the essential characteristics of the information.
3. To represent the communicative intent.
4. To choose the proper media for information presentation.
5. To coordinate different media and assembling techniques within a presentation.
6. To provide interactive exploration of the information presented.
“The surface representation used by the artifact should allow the person to work with exactly
the information acceptable to the task: neither more nor less. — (Norman, cognitive artifacts,
1991)”

1.1.1 Architectural Issues


An effective presentation design should be as interactive as it is informative. The user should have
the freedom to choose the direction of navigation. This should be supported by user-oriented goals,
context sensitive help and selection of proper media in order to represent the information.

Compiled by mail2prakashbaral@gmail.com
1.1.2 Information Characteristics for Presentation
The complete set of information characteristics makes knowledge definition and representation
easier because it allows for appropriate mapping between information and presentation techniques.
The information characteristics specify:
• Types –characterization schemes are based on ordering information. There are two types
of ordered data:
1. Coordinates vs. amount which specify points in time, space or other domains.
2. Intervals vs. ratio, which suggests the type of comparisons meaningful among elements
of coordinate and amount data types.
• Relational Structures – This group of characteristics refers to the way in which a relation
maps among its domain sets (dependency). There are functional dependencies and non-
functional dependencies. An example of a relational structure which expresses functional
dependency is a bar chart. An example of a relational structure which expresses non-
functional dependency is a student entry in a relational database.
• Multi-domain Relations –Relations can be considered across multiple domains, such as:
1. Multiple attributes of a single object set (e.g.positions, colours, shapes, and/or sizes
of a set of objects in a chart);
2. Multiple object sets (e.g., a cluster of text and graphical symbols on a map);
3. Multiple displays.
• Large Data Sets – Large datasets refers to numerous attributes of collections of
heterogeneous objects (e.g. presentation of semantic networks, databases with numerous
object types and attributes of technical documents for large systems etc.)

1.1.3 Presentation Functions


Presentation function is a program which displays an object (e.g. printf for a display of a character).
It is important to specify the presentation function independent from presentation form, style or
the information it conveys. One approach of it is the set of information-seeking goals and other is
hierarchical representation of media-independent presentation goals derived from a plan-based
theory of communication.

1.1.4 Presentation Design Knowledge


To design a presentation, issues like content selection, media and presentation technique selection
and presentation coordination must be considered.
Content selection is the key to convey the information to the user. The information should be
simple and revealing. Media selection is making choice of the media that is used to convey the
content. For selecting presentation techniques, rules can be used. For eg; rules for selection
methods, i.e. for supporting a user’s ability to locate one of the facts in presentation. For e.g. the
numerical data can be effectively presented with the help of graph while audio would be suitable
for narration. Coordination can be viewed as a process of composition. Coordination needs
mechanisms such as:
• encoding techniques (e.g., among graphical attributes, sentence forms, audio attributes, or
between media)

Compiled by mail2prakashbaral@gmail.com
• presentation objects that represent facts (e.g., coordination of the spatial and temporal
arrangement of points in a chart)
• multiple displays (e.g, windows)

1.1.5 Effective Human-Computer Interaction


One of the most important issues regarding multimedia interfaces is effective human- computer
interaction of the interface, i.e., user-friendliness. Here are the main issues the user interface
designer should keep in mind:
1. Context;
2. Linkage to the world beyond the presentation display;
3. Evaluation of the interface with respect to other human-computer inter faces;
4. Interactive capabilities, and
5. Separability of the user interface from the application.

1.2 CLASSIFICATION OF SOFTWARE: SYSTEM SOFTWARE AND APPLICATION


SOFTWARE

1.2.1 Types of System Software


Here are the important types of System Software:
• Operating systems: - Operating system software helps you for the effective utilization of
all hardware and software components of a computer system.
• Programming language translators: - Transforms the instructions prepared by
developers in a programming language into a form that can be interpreted or compiled and
executed by a computer system.
• Communication Software: – Communication software allows us to transfer data and
programs from one computer system to another.
• Utility programs: – Utility programs are a set of programs that help users in system
maintenance tasks, and in performing tasks of routine nature.

1.2.2 Types of Application Software


Here, are some important types of Application Software
• Word-processing software: - It makes use of a computer for creating, modifying, viewing,
storing, retrieving, and printing documents.
• Spreadsheet software: - Spreadsheet software is a numeric data-analysis tool that allows
you to create a computerized ledger.
• Database software: - A database software is a collection of related data that is stored and
retrieved according to user demand.
• Graphics software: - It allows computer systems for creating, editing, drawings, graphs,
etc.
• Education software: - Education software allows a computer to be used as a learning and
teaching tool.

Compiled by mail2prakashbaral@gmail.com
• Entertainment software: - This type of app allows a computer to be used as an
entertainment tool.
Here are major differences between System and Application software:

System Software Application Software


They are designed to manage the resources They are designed to fulfill the
of the system, like memory and process requirements of the user for performing
management, security, etc. specific tasks.
It is written in a low-level language like a A high-level language is used to write
machine or assembly language. Application Software.
The System Software starts running when The Application Software starts when the
the system is powered on and runs until the user begins, and it ends when the user
system is powered off. stops it.
The System Software is a general-purpose Application Software is specific purpose
software software.
It is classified as a package program or It is classified as time-sharing, resource
customized program. sharing, client-server.
Installed on the computer system at the
time when the operating system is Installed as per user’s requirements.
installed.
Capable of running independently. Can’t run independently.
Users never interact with system software Users interact with application software
as it functions in the background. while using specific applications.
System software are independent of the Application software needs system
application software software to run.
Application software is not extremely
System software is crucial for the effective
important for the functioning of the
functioning of a system.
system.

1.3 VIDEO AT THE USER INTERFACE


Video is actually the continuous sequence of still images such that the rate of replacement of
images is 15 images per second (however for better quality 30 images per second is used). Thus
the video can be manipulated using the interface that is used to manipulate the image.
The user should be allowed to navigate through the video both in the forward or backward direction
possibly by the use of slider. The properties of the video like the contrast, sharpness should be
adjustable and if there is audio too the user should be allowed to fine tune it. These functionalities

Compiled by mail2prakashbaral@gmail.com
are not as simple to deliver because of the high data transfer rate necessary is not guaranteed by
most of the hardware in current graphics systems.

1.3.1 Hardware for Visualization of Motion Pictures


Special hardware for visualization of motion pictures is available today, mostly through additional
video cards. Early examples of such additional hardware are IBM-M-Motion and ActionMedia II
(Intel/IBM) cards, and the Parallax, Sun and RasterOps cards. Today, these cards have become an
integral part of the multimedia system.
Most motion video components integrated in a window system use the chromakey methods where
an application generates a video window with a certain colour. Traditionally, this colour is a certain
blue (coming from a video technique used in the TV). The window system handles, in general, the
video window as a monochrome pixel graphic window, but on the device level, there is a switch
which allows for the selection of the display between the standard graphics and motion video. This
switch usually brings the standard graphics to the screen. If the hardware switch detects motion
video, such a video window presents the video signal taken directly from a camera. Using a
communication-capable multimedia system, this camera can be controlled remotely. The video
data may be transmitted from the camera into a computer network and then displayed.

1.3.2 Example Camera Control Application


Remote camera control is used, for example, in surveillance applications. Another example is a
microscope, remotely controlled in a telesurgery environment. We discuss below an application in
which an engineer remotely controls a CIM-completion process with the help of a remote-control
video camera.
Application Specification:
A camera is connected to a computer which serves as a camera server through a standardized
analogue interface. The camera control occurs, for example, through a serial RS-232-C interface.
The camera server sends commands such as focus, zoom and position to the camera through this
serial interface. The actual control of the camera is initiated by the camera-client, which can be
located remotely.
In addition to the data path for camera control, there is also a video path., i.e. the video data are
digitized, compressed and sent by the camera-server to the camera-client where the engineer is
located. The video image taken from the camera is displayed.
User interface:
In this case, the simplest decision would have been to use the keyboard. Fixed control functions
could be assigned to individual keys. For example, the keys left, right, up, and down would move
the camera in the corresponding directions.
In a window system, individual buttons can be programmed to position a camera. Pushing the
buttons initiates the positioning process. The particular moment is stopped explicitly with the stop
button. Another possibility to position a camera is by the pushing and releasing of a button., i.e.
continuous movement of the camera follows through several consecutive ‘push’ and ‘release’
button actions.

Compiled by mail2prakashbaral@gmail.com
Instead of using buttons in a window system, positioning in different access can also be done
through scrollbars.
Direct Manipulation of the Video Window
In our setup we decided to use a very user-friendly variant known as direct manipulation of the
video window. There are two possibilities:
1. Absolute Positioning: Imagine a tree in the upper right corner of the video window. The
user positions the cursor on this object and double-clicks with the mouse. Now, the camera
will be positioned so that the tree is the center of the video window, i.e., the camera moves
in the direction of the upper right corner. This method of object pointing and activating a
movement of camera is called absolute positioning. The camera control algorithm must
derive the position command from:
o the relative position of the pointer during the object activation in the video
window; and,
o the specified focal distance.
2. Relative Positioning: Imagine the pointer to the right of the center of the video window.
By pushing the mouse button, the camera moves to the right. The relative position of the
pointer with respect to the center of the video window determines the direction of the
camera movement. When the mouse button is released, the camera movement stops. This
kind of direct manipulation in the video window is called relative positioning. A camera
can move at different speeds. A speed can be specified through the user interface as
follows:
o If the mouse has several buttons, different speeds can be assigned to each button.
For example, the left mouse button could responsible for slow, accurate motion
(e.g, for calibration of the camera). The right buttons could be for fast movement
of the camera.
o Instead of working with several mouse buttons, the distance of the pointer to the
window center could determine the speed; the larger the distance, the faster the
movement of camera.

1.4 AUDIO AT THE USER INTERFACE


Audio can be implemented at the user interface for application control. Thus, speech analysis is
necessary. Speech analysis is either speaker-dependent or speaker-independent. Speaker-
dependent solutions allow the input of approximately 25,000 different words with a relatively low
error rate.
During audio output, the additional presentation dimension of space can be introduced using two
or more separate channels to give a more natural distribution of sound. The best known example
of this is stereo.
For example, during a conference with four participants, a fixed place is assigned to each
participant. The motion video of participant L is displayed in the upper left corner of the screen.
The corresponding sound of this participant is transmitted only through left speaker. Participant M
is visually and acoustically located in the middle. Participant R is positioned to the right. In this

Compiled by mail2prakashbaral@gmail.com
example, the conference system always activates the video window with the loudest-speaking
participant. The recognition of the loudest acoustic signal can be measured over a duration of five
seconds. Therefore, short, unwanted and loud signals can be compensated for.
In the case of monophony, all audio sources have the same spatial location. A listener can only
properly understand the loudest audio signal. The same effect can be simulated by closing one ear.
Stereophony allows listeners with bilateral hearing capabilities to hear lower intensity sounds.
The concept of the audio window allows for application independent control of audio parameters,
including spatial positioning. Most current multimedia applications using audio determine the
spatial positioning themselves and do not allow the user to change it. An example of such an
application is the audio tool for SUN workstations. The figure below shows the user interface of
this audio tool:

1.5 USER- FRIENDLINESS AS THE PRIMARY GOAL


User friendliness is the main property of a good user interface. The design of user-friendly
graphical interface requires the consideration of many conditions. The addition of audio and video
to the user interface does not simplify this process. The user-friendliness is implemented by:

1.5.1 Easy to Learn instructions


The instructions guiding the use of interface should be easy to learn. The language should be
simple and graphical. The older dial phones required no time to learn. An ISDN telephone requires
more time as compared to simple touch phone.

1.5.2 Presentation
The presentation, i.e., the optical image at the user interface, can have the following variants:
• Full text
• Abbreviated text
• Icons i.e. graphics

Compiled by mail2prakashbaral@gmail.com
• Micons i.e. motion video

1.5.3 Dialogue Boxes


Different dialogue boxes should have a similar construction. This requirement applies to the design
of :
• the buttons OK and Abort
• Joined Windows
• Other applications in the same window system
Semantically similar entry functions can be located in one dialogue box instead of several dialogue
boxes.

1.5.4 Additional Design Criteria


Some additional useful hints for designing a user-friendly interface should be mentioned:
• The former of the cursor can change to visualize the current state of the system. For
example, a rotating fish instead of a static pointer shows that a task is in progress.
• If time intensive tasks are performed, the progress of the task should be presented. For
example, during the formatting of a disk, the amount formatted is displayed through a
filling bar; during the remote retrieval of a file, the number of transmitted bytes in relation
to the whole size of the file is presented. This display allows the user to evaluate the state
of the task and react to it, i.e., let the task continue or cancel it. Thus, the Abort function to
cancel the activity should always be present during a time-intensive task.
• A selected entry should be immediately highlighted as "work in progress" before
performance actually starts. This approach ensures that no further input is given to the
entry.

1.5.5 Design-specific Criteria


In addition to the above-described general criteria for the design of a user interface, the problem
specific properties of the actual task need to be considered. These properties are demonstrated
in our telephone service example. The telephone network and telephone-specific end-devices
are provided by the telephone companies. They specify the user interface characteristics:
1. The end-device must have the basic function of dialling a number. The requirement may
be that the dialling is performed using keys and that there is an alphanumeric, single- line
display. This requirement provides compatibility among different phone devices. In a
multimedia system, dialling with keys can be programmed, but it is not very meaningful;
the main advantages of the different media are unused. To provide compatibility, a key set
with corresponding user procedures should be emulated.
2. Ongoing tasks should be signalled. For example, if the call re-routing function is activated,
this function should be signalled optically on the device. In the case of a telephone
(computer) application, its state does not have to be displayed on the whole screen. A
telephone icon can be used if no window is opened, but the application is still active.
3. A telephone device must always be operational. This requirement influences the
corresponding hardware and software. If a telephone service is implemented on PCs as a

Compiled by mail2prakashbaral@gmail.com
multimedia application, these devices are not always meant to be operational for 24 hours.
It also cannot easily become operational when a call arrives.
4. The state of the telephone-device (i.e., telephone application) must be always visible.
While working with the telephone application, it gets into different states. In different states
different functions are performed. Some states imply that a function can be selected, some
states imply that a function cannot be selected. The nonselective functions can be:
• Nonexistent: The function disappears when no activation is possible.
• Displayed: The function is displayed, but marked as deactivated and any
interaction is ignored; for example, deactivated menu functions are displayed in
gray, the active functions are displayed in black.
• Overlapped: If a function is overlapped with another window which is designed as
the confirmer, this function cannot be selected. First, after closing the confirmer,
other input can be taken.
It is important to point out that the functions most often used are always visible in the form of a
control panel. It is necessary to pick carefully which functions will belong in the control panel.
5. When a call request arrives, it must be immediately signalled (e-g., ringing).
Design of a user interface is also influenced by a specific implementation environment. For
example, in addition to the primitives of the particular window system, the quality of the graphical
terminal with its resolution, size and colour-map is important.

THANK YOU!!!
Assignments:
1. What are the issues which must be considered at design of multimedia user interface?
2. How can user interface be made user-friendly? Explain
3. What are the primary goals which should be considered at design of multimedia user
interface?

Compiled by mail2prakashbaral@gmail.com
Unit 7: Abstractions for programming
The state of the art of programming
Most of the current commercially available multimedia applications are implemented in
procedure‐oriented programming languages. Application code is still highly dependent on
hardware Change of multimedia devices still often requires re‐ implementation. Common
operating system extensions try to attack these problems. Different programming possibilities
for accessing and representing multimedia data.

1 ABSTRACTIONS LEVELS

Abstraction levels in programming define different approaches with a varying degree of detail
for representing, accessing and manipulating data. The abstraction levels with respect to
multimedia data and their relations among each other shown in figure below.

Figure 1: Abstraction Levels of the Programming of Multimedia Systems


A device for processing continuous media can exit as a separate component in a computer. In
this case, a device is not part of the OS, but is directly accessible to every component and
application.

Compiled by mail2prakashbaral@gmail.com
A library, the simplest abstraction level, includes the necessary for controlling the
corresponding hardware with specific device access operations. Libraries are very useful at
the OS level, but there is no agreement over which function are best for different drivers
As with any device, multimedia devices can be bound through a device driver, respectively
with the OS.
The processing of the continuous data become part of the system software. For the continuous
data processing requires appropriate schedulers, such as rate monotonic scheduler or earliest
deadline first scheduler. Multimedia device driver embedded in OS simply considerably the
implementation of device access and scheduling.
A simpler approach in a programming environment than the system software interface for the
control of the audio and video data processing can be taken by using toolkits. Toolkits can also
hide process structures.
Language used to implement multimedia application contains abstractions of multimedia data
known as higher procedural programming language.
The object-oriented approach was the first introduce as a method for the reduction of
complexity in the software development and it is used mainly with this today. Provides the
application with a class hierarchy for the manipulation of multimedia.

2 LIBRARIES

Libraries contain the set of functions used for processing the continuous media. Libraries are
provided together with the corresponding hardware. Some libraries can be considered as
extensions of the GUI, whereas other libraries consist of control instructions passed as control
blocks to the corresponding drivers.
Libraries are very useful at the operating system level. Since, there isn’t any sufficient support
of OS for continuous data and no integration into the programming environment exists, so there
will always be a variety of interfaces and hence, a set of different libraries.
Libraries differ in their degree of abstraction.

3 SYSTEM SOFTWARE

Instead of implementing access to multimedia devices through individual libraries, the device
access can become parts of the OS. E.g. Nemo system.
The nemo system consists of the nemo trusted supervisor call (NTSC) running in the
supervisor mode and 3 domains running in user mode: system, device drivers and application.

Compiled by mail2prakashbaral@gmail.com
The NTSC code implements those functions which are required by user mode processes. It
provides support for three types of processes. System processes implement the majority of the
services provided by the OS. Devices processes are similar to system process, but are attached
to device interrupt stubs which execute in supervisor mode.
The NTSC calls are separated into two classes, one containing calls which may only be
executed by a suitable privileged system process such as kernel, the other containing calls
which may be executed by any processes.
NTSC is responsible for providing a interface between a multimedia hardware device and its
associated driver process. This device driver implementation ensure that if a device han only a
low-level hardware interface to the system software. Application processes contains user
programs. Processes interact with each other via the system abstraction – IPC (InterProcess
Communication). IPC is implemented using low-level system abstractions events and, if
required, shared memory.
Data as time capsule:
Time capsules are the special abstraction related to the file systems. These files extensions
serve as storage, modification and access for continuous media. Each logical data unit (LDU)
carries in its time capsule, in addition to its data types and actual value, its valid life span. This
concept is used widely in video than in audio.
Data as streams:
A stream denotes the continuous flow of audio and video data. A stream is established between
source and sink before the flow. Operation on a stream can be performed such as play, fast
forward, rewind and stop.
In Microsoft windows, a media control interface (MCI) provides the interface for processing
multimedia data. It allows the access to continuous media streams and their corresponding
devices.

Compiled by mail2prakashbaral@gmail.com
4 TOOLKITS

Toolkits are used for controlling the audio and video data processing in a programming
environment. Toolkit hides the process structures. It represents interfaces at the system
software level. Toolkits are used to:
• Abstract from the actual physical layer.
• Allow a uniform interface for communication with all different devices of continuous
media
• Introduce the client-server paradigm
Toolkits can also hide process structures. It would be of great value for the development of
multimedia application software to have the same toolkit on different system platforms, but
according to current experiences, this remains to be a wish, and it would cause a decrease in
performance.
Toolkit should represent interface at the system software level. In this case, it is possible to
embed them into the programming languages or object-oriented environment. Hance, the
available abstraction in the subsequent section on programming language and object-oriented
approaches.

5 HIGHER PROGRAMMING LANGUAGES


In the higher programming languages, the processing of continuous media data is influenced by a
group of similar constructed functions. These calls are mostly hardware and driver independent. The
programs in high level language (HLL) either directly access multimedia data structures, or
communicate directly with, the active processes in the real-time environment. The processing devices
are controlled through corresponding device drivers.

Media can be considered differently inside a programming language.

• Media as types
• Media as files
• Media as processes
• Programming language requirements
➢ Inter-process communication mechanism
➢ Language

5.1 MEDIA AS TYPES:


E.g. Programming expression used in OCCAM-2 which was derived from Communication Sequential
Processes. This language is used for programming of transputers. This notation was choosed because
of it’s simplicity and embedded expressions of parallel behaviour.

Compiled by mail2prakashbaral@gmail.com
a, b REAL;

ldu.left1, ldu.left2, ldu.left-mixed AUDIO_LDU;

………

WHILE

COBEGIN

PROCESS_1

Input (micro1, ldu.left1)

PROCESS_2

Input(micro2,ldu.left2)

ldu.left\_mixed:=a*ldu.left1+b*ldu.left2;

………

END WHILE

…………

One of the alternatives to programming is an HLL with libraries is the concept of media as types. In
this example, there are 2 ldus from microphones that are read and mixed. Here, the data types for
video and audio are defined. In the case of text, character is the type (the smallest addressable
element). A program can address such characters through functions and sometimes directly through
operators. They can be copied, compared with other characters, deleted, created, read from a file or
stored. Further, they can be displayed, be part of other data structures, etc.

5.2 MEDIA AS FILES


Another possibility of programming continuous media data is the consideration of continuous media
streams as files instead of data types.

file_h1= open (MICROPHONE 1,...)

file h2= open(MICROPHONE 2,...)

tile_h3= open(SPEAKER,...)

….

read (file_h1)

read (file_h2)

mix(file_3, file_h1,file_h2)

activate (file_h1 , file_h2, file_h3)

….

deactivate (file_h1 , file h2, file_h3)

rc1= close (file_h1)

Compiled by mail2prakashbaral@gmail.com
rc2= close (file_h2)

rc3= close (file\_h3)

The example describes the merging of two audio streams. The physical file is associated during the
open process of a file with a corresponding file name. The program receives a file descriptor through
which the file is accessed. In this case, a device unit, which creates or processes continuous data
streams, can be associated with a file name.

Read and write functions are based on continuous data stream behaviour. Therefore, a new value is
assigned continuously to a specific variable which is connected, for example with one read function.
On other hand, the read and write functions of discrete data occur in separate steps. For each
assignment of a new value from a file to the corresponding variable, the read function is called again.

5.3 MEDIA AS PROCESSES


The processing of continuous data contains a time-dependency because the life span of a process
equals to the life span of a connection between source and destination. A connection can exist locally,
as well as remotely. Under this consideration, it is possible to map continuous media to processes and
to integrate them in an HLL.

PROCESSS cont_Process_a;

……

On_message_do

Set_Volume………..

Set_loudness……….

………………..

…………….

[main]

pid=create(cont_process_a)

send(pid, set_volume,3)

send(pid, set_loudness)

………….

In the above example, the process cont_process_a implements a set of actions which apply to a
continuous data stream, two of them are the modification of volume set_volume and the process of
setting a volume, dependent from a band filter, set_loudness.

During the creation of the process, the identification and reservation of the used physical devices
occur. The different actions of the continuous process are controlled through an IPC mechanism.

5.4 PROGRAMMING LANGUAGE REQUIREMENTS


HLL should support parallel processing. Number of processes must be known as compile time. Process
should be defined dynamically at run-time.

Compiled by mail2prakashbaral@gmail.com
• Interprocess communication mechanism:
➢ The IPC mechanism must be able to transmit the audio and video in a timely fashion
because these media have a limited life span. The IPC must be able to:-
➢ Understand a prior and/or implicitly specified time requirements.
➢ Transmit the continuous data according to the requirements.
➢ Initiate the processing of the received continuous process on time.
• Language
➢ A simple language should be developed for the purpose of simplicity.
➢ An example of such language is OCCAM-2, ADA, parallel C-variant for transputer etc.

5.5 INTERPROCESS COMMUNICATION MECHANISM


Different processes must be able to communicate through an Inter-Processes communication
mechanism. This IPC mechanism must be able to transmit audio and video in a timely fashion because
these media have a limited life span. Therefore, The IPC must be able to:-

• Understand a prior and/or implicitly specified time requirements.


• Transmit the continuous data according to the requirements.
• Initiate the processing of the received continuous process on time.

5.6 LANGUAGE
The authors see no demand for the development of a new dedicated language. A partial language
replacement is also quite difficult because co-operation between the real-time environment and the
remaining programs requires semantic changes in the programming languages. The IPC must be
designed and implemented in real-time, the current IPC can be omitted.

A language extension is the solution proposed here. For the purpose of simplicity, a simple language
should be developed which satisfies most of the above described requirements. An example of such
language is OCCAM-2.

6 OBJECT –ORIENTED APPROACHES


The object-oriented approach was first introduced as a method for the reduction of complexity in the
software development and it is used mainly with this goal today. Further, the reuse of software
components is a main advantage of this paradigm. The basic ideas of object-oriented programming
are: data encapsulation and inheritance, in connection with class and object definitions. The programs
are implemented, instead of using functions and data structures, by using classes, objects, and
methods.

6.1 CLASS
A class represents a collection of objects having same characteristic properties that exhibit common
behaviour. It gives the blueprint or description of the objects that can be created from it. Creation of
an object as a member of a class is called instantiation. Thus, object is an instance of a class.

Example:

Let us consider a simple class, Circle, that represents the geometrical figure circle in a two–dimensional
space. The attributes of this class can be identified as follows –

Compiled by mail2prakashbaral@gmail.com
▪ x–coord, to denote x–coordinate of the center
▪ y–coord, to denote y–coordinate of the center
▪ a, to denote the radius of the circle

Some of its operations can be defined as follows –

▪ findArea(), method to calculate area


▪ findCircumference(), method to calculate circumference
▪ scale(), method to increase or decrease the radius

6.2 OBJECT
An object is a real-world element in an object–oriented environment that may have a physical or a
conceptual existence. Each object has –

▪ Identity that distinguishes it from other objects in the system.


▪ State that determines the characteristic properties of an object as well as the values of the
properties that the object holds.
▪ Behaviour that represents externally visible activities performed by an object in terms of
changes in its state.

Objects can be modelled according to the needs of the application. An object may have a physical
existence, like a customer, a car, etc.; or an intangible conceptual existence, like a project, a process,
etc.

6.3 INHERITANCE
The concept allows us to inherit or acquire the properties of an existing class (parent class) into a
newly created class (child class). It is known as inheritance. It provides code reusability.

The existing classes are called the base classes/parent classes/super-classes, and the new classes are
called the derived classes/child classes/subclasses. The subclass can inherit or derive the attributes
and methods of the super-class(es) provided that the super-class allows so. Besides, the subclass may
add its own attributes and methods and may modify any of the super-class methods. Inheritance
defines an “is – a” relationship.

Example

From a class Mammal, a number of classes can be derived such as Human, Cat, Dog, Cow, etc. Humans,
cats, dogs, and cows all have the distinct characteristics of mammals. In addition, each has its own
particular characteristics. It can be said that a cow “is – a” mammal.

6.4 POLYMORPHISM
Polymorphism is originally a Greek word that means the ability to take multiple forms. In object-
oriented paradigm, polymorphism implies using operations in different ways, depending upon the
instance they are operating upon. Polymorphism allows objects with different internal structures to
have a common external interface. Polymorphism is particularly effective while implementing
inheritance.

Example

Let us consider two classes, Circle and Square, each with a method findArea(). Though the name and
purpose of the methods in the classes are same, the internal implementation, i.e., the procedure of

Compiled by mail2prakashbaral@gmail.com
calculating area is different for each class. When an object of class Circle invokes its findArea() method,
the operation finds the area of the circle without any conflict with the findArea() method of the Square
class.

[Note: see book for more references]

6.5 APPLICATION-SPECIFIC METAPHORS AS CLASSES


Multimedia metaphor is a set of user interface visuals, actions and procedures that exploit specific
knowledge that users already have of other domains. An application-specific class hierarchy
introduces abstraction specifically designed for a particular application. Thus, it is not necessary to
consider other class hierarchies. Using this approach, one very easily abandons the actual advantages
of object-oriented programming, i.e. the reuse of existing code.

6.6 APPLICATION-GENERIC METAPHORS AS CLASSES


This approach is to combine similar functionalities of all applications. This properties or functions
which occur repeatedly can be defined and implemented as classes for all applications. An application
is defined only through a binding of this class. For example, basic functions or functional units can
create classes. The methods of these classes inherit the general methods through integration of
application-specific subclasses.

6.7 DEVICES AS CLASSES


In this section we consider objects which reflect a physical view of the multimedia system. The devices
are assigned to objects which 'represent their behaviour and interface.

Methods with similar semantics, which interact with different devices, should be defined in a device-
independent manner. The considered methods use internally, for example, methods like start, stop
and seek. Some units can manipulate several media together.

A computer-controlled VCR or a Laser Disc Player (LDP) are storage units which, by themselves,
integrate (bind) video and audio. In a multimedia system, abstract device definitions can be provided,
e.g., camera and monitor. We did not say anything until now about the actual implementation. The
results show that defining a general and valid interface for several similar audio and video units, as
well as input and output units, is quite a difficult design process.

6.8 PROCESSING UNITS AS CLASSES


This abstraction comprises source objects, destination objects and combined source-destination
objects which perform intermediate processing of continuous data. With this approach, a kind of
“lego” system is created which allows for the creation ofa dataflow path through a connection of
objects. The outputs of objects are connected with inputs of other objects, either directly or through
channel objects.

6.9 DISTRIBUTION OF BMOS AND CMOS


In a multimedia environment, the data elements are more complex, taking the form of video, voice,
text, images and may be real time in nature or can be gathered from a stored environment. More
importantly, the separate data objects may combined into more complex forms so that the users may
want to create new objects by concatenating several simpler objects into a complex whole. Thus, we
can conceive of a set of three objects composed of an image, a voice annotation and a pointer motion

Compiled by mail2prakashbaral@gmail.com
annotating the voice annotation. The combination of all three of these can also be viewed as a single
identifiable multimedia object.

We can consider a multimedia data object to be composed of several related multimedia data objects
which are a voice segment, an image and a pointer movement (e.g. mouse movement). As we have
just described, these can be combined into a more complex object. We call the initial objects Simple
Multimedia Objects (SMOs) and the combination of several a Compound Multimedia Object (CMO).
In general a multimedia communications process involves one or multiple SMOs and possibly several
CMOs.

The SMO contains two headers that are to be defined and a long data string. The data string , we call
a Basic Multimedia Object (BMO). There may be two types of BMOs. The first type we call a segmented
BMO or SG:BMO. It has a definite length in data bits and may result from either a stored data record
or from a generated record that has a natural data length such as a single image, screen or text record.
We show the SMO. The second type of BMO is a streamed BMO, ST:BMO. This BMO has an a priori
undetermined duration. Thus it may be a real time voice or video segment.

A CMO has two headers, the Orchestration header and the Concatenation header. The Orchestration
header describes the temporal relationship between the SMOs and ensures that they are not only
individually synchronized but also they are jointly orchestrated. The orchestration concept has also
been introduced by Nicolaou. The concatenation function provides a description of the logical and
spatial relationships amongst the SMOs.

We can also expand the concept of a CMO as a data construct that is created and managed by multiple
users at multiple locations. In this construct we have demonstrated that N users can create a CMO by
entering multiple SMOs into the overall CMO structure. The objectives of the communications system
are thus focused on meeting the interaction between users who are communicating with CMOs.
Specifically we must be able to perform the following tasks:

▪ Allow any user to create an SMO and a CMO.


▪ Allow any user or set of users to share, store, or modify a CMO.
▪ Ensure that the user to user communications preserves the temporal, logical and spatial
▪ relationships between all CMOs at all users at all times.
▪ Provide an environment to define, manage and monitor the overall activity.
▪ Provide for an environment to monitor, manage and restore all services in the event of system
failures or degradation.

6.10 MEDIA AS CLASSES


The Media Class Hierarchy defines a hierarchical relation for different media. The following example
shows such a class hierarchy. Thus, the individual methods of the class hierarchy will not be described.
The class Pixel in the class hierarchy uses, for example, multiple inheritance.

Compiled by mail2prakashbaral@gmail.com
Medium

AcousticMedium

Music

Opus

Note

Audio_Block

Sample Value

Speech

…….

…….

Optical_Medium

Video

Video_Scene

Image

Image_Segment

Pixel

Line

Pixel

Column

Pixel

Animation

……

Text

…….

other-continuous _medium

discrete_medium

A specific property of all multimedia objects is the continuous change of their internal states during
their life spans. Data transfer of continuous media is performed as long as the corresponding
connection is activated.

6.11 COMMUNICATION-SPECIFIC METAPHORS AS CLASSES


Communication-oriented approaches often consider objects in a distributed environment through an
explicit specification of classes and objects tied to a communication system. Blakowski specifies,
information, presentation and transport classes.

The information contained in the information objects can build a presentation object which is later
used for presentation of information. Information objects can be converted to transport objects for
transmission purposes. Information is often processed differently which depends on whether the
information should be presented, transmitted or stored.

Compiled by mail2prakashbaral@gmail.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy