Multimedia Note
Multimedia Note
The word multi and media are combined to form the word multimedia. The word “multi”
signifies “many.” Multimedia is a type of medium that allows information to be easily
transferred from one location to another. Multimedia is an interactive media and provides
multiple ways to represent information to the user in a powerful manner. It provides an
interaction between users and digital information. It is a medium of communication. Some of
the sectors where multimedia’s is used extensively are education, training, reference material,
business presentations, advertising and documentaries.
Multimedia becomes interactive multimedia when a user is given the option of controlling the
elements, such as text, drawings, still and moving images(videos) graphics, audio, animation,
and any other media in which any type of information can be expressed, stored, communicated,
and processed digitally. Multi and Media refers to many types of media (hardware/software)
used for communication of information.
Compiled by mail2prakashbaral@gmail.com
1.1.1 Objective of multimedia
• Data representation
• Data processing
• Data compression
• Data transmission
• Mobiles games
• Data security
• Human computer interaction
Compiled by mail2prakashbaral@gmail.com
or media archives. Examples are electronic publishing, online galleries or weather
information systems.
• Remote Representation: By means of a remote representation system a user can take
part in or monitor events at a remote location. Important examples are the distance
conferencing or lecturing virtual reality or remote robotic agents.
• Entertainment: This major application area of multimedia technology is strongly
oriented towards the audio and video data. Example entertainment applications are
digital television, video on demand, distributed games or interactive television.
• Multimedia in Marketing and Advertising- By using multimedia marketing of new
products can be greatly enhanced. Multimedia boost communication on an affordable
cost opened the way for the marketing and advertising personnel.
• Multimedia in Hospital- Multimedia best use in hospitals is for real time monitoring
of conditions of patients in critical illness or accident. The conditions are displayed
continuously on a computer screen and can alert the doctor/nurse on duty if any changes
are observed on the screen.
• Device domain:
It deals with interaction between multimedia application and multimedia devices such
as Accelerated Graphics Port (AGP) Card, Sound Card etc. Basic concepts for the
processing of digital audio and video data are based on digital signal processing.
Different methods for the processing of image, graphics and animation are described.
The audio techniques section includes music, Musical Instrument Digital Interface
(MIDI) and speech processing.
Compiled by mail2prakashbaral@gmail.com
• System Domain:
The interface between the device domain and the system domain is specified by the
computer technology. To utilize the device domain, several system services are needed.
Basically, three services exit. These services are mostly implemented in software. The
operating system, serves as an interface between computer hardware/system and all
other software components. It provides the user with a programming and computational
environment, which should be easy to operate. The database system allows a structured
access to data and a management of large databases. The communication system is
responsible for data transmission according to the timing and reliability requirements
of the networked multimedia.
• Application domain:
Provides functions to the user to develop and present multimedia projects. This includes
software tools, and multimedia projects development methodology. The services of the
system domain are offered to the application domain through proper programming
abstractions. Another topic embedded in the application domain is document handling.
• Cross domain:
It turns out that, some aspects such as synchronization aspects, are difficult to locate in
one or two components or domains. The reason is that synchronization, being the
temporal relationship among various media, relates to many components across all
domains.
Compiled by mail2prakashbaral@gmail.com
1. Perception Medium: Perception media help human to sense their environment. The
central question is how human perceive information in a computer environment. The
answer is through seeing and hearing.
Seeing: For the perception of information through seeing the usual such as text,
image and video are used
Hearing: For the perception of information through hearing media such as
music, noise and speech are used.
2. Representation medium: Representation media are defined by internal computer
representation of information. The central question is how the computer information is
coded? The answer is that various format is used to represent media information in
computer.
3. Presentation medium: Presentation media refer to the tools and devices for the input
and output of the information. The central question is, through which the information
is delivered by the computer and is introduced to the computer.
4. Storage medium: Storage Media refer to the data carrier which enables storage of
information. The central question is, how will information be stored? The answer is
hard disk, CD-ROM, Floppy, Micro- film, printed documents, digital storage etc.
5. Transmission medium: Transmission Media are the different information carrier that
enables continuous data transmission. The central question is, over which information
will be transmitted? Information is transmitted over network either by using wired or
wireless connection. Wired connection can be twisted pair, coaxial cable, optical fiber
cable etc. Wireless connection can be satellite connection or radio link connections etc.
6. Information exchange medium: Information exchange media includes all information
carrier for transmission, i.e. all storage and transmission media.
• Continuous media types such as video need a lot of space to store and very high
bandwidth to transmit.
• They also have tight timing constraints.
• Automatically analysing, indexing and organizing information in audio, image and
video is much harder than from text.
• Multimedia involves many different research areas and needs more complex and more
efficient algorithms and hardware platforms.
• Distributed Network
• Temporal relationship between data
Compiled by mail2prakashbaral@gmail.com
▪ Render different data at same time- continuous data
▪ Sequencing within the media
▪ Synchronisation – inter medium scheduling
• Data representation – digitally- analog – digital conversion, sampling etc.
• Large data requirements- bandwidth, storage, compression.
• Graphics- Graphics make the multimedia application attractive. In many cases people do
not like reading large amount of textual matter on the screen. Therefore, graphics are used
more often than text to explain a concept, present background information etc. There are
two types of Graphics:
▪ Bitmap images- Bitmap images are real images that can be captured from devices
such as digital cameras or scanners. Generally, bitmap images are not editable.
Bitmap images require a large amount of memory.
▪ Vector Graphics- Vector graphics are drawn on the computer and only require a
small amount of memory. These graphics are editable.
• Audio- A multimedia application may require the use of speech, music and sound effects.
These are called audio or sound element of multimedia. Speech is also a perfect way for
teaching. Audio are of analog and digital types. Analog audio or sound refers to the original
sound signal. Computer stores the sound in digital form. Therefore, the sound used in
multimedia application is digital audio.
• Video- The term video refers to the moving picture, accompanied by sound such as a picture
in television. Video element of multimedia application gives a lot of information in small
duration of time. Digital video is useful in multimedia application for showing real life
objects. Video have highest performance demand on the computer memory and on the
bandwidth if placed on the internet. Digital video files can be stored like any other files in
the computer and the quality of the video can still be maintained. The digital video files can
be transferred within a computer network. The digital video clips can be edited easily.
References:
• Multimedia: Computing, Communications and Applications”, Ralf Steinmetz and
Klara Nahrstedt, Pearson Education Asia
Compiled by mail2prakashbaral@gmail.com
• “Multimedia Communications, Applications, Networks, protocols ad Standards”, Fred
Halsall, Pearson Education Asia
• “Multimedia Systems”, John F. Koegel Buford, Pearson Education Asia
Assignments:
1. Describe the data stream characteristics for continuous media.
2. Define Multimedia and explain how media can be classified.
3. Define Multimedia. Explain the characteristics of multimedia.
4. What do you mean by medium? Define different types of medium.
5. What is multimedia? With suitable example, discuss the definition and properties of
a multimedia system.
A Gentle Advice: Please go through your text books and reference books for detail study!!!
Thank you all.
Compiled by mail2prakashbaral@gmail.com
UNIT 2: SOUND /AUDIO SYSTEM (6 HRS.)
The pattern of the oscillation is called a waveform. The waveform repeats the same shape at
regular intervals and this point is called a period. Since sound wave forms occur naturally,
Sound waves are never perfectly smooth or uniformly periodic.
The wave form repeats the same shape at regular intervals which is called period. (one complete
cycle). The natural sounds are not perfectly smooth or uniformly period. The sounds which
have recognizable periodicity tend to be more musical than non-periodic sounds.
Example of periodic sounds sources are musical instrument, vowel sounds whistling wind, bird
songs, etc
Non periodic sounds are coughs, sneezing, rousing water etc.
Sound vs Audio
• The key difference between sound and audio is their form of energy.
• Sound is mechanical wave energy (longitudinal sound waves) that propagate through a
medium causing variation in pressure within the medium.
• Audio is made of electrical energy (analog or digital signals) that represent sound
electrically.
To put it simply:
Compiled by mail2prakashbaral@gmail.com
• Microphone is sound field moves according to the varying pressure exerted on it,
Transducer convert energy into voltage level (energy of another from – electrical
energy)
1.2 FREQUENCY
The frequency of a sound is the reciprocal value of the period. It represents the number of times
the pressure rises and falls, or oscillates, in a second and is measured in hertz (Hz) or cycles
per second (cps). A frequency of 100 Hz means 100 oscillations per second. A convenient
abbreviation, kHz for kilohertz, is used to indicate thousands of oscillations per second: 1 kHz
equals 1000 Hz.
The frequency range of normal human hearing extends from around 20 Hz up to about 20 kHz.
Represents the number of periods in a second and is measured in hertz (Hz) or cycles per
second.
Frequency represents the number of periods in a second (measured in hertz, cycles/second).
Some of the frequency ranges are:
• Infra sound: 0 - 20 Hz
• Human audible sound: 20 Hz - 20KHz
• Ultra sound: 20KHz - 1GHz
• Hyper sound: 1GHz - 10THz
Human audible sound is also called audio or acoustic signals (waves). Speech is an acoustic
signal produced by the humans.
1.3 AMPLITUDE
The amplitude of the sound is the measure of the displacement of the air pressure wave from
its mean or quiescent state. The greater the amplitude, the louder the sound.
Subjectively heard as loudness. Measured in decibels.
• 0 db - essentially no sound heard
• 35 db - quiet home
• 70 db - noisy street
• 120db - discomfort
Compiled by mail2prakashbaral@gmail.com
Sampling - Sampling is a process of measuring air pressure amplitude at equally spaced
moments in time, where each measurement constitutes a sample.
A sampling rate is the number of times the analog sound is taken per second. A higher sampling
rate implies that more samples are taken during the given time interval and ultimately, the
quality of reconstruction is better.
To discretize the signals, the gap between the samples should be fixed. That gap can be termed
as a sampling period Ts.
Where,
Sampling frequency is the reciprocal of the sampling period. This sampling frequency, can be
simply called as Sampling rate. The sampling rate denotes the number of samples taken per
second, or for a finite set of values.
The sampling rate is measured in terms of Hertz, Hz in short, which is the term for Cycle per
second. A sampling rate of 5000 Hz (or 5kHz, which is more common usage) implies that mt
uj vu8i 9ikuhree sampling rates most often used in multimedia are 44.1kHz(CD-quality),
22.05kHz and 11.025kHz.
Encoding - Encoding converts the integer base-10 number to a base-2 that is a binary number.
The output is a binary expression in which each bit is either a 1(pulse) or a 0(no pulse).
Compiled by mail2prakashbaral@gmail.com
1.4.1 Quantization of Audio
Quantization is a process to assign a discrete value from a range of possible values to each
sample. Number of samples or ranges of values are dependent on the number of bits used to
represent each sample. Quantization results in stepped waveform resembling the source signal.
• Quantization Error/Noise - The difference between sample and the value assigned to it
is known as quantization error or noise.
• Signal to Noise Ratio (SNR) - Signal to Ratio refers to signal quality versus quantization
error. Higher the Signal to Noise ratio, the better the voice quality. Working with very
small levels often introduces more error. So instead of uniform quantization, non-uniform
quantization is used as companding. Companding is a process of distorting the analog
signal in controlled way by compressing large values at the source and then expanding
at receiving end before quantization takes place.
Compiled by mail2prakashbaral@gmail.com
Sound Bit Depth
Sampling rate and sound bit depth are the audio equivalent of resolution and colour depth of a
graphic image. A single bit rate and single sampling rate are recommended throughout the
work. Bit depth depends on the amount of space in bytes used for storing a given piece of audio
information. Higher the number of bytes higher is the quality of sound. Multimedia sound
comes in 8-bit, 16-bit, 32-bit and 64-bit formats. An 8-bit has 28 or 256 possible values. A
single bit rate and single sampling rate are recommended throughout the work. An audio file
size can be calculated with the simple formula:
File Size in Disk = (Length in seconds) × (sample rate) × (bit depth/8 bits per byte).
Bit Rate refers to the amount of data, specifically bits, transmitted or received per second. It is
Notes comparable to the sample rate but refers to the digital encoding of the sound. It refers
specifically to how many digital 1s and 0s are used each second to represent the sound signal.
This means the higher the bit rate, the higher the quality and size of your recording. For
Compiled by mail2prakashbaral@gmail.com
instance, an MP3 file might be described as having a bit rate of 320 kb/s or 320000 b/s. This
indicates the amount of compressed data needed to store one second of music.
WAV
WAV is the Waveform format. It is the most commonly used and supported format on the
Windows platform. Developed by Microsoft, the Wave format is a subset of RIFE RIFF is
capable of sampling rates of 8 and 16 bits. With Wave, there are several different encoding
methods to choose from including Wave or PCM format. Therefore, when developing sound
for the Internet, it is important to make sure you use the encoding method that the player you’re
recommending supports.
AU
AU is the Sun Audio format. It was developed by Sun Microsystems to be used on UNIX,
NeXT and Sun Sparc workstations. It is a 16-bit compressed audio format that is fairly
prevalent on the Web. This is probably because it plays on the widest number of platforms.
RA
RA is Progressive Networks RealAudio format. It is very popular for streaming audio on the
Internet because it offers good compression up to a factor of 18. Streaming technology enables
a sound file to begin playing before the entire file has been downloaded.
AIFF
AIFF or AFF is Apple’s Audio Interchange File Format. This is the Macintosh waveform
format. It is also supported on IBM compatibles and Silicon Graphics machines. The AIFF
format supports a large number of sampling rates up to 32 bits.
MPEG
MPEG and MPEG2 are the Motion Picture Experts Group formats. They are a compressed
audio and video format. Some Web sites use these formats for their audio because their
compression capabilities offer up to a factor of at least 14:1. These formats will probably
become quite.
Compiled by mail2prakashbaral@gmail.com
2.1 MIDI BASIC CONCEPTS
MIDI is a standard that manufacturers of electronic musical instruments have agreed upon. It
is a set of specifications they use in building their instruments so that the instruments of
different manufacturers can, without difficulty, communicate musical information between one
another.
A MIDI interface has two different components:
Hardware connects the equipment. It specifies the physical connection between musical
instruments, stipulates that a MIDI port is built into an instrument, specifies a MIDI cable
(which connects two instruments) and deals with electronic signals that are sent over the cable.
A data format encodes the information traveling through the hardware. A MIDI data format
does not include an encoding of individual samples as the audio format does. Instead of
individual samples, an instrument- connected data format is used. The encoding includes,
besides the instrument specification, the notion of the beginning and end of a note, basic
frequency and sound volume. MIDI data allow an encoding of about 10 octaves, which
corresponds to 128 notes.
The MIDI data format is digital; the data are grouped into MIDI messages. Each MIDI message
communicates one musical event between machines. These musical events are usually actions
that a musician performs while playing a musical instrument. The action might be pressing
keys, moving slider controls, setting switches and adjusting foot pedals.
When a musician presses a piano key, the MIDI interface creates a MIDI message where the
beginning of the note with its stroke intensity is encoded. This message is transmitted to another
machine. In the moment the key is released, a corresponding signal (MIDI message) is
transmitted again. For ten minutes of music, this process creates about 200 Kbytes of MIDI
data, which is essentially less than the equivalent volume of a CD-audio coded stream in the
same time.
If a musical instrument satisfies both components of the MIDI standard, the instrument is MIDI
device (e.g. a synthesizer), capable of communicating with other MIDI devices through
channels. The MIDI standard specifies 16 channels. A MIDI device (musical instrument) is
mapped to a channel. Music data, transmitted through a channel, are reproduced at the receiver
side with the synthesizer instrument. The MIDI standard identifies 128 instruments, including
noise effects (e-g, telephone, air craft), with unique numbers. For example, 0 is for the Acoustic
Grand Piano, 12 for the marimba, 40 for the violin, 73 for the flute, etc.
Some instruments allow only one note to be played at a time, such as the flute. 0ther instruments
allow more than one note to be played simultaneously, such as the organ. The maximum
number of simultaneously played notes per channel is a main property of each synthesizer. The
range can be from 3 to 16 notes per channel. To tune a MIDI device to one or more channels,
the device must be set to one of the MIDI reception modes. There are four modes:
• Mode 1: Omni On/Poly;
• Mode 2: Omni On/Mono;
• Mode 3: Omni Off/Poly;
• Mode 4: Omni Off/Mono
Compiled by mail2prakashbaral@gmail.com
The first half of the mode name specifies how the MIDI device monitors the incoming MIDI
channels. If Omni is turned on, the MIDI device monitors all the MIDI channels and responds
to all channel messages, no matter which channel they are transmitted on. If Omni is turned
off, the MIDI device responds only to channel messages sent on the channel(s) the device is
set to receive.
The second half of the mode name tells the MIDI device how to play notes coming in over the
MIDI cable. If the option Poly is set, the device can play several notes at a time. If the mode is
set to Mono, the device plays notes like a monophonic synthesizer one note at a time.
Compiled by mail2prakashbaral@gmail.com
2.3 MIDI MESSAGES:
MIDI messages are used by MIDI devices to communicate with each other and to determine
what kinds of musical events can be passed from device to device.
Structure of MIDI messages:
➢ MIDI message includes a status byte and up to two data bytes.
➢ Status byte
➢ The most significant bit of status byte is set to 1.
➢ The 4 low-order bits identify which channel it belongs to (four bits produce 16
possible channels).
➢ The 3 remaining bits identify the message.
➢ Data Byte: The most significant bit of data byte is set to 0.
Classification of MIDI messages:
Compiled by mail2prakashbaral@gmail.com
of the sender. For example, a MIDI clock helps keep separate sequencers in the same MIDI
system playing at the same tempo. When a master sequencer plays a song, it sends out a stream
of Timing Clock' messages to convey the tempo to other sequencers. The faster the Timing
Clock messages come in, the faster the receiving sequencer plays the song. To keep a standard
timing reference, the MIDI specifications state that 24 MIDI clocks equal one quarter note.
As an alternative, the SMPTE timing standard (Society of Motion Picture and Television
Engineers) can be used. The SMPTE timing standard was originally developed by NASA as a
way to mark incoming data from different tracking stations so that receiving computers could
tell exactly what time each piece of data was created. In the film and video version promoted
by the SMPTE, the SMPTE timing standard acts as a very precise clock that stamps a time
reading on each frame and fraction of a frame, counting from the beginning of a film or video.
To make the time readings precise, the SMPTE format consists of hours: minutes: seconds:
frames: bits (e.g., 30 frames per second), uses a 24-hour clock and counts from 0 to 23 before
recycling to 0. The number of frames in a second differs depending on the type of visual
medium. To divide time even more precisely, SMPTE breaks each frame into 80 bits (not
digital bits). When SMPTE is counting bits in a frame, it is dividing time into segments as
small as one twenty-five hundredth of a second.
Because many film composers now record their music on a MIDI recorder, it is desirable to
synchronize the MIDI recorder with video equipment. A SMPTE synchronizer should be able
to give a time location to the MIDI recorder so it can move to that location in the MIDI score
(pre-recorded song) to start playback or recording But MIDI recorders cannot use incoming
SMPTE signals to control their recording and playback. The solution is a MIDI/SMPTE
synchronizer that converts SMPTE into MIDI, and vice versa. The MIDI/SMPTE synchronizer
lets the user specify different tempos and the exact points in SMPTE timing at which each
tempo is to start, change, and stop. The synchronizer keeps these tempos and timing points in
memory. As a SMPTE video deck plays and sends a stream of SMPTE times to the
synchronizer, the synchronizer checks the incoming time and sends out MIDI clocks at a
corresponding tempo.
Compiled by mail2prakashbaral@gmail.com
➢ Sensing stage: Data is collected from controllers reading the gesture information from
human performers on stage.
➢ Processing stage: Computer reads and interprets information coming from the sensors
and prepares data for the response stage.
➢ Response stage: Computer ad some collection of sound producing devices share in
realizing a musical output.
3 SPEECH GENERATION
Speech can be perceived, understood and generated by humans and by machines. Generated
speech must be understandable and must sound natural. The requirement of understandable
speech is a fundamental assumption, and the natural sound of speech increases user acceptance.
Speech signals have two properties which can be used in speech processing:
• Voiced speech signals show during certain time intervals almost periodic behavior.
Therefore, we can consider these signals as quasi-stationary signals for around 30
milliseconds.
• The spectrum of audio signals shows characteristic maxima, these maxima, called
formants, occur because of resonances of the vocal tract.
Speech Generation:
An important requirement for speech generation is real-time signal generation. With such a
requirement met, a speech output system could transform text into speech automatically
without any lengthy pre-processing.
Generated speech must be understandable and must sound natural. The requirement of
understandable speech is a fundamental assumption, and the natural sound of speech increases
user acceptance.
Compiled by mail2prakashbaral@gmail.com
Consonants: a speech sound produced by a partial or complete obstruction of the air stream
by any of the various constrictions of the speech organs (e.g., voiced consonants, such as m
from mother, fricative voiced consonants, such as v from voice, fricative voiceless consonants,
such as s from nurse, plosive consonants, such as d from daily and affricate consonants, such
as dg from knowledge. or ch from chew).
Compiled by mail2prakashbaral@gmail.com
Step 1: Generation of a Sound Script
Transcription from text to a sound script using a library containing (language specific) letter ‐
to ‐phone rules. A dictionary of exceptions is used for word with a non ‐standard pronunciation.
Step 2: Generation of Speech
The sound script is used to drive the time ‐ or frequency ‐dependent sound concatenation
process.
4 SPEECH ANALYSIS
Purpose of Speech Analysis:
➢ Who is speaking: speaker identification for security purposes
➢ What is being said: automatic transcription of speech into text
➢ How was a statement said: understanding psychological factors of a speech pattern (was
the speaker angry or calm, is he lying, etc)
The primary goal of speech analysis in multimedia systems is to correctly determine individual
words (speech recognition).
Compiled by mail2prakashbaral@gmail.com
Speech analysis is of strong interest for multimedia systems. Together with speech synthesis,
different media transformations can be implemented. The primary goal of speech analysis is to
correctly determine individual words with probability <=1. A word is recognized only with a
certain probability. Here, environmental noise, room acoustics and a speaker's physical and
psychological conditions play an important role.
The system which provides recognition and understanding of a speech signal applies this
principle several times as follows:
• In the first step, the principle is applied to a sound pattern and/or word model. An
acoustical and phonetical analysis is performed.
• In the second step, certain speech units go through syntactical analysis; thereby, the
errors of the previous step can be recognized. Very often during t step, no unambiguous
decisions can be made. In this case, syntactical analysis provides additional decision
help and the result is a recognized speech.
• The third step deals with the semantics of the previously recognized language. Here the
decision errors of the previous step can be recognized and corrected with other analysis
methods. Even today, this step is non-trivial to implement with current methods known
in artificial intelligence and neural nets research. The result of this step is an understood
speech.
These steps work mostly under the consideration of time and/or frequency-dependent sounds.
The same criteria and speech units (formants, phones, etc.) are considered as in speech
generation/output.
Compiled by mail2prakashbaral@gmail.com
There are still many problems into which speech recognition research is being conducted:
• A specific problem is presented by room acoustics with existent environmental noise.
The frequency-dependent reflections of a sound wave from walls and objects can
overlap with the primary sound wave.
• Further, word boundaries must be determined. Very often neighboring words flow into
one another.
• For the comparison of a speech element to the existing pattern, time normalization is
necessary. The same word can be spoken quickly or slowly. However, the time axis
cannot be modified because the extension factors are not proportional to the global time
interval. There are long and short voiceless sounds (e.g., s, sh). Individual sounds are
extended differently and need a minimal time duration for their recognition.
Speech recognition systems are divided into speaker-independent recognition systems and
speaker-dependent recognition systems. A speaker-independent system can recognize with the
same reliability essentially fewer words than a speaker-dependent system because the latter is
trained in advance. Training in advance means that there exists a training phase for the speech
recognition system, which takes a half an hour. Speaker-dependent systems can recognize
around 25,000 words; speaker independent systems recognize a maximum of about 500 words,
but with a worse recognition rate. These values should be understood as gross guidelines. In a
concrete situation, the marginal conditions must be known. (e.g., Was the measurement taken
in a sound deadening room?, Does the speaker have to adapt to the system to simplify the time
normalization?, etc.)
5 SPEECH TRANSMISSION
The area of speech transmission deals with efficient coding of the speech signal to allow
speech/sound transmission at low transmission rates over networks. The goal is to provide the
receiver with the same speech/sound quality as was generated at the sender side. This section
includes some principles that are connected to speech generation and recognition.
Compiled by mail2prakashbaral@gmail.com
quality. Adaptive Pulse Code Modulation (ADPCM) allows a further rate reduction to
32 Kbits/s.
• Source Coding:
Parameterized systems work with source coding algorithms. Here, the specific speech
characteristics are used for data rate reduction. Channel vo-coder is an example of such
a parameterized system. The channel vo-coder is an extension of a sub-channel coding.
The signal is divided into a set of frequency channels during speech analysis because
only certain frequency maxima are relevant to speech. Additionally, the differences
between voiced and unvoiced sounds are taken into account. Voiceless sounds are
simulated by the noise generator. For generation of voiced sounds, the simulation
comes from a sequence of pulses. The rate of the pulses is equivalent to the a priori
measured basic speech frequency. The data rate of about 3 Kbits/s can be generated
with a channel vo-coder; however the quality is not always satisfactory.
Major effort and work on further data rate reduction from 64 Kbits/s to 6 Kbits/s is
being conducted, where the compressed signal quality should correspond, after a
decompression, to the quality of an uncompressed 64 Kbits/s signal.
• Recognition/Synthesis Methods:
There have been attempts to reduce the transmission rate using pure
recognition/synthesis methods. Speech analysis (recognition) follows on the sender side
of a speech transmission system and speech synthesis (generation) follows on the
receiver side.
Only the characteristics of the speech elements are transmitted. For example, the speech
elements with their characteristics are the formants with their middle frequency
Compiled by mail2prakashbaral@gmail.com
bandwidths. The frequency bandwidths are used in the corresponding digital filter. This
reduction brings the data rate down to 50 bits/s. The quality of the reproduced speech
and its recognition rate are not acceptable by today's standards.
• Achieved Quality:
The essential question regarding speech and audio transmission with respect to multimedia
systems is how to achieve the minimal data rate for a given quality. The published function
from Flanagan shows the dependence of the achieved quality of compressed speech on the data
rate. One can assume that for telephone quality, a data rate of 8 Kbits/s is sufficient. Figure
below shows the dependence of audio quality on the number of bits per sample value. For
example, excellent CD-quality can be achieved with a reduction from 16 bits per sample value
to 2 bits per sample value. This means that only 1/8 of the actual data needs to be transmitted.
Compiled by mail2prakashbaral@gmail.com
UNIT 3: IMAGES AND GRAPHICS
1.1 INTRODUCTION
Image is the spatial representation of an object. It may be 2D or 3D scene or another image.
Images may be real or virtual. It can be abstractly thought of as continuous function defining
usually a rectangular region of plane. Example:
• Recorded image- photographic, or in digital format
• Computer vision- video image, digital image or picture
• Computer graphics- digital image
• Multimedia- deals about all above formats
Compiled by mail2prakashbaral@gmail.com
• Three numbers representing the intensities of the red, green and blue components of
the color at that pixel.
• Three numbers that are indices to tables of the red, green and blue intensities.
• A single number that is an index to a table of color triples.
• An index to any number of other data structures that can represent a color.
• Four or five spectral samples for each other.
In addition, each pixel may have other information associated with it; e.g., three numbers
indicating the normal to the surface drawn at that pixel.
Information associated with the image as a whole, e.g., width, height, depth, name of the
creator, etc. may also have to be stored.
The image may be compressed before storage for saving storage space. Some current image
file formats for storing images include GIF, X11 Bitmap, Sun Rasterfile, PostScript, IRIS,
JPEG, TIFF, etc.
Bitmap (BMP):
BMP is a standard format used by Windows to store device-independent and application
independent images. The number of bits per pixel (1, 4, 8, 15, 24, 32, or 64) for a given BMP
file is specified in a file header. BMP files with 24 bits per pixel are common.
Graphics Interchange Format (GIF)
GIF is a common format for images that appear on webpages. GIFs work well for line
drawings, pictures with blocks of solid color, and pictures with sharp boundaries between
colors. GIFs are compressed, but no information is lost in the compression process; a
decompressed image is exactly the same as the original. One color in a GIF can be designated
as transparent, so that the image will have the background color of any Web page that
Compiled by mail2prakashbaral@gmail.com
displays it. A sequence of GIF images can be stored in a single file to form an animated GIF.
GIFs store at most 8 bits per pixel, so they are limited to 256 colors.
Joint Photographic Experts Group (JPEG)
JPEG is a compression scheme that works well for natural scenes, such as scanned
photographs. Some information is lost in the compression process, but often the loss is
imperceptible to the human eye. Colour JPEG images store 24 bits per pixel, so they are
capable of displaying more than 16 million colors. There is also a grey scale JPEG format
that stores 8 bits per pixel. JPEGs do not support transparency or animation. The level of
compression in JPEG images is configurable, but higher compression levels (smaller files)
result in more loss of information. A 20:1 compression ratio often produces an image that the
human eye finds difficult to distinguish from the original. The Figure below shows a BMP
image and two JPEG images that were compressed from that BMP image. The first JPEG has
a compression ratio of 4:1 and the second JPEG has a compression ratio of about 8:1.
JPEG compression does not work well for line drawings, blocks of solid color, and sharp
boundaries. JPEG is a compression scheme, not a file format. JPEG File Interchange Format
(JFIF) is a file format commonly used for storing and transferring images that have been
compressed according to the JPEG scheme. JFIF files displayed by Web browsers use the
.jpg extension.
Exchangeable Image File (Exif)
Exif is a file format used for photographs captured by digital cameras. An Exif file contains
an image that is compressed according to the JPEG specification. An Exif file also contains
information about the photograph (date taken, shutter speed, exposure time, and so on) and
information Notes about the camera (manufacturer, model, and so on).
Portable Network Graphics (PNG)
The PNG format retains many of the advantages of the GIF format but also provides
capabilities beyond those of GIF. Like GIF files, PNG files are compressed with no loss of
information. PNG files can store colors with 8, 24, or 48 bits per pixel and gray scales with 1,
2, 4, 8, or 16 bits per pixel. In contrast, GIF files can use only 1, 2, 4, or 8 bits per pixel. A
PNG file can also store an alpha value for each pixel, which specifies the degree to which the
color of that pixel is blended with the background colour.
Compiled by mail2prakashbaral@gmail.com
Tag Image File Format (TIFF)
TIFF is a flexible and extendable format that is supported by a wide variety of platforms and
image-processing applications. TIFF files can store images with an arbitrary number of bits
per pixel and can employ a variety of compression algorithms. Several images can be stored
in a single, multiple-page TIFF file. Information related to the image (scanner make, host
computer, type of compression, orientation, samples per pixel, and so on) can be stored in the
file and arranged through the use of tags. The TIFF format can be extended as needed by the
approval and addition of new tags.
Graphics Format
Graphic image formats are specified through graphics primitives and their attributes.
Compiled by mail2prakashbaral@gmail.com
1.4 IMAGE SYNTHESIS, ANALYSIS AND TRANSMISSION
1.4.1 Computer Image Processing
Image processing is a method to perform some operations on an image, in order to get an
enhanced image or to extract some useful information from it. Computer image processing
comprises of image synthesis (generation) and image analysis (recognition). It is a type of
signal processing in which input is an image and output may be image or
characteristics/features associated with that Image, processing basically includes the
following three steps:
• Importing the image via image acquisition tools;
• Analyzing and manipulating the image;
• Output in which result can be altered image or report that is based on image analysis.
There are two types of methods used for image processing namely, analogue and digital
image processing. Analogue image processing can be used for the hard copies like printouts
and photographs. Digital image processing techniques help in manipulation of the digital
images by using computers. The three general phases that all types of data have to undergo
while using digital technique are pre-processing, enhancement, and display, information
extraction. image.
Compiled by mail2prakashbaral@gmail.com
• Application model:
The application model represents the data or objects to be pictured on the screen; it is
stored in an application database. The model typically stores descriptions of
primitives that define the shape of components of the object, object attributes and
connectivity relationships that describe how the components fit together. The model is
application-specific and is created independently of any particular display system.
• Application program:
Therefore, the application program must convert a description of the portion of the
model to whatever procedure calls or commands the graphics system uses to create an
image. This conversion process has two phases. The application program traverses the
application database that stores the model to extract the portions to be viewed, using
some selection or query system.
• Graphics system:
Second, the extracted geometry is put in a format that can be sent to the graphics
system. The application program handles user input. It produces views by sending to
the third component, the graphics system, a series of graphics output commands that
contain both a detailed geometric description of what is to be viewed and the
attributes describing how the objects should appear. The graphics system is
responsible for actually producing the picture from the detailed descriptions and for
passing the user's input to the application program for processing.
Compiled by mail2prakashbaral@gmail.com
The data glove records hand position and orientation as well as finger movements. It is a
glove covered with small, lightweight sensors. Each sensor is a short length of fiber-optic
cable, with a Light-Emitting Diode (LED) at one end and a photo- transistor at the other.
Wearing the data glove, a user can grasp objects, move and rotate them and then release
them.
Graphics Hardware –Output: Current output technology uses raster displays, which store
display primitives in a refresh buffer in terms of their component pixels. The architecture of a
raster display is shown in figure below. In some raster displays, there is a hardware display
controller that receives and interprets sequences of output commands. In simpler, more
common systems, such as those in personal computers, the display controller exists only as a
software component of the graphics library package, and the refresh buffer is no more than a
piece of the CPU's memory that can be read by the image display subsystem that produces
the actual image on the screen.
The complete image on a raster display is formed from the raster, which is a set of horizontal
raster lines, each a row of individual pixels; the raster is thus stored as a matrix of pixels
representing the entire screen area. The entire image is scanned out sequentially by the video
controller. The raster scan is shown in figure below.
Compiled by mail2prakashbaral@gmail.com
Figure 2: Raster Scan
At each pixel, the beam's intensity is set to reflect the pixel's intensity; in color systems, three
beams are controlled - one for each primary color (red, green, blue) as specified by the three
color components of each pixel's value.
Raster graphics systems have other characteristics. To avoid flickering of the image, a 60 Hz
or higher refresh rate is used today; an entire image of 1024 lines of 1024 pixels each must be
stored explicitly and a bitmap or pixmap is generated. Raster graphics can display areas filled
with solid colors or patterns, i.e., realistic images of 3D objects.
1.4.5 Dithering
Dithering is the process by which we create illusions of the color that are not present actually.
It is done by the random arrangement of pixels. For example. Consider this image.
This is an image with only black and white pixels in it. Its pixels are arranged in an order to
form another image that is shown below. Note at the arrangement of pixels has been changed,
but not the quantity of pixels.
The growth of raster graphics has made color and grayscale an integral part of contemporary
computer graphics. The color of an object depends not only on the object itself, but also on
the light source illuminating it, on the color of the surrounding area and on the human visual
system.
What we see on a black-and-white television set or display monitor is achromatic light.
Achromatic light is determined by the attribute quality of light. Quality of light is determined
by the intensity and luminance parameters. For example, if we have hardcopy devices or
Compiled by mail2prakashbaral@gmail.com
displays which are only bi-levelled, which means they produce just two intensity levels, then
we would like to expand the range of available intensity.
The solution lies in our eye's capability for spatial integration. If we view a very small area
from a sufficiently large viewing distance, our eyes average fine detail within the small area
and record only the overall intensity of the area. This phenomenon is exploited in the
technique called half toning, or clustered-dot ordered dithering (half toning
approximation). Each small resolution unit is imprinted with a circle of black ink whose area
is proportional to the blackness 1- I (I=intensity) of the area in the original photograph.
Figure 3:five intensity levels approximated with two 2*2 dither patterns
Graphics output devices can approximate the variable- area circles of halftone reproduction.
For example, a 2 x 2 pixel area of a bi-level display can be used to produce five different
intensity levels at the cost of halving the spatial resolution along each axis. The patterns,
shown in above Figure, can be filled by 2 x 2 areas, with the number of 'on' pixels
proportional to the desired intensity. The patterns can be represented by the dither matrix.
This technique is used on devices which are not able to display individual dots (e.g., laser
printers). This means that these devices are poor at reproducing isolated 'on' pixels (the black
dots in figure above). All pixels that are 'on' for a particular intensity must be adjacent to
other 'on' pixels.
Compiled by mail2prakashbaral@gmail.com
Scene Analysis and computer vision deal with recognizing and reconstructing 3D models of a
scene from several 2D images. An example is an industrial robot sensing the relative sizes,
shapes, positions and colors of objects.
Compiled by mail2prakashbaral@gmail.com
1. Image Formatting
Image formatting means capturing an image from a camera and bringing it into a digital form.
It means that we will have a digital representation of an image in the form of pixels.
Compiled by mail2prakashbaral@gmail.com
Figure 6: Edge detection of the image
Edge detection recognizes many edges, but not all of them are significant. Therefore, another
labelling operation must occur after edge detection, namely thresholding. Thresholding
specifies which edges should be accepted and which should not; the thresholding operation
filters only the significant edges from the image and labels them. Other edges are removed.
Other kinds of labelling operations include corner finding and identification of pixels that
participate in various shape primitives.
4. Grouping:
The labelling operation labels the kinds of primitive spatial events in which the pixel
participates. The grouping operation identifies the events by collecting together or identifying
maximal connected sets of pixels participating in the same kind of event. When the reader
recalls the intensity edge detection viewed as a step change in intensity, the edges are labelled
as step edges, and the grouping operation constitutes the step edge linking.
A grouping operation, where edges are grouped into lines, is called line-fitting. Again the
grouping operation line-fitting is performed on the image shown in Figure below:
Compiled by mail2prakashbaral@gmail.com
Figure 8: Line-fitting of the image
The grouping operation involves a change of logical data structure. The observed image, the
conditioned image and the labelled image are all digital image data structures. Depending on
the implementation, the grouping operation can produce either an image data structure in
which each pixel is given an index associated with the spatial event to which it belongs or a
data structure that is a collection of sets. Each set corresponds to a spatial event and contains
the pairs of positions (row, column) that participate in the event. In either case, a change
occurs in the logical data structure.
The entities of interest prior to grouping are pixels; the entities of interest after grouping are
sets of pixels.
5. Extraction
Extracting the grouping operation determines the new set of entities, but they are left naked in
the sense that the only thing they poses is their identity. The extracting operation computes
for each group of pixels a list of properties. Example properties might include its centroid,
area, orientation, spatial moments, gray tone moments, spatial-gray tone moments,
circumscribing circle, inscribing circle, and so on.
Other properties might depend on whether the group is considered a region or an arc. If the
group is a region, the number of holes might be a useful property. If the group is an arc,
average curvature might be a useful property. Extraction can also measure topological or
spatial relationships between two or more groupings. For example, an extracting operation
may make explicit that two groupings touch, or are spatially close, or that one grouping is
above another.
6. Matching:
After the completion of the extracting operation, the events occurring on the image have been
identified and measured, but the events in and of themselves have no meaning. The meaning
of the observed spatial events emerges when a perceptual organization has occurred such that
a specific set of spatial events in the observed spatial organization clearly constitutes an
imaged instance of some previously known object, such as a chair or the letter A. Once an
object or set of object parts has been recognized, measurements (such as the distance between
two parts, the angle between two lines or the area of an object part) can be made and related
to the allowed tolerance, as may be the case in an inspection scenario. It is the matching
operation that determines the interpretation of some related set of image events, associating
these events with some given three-dimensional object or two-dimensional shape. There are a
Compiled by mail2prakashbaral@gmail.com
wide variety of matching operations. The classic example is tem- plate matching, which
compares the examined pattern with stored models (templates) of known patterns and
chooses the best match.
Image transmission takes into account transmission of digital images through computer
networks. There are several requirements on the networks when images are transmitted: (1)
The network must accommodate bursty data transport because image transmission is bursty
(The burst is caused by the large size of the image.); (2) Image transmission requires reliable
transport; (3) Time-dependence is not a dominant characteristic of the image in contrast to
audio/video transmission.
Image size depends on the image representation format used for transmission. There are
several possibilities:
Compiled by mail2prakashbaral@gmail.com
UNIT 4: VIDEO AND ANIMATION
Video is a combination of image and audio. It consists of a set of still images called frames
displayed to the user one after another at a specific speed, known as the frame rate measured in
number of frames per second (fps), If displayed fast enough our eye cannot distinguish the
individual frames, but because of persistence of vision merges the individual frames with each
other thereby creating an illusion of motion. The frame rate should range between 20 and 30 for
perceiving smooth realistic motion. Audio is added and synchronized with the apparent movement
of images.
Compiled by mail2prakashbaral@gmail.com
The picture width chosen for conventional television service is 4/3*picture height. Using
the aspect ratio, we can determine the horizontal field of view from the horizontal angle.
4. Perception of Depth:
In natural vision, perception of the third spatial dimension, depth, depends primarily on the
angular separation of the images received by the two eyes of the viewer. In the fat image
of television, a considerable degree of depth perception is inferred from the perspective
appearance of the subject matter. Further, the choice of the focal length of lenses and
changes in depth of focus in a camera influence the depth perception.
7. Continuity of motion:
It is known that we perceive a continuous motion to happen at any frame rate faster than
15 frames per second. Video motion seems smooth and is achieved at only 30 frames per
second, when filmed by a camera and not synthetically generated. Movies, however, at 24
frames/s. The new Show scan technology involves making and showing movies at 60
frames per second and on 70-millimeter films. This scheme produces a bigger picture,
which therefore occupies a larger portion of the visual field, and produces much smoother
motion.
Compiled by mail2prakashbaral@gmail.com
There are several standards for motion video signals which determine the frame rate to
achieve proper continuity of motion. The USA standard for motion video signals, NTSC
(National Television Systems Committee) standard, specified the frame rate initially to
30 frames/s, but later changed it to 29.97 Hz to maintain the visual-aural carrier separation
at precisely 4.5 MHz. NTSC scanning equipment presents images at the 24 Hz standard,
but transposes them to the 29.97 Hz scanning rate. The European standard for motion
video, PAL (Phase Alternating Line), adopted the repetition rate of 25 Hz, and the frame
rate therefore is 25 frames/s.
8. Flicker:
Through a slow motion, a periodic fluctuation of brightness perception, a flicker effect,
arises. The marginal value to avoid flicker is at least 50 refresh cycles/s. To achieve
continuous flicker-free motion, we need a relatively high refresh frequency. Movies, as
well as television, apply some technical measures to work with lower motion frequencies.
1.1.2 Transmission
Video signals are transmitted to receivers through a single television channel. The NTSC channel
is shown in Figure below:
Compiled by mail2prakashbaral@gmail.com
To encode color, a video signal is a composite of three signals. For transmission purposes, a video
signal consists of one luminance and two chrominance signals. Luminance as different shades
of light in grays while chroma are different hues (shade) of color.
In NTSC systems, the composite transmission of luminance and chrominance signals in a single
channel is achieved by specifying the chrominance subcarrier to be an odd multiple of one-half of
the line-scanning frequency. This causes the component frequencies of chrominance to be
interleaved with those of luminance. The goal is to separate the two sets of components in the
receiver and avoid interference between them prior to the recovery of the primary color signals for
display.
Several approaches of color encoding are: RGB signal, composite signal
1.1.3 Digitalization
Before a picture or motion, video can be processed by a computer or transmitted over a computer
network, it needs to be converted from analog to digital representation. In an ordinary sense,
digitalization consists of sampling the gray (color) level in the picture at M x N array of points.
Since the gray level at these points may take any value in a continuous range, for digital processing,
the gray level must be quantized. By this we mean that we divide the range of gray levels into K
intervals, and require the gray level at any point to take on only one of these values. For a picture
reconstructed from quantized samples to be acceptable, it may be necessary to use 100 or more
quantizing levels.
When samples are obtained by using an array of points or finite strings, a fine degree of
quantization is very important for samples taken in regions of a picture where the gray (color)
levels change slowly. The result of sampling and quantizing is a digital image (picture), at which
point we have obtained a rectangular array of integer values representing pixels.
The next step in the creation of digital motion video is to digitize pictures in time and get a
sequence of digital images per second that approximates analog motion video.
Compiled by mail2prakashbaral@gmail.com
Figure 2: Display for video controller
The video controller displays the image stored in the frame buffer, accessing the memory through
a separate access port as often as the raster scan rate dictates. The constant refresh of the display
is its most important task. Because of the disturbing flicker effect, the video controller cycles
through the frame buffer, one scan line at a time, typically 60 times/second. For presentation of
different colors on the screen, the system works with a Color Look Up Table (CLUT or lut). At a
certain time, a limited number of colors (n) is prepared for the whole picture. The set of n colors,
used mostly, is chosen from a color space, consisting of m colors, where generally n << m.
Some of the computer video controller standards are:
• CGA (Color Graphics adapter): The first color monitor and graphics cards for PC
computers. Capable of producing 16 colors at 160x200 pixels.
• EGA (Enhanced Graphics Adapter): an adapter that could display 16 colors with a screen
resolution of 640x350 pixels.
• VGA (Video Graphics Adapter): Currently the base standard for PC video cards and
monitors. True VGA supports 16 colors at 640x480 pixels or 256 colors at 320x200 pixels.
• SVGA (Super VGA): A SVGA card or monitor is capable of displaying more pixels (dots
on the screen) and/or colors than basic VGA. For example, an SVGA graphics card may
be able to display 16-bit color with a resolution of 800x600 pixels.
• XGA (Extended Graphics Array): A standard used on some IBM PS/2 models. XGA
supports 256 colors at 1024x728 pixels, or 16-bit colors at 640x480 pixels.
Compiled by mail2prakashbaral@gmail.com
structure and texture of an object (update dynamics), and changes in lighting, camera position,
orientation and focus.
A computer-based animation is an animation performed by a computer using graphical tools to
provide visual effects. Processes of computer based animation are as follows:
Compiled by mail2prakashbaral@gmail.com
bandwidth of over 9 Mbytes per second. On the other hand, new values for the lut can be sent very
rapidly, since luts are typically on the order of a few hundred to a few thousand bytes.
Controlling animation is independent of the language used for describing it. Animation control
mechanisms can employ different techniques.
Compiled by mail2prakashbaral@gmail.com
1.5.1 Full Explicit Control
Explicit control is the simplest type of animation control. Here, the animator provides a description
of everything that occurs in the animation, either by specifying simple changes, such as scaling,
translation, and rotation, or by providing key frame information and interpolation methods to use
between key frames. This interpolation may be given explicitly or (in an interactive system) by
direct manipulation with a mouse, joystick, data glove or other input device. An example of this
type of control is the BBOP system.
Compiled by mail2prakashbaral@gmail.com
with an acceleration proportional to the forces acting on it, and the proportionality constant is the
mass of the particle. Thus, a dynamic description of a scene might be, "At time t =0 seconds, the
cube is at position (0 meters, 100 meters, 0 meters). The cube has a mass of 100 grams. The force
of gravity acts on the cube." Naturally, the result of a dynamic simulation of such a model is that
the cube falls.
Repeat
If rotating and scan-converting the object takes longer than 100 milliseconds, the animation is
quite slow, but the transition from one image to the next appears to be instantaneous. Loading the
look-up table typically takes less than one millisecond.
Compiled by mail2prakashbaral@gmail.com
1.7 TRANSMISSION OF ANIMATION
As described above, animated objects may be represented symbolically using graphical objects or
scan-converted pixmap images. Hence, the transmission of animation over computer networks
may be performed using one of two approaches:
• The symbolic representation (e.g. circle) of animation objects (e.g. ball) is transmitted
together with the operation commands (e.g. roll the ball) performed on the object, and at
the receiver side the animation is displayed. In this case, the transmission time is short
because the symbolic representation of an animated object is smaller in byte size than its
pixmap representation, but the display time at the receiver takes longer because the scan
converting operation has to be performed at the receiver side. In this approach, the
transmission rate (bits/second or bytes/second) of animated objects depends
1. On the size of the symbolic representation structure, where the animated object is
encoded,
2. On the size of the structure, where the operation command is encoded,
3. On the number of animated objects and operation commands sent per second.
• The pixmap representation of the animated objects is transmitted and displayed on the
receiver side. In this case, the transmission time is longer in comparison to the previous
approach because of the size of the pixmap representation, but the display time is shorter
because the scan-conversion of the animated objects is avoided at the receiver side. It is
performed at the sender side where animation objects and operation commands are
generated. In this approach, the transmission rate of the animation is equal to the size of
the pixmap representation of an animated object (graphical image) multiplied by the
number of graphical images per second.
THANK YOU!!!
Compiled by mail2prakashbaral@gmail.com
Unit 5: Data Compression
Why compression?
➢ To reduce the volume of data to be transmitted (text, fax, images)
➢ To reduce the bandwidth required for transmission and to reduce storage requirements
(speech, audio, video)
Data compression implies sending or storing a smaller number of bits. Although many methods
are used for this purpose, in general these methods can be divided into two broad categories:
lossless and lossy methods.
Compiled by mail2prakashbaral@gmail.com
In digital video, in addition to spatial redundancy, neighboring images in a video
sequence may be similar (temporal redundancy)
Human perception: Compressed version of digital audio, image, video need not
represent the original information exactly.
Perception sensitivities are different for different signal patterns.
Human eye is less sensitive to the higher spatial frequency components than the lower
frequencies (transform coding).
2. In lossless, the file is restored in its In Lossy, the file does not restore in its original
original form. form.
3. Lossless data compression algorithms are Lossy data compression algorithms are: Transform
Run Length Encoding, Huffman encoding, coding, Discrete Cosine Transform, Discrete
Shannon fano encoding, Arithmetic Wavelet Transform, fractal compression, etc.
encoding, Lempel Ziv Welch encoding.
5. In lossy data compression, lossless data As compare to lossless data compression, lossy
compression holds more data. data compression holds less data.
6. File quality is high in this compression. File quality is low in the lossy data compression.
7. Lossless data compression mainly Lossy data compression mainly supports JPEG,
supports RAW, BMP, PNG, WAV. GIF, MP3, MP4, MKV.
Compiled by mail2prakashbaral@gmail.com
1.4 ENTROPY, SOURCE AND HYBRID CODING
Compiled by mail2prakashbaral@gmail.com
lot of repetition in the data then it is possible the run length encoding scheme would actually
increase the size of a file.
Compiled by mail2prakashbaral@gmail.com
Processing: This involves the conversion of the information from the time domain to the
frequency domain by using DCT (discrete cosine transform). In the case of motion video
compression, interframe coding uses a motion vector for each 8*8 block.
Quantization: It defines discrete level or values that the information is allowed to take. This
process involves the reduction of precision. The quantization process may be uniform or it may
be differential depending upon the characteristics of the picture.
Entropy Encoding: This is the lossless compression method where the semantics of data is
ignored but only its characteristics are considered. It may be run length coding or entropy
coding. After compression, the compressed video stream contains the specification of the image
starting point and an identification of the compression technique may be the part of the data
stream. The error correction code may also be added to the stream. Decompression is the inverse
process of compression.
Compiled by mail2prakashbaral@gmail.com
1.5.1 Huffman Tree
Step 1: For each character of the node, create a leaf node. The leaf node of a character contains
the frequency of that character.
Step 2: Set all the nodes in sorted order according to their frequency.
Step 3: There may exist a condition in which two nodes may have the same frequency. In such a
case, do the following:
Step 4: Repeat step 2 and 3 until all the node forms a single tree. Thus, we get a Huffman tree
For example:
Character Frequency
A 5
B 9
C 12
D 13
E 16
F 45
1. Step 1: Make pairs of characters and their frequencies. i.e. (a,5), (b,9), (c,12), (d,13),
(e,16), (f,45)
2. Step 2: Sort pairs with respect to frequency, in this case given data are already sorted.
3. Step 3: Pick the first two characters and join them under a parent node. Add a new
internal node with frequency 5 + 9 = 14.
Character Frequency
C 12
D 13
New node 14
E 16
F 45
Compiled by mail2prakashbaral@gmail.com
4. Step 4: Repeat Steps 2 and 3 until, we get a single tree.
The Huffman tree will be
Character Frequency
A 5 12 14 25 45
B 9 13 16 100
C 12 14 25 30 55
D 13
E 16 16 45 45
F 45 45
1.5.2 Coding
Therefore, we get a single tree. Then, we will find the code for each character with the help of
the above tree. Assign a weight to each edge. Note that each left edge-weighted is 0 and
the right edge-weighted is 1.
Compiled by mail2prakashbaral@gmail.com
Character Frequency code-word
A 5 1100
B 9 1101
C 12 100
D 13 101
E 16 111
F 45 0
Here, no of character = 6
No of bits required to represent character = 2^3 i.e. code length will be 3.
Bits used before coding: 5*3 +9*3 + 12*3 + 13*3 +16*3 + 45*3 = 300
Bits used after coding: 5*4 + 9*4 + 12*3 + 13*3 + 16*3 + 45*0 = 179
Therefore, (300-179)/300 = 40.3% of space is saved after Huffman coding.
Q. Find the Average Code Length for the String and Length of the Encoded String of the given
string “abcdrcbd” Use the Huffman coding technique to find the representation bits of the
character.
Character Frequency
A 5
B 2
C 1
D 1
R 2
Hints:
Average Code Length = ∑ ( frequency × code length ) / ∑ ( frequency )
length= Total number of characters in the text x Average code length per character
2 JPEG
The JPEG standard for compressing continuous –tone still pictures (e.g. photographs) was
developed by joint photographic experts group, JPEG became an ISO standard. JPEG is one
algorithm, to satisfy the requirements of a broad range of still-image compression applications.
Compiled by mail2prakashbaral@gmail.com
Why we need image compression?
Let us take an example, A typical digital image has 512x480 pixels. In 24-bit colour (one byte
for each of the red, green and blue components), the image requires 737,280 bytes of storage
space. It would take about 1.5 minutes to transmit the uncompressed image over a 64kb/second
link. The JPEG algorithms offer compression rates of most images at ratios of about 24:1.
Effectively, every 24 bits of data is stuffed into 1 bit, giving a compressed file size (for the above
image dimensions) of 30,720 bytes, and a corresponding transmission time of 3.8 seconds.
There are some requirements of JPEG standard and they are:
• The JPEG implementation should be independent of image size.
• The JPEG implementation should be applicable to any image and pixel aspect ratio.
• Color representation itself should be independent of the special implementation.
• Image content may be of any complexity, with any statistical characteristics.
• The JPEG standard specification should be state of art (or near) regarding the
compression factor and achieved image quality.
• Processing complexity must permit a software solution to run on as many available
standard processors as possible. Additionally, the use of specialization hardware should
substantially enhance image quality.
Compiled by mail2prakashbaral@gmail.com
I= 0.60R – 0.28G – 0.32B
Q= 0.21R – 0.52G + 0.31B
Separate matrices are constructed for Y, I and Q each elements in the range of 0 and 255. The
square blocks of four pixels are averaged in the I and Q matrices to reduce them to 320*240.
Thus the data is compressed by a factor of two. Now, 128 is subtracted from each element of all
three matrices to put 0 in the middle of the range. Each image is divided up into 8*8 blocks. The
Y matrix has 4800 blocks; the other two have 1200 blocks.
Compiled by mail2prakashbaral@gmail.com
Step 4: (Differential Quantization) This step reduces the (0,0) value of each block by replacing
it with the amount it differs from the corresponding element in the previous block. Since these
elements are the averages of their respective blocks, they should change slowly, so taking the
differential values should reduce most of them to small values. The (0,0) values are referred to as
the DC components; the other values are the AC components.
Step 5: (Run length Encoding) This step linearizes the 64 elements and applies run-length
encoding to the list. In order to concentrate zeros together, a zigzag scanning pattern is used.
Finally run length coding is used to compress the elements.
Step 6: (Statistical Encoding) Huffman encodes the numbers for storage or transmission,
assigning common numbers shorter codes than uncommon ones. JPEG produces a 20:1 or even
better compression ratio. Decoding a JPEG image requires running the algorithm backward and
thus it is roughly symmetric: decoding takes as long as encoding.
Although JPEG is one algorithm, to satisfy the requirements of a broad range of still-image
compression applications, it has 4 modes of operation.
Compiled by mail2prakashbaral@gmail.com
2.2.2 Quantization
The JPEG application provides a table of 64 entries. Each entry will be used for the quantization
of one of the 64 DCT-coefficients. Each of the 64 coefficient can be adjusted separately.
Each table entry is an 8-bit integer value (coefficient Qvu). The quantization process become less
accurate as the size of the table entries increase. Quantization and de-quantization must use the
same tables.
AC coefficient are DCT coefficients for which the frequency in one or both dimensions is
non zero.
DC coefficient are DCT coefficients for which the frequency in one or both dimensions is
zero
Compiled by mail2prakashbaral@gmail.com
JPEG specifies Huffman and arithmetic encoding as entropy encoding methods. However, as
this is lossy sequential DCT-based mode, only Huffman encoding is allowed. In lossy
sequential mode the framework of the whole picture is not formed but parts of it are drawn
i.e. sequentially done.
Figure 5: Progressive picture presentation used in the expanded lossy DCT-based mode
For the expanded lossy DCT-based mode, JPEG specifies progressive encoding in addition to
sequential encoding. At first, a very rough representation of the image appears which is
progressively refined until the whole image is formed. This progressive coding is achieved by
layered coding. Progressiveness is achieved in two different ways:
▪ By using a spectral selection in the first run only, the quantized DCT coefficients of low
frequencies of each data unit are passed in the entropy encoding. In successive runs, the
coefficients of higher frequencies are processed.
Compiled by mail2prakashbaral@gmail.com
▪ Successive approximation transfers all of the quantized coefficients in each run, but
single bits are differentiated according to their significance. The most-significant bits are
encoded first, then the less-significant bits.
2.4 LOSSLESS
The decoder renders an exact reproduction of the original digital image.
This mode is used when it is necessary to decode a compressed image identical to the original.
Compression ratios are typically only 2:1. Rather than grouping the pixels into 8x8 blocks, data
units are equivalent to single pixels. Image processing and quantization use a predictive
technique, rather than a transformation encoding one. For a pixel X in the image, one of 8
possible predictors is selected (see table below). The prediction selected will be the one which
gives the best result from the a priori known values of the pixel's neighbours, A, B, and C. The
number of the predictor as well as the difference of the prediction to the actual value is passed to
the subsequent entropy encoding.
0 No prediction
1 X=A
2 X=B
3 X=C
4 X=A+B-C
5 X=A+(B-C)/2
6 X=B+(A-C)/2
7 X=(A+B)/2
2.5 HIERARCHICAL
The input image is coded as a sequence of increasingly higher resolution frames. The client
application will stop decoding the image when the appropriate resolution image has been
reproduced.
This mode uses either the lossy DCT-based algorithms or the lossless compression technique.
The main feature of this mode is the encoding of the image at different resolutions. The prepared
image is initially sampled at a lower resolution (reduced by the factor 2n). Subsequently, the
resolution is reduced by a factor 2n-1 vertically and horizontally. This compressed image is then
Compiled by mail2prakashbaral@gmail.com
subtracted from the previous result. The process is repeated until the full resolution of the image
is compressed.
Hierarchical encoding requires considerably more storage capacity, but the compressed image is
immediately available at the desired resolution. Therefore, applications working at lower
resolutions do not have to decode the whole image and then subsequently reduce the resolution.
Compiled by mail2prakashbaral@gmail.com
(referring to human visual perception). The final step is an entropy coding using the Run Length
Encoding and the Huffman coding algorithm.
Compiled by mail2prakashbaral@gmail.com
backward frames. P-frames and B-frames are called inter coded frames, whereas I-frames are
known as intra coded frames.
The references between the different types of frames are realized by a process called motion
estimation or motion compensation. The correlation between two frames in terms of motion is
represented by a motion vector. The resulting frame correlation, and therefore the pixel
arithmetic difference, strongly depends on how good the motion estimation algorithm is
implemented. The steps involved in motion estimation:
• Frame Segmentation - The Actual frame is divided into nonoverlapping blocks (macro
blocks) usually 8x8 or 16x16 pixels. The smaller the block sizes are chosen, the more
vectors need to be calculated.
• Search Threshold - In order to minimize the number of expensive motion estimation
calculations, they are only calculated if the difference between two blocks at the same
position is higher than a threshold, otherwise the whole block is transmitted.
• Block Matching - In general block matching tries to “stitch together” an actual predicted
frame by using snippets (blocks) from previous frames.
• Prediction Error Coding - Video motions are often more complex, and a simple
“shifting in 2D” is not a perfectly suitable description of the motion in the actual scene,
causing so called prediction errors.
• Vector Coding - After determining the motion vectors and evaluating the correction,
these can be compressed. Large parts of MPEG videos consist of B- and P-frames as seen
before, and most of them have mainly stored motion vectors.
Step 3: Discrete Cosine Transform (DCT)
DCT allows, similar to the Fast Fourier Transform (FFT), a representation of image data in terms
of frequency components. So the frame-blocks (8x8 or 16x16 pixels) can be represented as
frequency components. The transformation into the frequency domain is described by the
following formula:
Step 4: Quantization
Compiled by mail2prakashbaral@gmail.com
During quantization, which is the primary source of data loss, the DCT terms are divided by a
quantization matrix, which takes into account human visual perception. The human eyes are
more reactive to low frequencies than to high ones. Higher frequencies end up with a zero entry
after quantization and the domain was reduced significantly.
Compiled by mail2prakashbaral@gmail.com
UNIT 6: USER INTERFACES
The user interface is the point at which human users interact with a computer, website or
application. The goal of effective UI is to make the user's experience easy and intuitive, requiring
minimum effort on the user's part to receive the maximum desired outcome.
They include both input devices like a keyboard, mouse, trackpad, microphone, touch screen,
fingerprint scanner, e-pen and camera, and output devices like monitors, speakers and printers.
Devices that interact with multiple senses are called "multimedia user interfaces." For example,
everyday UI uses a combination of tactile input (keyboard and mouse) and a visual and auditory
output (monitor and speakers).
GUI is the acronym for graphical user interface—the interface that allows users to interact with
electronic devices, such as computers, laptops, smartphones and tablets, through graphical
elements. It’s a valuable part of software application programming in regards to human-computer
interaction, replacing text-based commands with user-friendly actions. Its goal is to present the
user with decision points that are easy to find, understand and use. In other words, GUI lets you
control your device with a mouse, pen or even your finger.
GUI was created because text command-line interfaces were complicated and difficult to learn.
The GUI process lets you click or point to a small picture, known as an icon or widget, and open
a command or function on your devices, such as tabs, buttons, scroll bars, menus, icons, pointers
and windows. It is now the standard for user-centered design in software application programming.
Compiled by mail2prakashbaral@gmail.com
1.1.2 Information Characteristics for Presentation
The complete set of information characteristics makes knowledge definition and representation
easier because it allows for appropriate mapping between information and presentation techniques.
The information characteristics specify:
• Types –characterization schemes are based on ordering information. There are two types
of ordered data:
1. Coordinates vs. amount which specify points in time, space or other domains.
2. Intervals vs. ratio, which suggests the type of comparisons meaningful among elements
of coordinate and amount data types.
• Relational Structures – This group of characteristics refers to the way in which a relation
maps among its domain sets (dependency). There are functional dependencies and non-
functional dependencies. An example of a relational structure which expresses functional
dependency is a bar chart. An example of a relational structure which expresses non-
functional dependency is a student entry in a relational database.
• Multi-domain Relations –Relations can be considered across multiple domains, such as:
1. Multiple attributes of a single object set (e.g.positions, colours, shapes, and/or sizes
of a set of objects in a chart);
2. Multiple object sets (e.g., a cluster of text and graphical symbols on a map);
3. Multiple displays.
• Large Data Sets – Large datasets refers to numerous attributes of collections of
heterogeneous objects (e.g. presentation of semantic networks, databases with numerous
object types and attributes of technical documents for large systems etc.)
Compiled by mail2prakashbaral@gmail.com
• presentation objects that represent facts (e.g., coordination of the spatial and temporal
arrangement of points in a chart)
• multiple displays (e.g, windows)
Compiled by mail2prakashbaral@gmail.com
• Entertainment software: - This type of app allows a computer to be used as an
entertainment tool.
Here are major differences between System and Application software:
Compiled by mail2prakashbaral@gmail.com
are not as simple to deliver because of the high data transfer rate necessary is not guaranteed by
most of the hardware in current graphics systems.
Compiled by mail2prakashbaral@gmail.com
Instead of using buttons in a window system, positioning in different access can also be done
through scrollbars.
Direct Manipulation of the Video Window
In our setup we decided to use a very user-friendly variant known as direct manipulation of the
video window. There are two possibilities:
1. Absolute Positioning: Imagine a tree in the upper right corner of the video window. The
user positions the cursor on this object and double-clicks with the mouse. Now, the camera
will be positioned so that the tree is the center of the video window, i.e., the camera moves
in the direction of the upper right corner. This method of object pointing and activating a
movement of camera is called absolute positioning. The camera control algorithm must
derive the position command from:
o the relative position of the pointer during the object activation in the video
window; and,
o the specified focal distance.
2. Relative Positioning: Imagine the pointer to the right of the center of the video window.
By pushing the mouse button, the camera moves to the right. The relative position of the
pointer with respect to the center of the video window determines the direction of the
camera movement. When the mouse button is released, the camera movement stops. This
kind of direct manipulation in the video window is called relative positioning. A camera
can move at different speeds. A speed can be specified through the user interface as
follows:
o If the mouse has several buttons, different speeds can be assigned to each button.
For example, the left mouse button could responsible for slow, accurate motion
(e.g, for calibration of the camera). The right buttons could be for fast movement
of the camera.
o Instead of working with several mouse buttons, the distance of the pointer to the
window center could determine the speed; the larger the distance, the faster the
movement of camera.
Compiled by mail2prakashbaral@gmail.com
example, the conference system always activates the video window with the loudest-speaking
participant. The recognition of the loudest acoustic signal can be measured over a duration of five
seconds. Therefore, short, unwanted and loud signals can be compensated for.
In the case of monophony, all audio sources have the same spatial location. A listener can only
properly understand the loudest audio signal. The same effect can be simulated by closing one ear.
Stereophony allows listeners with bilateral hearing capabilities to hear lower intensity sounds.
The concept of the audio window allows for application independent control of audio parameters,
including spatial positioning. Most current multimedia applications using audio determine the
spatial positioning themselves and do not allow the user to change it. An example of such an
application is the audio tool for SUN workstations. The figure below shows the user interface of
this audio tool:
1.5.2 Presentation
The presentation, i.e., the optical image at the user interface, can have the following variants:
• Full text
• Abbreviated text
• Icons i.e. graphics
Compiled by mail2prakashbaral@gmail.com
• Micons i.e. motion video
Compiled by mail2prakashbaral@gmail.com
multimedia application, these devices are not always meant to be operational for 24 hours.
It also cannot easily become operational when a call arrives.
4. The state of the telephone-device (i.e., telephone application) must be always visible.
While working with the telephone application, it gets into different states. In different states
different functions are performed. Some states imply that a function can be selected, some
states imply that a function cannot be selected. The nonselective functions can be:
• Nonexistent: The function disappears when no activation is possible.
• Displayed: The function is displayed, but marked as deactivated and any
interaction is ignored; for example, deactivated menu functions are displayed in
gray, the active functions are displayed in black.
• Overlapped: If a function is overlapped with another window which is designed as
the confirmer, this function cannot be selected. First, after closing the confirmer,
other input can be taken.
It is important to point out that the functions most often used are always visible in the form of a
control panel. It is necessary to pick carefully which functions will belong in the control panel.
5. When a call request arrives, it must be immediately signalled (e-g., ringing).
Design of a user interface is also influenced by a specific implementation environment. For
example, in addition to the primitives of the particular window system, the quality of the graphical
terminal with its resolution, size and colour-map is important.
THANK YOU!!!
Assignments:
1. What are the issues which must be considered at design of multimedia user interface?
2. How can user interface be made user-friendly? Explain
3. What are the primary goals which should be considered at design of multimedia user
interface?
Compiled by mail2prakashbaral@gmail.com
Unit 7: Abstractions for programming
The state of the art of programming
Most of the current commercially available multimedia applications are implemented in
procedure‐oriented programming languages. Application code is still highly dependent on
hardware Change of multimedia devices still often requires re‐ implementation. Common
operating system extensions try to attack these problems. Different programming possibilities
for accessing and representing multimedia data.
1 ABSTRACTIONS LEVELS
Abstraction levels in programming define different approaches with a varying degree of detail
for representing, accessing and manipulating data. The abstraction levels with respect to
multimedia data and their relations among each other shown in figure below.
Compiled by mail2prakashbaral@gmail.com
A library, the simplest abstraction level, includes the necessary for controlling the
corresponding hardware with specific device access operations. Libraries are very useful at
the OS level, but there is no agreement over which function are best for different drivers
As with any device, multimedia devices can be bound through a device driver, respectively
with the OS.
The processing of the continuous data become part of the system software. For the continuous
data processing requires appropriate schedulers, such as rate monotonic scheduler or earliest
deadline first scheduler. Multimedia device driver embedded in OS simply considerably the
implementation of device access and scheduling.
A simpler approach in a programming environment than the system software interface for the
control of the audio and video data processing can be taken by using toolkits. Toolkits can also
hide process structures.
Language used to implement multimedia application contains abstractions of multimedia data
known as higher procedural programming language.
The object-oriented approach was the first introduce as a method for the reduction of
complexity in the software development and it is used mainly with this today. Provides the
application with a class hierarchy for the manipulation of multimedia.
2 LIBRARIES
Libraries contain the set of functions used for processing the continuous media. Libraries are
provided together with the corresponding hardware. Some libraries can be considered as
extensions of the GUI, whereas other libraries consist of control instructions passed as control
blocks to the corresponding drivers.
Libraries are very useful at the operating system level. Since, there isn’t any sufficient support
of OS for continuous data and no integration into the programming environment exists, so there
will always be a variety of interfaces and hence, a set of different libraries.
Libraries differ in their degree of abstraction.
3 SYSTEM SOFTWARE
Instead of implementing access to multimedia devices through individual libraries, the device
access can become parts of the OS. E.g. Nemo system.
The nemo system consists of the nemo trusted supervisor call (NTSC) running in the
supervisor mode and 3 domains running in user mode: system, device drivers and application.
Compiled by mail2prakashbaral@gmail.com
The NTSC code implements those functions which are required by user mode processes. It
provides support for three types of processes. System processes implement the majority of the
services provided by the OS. Devices processes are similar to system process, but are attached
to device interrupt stubs which execute in supervisor mode.
The NTSC calls are separated into two classes, one containing calls which may only be
executed by a suitable privileged system process such as kernel, the other containing calls
which may be executed by any processes.
NTSC is responsible for providing a interface between a multimedia hardware device and its
associated driver process. This device driver implementation ensure that if a device han only a
low-level hardware interface to the system software. Application processes contains user
programs. Processes interact with each other via the system abstraction – IPC (InterProcess
Communication). IPC is implemented using low-level system abstractions events and, if
required, shared memory.
Data as time capsule:
Time capsules are the special abstraction related to the file systems. These files extensions
serve as storage, modification and access for continuous media. Each logical data unit (LDU)
carries in its time capsule, in addition to its data types and actual value, its valid life span. This
concept is used widely in video than in audio.
Data as streams:
A stream denotes the continuous flow of audio and video data. A stream is established between
source and sink before the flow. Operation on a stream can be performed such as play, fast
forward, rewind and stop.
In Microsoft windows, a media control interface (MCI) provides the interface for processing
multimedia data. It allows the access to continuous media streams and their corresponding
devices.
Compiled by mail2prakashbaral@gmail.com
4 TOOLKITS
Toolkits are used for controlling the audio and video data processing in a programming
environment. Toolkit hides the process structures. It represents interfaces at the system
software level. Toolkits are used to:
• Abstract from the actual physical layer.
• Allow a uniform interface for communication with all different devices of continuous
media
• Introduce the client-server paradigm
Toolkits can also hide process structures. It would be of great value for the development of
multimedia application software to have the same toolkit on different system platforms, but
according to current experiences, this remains to be a wish, and it would cause a decrease in
performance.
Toolkit should represent interface at the system software level. In this case, it is possible to
embed them into the programming languages or object-oriented environment. Hance, the
available abstraction in the subsequent section on programming language and object-oriented
approaches.
• Media as types
• Media as files
• Media as processes
• Programming language requirements
➢ Inter-process communication mechanism
➢ Language
Compiled by mail2prakashbaral@gmail.com
a, b REAL;
………
WHILE
COBEGIN
PROCESS_1
PROCESS_2
Input(micro2,ldu.left2)
ldu.left\_mixed:=a*ldu.left1+b*ldu.left2;
………
END WHILE
…………
One of the alternatives to programming is an HLL with libraries is the concept of media as types. In
this example, there are 2 ldus from microphones that are read and mixed. Here, the data types for
video and audio are defined. In the case of text, character is the type (the smallest addressable
element). A program can address such characters through functions and sometimes directly through
operators. They can be copied, compared with other characters, deleted, created, read from a file or
stored. Further, they can be displayed, be part of other data structures, etc.
tile_h3= open(SPEAKER,...)
….
read (file_h1)
read (file_h2)
mix(file_3, file_h1,file_h2)
….
Compiled by mail2prakashbaral@gmail.com
rc2= close (file_h2)
The example describes the merging of two audio streams. The physical file is associated during the
open process of a file with a corresponding file name. The program receives a file descriptor through
which the file is accessed. In this case, a device unit, which creates or processes continuous data
streams, can be associated with a file name.
Read and write functions are based on continuous data stream behaviour. Therefore, a new value is
assigned continuously to a specific variable which is connected, for example with one read function.
On other hand, the read and write functions of discrete data occur in separate steps. For each
assignment of a new value from a file to the corresponding variable, the read function is called again.
PROCESSS cont_Process_a;
……
On_message_do
Set_Volume………..
Set_loudness……….
………………..
…………….
[main]
pid=create(cont_process_a)
send(pid, set_volume,3)
send(pid, set_loudness)
………….
In the above example, the process cont_process_a implements a set of actions which apply to a
continuous data stream, two of them are the modification of volume set_volume and the process of
setting a volume, dependent from a band filter, set_loudness.
During the creation of the process, the identification and reservation of the used physical devices
occur. The different actions of the continuous process are controlled through an IPC mechanism.
Compiled by mail2prakashbaral@gmail.com
• Interprocess communication mechanism:
➢ The IPC mechanism must be able to transmit the audio and video in a timely fashion
because these media have a limited life span. The IPC must be able to:-
➢ Understand a prior and/or implicitly specified time requirements.
➢ Transmit the continuous data according to the requirements.
➢ Initiate the processing of the received continuous process on time.
• Language
➢ A simple language should be developed for the purpose of simplicity.
➢ An example of such language is OCCAM-2, ADA, parallel C-variant for transputer etc.
5.6 LANGUAGE
The authors see no demand for the development of a new dedicated language. A partial language
replacement is also quite difficult because co-operation between the real-time environment and the
remaining programs requires semantic changes in the programming languages. The IPC must be
designed and implemented in real-time, the current IPC can be omitted.
A language extension is the solution proposed here. For the purpose of simplicity, a simple language
should be developed which satisfies most of the above described requirements. An example of such
language is OCCAM-2.
6.1 CLASS
A class represents a collection of objects having same characteristic properties that exhibit common
behaviour. It gives the blueprint or description of the objects that can be created from it. Creation of
an object as a member of a class is called instantiation. Thus, object is an instance of a class.
Example:
Let us consider a simple class, Circle, that represents the geometrical figure circle in a two–dimensional
space. The attributes of this class can be identified as follows –
Compiled by mail2prakashbaral@gmail.com
▪ x–coord, to denote x–coordinate of the center
▪ y–coord, to denote y–coordinate of the center
▪ a, to denote the radius of the circle
6.2 OBJECT
An object is a real-world element in an object–oriented environment that may have a physical or a
conceptual existence. Each object has –
Objects can be modelled according to the needs of the application. An object may have a physical
existence, like a customer, a car, etc.; or an intangible conceptual existence, like a project, a process,
etc.
6.3 INHERITANCE
The concept allows us to inherit or acquire the properties of an existing class (parent class) into a
newly created class (child class). It is known as inheritance. It provides code reusability.
The existing classes are called the base classes/parent classes/super-classes, and the new classes are
called the derived classes/child classes/subclasses. The subclass can inherit or derive the attributes
and methods of the super-class(es) provided that the super-class allows so. Besides, the subclass may
add its own attributes and methods and may modify any of the super-class methods. Inheritance
defines an “is – a” relationship.
Example
From a class Mammal, a number of classes can be derived such as Human, Cat, Dog, Cow, etc. Humans,
cats, dogs, and cows all have the distinct characteristics of mammals. In addition, each has its own
particular characteristics. It can be said that a cow “is – a” mammal.
6.4 POLYMORPHISM
Polymorphism is originally a Greek word that means the ability to take multiple forms. In object-
oriented paradigm, polymorphism implies using operations in different ways, depending upon the
instance they are operating upon. Polymorphism allows objects with different internal structures to
have a common external interface. Polymorphism is particularly effective while implementing
inheritance.
Example
Let us consider two classes, Circle and Square, each with a method findArea(). Though the name and
purpose of the methods in the classes are same, the internal implementation, i.e., the procedure of
Compiled by mail2prakashbaral@gmail.com
calculating area is different for each class. When an object of class Circle invokes its findArea() method,
the operation finds the area of the circle without any conflict with the findArea() method of the Square
class.
Methods with similar semantics, which interact with different devices, should be defined in a device-
independent manner. The considered methods use internally, for example, methods like start, stop
and seek. Some units can manipulate several media together.
A computer-controlled VCR or a Laser Disc Player (LDP) are storage units which, by themselves,
integrate (bind) video and audio. In a multimedia system, abstract device definitions can be provided,
e.g., camera and monitor. We did not say anything until now about the actual implementation. The
results show that defining a general and valid interface for several similar audio and video units, as
well as input and output units, is quite a difficult design process.
Compiled by mail2prakashbaral@gmail.com
annotating the voice annotation. The combination of all three of these can also be viewed as a single
identifiable multimedia object.
We can consider a multimedia data object to be composed of several related multimedia data objects
which are a voice segment, an image and a pointer movement (e.g. mouse movement). As we have
just described, these can be combined into a more complex object. We call the initial objects Simple
Multimedia Objects (SMOs) and the combination of several a Compound Multimedia Object (CMO).
In general a multimedia communications process involves one or multiple SMOs and possibly several
CMOs.
The SMO contains two headers that are to be defined and a long data string. The data string , we call
a Basic Multimedia Object (BMO). There may be two types of BMOs. The first type we call a segmented
BMO or SG:BMO. It has a definite length in data bits and may result from either a stored data record
or from a generated record that has a natural data length such as a single image, screen or text record.
We show the SMO. The second type of BMO is a streamed BMO, ST:BMO. This BMO has an a priori
undetermined duration. Thus it may be a real time voice or video segment.
A CMO has two headers, the Orchestration header and the Concatenation header. The Orchestration
header describes the temporal relationship between the SMOs and ensures that they are not only
individually synchronized but also they are jointly orchestrated. The orchestration concept has also
been introduced by Nicolaou. The concatenation function provides a description of the logical and
spatial relationships amongst the SMOs.
We can also expand the concept of a CMO as a data construct that is created and managed by multiple
users at multiple locations. In this construct we have demonstrated that N users can create a CMO by
entering multiple SMOs into the overall CMO structure. The objectives of the communications system
are thus focused on meeting the interaction between users who are communicating with CMOs.
Specifically we must be able to perform the following tasks:
Compiled by mail2prakashbaral@gmail.com
Medium
AcousticMedium
Music
Opus
Note
Audio_Block
Sample Value
Speech
…….
…….
Optical_Medium
Video
Video_Scene
Image
Image_Segment
Pixel
Line
Pixel
Column
Pixel
Animation
……
Text
…….
other-continuous _medium
discrete_medium
A specific property of all multimedia objects is the continuous change of their internal states during
their life spans. Data transfer of continuous media is performed as long as the corresponding
connection is activated.
The information contained in the information objects can build a presentation object which is later
used for presentation of information. Information objects can be converted to transport objects for
transmission purposes. Information is often processed differently which depends on whether the
information should be presented, transmitted or stored.
Compiled by mail2prakashbaral@gmail.com