100% found this document useful (1 vote)
3K views675 pages

Multimedia Systems Ebook PDF

Uploaded by

fykk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
3K views675 pages

Multimedia Systems Ebook PDF

Uploaded by

fykk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 675

Syllabus Outline

Topics in the module include the following:


• Introduction: Multimedia applications and requirements (e.g.,
overview of multimedia systems, video-on-demand, interactive
television, video conferencing, hypermedia courseware, 6

groupware, World Wide Web, and digital libraries).


• Audio/Video fundamentals including analog and digital
representations, human perception, and
audio/video equipment, applications.
• Audio and video compression
– perceptual transform coders for images/video
(e.g., JPEG, MPEG, H.263, etc.), JJ
– scalable coders (e.g., pyramid coders), II
– and perceptual audio encoders. J
– image and video processing applications and algorithms. I
Back
Close
Recommended Course Books
Supplied Text:
7
• Managing Multimedia:
Project Management for Interactive Media
(2nd Edition)
Elaine England and Andy Finney,
Addison Wesley, 1998
(ISBN 0-201-36058-6)

JJ
II
J
I
Back
Close
Recommended Course Book
8

Fundamentals of Multimedia
Mark S. Drew, Li Ze-Nian
Prentice Hall, 2003
(ISBN: 0130618721)

JJ
II
J
I
Back
Close
Other Good General Texts
9

• Multimedia Communications:
Applications, Networks, Protocols and
Standards,
Fred Halsall,
Addison Wesley, 2000
(ISBN 0-201-39818-4)
OR
• Networked Multimedia Systems,
Raghavan and Tripathi,
Prentice Hall,
(ISBN 0-13-210642) JJ
II
J
I
Back
Close
The following books are highly recommended reading:
• Hypermedia and the Web: An Engineering Approach, D. Lowe and W. Hall, J. Wiley
and Sons, 1999 (ISBN 0-471-98312-8).
• Multimedia Systems, J.F.K, Buford, ACM Press, 1994 (ISBN 0-201-53258-1).
• Understanding Networked Multimedia, Fluckiger, Prentice Hall, (ISBN 0-13-190992-4)
• Design for Multimedia Learning, Boyle, Prentice Hall, (ISBN 0-13-242215-8) 10

• Distributed Multimedia:Technologies, Applications, and Opportunities in the Digital


Information Industry (1st Edition) P.W. Agnew and A.S. Kellerman , Addison Wesley,
1996 (ISBN 0-201-76536-5)
• Multimedia Communication, Sloane, McGraw Hill, (ISBN 0-077092228)
• Virtual Reality Systems, J. Vince, Addison Wesley, 1995 (ISBN 0-201-87687-6)
• Encyclopedia of Graphics File Formats, Second Edition by James D. Murray and William
vanRyper, O’Reilly & Associates, 1996 (ISBN: 1-56592-161-5)

JJ
II
J
I
Back
Close
Multimedia Authoring — Useful for Assessed Coursework

• Macromedia Director MX Demystified,


Phil Gross, Macromedia Press (ISBN: 11
0321180976)
• Macromedia Director MX and Lingo:
Training from the Source Phil Gross,
Macromedia Press (ISBN: 0321180968)
• Director 8 and Lingo (Inside
Macromedia), Scott Wilson, Delmar
(ISBN: 0766820084)
• Director/Lingo Manuals — Application
Help and in Library
• SMIL: Adding Multimedia to the Web
Tim Kennedy and Mary Slowinski, JJ
Sams.net (ISBN: 067232167X) II
J
I
Back
Close
The following provide good reference material for parts of the
module:
Multimedia Systems
• Hyperwave:The Next Generation Web Solution, H. Maurer,
Addison Wesley, 1996 (ISBn 0-201-40346). 12

JJ
II
J
I
Back
Close
Digital Audio
• A programmer’s Guide to Sound, T. Kientzle, Addison Wesley,
1997 (ISBN 0-201-41972-6)
• Audio on the Web — The official IUMA Guide, Patterson and
13
Melcher, Peachpit Press.
• The Art of Digital Audio, Watkinson,
Focal/Butterworth-Heinmann.
• Synthesiser Basics, GPI Publications.
• Signal Processing: Principles and Applications, Brook and
Wynne, Hodder and Stoughton.
• Digital Signal Processing, Oppenheim and Schafer, Prentice
Hall. JJ
II
J
I
Back
Close
Digital Imaging/Graphics/Video
• Digital video processing, A.M. Tekalp, Prentice Hall PTR,
1995.
• Encyclopedia of Graphics File Formats, Second Edition by
14
James D. Murray and William vanRyper, 1996, O’Reilly &
Associates.

JJ
II
J
I
Back
Close
Data Compression
• The Data Compression Book, Mark Nelson,M&T Books, 1995.
• Introduction to Data Compression, Khalid Sayood, Morgan
Kaufmann, 1996.
15
• G.K. Wallace, The JPEG Still Picture Compression Standard
• CCITT, Recommendation H.261
• D. Le Gall, MPEG: A Video Compression Standard for Multimedia
Applications
• K. Patel, et. al., Performance of a Software MPEG Video Decoder
• P. Cosman, et. al., Using Vector Quantization for Image Processing
JJ
II
J
I
Back
Close
Introduction to Multimedia
16

What is Multimedia?

JJ
II
J
I
Back
Close
Introduction to Multimedia
What is Multimedia?
17
Multimedia can have a many definitions these include:

Multimedia means that computer information can be


represented through audio, video, and animation in addition
to traditional media (i.e., text, graphics/drawings, images).

JJ
II
J
I
Back
Close
General Definition
A good general definition is:

Multimedia is the field concerned with the computer-controlled


integration of text, graphics, drawings, still and moving images 18

(Video), animation, audio, and any other media where every


type of information can be represented, stored, transmitted and
processed digitally.

JJ
II
J
I
Back
Close
Multimedia Application Definition
A Multimedia Application is an Application which uses a
collection of multiple media sources e.g. text, graphics, images,
sound/audio, animation and/or video.
19

JJ
II
J
I
Back
Close
What is HyperText and HyperMedia?
Hypertext is a text which contains links to other texts.
The term was invented by Ted Nelson around 1965.

20

JJ
II
J
Figure 1: Definition of Hypertext I
Back
Close
Hypertext is therefore usually non-linear (as indicated below).

21

Figure 2: Illustration of Hypertext Links


JJ
II
J
I
Back
Close
Hypermedia
HyperMedia is not constrained to be text-based. It can include
other media, e.g., graphics, images, and especially the
continuous media – sound and video.
22

JJ
II
J
I
Figure 3: Definition of HyperMedia Back
Close
Example Hypermedia Applications?

23

JJ
II
J
I
Back
Close
Example Hypermedia Applications?
• The World Wide Web (WWW) is the best example of a
hypermedia application.
• Powerpoint
24
• Adobe Acrobat
• Macromedia Director
• Many Others?

JJ
II
J
I
Back
Close
Multimedia Systems
A Multimedia System is a system capable of processing
multimedia data and applications.

25

A Multimedia System is characterised by the processing,


storage, generation, manipulation and rendition of Multimedia
information.

JJ
II
J
I
Back
Close
Characteristics of a Multimedia System
A Multimedia system has four basic characteristics:
• Multimedia systems must be computer controlled.
• Multimedia systems are integrated. 26

• The information they handle must be represented digitally.


• The interface to the final presentation of media is usually
interactive.

JJ
II
J
I
Back
Close
Challenges for Multimedia Systems
• Distributed Networks
• Temporal relationship between data
– Render different data at same time — continuously. 27
– Sequencing within the media
playing frames in correct order/time frame in video
– Synchronisation — inter-media scheduling

E.g. Video and Audio — Lip synchronisation is clearly


important for humans to watch playback of video and audio
and even animation and audio.
JJ
II
Ever tried watching an out of (lip) sync film for a long time? J
I
Back
Close
Key Issues for Multimedia Systems
The key issues multimedia systems need to deal with here are:
• How to represent and store temporal information.
• How to strictly maintain the temporal relationships on play 28
back/retrieval
• What process are involved in the above.
• Data has to represented digitally — Analog–Digital
Conversion, Sampling etc.
• Large Data Requirements — bandwidth, storage, compression

JJ
II
J
I
Back
Close
Desirable Features for a Multimedia System
Given the above challenges the following feature a desirable (if
not a prerequisite) for a Multimedia System:
Very High Processing Power — needed to deal with large data
29
processing and real time delivery of media. Special hardware
commonplace.
Multimedia Capable File System — needed to deliver real-time
media — e.g. Video/Audio Streaming.
Special Hardware/Software needed – e.g RAID technology.
Data Representations — File Formats that support multimedia
should be easy to handle yet allow for
compression/decompression in real-time. JJ
II
J
I
Back
Close
Efficient and High I/O — input and output to the file subsystem
needs to be efficient and fast. Needs to allow for real-time
recording as well as playback of data. e.g. Direct to Disk
recording systems.
Special Operating System — to allow access to file system and 30
process data efficiently and quickly. Needs to support direct
transfers to disk, real-time scheduling, fast interrupt processing,
I/O streaming etc.
Storage and Memory — large storage units (of the order of 50
-100 Gb or more) and large memory (50 -100 Mb or more).
Large Caches also required and frequently of Level 2 and 3
hierarchy for efficient management.
Network Support — Client-server systems common as JJ
distributed systems common. II
J
Software Tools — user friendly tools needed to handle media,
I
design and develop applications, deliver media. Back
Close
Components of a Multimedia System
Now let us consider the Components (Hardware and Software)
required for a multimedia system:
Capture devices — Video Camera, Video Recorder, Audio
31
Microphone, Keyboards, mice, graphics tablets, 3D input
devices, tactile sensors, VR devices. Digitising/Sampling
Hardware
Storage Devices — Hard disks, CD-ROMs, Jaz/Zip drives, DVD,
etc
Communication Networks — Ethernet, Token Ring, FDDI, ATM,
Intranets, Internets.
Computer Systems — Multimedia Desktop machines, JJ
Workstations, MPEG/VIDEO/DSP Hardware II
Display Devices — CD-quality speakers, HDTV,SVGA, Hi-Res J
I
monitors, Colour printers etc. Back
Close
Applications
Examples of Multimedia Applications include:
• World Wide Web
• Hypermedia courseware
32
• Video conferencing
• Video-on-demand
• Interactive TV
• Groupware
• Home shopping
• Games
JJ
• Virtual reality
II
• Digital video editing and production systems J
I
• Multimedia Database systems
Back
Close
Trends in Multimedia
Current big applications areas in Multimedia include:
World Wide Web — Hypermedia systems — embrace nearly
all multimedia technologies and application areas.
33
MBone — Multicast Backbone: Equivalent of conventional TV
and Radio on the Internet.
Enabling Technologies — developing at a rapid rate to support
ever increasing need for Multimedia. Carrier, Switching,
Protocols, Applications, Coding/Compression, Database,
Processing, and System Integration Technologies at the
forefront of this.
JJ
II
J
I
Back
Close
Multimedia Data: Input and format
Text and Static Data
• Source: keyboard, floppies, disks and tapes.
• Stored and input character by character:
34
– Storage of text is 1 byte per character (text or format character).
– For other forms of data e.g. Spreadsheet files some formats
may store format as text (with formatting) others may use binary
encoding.
• Format: Raw text or formatted text e.g HTML, Rich Text Format
(RTF), Word or a program language source (C, Pascal, etc..
• Not temporal — BUT may have natural implied sequence e.g.
HTML format sequence, Sequence of C program statements.
JJ
• Size Not significant w.r.t. other Multimedia. II
J
I
Back
Close
Graphics
• Format: constructed by the composition of primitive objects
such as lines, polygons, circles, curves and arcs.
• Input: Graphics are usually generated by a graphics editor
35
program (e.g. Freehand) or automatically by a program (e.g.
Postscript).
• Graphics are usually editable or revisable (unlike Images).
• Graphics input devices: keyboard (for text and cursor control),
mouse, trackball or graphics tablet.
• graphics standards : OpenGL, PHIGS, GKS
• Graphics files usually store the primitive assembly
JJ
• Do not take up a very high storage overhead. II
J
I
Back
Close
Images
• Still pictures which (uncompressed) are represented as a
bitmap (a grid of pixels).
• Input: Generated by programs similar to graphics or animation
36
programs.
• Input: scanned for photographs or pictures using a digital
scanner or from a digital camera.
• Analog sources will require digitising.
• Stored at 1 bit per pixel (Black and White), 8 Bits per pixel
(Grey Scale, Colour Map) or 24 Bits per pixel (True Colour)
• Size: a 512x512 Grey scale image takes up 1/4 Mb, a 512x512
24 bit image takes 3/4 Mb with no compression. JJ
II
• This overhead soon increases with image size J
• Compression is commonly applied. I
Back
Close
Audio
• Audio signals are continuous analog signals.
• Input: microphones and then digitised and stored
• usually compressed. 37

• CD Quality Audio requires 16-bit sampling at 44.1 KHz


• 1 Minute of Mono CD quality audio requires 5 Mb.

JJ
II
J
I
Back
Close
Video
• Input: Analog Video is usually captured by a video camera
and then digitised.
• There are a variety of video (analog and digital) formats
38
• Raw video can be regarded as being a series of single images.
There are typically 25, 30 or 50 frames per second.
• a 512x512 size monochrome video images take 25*0.25 =
6.25Mb for a minute to store uncompressed.
• Digital video clearly needs to be compressed.

JJ
II
J
I
Back
Close
Output Devices
The output devices for a basic multimedia system include
• A High Resolution Colour Monitor
• CD Quality Audio Output 39

• Colour Printer
• Video Output to save Multimedia presentations to (Analog)
Video Tape, CD-ROM DVD.
• Audio Recorder (DAT, DVD, CD-ROM, (Analog) Cassette)
• Storage Medium (Hard Disk, Removable Drives, CD-ROM)

JJ
II
J
I
Back
Close
Multimedia Authoring:
Systems and Applications
40
What is an Authoring System?
An Authoring System is a program which has pre-programmed
elements for the development of interactive multimedia software
titles.
Authoring systems vary widely
• orientation,
• capabilities, and JJ
• learning curve. II
J
I
Back
Close
Why should you use an authoring system?
• can speed up programming possibly content development
and delivery
• about 1/8th
41
• However, the content creation (graphics, text, video, audio,
animation, etc.) not affected by choice of authoring system;
• time gains – accelerated prototyping

JJ
II
J
I
Back
Close
Authoring Vs Programming
• Big distinction between Programming and Authoring.
• Authoring —
– assembly of Multimedia 42
– possibly high level graphical interface design
– some high level scripting.
• Programming —
– involves low level assembly of Multimedia
– construction and control of Multimedia
– involves real languages like C and Java.
JJ
II
J
I
Back
Close
Multimedia Authoring Paradigms
The authoring paradigm, or authoring metaphor, is the
methodology by which the authoring system accomplishes its
task.
43

There are various paradigms


Scripting Language
Iconic/Flow Control
Frame
Card/Scripting
Cast/Score/Scripting — Macromedia Director JJ
Hypermedia Linkage II
J
Tagging — SMIL I
Back
Close
Scripting Language
• closest in form to traditional programming. The paradigm
is that of a programming language, which specifies (by
filename)
– multimedia elements, 44

– sequencing,
– hotspots,
– synchronization, etc.
• Usually a powerful, object-oriented scripting language
• in-program editing of elements (still graphics, video, audio,
etc.) tends to be minimal or non-existent.
• media handling can vary widely
JJ
II
J
I
Back
Close
Examples
• The Apple’s HyperTalk for HyperCard,
• Assymetrix’s OpenScript for ToolBook and
• Lingo scripting language of Macromedia Director 45

Here is an example lingo script to jump to a frame


global gNavSprite

on exitFrame
go the frame
play sprite gNavSprite
end
JJ
II
J
I
Back
Close
Iconic/Flow Control
• tends to be the speediest in development time
• best suited for rapid prototyping and short-development
time projects.
• The core of the paradigm is the Icon Palette, contains: 46

– possible functions/interactions of a program, and


– the Flow Line — shows the actual links between the
icons.
• slowest runtimes programs , high interaction overheads
Examples:
• Authorware
JJ
• IconAuthor II
J
I
Back
Close
Frame
• similar to the Iconic/Flow Control paradigm
• usually incorporates an icon palette
• the links drawn between icons are conceptual
47
• do not always represent the actual flow of the program.
Examples
• Quest (whose scripting language is C)
• Apple Media Kit.

JJ
II
J
I
Back
Close
48

JJ
II
J
I
Back
Figure 4: Macromedia Authorware Iconic/Flow Control Examples Close
Card/Scripting
• paradigm provides a great deal of power
(via the incorporated scripting language)
• suffers from the index-card structure.
• Well suited for Hypertext applications, and especially 49

suited for navigation intensive (e.g. Cyan’s ”MYST” game)


applications.
• extensible via XCMDs and DLLs;
• all objects (including individual graphic elements) to be
scripted;
• many entertainment applications are prototyped in a
card/scripting system prior to compiled-language coding.
JJ
II
J
I
Back
Close
Cast/Score/Scripting
• uses a music score as its primary authoring metaphor
• synchronous elements are shown in various horizontal
tracks
• simultaneity shown via the vertical columns. 50

• power of this metaphor lies in the ability to script the


behavior of each of the cast members.
• easily extensible to handle other functions (such as
hypertext) via XOBJs, XCMDs, and DLLs.
• best suited for animation-intensive or synchronized media
applications;
Examples JJ
II
• Macromedia Director
J
• Macromedia Flash — cut Down director Interface I
Back
Close
Hierarchical Object
• paradigm uses a object metaphor (like OOP)
• visually represented by embedded objects and iconic
properties.
• learning curve is non-trivial, 51

• visual representation of objects can make very


complicated constructions possible.

JJ
II
J
I
Back
Close
52

JJ
II
J
I
Figure 5: Macromedia Director Score Window
Back
Close
53

JJ
II
Figure 6: Macromedia Director Cast Window J
I
Back
Close
54

JJ
Figure 7: Macromedia Director Script Window II
J
I
Back
Close
Hypermedia Linkage
• similar to the Frame paradigm
• shows conceptual links between elements
• lacks the Frame paradigm’s visual linkage metaphor.
55

JJ
II
J
I
Back
Close
Tagging
tags in text files to
• link pages,
• provide interactivity and
56
• integrate multimedia elements.
Examples:
• SGML/HTML,
• SMIL (Synchronised Media Integration Language),
• VRML,
• 3DML and
• WinHelp
JJ
II
J
I
Back
Close
Issues in Multimedia Applications Design
There are various issues in Multimedia authoring.

Issues involved:
57
• Content Design
• Technical Design

JJ
II
J
I
Back
Close
Content Design
Content design deals with:
• What to say, what vehicle to use.
”In multimedia, there are five ways to format and deliver your 58
message.
You can
• write it,
• illustrate it,
• wiggle it,
• hear it, and
JJ
• interact with it.” II
J
I
Back
Close
Scripting (writing)

Rules for good writing:


1. Understand your audience and correctly address them.
2. Keep your writing as simple as possible. (e.g., write out the 59
full message(s) first, then shorten it.)
3. Make sure technologies used complement each other.

JJ
II
J
I
Back
Close
Graphics (illustrating)
• Make use of pictures to effectively deliver your messages.
• Create your own (draw, (color) scanner, PhotoCD, ...), or
keep ”copy files” of art works. – ”Cavemen did it first.”
60
Graphics Styles
• fonts
• colors
– pastels
– earth-colors
– metallic
– primary color JJ
– neon color II
J
I
Back
Close
Animation (wiggling) Types of Animation
• Character Animation – humanise an object
• Highlights and Sparkles
• Moving Text 61

• Video – live video or digitized video

JJ
II
J
I
Back
Close
2. When to Animate
• Enhance emotional impact
• Make a point (instructional)
• Improve information delivery 62

• Indicate passage of time


• Provide a transition to next subsection

JJ
II
J
I
Back
Close
Audio (hearing)

Types of Audio in Multimedia Applications:


1. Music – set the mood of the presentation, enhance the emotion,
illustrate points
63
2. Sound effects – to make specific points, e.g., squeaky doors,
explosions, wind, ...
3. Narration – most direct message, often effective

JJ
II
J
I
Back
Close
Interactivity (interacting)
• interactive multimedia systems!
• people remember 70% of what they interact with (according
to late 1980s study)
64

JJ
II
J
I
Back
Close
Types of Interactive Multimedia Applications:
1. Menu driven programs/presentations
– often a hierarchical structure (main menu, sub-menus, ...)
2. Hypermedia
65
+: less structured, cross-links between subsections of the
same subject –> non-linear, quick access to information
+: easier for introducing more multimedia features, e.g., more
interesting ”buttons”
-: could sometimes get lost in navigating the hypermedia
3. Simulations / Performance-dependent Simulations
– e.g., Games – SimCity, Flight Simulators
JJ
II
J
I
Back
Close
Technical Design
Technological factors may limit the ambition of your multimedia
presentation.
Studied Later in detail.
66

JJ
II
J
I
Back
Close
Storyboarding
The concept of storyboarding has been by animators and their
like for many years.

67

JJ
II
J
I
Back
Close
Storyboarding
• used to help plan the general
organisation
• used to help plan the content
of a presentation by recording 68

• organizing ideas on index


cards,
• placed on board/wall.
• Storyboard evolves as the
media are collected and
organised:
– new ideas and refinements JJ
to the presentation are II
J
made. I
Back
Close
Storyboard Examples
• DVD Example
• Storyboarding Explained
• Acting With a Pencil 69

• The Storyboard Artist


• Star Wars Quicktime Storyboard

JJ
II
J
I
Back
Close
Overview of Multimedia Software Tools
Digital Audio
Macromedia Soundedit —- Edits a variety of different format
audio files, apply a variety of effects (Fig 8)
70

JJ
II
J
Figure 8: Macromedia Soundedit Main and Control Windows and
I
Effects Menu
Back
Close
CoolEdit/Adobe Audtion — Edits a variety of different format
audio files

71

JJ
II
J
I
Many Public domain audio editing tools also exist. Back
Close
Music Sequencing and Notation
Cakewalk
• Supports General MIDI
• Provides several editing views (staff, piano roll, event list) 72
and Virtual Piano
• Can insert WAV files and Windows MCI commands (animation
and video) into tracks

JJ
II
J
I
Back
Close
Cubase
• A better software than Cakewalk Express
• Intuitive Interface to arrange and play Music (Figs 9 and 10)
• Wide Variety of editing tools including Audio (Figs 11 and 12
73
• Score Editing

JJ
II
J
I
Back
Close
74

Figure 9: Cubase Arrange Window (Main)

JJ
II
J
I
Back
Close
Figure 10: Cubase Transport Bar Window — Emulates a Tape
Recorder Interface
75

JJ
II
J
I
Back
Close
76

Figure 11: Cubase Audio Window

JJ
II
J
I
Back
Close
77

JJ
Figure 12: Cubase Audio Editing Window with Editing Functions II
J
I
Back
Close
Logic Audio
• Cubase Competitor, similar functionality
Marc of the Unicorn Performer
• Cubase/Logic Audio Competitor, similar functionality 78

JJ
II
J
I
Back
Close
79

JJ
II
J
I
Back
Close
Figure 13: Cubase Score Editing Window
Image/Graphics Editing
Adobe Photoshop
• Allows layers of images, graphics and text
• Includes many graphics drawing and painting tools
80
• Sophisticate lighting effects filter
• A good graphics, image processing and manipulation tool
Adobe Premiere
• Provides large number (up to 99) of video and audio tracks,
superimpositions and virtual clips
• Supports various transitions, filters and motions for clips
• A reasonable desktop video editing tool JJ
Macromedia Freehand II
• Graphics drawing editing package J
I
Many other editors in public domain and commercially Back
Close
Image/Video Editing
Many commercial packages available
• Adobe Premier
• Videoshop 81

• Avid Cinema
• SGI MovieMaker

JJ
II
J
I
Back
Close
Animation
Many packages available including:
• Avid SoftImage
• Animated Gif building packages e.g. GifBuilder 82

JJ
II
J
I
Back
Close
Multimedia Authoring
– Tools for making a complete multimedia presentation where
users usually have a lot of interactive controls.
Macromedia Director
83
• Movie metaphor (the cast includes bitmapped sprites, scripts,
music, sounds, and palettes, etc.)
• Can accept almost any bitmapped file formats
• Lingo script language with own debugger allows more control
including external devices, e.g., VCRs and video disk players
• Ready for building more interactivities (buttons, etc.)
• follows the cast/score/scripting paradigm, JJ
• tool of choice for animation content (Well FLASH for Web). II
J
I
Back
Close
Authorware
• Professional multimedia authoring tool
• Supports interactive applications with hyperlinks,
drag-and-drop controls, and integrated animation
84
• Compatibility between files produced from PC version and
MAC version
Other Authoring Tools mentioned in notes later

JJ
II
J
I
Back
Close
Multimedia Authoring:
Scripting (Lingo)
85
Cast/Score/Scripting paradigm.

This section is a very brief introduction to Director.

For further Information, You should consult:


• Macromedia Director Using Director Manual — In Library
• Macromedia Director : Lingo Dictionary Manual — In Library
• Macromedia Director: Application Help — Select Help from within the Director
application. This is very thorough resource of information. JJ
• Macromedia Director Guided tours — see Help menu option.
II
J
• A variety of web sites contain director tutorials, hints and information including
http://www.macromedia.com
I
Back
Close
More Director References

• Macromedia Director MX
Demystified,
Phil Gross,
Macromedia Press (ISBN: 86
0321180976)
• Macromedia Director MX and
Lingo: Training from the Source
Phil Gross,
Macromedia Press (ISBN:
0321180968)
• Director 8 and Lingo
(Inside Macromedia),
Scott Wilson, JJ
Delmar (ISBN: 0766820084) II
J
I
Back
Close
Related Additional Material and Coursework

Tutorials with additional Director Instructional Material

See Lab Worksheets 1 + 2


87

Also Assessed Exercise 2

JJ
II
J
I
Back
Close
Director Overview/Definitions
movies — Basic Director Commodity:
interactive multimedia pieces that can include
• animation,
88
• sound,
• text,
• digital video,
• and many other types of media.
• link to external media
A movie can be as small and simple as an animated logo or
as complex as an online chat room or game.
JJ
Frames — Director divides lengths of time into a series of frames, II
cf. celluloid movie. J
I
Back
Close
Creating and editing movies
4 Key Windows:
the Stage — Rectangular area where the movie plays
89

JJ
II
J
I
Back
Close
the Score : Where the movie is assembled;

90

JJ
II
J
I
Back
Close
one or more Cast windows — Where the movie’s media elements
are assembled;

91

JJ
II
J
I
Back
Close
and
the Control Panel — Controls how the movie plays back.

92

JJ
II
J
I
Back
Close
To create a new movie:
93
• Choose File > New > Movie

JJ
II
J
I
Back
Close
Some other key Director Components (1)
Channels – the rows in the Score that contain sprites for
controlling media
• numbered
94
• contain the sprites that control all the visible media
• Special effects channels at the top contain behaviors as
well as controls for the tempo, palettes, transitions, and
sounds.
Sprites —
Sprites are objects that control when, where, and how media
appears in a movie.
JJ
II
J
I
Back
Close
Some other key Director Components (2)
Cast members —
• The media assigned to sprites.
• media that make up a movie.
95
• includes bitmap images, text, vector shapes, sounds, Flash
movies, digital videos, and more.
Lingo — Director’s scripting language, adds interactivity to a
movie.
Behaviors — pre-existing sets of Lingo instructions.
Markers — identify fixed locations at a particular frame in a
movie. JJ
II
J
I
Back
Close
Lingo Scripting (1)
Commands — terms that instruct a movie to do something while the
movie is playing. For example, go to sends the playback head to
a specific frame, marker, or another movie.
Properties — attributes that define an object. For example 96
colorDepth is a property of a bitmap cast member,
Functions — terms that return a value. For example, the date function
returns the current date set in the computer. The key function
returns the key that was pressed last. Parentheses occur at the
end of a function,
Keywords — reserved words that have a special meaning.
For example, end indicates the end of a handler,

JJ
II
J
I
Back
Close
Lingo Scripting (2)
Events — actions that scripts respond to.
Constants — elements that don’t change. For example, the constants
TAB, EMPTY, and RETURN always have the same meaning, and
Operators — terms that calculate a new value from one or more 97

values. For example, the add operator (+) adds two or more
values together to produce a new value.

JJ
II
J
I
Back
Close
Lingo Data Types
Lingo supports a variety of data types:
• references to sprites and cast members,
• (Boolean) values: TRUE and FALSE , 98

• strings,
• constants,
• integers, and
• floating-point numbers.
Standard Program structure syntax

JJ
II
J
I
Back
Close
Lingo Script Types (1)
Director uses four types of scripts.
Behaviors — Behaviors are attached to sprites or frames in the
Score.
99

Figure 14: Behavior Icon

Movie scripts — available to the entire movie

Figure 15: Movie script icon JJ


II
J
I
Back
Close
Lingo Script Types (2)

Parent scripts — special scripts that contain Lingo used to create


child objects.

100

Figure 16: Parent script icon

Scripts attached to cast members — independent of the Score.


don’t appear in the Cast window.

Figure 17: Script button


JJ
II
J
I
Back
Close
Director Example 1: Simple Animation
A Bouncing Ball Graphic
101

Run Example in Browser (Shockwave)


Run Example in Browser (Lecture ONLY)
• No Lingo scripting.
• basic animation where a cast member

JJ
II
J
I
Back
Close
Creating the Bouncing Ball Graphic
The following steps achieve a simple bouncing ball animation
along a path:

1. Let us begin by creating 102


a new movie and setting the
Stage size:
• Start a New movie: File
> New > Movie
(Shortcut =
Command+N)
• Choose Modify > Movie
> Properties. JJ
In stage size, choose 640 II
x 480. J
I
Back
Close
2. Now let us create a ball, using a the vector shape tool:

• Choose Window > Vector Shape


(Shortcut = Command+Shift+V)
• Click the filled ellipse button.
103
• Draw an ellipse (circle) about the size of
the Vector Shape Window
• Click on the Gradient fill button.
• To change the colours, click the colour
box on the left side of the Gradient
colour control
• Change the colour on the right side of
the Gradient Colours to a dark blue.
• Change the Gradient type pull-down
menu from Linear to Radial. JJ
• Change the Stroke Colour to white . II
J
I
Back
Close
3. Now let us change a few other properties of this ellipse

• Close the Vector Shape window.


• In the Cast Window, select the ellipse.
104
• Choose Edit > Duplicate (Shortcut = Command+D).
• Double click the new cast, which opens it in the Vector Shape
Tool.
• Change the Cycles to 3 and the Spread to 200.
• Name the latest ellipse to ’bouncing ball’

JJ
II
J
I
Back
Close
4. Now we are going to animate the ball.
• Drag ’bouncing ball’ from the cast
member window to the stage.
• You will notice the sprite (the
object that appears in the score)
is extended over 20 frames. 105

• Drag the right end of the sprite to


frame 40.
• Click anywhere in the middle of
the sprite to select it.
• resize the ellipse.

JJ
II
J
I
Back
Close
4. Ball Animation (Key Frames)

• Click on frame 40 in channel 1


• Create keyframes at frame 20
(the end of the sprite), hold down and 30.
Option and shift and drag the
ellipse to the right end of the • at each keyframe, a circle
stage. appears on the path shown on 106
the stage.
• To curve the path, we are going to
insert keyframes within the sprite. • Click on the keyframe 10 circle
and drag it up.
• Click on frame 10 of the sprite and
choose Insert > Keyframe • Change other Keyframes.
(Shortcut =
• Rewind and play the movie.
Command+Option+K)

JJ
II
J
I
Back
Close
Further Animation: 1.1 Shrinking the ball

Run Example Shrinking the ball (Shockwave)


Run Shrinking the ball (Lecture Only)
107
• (Optional) Click on keyframe 40 in
the score and drag it to frame 60,
notice how all the keyframes spread out
proportionally.
• (Optional) Click on the keyframes in the
score and adjust the path if you feel like
it.
• While moving the keyframes, resize the
balls so they slowly get smaller. Notice
while you resize the balls, the path
changes and you will need to edit the
path again. JJ
II
• Rewind and play the movie.
J
• Save your movie as example2.dir.
I
Back
Close
1.2. Animating sprite colour
Run Example: Animating sprite colour (Shockwave)
Run Example: Animating sprite colour (Lecture Only)
• Working still with example1.dir.
108
• Open Property Inspector for Sprite
– Right Mouse (or Ctrl) on Sprite
(Score or Stage)
– Select Properties...
• Click on the keyframes in the score,
and change the Foreground colour chip,
Forecolor, to different colours.
• Changing the foreground colour is like
putting a coloured film over your object.
The resulting colour is a mixture of the
object’s original colour and the ’film’. For JJ
this reason, light colours work better II
than dark colours for this effect..
J
• Rewind and play the movie. I
• Save as example3.dir Back
Close
1.3. Animating sprite transparency — Making the Ball
Disappear

Run Example: Making the Ball Disappear (Shockwave)


Run Example: Making the Ball Disappear (Lecture Only) 109

• Open example1.dir
• Open Property Inspector for Sprite
• Click on the keyframes in the score, and
• Change the Blend Transparency to 100, 75, 50, 25, 0 for the
consecutive keyframes.
• Rewind and play the movie. JJ
• Save as example4.dir II
J
I
Back
Close
1.4. Animating sprite shape — Deforming The Ball

Run Example: Deforming The Ball (Shockwave)


Run Example: Deforming The Ball (Lecture Only)
110
• Open example1.dir
• Open Property Inspector for Sprite
• Click on the keyframes in the score, and
• Change the Skew Angle to 0, 20, 40, 60 and 80 for the
consecutive keyframes.
• Rewind and play the movie
• Save as example5.dir JJ
II
J
I
Back
Close
Director Example 2: Importing media
To import multimedia data there
111
are two basic ways:
• Choose File > Import ...
Useful for importing batches
of data (e.g. Several image
sequences.
• Drag and drop source media
into a cast member location
Quite Intuitive JJ
II
J
I
Back
Close
Examples: Simple Image import and Manipulation
• Drag an image into a spare cast member.
• Drag this cast member to the Score
• Set suitable Properties for Sprite 112

– Manipulate as for a vector item above.


• Examples:
– ex dave roll.dir sets up some keyframes and alters the
rotation of the image (Shockwave)
– ex dave roll.dir sets up some keyframes and alters the
rotation of the image (Lecture Only)
– ex dave sq.dir alters the skew angle (Shockwave) JJ
– ex dave sq.dir alters the skew angle (Lecture Only) II
J
I
Back
Close
Example: Falling Over Movie, ex dave movie.dir

Example: Falling Over Movie, ex dave movie.dir (Shockwave)


Run Example: Falling Over Movie, ex dave movie.dir (Lecture
Only) 113

• Several Gif images depicting sequence


exist on disk.
• Choose File > Import
• Select items you wish to import by
double-clicking or pressing the Add
button
• Click on the Import Button
• Several new cast members should be
added JJ
• Set looping on and play II
J
I
Back
Close
Example: Pinching Movie Movie, ex dave pinch.dir
Example: Pinching Movie Movie, ex dave pinch.dir (Shockwave)
Example: Pinching Movie Movie, ex dave pinch.dir (Lecture Only)

• Photoshop has been used to set a pinch


effect of varying degree for an image. 114
• Import images as before
• To reverse the image set to obtain a smooth
back and forth animation:
– Select the sprite sequence in the score
– Copy Sequence — Press Command+C
(Copy),
– Click on the frame just after the sprite
sequence
– Paste Sequence — press
Command+V (Paste).
– Click on this second sprite sequence
JJ
and choose Modify > Reverse II
Sequence. J
– Select the 2 sprites by pressing Shift I
and clicking on both. Choose Modify >
Join Sprites. Back
Close
Simple Lingo Scripting
Director Example 3: Very Simple Action
115
Here we illustrate the basic mechanism of scripting in Director
by developing and extending a very basic example:

Making a button beep and attaching a message to a button

Making the a button beep (Shockwave)


Making the a button beep (Lecture Only)
JJ
II
J
I
Back
Close
Making the Button Beep Movie
• Open a new movie.
• Turn the looping on in the
control panel.
116

• Open the tool palette.


• Click the push button icon.
• Draw a button on the stage,
JJ
and type in a label:
II
“button” here J
I
Back
Close
Our First Lingo

Now lets write a simple script for the button:

• Press Ctrl+click the button in the cast


window and choose Cast Member 117
Script.
• Director writes the first and last line for
us, add a beep command so the script
look like this:

on mouseUp
beep
end

• Close the window.


• Rewind and play the movie. JJ
• Click the button a few times. II
J
I
Back
Close
To pop up a message box on button press (and still beep)

• Reopen the cast member


script.
• Change the text so it now 118
reads.

on mouseUp
beep
alert "Button Pressed"
end

• Close the window.


• Play the movie and click the JJ
II
button.
J
I
Back
Close
Director Example 4: Controlling Navigation with Lingo

A slightly more complex Lingo Example

This examples illustrates how we may use Lingo Scripts as: 119

• Cast Member Scripts


• Sprite Scripts
• Behaviour Scripts

JJ
II
J
I
Back
Close
Director Example 4: Ready Made Example
To save time, we begin we a preassembled
Director movie:

Run Lingo Navigation Example (Shockwave)

Run Lingo Navigation Example 120


(Lecture Only)
• Open lingo ex.3.2.dir
• Play the movie —
press some of the buttons
– The Numbered buttons record
moves through
Scenes/Frames
– The Next/Back buttons replay
what has been recorded.
JJ
II
J
I
Back
Close
The Loop the Frame script
We are first going to create a loop the frame script:
• Cast Member 11 controls the Frame
Looping
• Note we have created a special frame
marking channel in the Score. 121

• To create the associated script either


– Double click on the script icon in the
Score
– Ctrl-click on the Cast member and
select Member Script
• The scripting window appears. You can
edit the script text, it now reads:

on exitFrame
go the frame JJ
end
II
This frame script tells Director to keep • Pressing down Alt and J
playing the same frame. dragging the frame script in I
the Score can change this
• The Loop lasts to frame 24 Back
length.
Close
Scene Markers (1)
Now we will create some markers
• To Create a Marker You Click in the marking channel for
the Frame and label some the marker with some typed text
122
In this example:
• Markers are at frame 1,
10 and 20, naming them
scene1, scene2 and
scene3 respectively.
• Note: You can delete
a marker by clicking the • The go to next
triangle and dragging it command tells Director to
below the marker channel. go to the next consecutive
• A cast member (9) script marker in the score.
for the next button has also JJ
been created: II
on mouseUp J
go to next I
end Back
Close
Scene Markers (2)
• A cast member (10) script for the back button has also been
created:
on mouseUp
123
go to previous
end
The go to previous command tells Director to go to the
previous marker in the score.
• Once again, Play the movie, click on these buttons to see
how they work.

JJ
II
J
I
Back
Close
Sprite Scripts
Now We will create some sprite scripts:
• Sometimes a button will
– behave one way in one part of the movie and 124

– behave another way in a different part of the movie.


– A typical example use of sprite scripts.

JJ
II
J
I
Back
Close
The Next Button Sprite Scripts (1)
Desired Action of Next Button: Jump to next scene

125

JJ
II
J
I
Back
Close
The Next Button Sprite Scripts (2)
• Here we have split actions to map to our
Scene Markers. To achieve this:
– Click on frame 10 of channel 6 (the
next button) this sprite and choose
Modify > Split Sprite.
126
– Do the same at frame 20.
• To attach a script to each split action:
– Select the each sprite sequence
(here in channel 6).
– Ctrl-click on sequence and select
Script... from the pull-down in
the score to give a script window.
– We add a suitable jump to next
scene
– In example shown we have go to
"scene2" : JJ
This command tells Director to send II
the movie to marker "scene2".
J
– Could do other sequences similar –
But alternatives exist. I
Back
Close
Behaviour Scripts

Example Here: Another way to write the sprite script on last


slide - using the Behaviour Inspector.
127

Behaviour Scripts can do A LOT MORE.

JJ
II
J
I
Back
Close
A Behaviour Script for Next Button (Scene 2) (1)

• We now work with the second sprite


sequence (channel 6 in the score). 128

• We will create (or have created) an


associated behaviour script:
– Ctrl-click on the second sequence
– Open the Behaviour Inspector
window
– Click on the Script Window icon next
to the Behaviour Inspector Tab.
• To Create/name a new Behaviour:
– Click the + icon at the top left of the
window and select new behaviour JJ
from the pull-down. II
– Give the behaviour a name, here it J
is called next2.
I
Back
Close
A Behaviour Script for Next Button (Scene 2) (2)

• To add events/actions to the script you


can: 129
– Under Events click the + icon
In this example we have added a
mouseUp from the menu.
– Under Actions click the + icon
In this example we have chosen
Navigation > Go to marker then find
scene3 on the list
– You can add/edit Lingo text
manually in the Script Editor
window for the particular behaviour
JJ
II
J
I
Back
Close
Summary: Sprite Script Order
We now 2 scripts attached to a single object (achieving much
the same task):
• a Cast Member script
130
• a Sprite script.
• Sprite scripts take priority over cast member scripts
• So Here cast member script will be ignored.

JJ
II
J
I
Back
Close
Some more Lingo to add to our example

Another part of our Application:


The jump buttons 1-3 (Buttons 4-6 currently inactive).
131

We will be using Lingo play/ play do to record actions

We have created a Vector Graphic Image (Cast Member 2)


for the main Recorder Interface

JJ
II
J
I
Back
Close
A problem in Director?
In Director: a script can only be associated with a complete
Object
For the way we have created the Recorder Interface we require
(and this is clearly a common requirement in many other cases): 132

• Only part an image to be linked instead of the whole object.


• One part for each of the jump buttons 1-3.
There is a solution:
• Use invisible buttons.
• These are shape cast members with an invisible border.
JJ
II
J
I
Back
Close
Creating our Invisible Buttons

• We have added our invisible as Cast


Member 14. To create this component:
– Open the Tool palette window. 133

– Click on the no line button.


– Click on the rectangle button and
draw a rectangle on the stage
around the 1 button.
• We have added this sprite to 8 and we
have attached a sprite script:
– Ctrl-click on frame 1 of channel 8
and select script.
– Attach a sprite script to this shape
with the command play ”scene1”. JJ
– Extend the sprite sequence so it II
covers frame 1 to 24.
J
• Repeat the steps placing the sprite over
the 2 and 3 button
I
Back
Close
Final Part: the Back Button (1)
Director provides the ability to record actions for future use
The Lingo Play command
The play command is similar to the go to command but
allows: 134

• Director records every time a play is initiated,


• Keeping track of the users’ path through the movie.
• You can move back on along this path by using the play
done command.

JJ
II
J
I
Back
Close
Final Part: the Back Button (1)
So in this Example
• Select the sprite sequence in channel 5 and Cast member
10.
135
• Attach a Sprite script reading
on mouseUp
play done
end

• Rewind, play the movie, click all the 1, 2, 3 buttons in various


orders, click the back button also and observe the effect of
Back button
JJ
• Complete example: lingo ex3.2.dir (Web Based) II
• Complete example: lingo ex3.2.dir (Local Version) J
I
Back
Close
Multimedia Authoring:Tagging (SMIL)
• Last lecture — Lingo scripting
136
• This lecture — Tagging
• SMIL an extension of XML for synchronised media integration.

JJ
II
J
I
Back
Close
What it is SMIL?
• SMIL is to synchronized multimedia what
• HTML is to hyperlinked text.
• Pronounced smile 137

JJ
II
J
I
Back
Close
SMIL :
• A simple,
• Vendor-neutral
• Markup language 138

Designed to:
• For all skill levels of WWW authors
• Schedule audio, video, text, and graphics files across a
timeline
• No need to master development tools or complex
programming languages. JJ
II
• HTML-like need a text editor only
J
• Links to media — medie not embedded in SMIL file I
Back
Close
Drawbacks of SMIL?

Good Points:
• A powerful tool for creating synchronized multimedia
presentations on the web 139

• Deals with low bandwidth connections.


Bad Points:
• Meant to work with linear presentations
• Several types of media can be synchronized to one timeline.
• Does not work well with non-linear presentations
• Ability to skip around in the timeline is buggy. JJ
II
For slideshow style mixed media presentations it the best the J
web has to offer. I
Back
Close
SMIL support
• The W3C recommended SMIL in June 1998
• Quicktime 4.0 supports SMIL (1999)
• Not universally supported across the Web.
140
• No Web browser directly support SMIL
• RealPlayer G2 supports SMIL
• Many other SMIL-compliant players, authoring tools, and
servers available.

JJ
II
J
I
Back
Close
Running SMIL Applications
For this course there are basically three ways to run SMIL
applications (two use the a Java Applet) so there are basically
two SMIL supported mediums:
141
Quicktime — supported since Quicktime Version 4.0.
RealPlayer G2 — integrated SMIL support
Web Browser — use the SOJA SMIL applet viewer with html
wrapper
Applet Viewer — use the SOJA SMIL applet viewer with html
wrapper

JJ
II
J
I
Back
Close
Quicktime media support is richer (see later sections on
Quicktime).
You will need to use both as RealPlayer and SOJA support
different media

Media Tag RealPlayer GRiNS Soja 142

GIF img OK OK OK
JPEG img OK OK OK
Wav audio OK OK -
.au Audio audio OK OK OK
.auz Audio Zipped audio - - OK
MP3 audio OK - -
Plain text text OK OK OK
Real text textstream OK - -
Real movie video OK - - JJ
AVI video OK OK - II
MPEG video OK OK - J
MOV video OK - - I
Back
Close
Using Quicktime
• Load the SMIL file into a Quicktime plug-in (configure Browser
helper app or mime type) or
• the Quicktime movie player.
143

JJ
II
J
I
Back
Close
Using RealPlayer G2
The RealPlayer G2 is installed on the applications HD in the
RealPlayer folder.
Real player supports lots of file format and can use plugins.
The main supported formats are:
144

• Real formats: RealText, RealAudio, etc...


• Images: GIF, JPEG
• Audio: AU, WAV, MIDI, etc...

JJ
II
J
I
Back
Close
To run SMIL files
Real Player uses streaming to render presentations.
• works better when calling a SMIL file given by a Real Server,
• rather than from an HTTP one.
145
Locally RUN SMIL files
• drag a SMIL file onto the RealPlayer G2 Application
• Open a local SMIL file inside RealPlayer G2 Application

JJ
II
J
I
Back
Close
Using the SOJA applet SOJA stands for SMIL Output in Java
Applet.
SOJA is an applet that render SMIL in a web page or in a
separate window. It supports the following formats:
• Images: GIF, JPEG 146

• Audio: AU and AUZ (AU zipped) — SUN Audio files


• Text: plain text

JJ
II
J
I
Back
Close
Running SOJA
To run SMIL through an applet you have to
• call the applet from an HTML file:
<APPLET CODE="org.helio.soja.SojaApplet.class"
ARCHIVE="soja.jar" CODEBASE="../" 147
WIDTH="600" HEIGHT="300">
<PARAM NAME="source" VALUE="cardiff_eg.smil">
<PARAM NAME="bgcolor" VALUE="#000066">
</APPLET>

• the SOJA (soja.jar) archive is located in the SMIL folder on


the Macintoshes.
• You may need to alter the CODEBASE attribute for your own
applications
• The PARAM NAME="source" VALUE="MY SMILFILE.smil" is JJ
how the file is called. II
J
I
Back
Close
RUNNING APPLETS
This should be easy to do
• Run the html file through a java enabled browser
• Use Apple Applet Runner
148
– uses MAC OS Runtime Java (Java 1.2)
– less fat for SMIL applications (we do really need Web
connection for our examples)
– Efficient JAVA and MAC OS run.
– Located in Apple Extras:Mac OS Runtime For Java folder
– TO RUN: Drag files on to application, OR
– TO RUN: Open file from within application
JJ
II
J
I
Back
Close
Let us begin to SMIL — SMIL Authoring
SMIL Syntax Overview
• SMIL files are usually named with .smi or .smil extensions
• XML based syntax 149

JJ
II
J
I
Back
Close
Basic Layout The basic Layout of a SMIL Documents is as
follows:
<smil>
<head>
<meta name="copyright"
150
content="Your Name" />
<layout>
<!-- layout tags -->
</layout>
</head>
<body>
<!-- media and synchronization tags -->
</body>
</smil>
JJ
II
J
I
Back
Close
A source begins with <smil> and ends with </smil>.
Note that SMIL is case sensitive
<smil>
....
</smil> 151

JJ
II
J
I
Back
Close
SMIL documents have two parts: head and body. Each of
them must have <smil> as a parent.
<smil>
<head>
.... 152
</head>
<body>
....
</body>
</smil>

JJ
II
J
I
Back
Close
Some tags, such as meta can have a slash at their end:
....
<head>
<meta name="copyright"
content="Your Name" /> 153
</head>
....
This is because SMIL is XML-based.
Some tags are written:
• <tag> ... </tag>
• <tag />
JJ
II
J
I
Back
Close
SMIL Layout Everything concerning layout (including window
settings) is stored between the <layout> and the </layout>
tags in the header as shown in the above subsection.
A variety of Layout Tags define the presentation layout:
<smil> 154
<head>
<layout>
<!-- layout tags -->
</layout>
......

JJ
II
J
I
Back
Close
Window settings
You can set width and height for the window in which your
presentation will be rendered with <root-layout>.
The following source will create a window with a 300x200
pixels dimension and also sets the background to be white.
155

<layout>
<root-layout width="300" height="200"
background-color="white" />
</layout>

JJ
II
J
I
Back
Close
Positioning Media It is really easy to position media with SMIL.
You can position media in 2 ways:
Absolute Positioning — Media are located with offsets from
the origin — the upper left corner of the window.
156
Relative Positioning — Media are located relative to the window’s
dimensions.
We define position with a <region> tag

JJ
II
J
I
Back
Close
The Region tag —
To insert a media within our presentation we use the <region>
tag.
• must specify the region (the place) where it will be displayed.
157
• must also assign an id that identifies the region.

JJ
II
J
I
Back
Close
Let’s say we want to
• insert the Cardiff icon (533x250 pixels)
• at 30 pixels from the left border and
• at 25 pixels from the top border.
158
The header becomes:
<smil>
<head>
<layout>
<root-layout width="600" height="300"
background-color="white" />
<region id="cardiff_icon"
left="30" top="25"
width="533" height="250" /> JJ
II
</layout>
J
</head> I
...... Back
Close
The img tag
To insert the Cardiff icon in the region called ”cardiff icon”, we
use the <img> tag as shown in the source below.
Note that the region attribute is a pointer to the <region>
tag.
159
<head>
<layout>
<root-layout width="600" height="300"
background-color="white" />
<region id="cardiff_icon"
left="30" top="25"
width="533" height="250" />
</layout>
</head>
<body> JJ
<img src="cardiff.gif" II
alt="The Cardiff icon" J
region="cardiff_icon" /> I
</body> Back
Close
This produces the following output:

160

Figure 18: Simple Cardiff Image Placement in SMIL

JJ
II
J
I
Back
Close
Relative Position Example
if you wish to display the Cardiff icon at
• 10% from the left border and
• at 5% from the top border, modify the previous source and
replace the left and top attributes. 161

<head>
<layout>
<root-layout width="600" height="600"
background-color="white" />
<region id="cardiff_icon"
left="10%" top="5%"
width="533" height="250" />
</layout>
</head> JJ
<body> II
<img src="cardiff.gif" J
region="cardiff_icon" /> I
</body> Back
Close
Overlaying Regions
We have just seen how to position a media along x and y axes
(left and top).
What if two regions overlap ?
• Which one should be displayed on top ? 162

JJ
II
J
I
Back
Close
The following code points out the problem:
<smil>
<head>
<layout>
<root-layout width="300" height="200"
background-color="white" />
<region id="region_1" left="50" top="50" 163
width="150" height="125" />
<region id="region_2" left="25" top="25"
width="100" height="100" />
</layout>
</head>
<body>
<par>
<text src="text1.txt" region="region_1" />
<text src="text2.txt" region="region_2" />
</par>
</body>
</smil>
JJ
II
J
I
Back
Close
To ensure that one region is over the other, add z-index
attribute to <region>.
When two region overlay:
• the one with the greater z-index is on top.
164
• If both regions have the same z-index, the first rendered one
is below the other.

JJ
II
J
I
Back
Close
In the following code, we add z-index to region 1 and
region 2:
<smil>
<head>
<layout>
<root-layout width="300" height="200"
background-color="white" /> 165
<region id="region_1" left="50"
top="50" width="150"
height="125" z-index="2"/>
<region id="region_2" left="25"
top="25" width="100"
height="100" z-index="1"/>
</layout>
</head>
<body>
<par>
<text src="text1.txt" region="region_1" />
<text src="text2.txt" region="region_2" />
</par> JJ
</body> II
</smil>
J
I
Back
Close
fitting media to regions
You can set the fit attribute of the <region> tag to force
media to be resized etc.
The following values are valid for fit:
166
• fill — make media grow and fill the area.
• meet — make media grow (without any distortion) until it
meets the region frontier.
• slice — media grows (without distortion) and fill entirely its
region.
• scroll — if media is bigger than its region area gets scrolled.
• hidden — don’t show media
JJ
Obviously you set the value like this: II
<region id="region_1" ..... J
I
fit="fill" /> Back
Close
Synchronisation There are two basic ways in which we may
want to play media:
• play several media one after the other,
• how to play several media in parallel.
167
In order to do this we need to add synchronisation:
• we will need to add time parameter to media elements,

JJ
II
J
I
Back
Close
Adding a duration of time to media — dur
To add a duration of time to a media element simply specify a
dur attribute parameter in an appropriate media tag:
.....
<body> 168
<img src="cardiff.gif"
alt="The Cardiff icon"
region="cardiff_icon" dur="6s" />
</body>
.....

JJ
II
J
I
Back
Close
Delaying Media — the begin tag
To specify a delay i.e when to begin set the begin attribute
parameter in an appropriate media tag:
If you add begin=”2s” in the cardiff image tag, you will see
that the Cardiff icon will appear 2 seconds after the document
169
began and will remain during 6 other seconds. Have a look at
the source:
.....
<body>
<body>
<img src="cardiff.gif"
alt="The Cardiff icon"
region="cardiff_icon" JJ
dur="6s" begin="2s" /> II
</body> J
..... I
Back
Close
Sequencing Media — the seq tag
Scheduling media:
The <seq> tag is used to define a sequence of media.
• The media are executed one after each other:
170
.....
<seq>
<img src="img1.gif"
region="reg1" dur="6s" />
<img src="img2.gif"
region="reg2"
dur="4s" begin="1s" />
</seq>
..... JJ
II
So the setting 1s makes the img2.gif icon appear 1 second J
after img1.gif. I
Back
Close
Parallel Media — the par tag
We use the <par> to play media at the same time:
<par>
<img src="cardiff.gif"
alt="The cardiff icon" 171
region="cardiff_icon" dur="6s" />
<audio src="music.au" alt="Some Music"
dur="6s" />
</par>
This will display an image and play some music along with it.

JJ
II
J
I
Back
Close
Synchronisation Example 1: Planets Soundtrack
The following SMIL code plays on long soundtrack along with
as series of images.
Essentially:
• The audio file and 172

• image sequences are played in parallel


• The Images are run in sequence with no break (begin =
0s)

JJ
II
J
I
Back
Close
The files are stored on the MACINTOSHES in the Multimedia
Lab (in the SMIL folder) as follows:
• planets.html — call SMIL source (below) with the SOJA
applet. This demo uses zipped (SUN) audio files (.auz)
which are not supported by RealPlayer. 173

• planets.smil — sMIL source (listed below),

JJ
II
J
I
Back
Close
SMIL HEAD DATA
<smil>
<head>
<layout>
<root-layout height="400" width="600"
background-color="#000000"
title="Dreaming out Loud"/> 174
<region id="satfam" width="564" height="400"
top="0" left="0" background-color="#000000"
z-index="2" />
<region id="jupfam" width="349" height="400"
top="0" left="251" background-color="#000000"
z-index="2" />
<region id="redsun" width="400" height="400"
top="0" left="100" background-color="#000000"
z-index="2" />
...........
</layout>
</head>
JJ
II
J
I
Back
Close
SMIL BODY DATA
<body>
<par>
<audio src="media/dreamworldb.auz"
dur="61.90s" begin="3.00s"
system-bitrate="14000" />
<seq>
175
<img src="media/satfam1a.jpg" region="satfam"
begin="1.00s" dur="4.50s" />
<img src="media/jupfam1a.jpg" region="jupfam"
begin="1.50s" dur="4.50s" />
<img src="media/redsun.jpg" region="redsun"
begin="1.00s" dur="4.50s" />
........
<img src="media/orion.jpg" region="orion"
begin="1.00s" dur="4.50s" />
<par>
<img src="media/pillarsb.jpg" region="pillars"
begin="1.00s" end="50s" />
<img src="media/blank.gif" region="blank"
begin="2.00s" end="50.00s" /> JJ
<text src="media/music.txt" region="music" II
begin="3.00s" end="50.00s" />
..........
J
<text src="media/me.txt" region="me" I
begin="20.00s" dur="3.00s" /> Back
Close
<text src="media/jose.txt" region="jose"
begin="23.00s" end="50.00s" />
</par>
<text src="media/title.txt" region="title"
begin="3.00s" end="25.00s" />
</seq>
</par>
</body> 176
</smil>

JJ
II
J
I
Back
Close
Synchronisation Example 2: Slides ’N’ Sound
Dr John Rosbottom of Plymouth University has come up with
a novel way of giving lectures.
This has
• one long sequence of 177

• parallel pairs of images and audio files


The files are stored on the MACINTOSHES in the Multimedia
Lab (in the SMIL folder) as follows:
• slides n sound.smil — sMIL source (listed below), play
with RealPlayer G2. NOTE: This demo uses real audio files
which are not supported by SOJA:
JJ
II
J
I
Back
Close
<smil>
<head>
<layout>
<root-layout height="400" width="600" background-color="#000000"
title="Slides and Sound"/>
</layout>
</head>
<body>
<seq>
<par>
<audio src="audio/leconlec.rm" dur="24s" title="slide 1"/> 178
<img src="slides/img001.GIF" dur="24s"/>
</par>

<par>
<audio src="audio/leconlec.rm" clip-begin="24s" clip-end="51s" dur="27s"
title="slide 2"/>
<img src="slides/img002.GIF" dur="27s"/>
</par>
............
<par>
<audio src="audio/leconlec.rm" clip-begin="610s"
clip-end="634s" dur="24s" title="The Second Reason"/>
<img src="slides/img018.GIF" clip-begin="610s"
clip-end="634s" dur="24s" title="The Second Reason"/>
</par>

<par>
<audio src="audio/leconlec.rm" clip-begin="634s" clip-end="673s" dur="39s"
title="Slide 19"/>
JJ
<img src="slides/img019.GIF" clip-begin="634s" clip-end="673s" dur="39s" II
title="Slide 19"/>
</par> J
<img src="slides/img006.GIF" fill="freeze" title="And finally..." I
author="Abbas Mavani (dis80047@port.ac.uk)"
Back
Close
copyright="Everything is so copyright protected (c)1999"/>

<!-- kept this in to remind me that you can have single things
<audio src="audio/AbbasTest.rm" dur="50.5s"/>
-->
</seq>
</body>
</smil>

179

JJ
II
J
I
Back
Close
SMIL Events Smiles supports event based synchronisation:
begin events
• When a media begins, it sends a begin event.
• If another media waits for this event, it catches it.
180

JJ
II
J
I
Back
Close
To make a media wait to an event,
• one of its synchronisation attributes
• (begin or end) should be written as follows:
<!-- if you want tag to start when 181
another tag begins -->
<tag begin="id(specifiedId)(begin)" />

<!-- if you want tag to start 3s after


another tag begins -->
<tag begin="id(specifiedId)(3s)" />

<!-- if you want tag to start when


JJ
another tag ends -->
II
<tag begin="id(specifiedId)(end)" /> J
I
Back
Close
For example:
<body>
<par>

<img src="cardiff.gif" region="cardiff" 182


id="cf" begin="4s" />

<img src="next.gif" region="next"


begin="id(cf)(2s)" />

</par>
</body>
will make the next.gif image begin 2s after cardiff.gif JJ
begins. II
J
I
Back
Close
The switch Tag
The syntax for the switch tag is:
<switch>
<!-- child1 testAttributes1 -->
<!-- child2 testAttributes2 --> 183
<!-- child3 testAttributes3 -->
</switch>
The rule is:
• The first of the <switch> tag children whose test attributes
are all evaluated to TRUE is executed.
• A tag with no test attributes is evaluated to TRUE.
• See SMIL reference for list of valid test attributes JJ
II
J
I
Back
Close
For example you may wish to provide presentations in english
or welsh:
<body>
<switch>
<!-- English only -->
<par system-language="en"> 184

<img src="cardiff.gif"
region="cardiff"/>
< audio src ="english.au" />
</par>

<!-- Welsh only -->


<par system-language="cy">
<img src="caerdydd.gif"
region="cardiff"/> JJ
<audio src ="cymraeg.au" /> II
</par> J
somewhere in code you will set (or it will be set) the I
system-language Back
Close
Multimedia Systems Technology
185

Multimedia systems have to deal with the


• generation,
• manipulation,
• storage,
• presentation, and
• communication of information JJ
II
J
Lets consider some broad implications of the above I
Back
Close
Discrete v Continuous Media

RECALL: Our Definition of Multimedia

186
• All data must be in the form of digital information.
• The data may be in a variety of formats:
– text,
– graphics,
– images,
– audio,
– video. JJ
II
J
I
Back
Close
Synchronisation
187
A majority of this data is large and the different media may
need synchronisation:

• The data will usually have temporal relationships as an


integral property.

JJ
II
J
I
Back
Close
Static and Continuous Media
188
Static or Discrete Media — Some media is time independent:
Normal data, text, single images, graphics are examples.
Continuous media — Time dependent Media:
Video, animation and audio are examples.

JJ
II
J
I
Back
Close
Analog and Digital Signals
189
• Some basic definitions – Studied HERE
• Overviewing of technology — Studied HERE
• In depth study later.

JJ
II
J
I
Back
Close
Analog and Digital Signal Converters

The world we sense is full of analog signals:


• Electrical sensors convert the medium they sense into
190
electrical signals
– E.g. transducers, thermocouples, microphones.
– (usually)continuous signals
• Analog signals must be converted ordigitised
• Digital: discrete digital signals that computer can readily deal
with.
• Special hardware devices : Analog-to-Digital converters JJ
• Playback – a converse operation: Digital-to-Analog . II
J
I
Back
Close
Multimedia Data: Input and format
191
How to capture and store each Media format?

Note that Text, Graphics and some images are generated


directly by computer and do not require digitising:

they are generated directly in some binary format.

Handwritten text would have to digitised either by electronic


pen sensing of scanning of paper based form. JJ
II
J
I
Back
Close
Text and Static Data
• Source: keyboard, floppies, disks and tapes.
• Stored and input character by character:
– Storage of text is 1 byte per character (text or format
192
character).
– For other forms of data e.g. Spreadsheet files some
formats may store format as text (with formatting) others
may use binary encoding.
• Format: Raw text or formatted text e.g HTML, Rich Text
Format (RTF), Word or a program language source (C, Java,
etc.
• Not temporal — BUT may have natural implied sequence JJ
e.g. HTML format sequence, Sequence of C program II
statements. J
I
• Size Not significant w.r.t. other Multimedia. Back
Close
Graphics
• Format: constructed by the composition of primitive objects
such as lines, polygons, circles, curves and arcs.
• Input: Graphics are usually generated by a graphics editor
193
program (e.g. Freehand) or automatically by a program (e.g.
Postscript).
• Graphics are usually editable or revisable (unlike Images).
• Graphics input devices: keyboard (for text and cursor
control), mouse, trackball or graphics tablet.
• graphics standards : OpenGL, PHIGS, GKS
• Graphics files usually store the primitive assembly
JJ
• Do not take up a very high storage overhead. II
J
I
Back
Close
Images
• Still pictures which (uncompressed) are represented as a
bitmap (a grid of pixels).
• Input: Generated by programs similar to graphics or
194
animation programs.
• Input: scanned for photographs or pictures using a digital
scanner or from a digital camera.
• Analog sources will require digitising.
• Stored at 1 bit per pixel (Black and White), 8 Bits per pixel
(Grey Scale, Colour Map) or 24 Bits per pixel (True Colour)
• Size: a 512x512 Grey scale image takes up 1/4 Mb, a 512x512
24 bit image takes 3/4 Mb with no compression. JJ
II
• This overhead soon increases with image size J
• Compression is commonly applied. I
Back
Close
Audio
• Audio signals are continuous analog signals.
• Input: microphones and then digitised and stored
• usually compressed. 195

• CD Quality Audio requires 16-bit sampling at 44.1 KHz


• 1 Minute of Mono CD quality audio requires 5 Mb.

JJ
II
J
I
Back
Close
Video
• Input: Analog Video is usually captured by a video camera
and then digitised.
• There are a variety of video (analog and digital) formats
196
• Raw video can be regarded as being a series of single images.
There are typically 25, 30 or 50 frames per second.
• a 512x512 size monochrome video images take

25*0.25 = 6.25Mb

for a minute to store uncompressed.


JJ
• Digital video clearly needs to be compressed.
II
J
I
Back
Close
Output Devices
The output devices for a basic multimedia system include
• A High Resolution Colour Monitor
• CD Quality Audio Output 197

• Colour Printer
• Video Output to save Multimedia presentations to (Analog)
Video Tape, CD-ROM DVD.
• Audio Recorder (DAT, DVD, CD-ROM, (Analog) Cassette)
• Storage Medium (Hard Disk, Removable Drives, CD-ROM)

JJ
II
J
I
Back
Close
Storage Media
The major problems that affect storage media are:
• Large volume of date
• Real time delivery 198

• Data format
• Storage Medium
• Retrieval mechanisms

JJ
II
J
I
Back
Close
High performance I/O

There are four factors that influence I/O performance:

199
Data —
• high volume, continuous, contiguous vs distributed storage.
• Direct relationship between size of data and how long it
takes to handle.
• Compression

JJ
II
J
I
Back
Close
Data Storage —
• Depends of the storage hardware and
• The nature of the data.
• The following storage parameters affect how data is stored:
200
– Storage Capacity
– Read and Write Operations of hardware
– Unit of transfer of Read and Write
– Physical organisation of storage units
– Read/Write heads, Cylinders per disk,
Tracks per cylinder, Sectors per Track
– Read time
– Seek time JJ
II
J
I
Back
Close
Data Transfer —
• Depend how data generated and
• written to disk, and
• in what sequence it needs to retrieved.
201
• Writing/Generation of Multimedia data is usually
sequential e.g. streaming digital audio/video direct to disk.
• Individual data (e.g. audio/video file) usually streamed.
• RAID architecture can be employed to accomplish high
I/O rates (parallel disk access)

JJ
II
J
I
Back
Close
Operating System Support —
• Scheduling of processes when I/O is initiated.
• Time critical operations can adopt special procedures.
• Direct disk transfer operations free up CPU/Operating
system space. 202

JJ
II
J
I
Back
Close
Basic Storage

Basic storage units have problems dealing with large multimedia


data
203

• Single Hard Drives — SCSI/IDE Drives.


• AV (Audio-Visual) drives
– avoid thermal recalibration between read/writes,
– suitable for desktop multimedia.
• New drives are fast enough for direct to disk audio and video
capture. JJ
• not adequate for commercial/professional Multimedia. II
J
I
Back
Close
• Removable Media —
– Floppies not adequate
– Jaz/Zip Drives,
– CD-ROM,
204
– DVD-ROM.

JJ
II
J
I
Back
Close
RAID — Redundant Array of Inexpensive Disks
Needed:
• To fulfill the needs of current multimedia and other data
hungry application programs,
205
• Fault tolerance built into the storage device.
• Parallel processing exploits arrangement of hard disks.

Raid technology offers some significant advantages as a


storage medium of multimedia data:
• Affordable alternative to mass storage
• High throughput and reliability
JJ
II
J
I
Back
Close
The key components of a RAID System are:
• Set of disk drives, disk arrays, viewed by user as one or more
logical drives.
• Data may be distributed across drives
206
• Redundancy added in order to allow for disk failure
• Disk arrays:
– store large amounts of data,
– have high I/O rates and
– take less power per megabyte (cf. high end disks)
– but they have very poor reliability
As more devices are added, reliability deteriorates
JJ
1
– N devices generally have N the reliability of a single II
device J
I
Back
Close
Overcoming Reliability Problems

Redundancy — Files stored on arrays may be striped across


multiple disks.
207

Four ways do to this.

JJ
II
J
I
Back
Close
Four ways of Overcoming Reliability Problems
• Mirroring or shadowing of the contents of disk, which can
be a capacity kill approach to the problem.
– write on two disks - a 100% capacity overhead.
208
– Reads to disks may however be optimised.
• Horizontal Hamming Codes: A special means to
reconstruct information using an error correction encoding
technique.
• Parity and Reed-Soloman Codes: Also an error correction
coding mechanism. Parity may be computed in a number of
ways.
• Failure Prediction: There is no capacity overhead in this JJ
technique. II
J
I
Back
Close
RAID Architecture

Each disk within the array needs to have its own I/O controller,
but interaction with a host computer may be mediated through
an array controller 209

Disk
Host Controller
Processor

Host
Adaptor Disk
Controller

Array
Controller

Disk
Controller
Manages the control
logic and parity

Disk
JJ
Controller
II
J
I
Back
Close
Orthogonal RAID
Possible to combine the
disks together to produce a
collection of devices, where
• Each vertical array is now 210

the unit of data


redundancy.
• Such an arrangement
is called an orthogonal
RAID
• Other arrangements of
disks are also possible. JJ
II
J
I
Back
Close
The Eight levels of RAID
There are 8 levels of RAID technology:
• each level providing a greater amount of resilience then the
lower levels:
211

Level 0: Disk Striping


Level 1: Disk Mirroring
Level 2: Bit Interleaving and HEC Parity
Level 3: Bit Interleaving with XOR Parity
Level 4: Block Interleaving with XOR Parity
Level 5: Block Interleaving with Parity Distribution JJ
II
Level 6: Fault Tolerant System J
Level 7: Heterogeneous System I
Back
Close
First Six RAID levels

RAID 0 Simultaneous reads Simultaneous reads RAID 2


RAID 1
on every drive on every drive
Each
write
spans
all
Data drives
Duplication on
drive pairs Each 212
Simultaneous read
Writes on spans
every drive all drives

Simultaneous reads
Read and on every drive
Write
Span
All
drives

Parallel
Access Simultaneous reads
on every drive
Parity
Parity Each drive now also
Every handles parity
Write must update indicated by the filled circle

RAID 3
a dedicated parity drive
RAID 4 RAID 5 JJ
II
J
I
Back
Close
Optical Storage
• The most popular storage medium in the multimedia context
• compact size,
• High density recording,
213
• Easy handling and
• Low cost per MB.
• CD and recently DVD (ROM) the most common
• Laser disc — older format.

JJ
II
J
I
Back
Close
CD Storage
There a now various formats of CD:
• CD-DA (Compact Disc-Digital Audio)
• CD-I (Compact Disc-Interactive)
214
• CD-ROM/XA (eXtended Architecture)
• Photo CD
The capacity of a CD-ROM is
• 620-700 Mbs depending on CD material,
• 650/700 Mb (74/80 Mins) is a typical write once CD-ROM
size.
• Drives that read and write CD-ROMS (CD-RW) also similar. JJ
II
J
I
Back
Close
CD Standards

There are several CD standard for different types of media:


Red Book — Digital Audio: Most Music CDs.
215
Yellow Book — CD-ROM:
Model 1 – computer data,
Model 2 – compressed audio/video data.
Green Book — CD-I
Orange Book — Write once CDs
Blue Book — Laser disc
JJ
II
J
I
Back
Close
DVD

The current best generation of optical disc storage technology


for Multimedia:
216
• DVD — Digital Versatile Disc (formal),
Digital Video Disc (mistaken).
• Larger storage and faster than CD
– over 2 Hours of Video / Single sided DVD-ROM 2.4 Gb
• Formats: DVD-Video and DVD-ROM (DVD-R and DVD-RAM)

JJ
II
J
I
Back
Close
What are the features of DVD-Video?

The main features of DVD include:


• Over 2 hours of high-quality digital video (over 8 on a
217
double-sided, dual-layer disc).
• Support for widescreen movies on standard or widescreen
TVs (4:3 and 16:9 aspect ratios).
• Up to 8 tracks of digital audio (for multiple languages), each
with as many as 8 channels.
• Up to 32 subtitle/karaoke tracks.
• Automatic seamless branching of video (for multiple story
JJ
lines or ratings on one disc). II
• Up to 9 camera angles (different viewpoints can be selected J
during playback). I
Back
Close
Main features of DVD (Cont)

• Menus and simple interactive features (for games, quizzes,


etc.).
218
• Multilingual identifying text for title name, album name, song
name, cast, crew, etc.
• Instant rewind and fast forward, including search to title,
chapter, track, and timecode.
• Durability (no wear from playing, only from physical damage).
• Not susceptible to magnetic fields. Resistant to heat.
• Compact size (easy to handle and store, players can be JJ
portable, replication is cheaper). II
J
I
Back
Close
What are the disadvantages of DVD?

Despite several positive attributes mentioned above there are


some potential disadvantages of DVD:
• It has built-in copy protection and regional lockout. 219

• It uses digital compression. Poorly compressed audio or


video may be blocky, fuzzy, harsh, or vague.
• The audio downmix process for stereo/Dolby Surround can
reduce dynamic range.
• It doesn’t fully support HDTV.
• Some DVD players and drives may not be able to read CD-Rs.
JJ
• Dispute of some DVD-R formats
II
J
I
Back
Close
Comparison of DVD and CD-ROM

The increase in capacity in


DVD-ROM (from CD-ROM)
is due to: 220

• smaller pit length


(∼2.08x),
• tighter tracks (∼2.16x),
• slightly larger data area
(∼1.02x),
• discs single or double
sided JJ
II
J
I
Back
Close
Comparison of DVD and CD-ROM (Cont.)

• another data layer added to each


side creating a potential for four
layers of data per disc
221
• more efficient channel bit
modulation (∼1.06x),
• more efficient error correction
(∼1.32x),
• less sector overhead ( 1.06x).
• capacity of a dual-layer disc is
slightly less than double that of a
single-layer disc.

JJ
II
J
I
Back
Close
Multimedia Data Representation
222
Issues to be covered:

• Digital Audio
• Graphics/Image Formats
• Digital Video (Next Lecture)
• Sampling/Digitisation
• Compression JJ
II
J
I
Back
Close
Digital Audio

Application of Digital Audio — Selected Examples


223
Music Production
• Hard Disk Recording
• Sound Synthesis
• Samplers
• Effects Processing
Video – Audio Important Element: Music and Effects
Web — Many uses on Web JJ
• Spice up Web Pages II
• Listen to Cds J
I
• Listen to Web Radio Back
Close
What is Sound?

Source — Generates Sound


• Air Pressure changes 224

• Electrical — Loud Speaker


• Acoustic — Direct Pressure Variations

Destination — Receives Sound


• Electrical — Microphone produces electric signal
• Ears — Responds to pressure hear sound (more later
(MPEG Audio)) JJ
II
J
I
Back
Close
Digitising Sound

• Microphone produces analog signal


• Computer like discrete entities 225

Need to convert Analog-to-Digital — Specialised Hardware

Also known as Sampling

JJ
II
J
I
Back
Close
Digital Sampling

Sampling basically involves:


• measuring the analog signal at regular discrete intervals
226
• recording the value at these points

JJ
II
J
I
Back
Close
Computer Manipulation of Sound

Writing Digital Signal Processing routines range from being


trivial to highly complex:
227
• Volume
• Cross-Fading
• Looping
• Echo/Reverb/Delay
• Filtering
• Signal Analysis
JJ
II
J
I
Back
Close
Sound Demos

• Volume
• Cross-Fading 228

• Looping
• Echo/Reverb/Delay
• Filtering

JJ
II
J
I
Back
Close
Sample Rates and Bit Size

How do we store each sample value (Quantisation)?


229
8 Bit Value (0-255)
16 Bit Value (Integer) (0-65535)

How many Samples to take?


11.025 KHz — Speech (Telephone 8 KHz)
22.05 KHz — Low Grade Audio
(WWW Audio, AM Radio) JJ
44.1 KHz — CD Quality II
J
I
Back
Close
Nyquist’s Sampling Theorem

Sampling Frequency is Very Important in order to accurately


reproduce a digital version of an Analog Waveform
230

Nyquist’s Theorem:

The Sampling frequency for a signal must be at least twice


the highest frequency component in the signal.

JJ
II
J
I
Back
Close
231

Figure 19: Sampling at Signal Frequency


JJ
II
J
I
Back
Close
232

Figure 20: Sampling at Twice Nyquist Frequency


JJ
II
J
I
Back
Close
233

Figure 21: Sampling at above Nyquist Frequency


JJ
II
J
I
Back
Close
Implications of Sample Rate and Bit Size

Affects Quality of Audio

• Ears do not respond to sound in a linear fashion ((more later 234

(MPEG Audio))
• Decibel (dB) a logarithmic measurement of sound
• 16-Bit has a signal-to-noise ratio of 98 dB — virtually
inaudible
• 8-bit has a signal-to-noise ratio of 50 dB
• Therefore, 8-bit is roughly 8 times as noisy
JJ
– 6 dB increment is twice as loud II
• Click Here to Hear Sound Examples J
I
Back
Close
Implications of Sample Rate and Bit Size (cont)

Affects Size of Data

235
File Type 44.1 KHz 22.05 KHz 11.025 KHz
16 Bit Stereo 10.1 Mb 5.05 Mb 2.52 Mb
16 Bit Mono 5.05 Mb 2.52 Mb 1.26 Mb
8 Bit Mono 2.52 Mb 1.26 Mb 630 Kb

Figure 22: Memory Required for 1 Minute of Digital Audio

JJ
II
J
I
Back
Close
Practical Implications of Nyquist Sampling Theory

• Must (low pass) filter signal before sampling:

236

• Otherwise strange artifacts from high frequency signals appear.

JJ
II
J
I
Back
Close
Why are CD Sample Rates 44.1 KHz?
237

JJ
II
J
I
Back
Close
Why are CD Sample Rates 44.1 KHz?
238

Upper range of human hearing is around JJ


20-22 KHz — Apply Nyquist Theorem II
J
I
Back
Close
Common Audio Formats
• Popular audio file formats include
– .au (Origin: Unix workstations),
– .aiff (MAC, SGI),
239
– .wav (PC, DEC workstations)
• Compression can be utilised in some of the above but is not
Mandatory
• A simple and widely used (by above) audio compression
method is Adaptive Delta Pulse Code Modulation (ADPCM).
– Based on past samples, it
predicts the next sample and JJ
encodes the difference between the actual value and the II
predicted value. J
– More on this later (Audio Compression) I
Back
Close
Common Audio Formats (Cont.)

• Many formats linked to audio applications


• Can use some compression 240

– Sounblaster — .voc (Can use Silence Deletion (More on


this later (Audio Compression))
– Protools/Sound Designer – .sd2
– Realaudio — .ra.
– Ogg Vorbis — .ogg
• MPEG AUDIO — More Later (MPEG-3 and MPEG-4)
JJ
II
J
I
Back
Close
Delivering Audio across a network

• Trade off between desired fidelity and file size


• Bandwidth Considerations for Web and other media. 241

• Compress Files:
– Could affect live transmission on Web

JJ
II
J
I
Back
Close
Streaming Audio

• Buffered Data:
– Trick get data to destination before it’s needed
242
– Temporarily store in memory (Buffer)
– Server keeps feeding the buffer
– Client Application reads buffer
• Needs Reliable Connection, moderately fast too.
• Specialised client, Steaming Audio Protocol (PNM for real
audio).

JJ
II
J
I
Back
Close
Synthetic Sounds — reducing bandwidth?

• Synthesise sounds — hardware or software


• Client produces sound — only send parameters to control sound (MIDI next)
• Many synthesis techniques could be used, For example: 243

– FM (Frequency Modulation) Synthesis – used in low-end Sound Blaster


cards, OPL-4 chip, Yamaha DX Synthesiser range popular in Early 1980’s.
– Wavetable synthesis – wavetable generated from sound waves of real instruments
– Additive synthesis — make up signal from smaller simpler waveforms
– Subtractive synthesis — modify a (complex) waveform but taking out elements
– Physical Modelling — model how acoustic sound in generated in software
• Modern Synthesisers use a mixture of sample and synthesis.

JJ
II
J
I
Back
Close
MIDI
What is MIDI?
244
• No Longer Exclusively the Domain of Musicians.
• Midi provides a very low bandwidth alternative on the Web:
– transmit musical and
– certain sound effects data
• also now used as a compression control language (modified)
– See MPEG-4 Section soon
JJ
II
J
I
Back
Close
MIDI on the Web
245
Very Low Bandwidth (few 100K bytes)

• The responsibility of producing sound is moved to the client:


– Synthesiser Module
– Sample
– Soundcard
– Software Generated
JJ
• Most Web browsers can deal with MIDI. II
J
I
Back
Close
Definition of MIDI:
A protocol that enables computer, synthesizers, keyboards,
246
and other musical device to communicate with each other.

JJ
II
J
I
Back
Close
Components of a MIDI System

Synthesizer:
• It is a sound generator (various pitch, loudness, tone colour).
247
• A good (musician’s) synthesizer often has a microprocessor,
keyboard, control panels, memory, etc.
Sequencer:
• It can be a stand-alone unit or a software program for a
personal computer. (It used to be a storage server for MIDI
data. Nowadays it is more a software music editor on the
computer.
JJ
• It has one or more MIDI INs and MIDI OUTs. II
J
I
Back
Close
Basic MIDI Concepts

Track:
• Track in sequencer is used to organize the recordings.
• Tracks can be turned on or off on recording or playing back. 248

Channel:
• MIDI channels are used to separate information in a MIDI
system.
• There are 16 MIDI channels in one cable.
• Channel numbers are coded into each MIDI message.
Timbre:
JJ
• The quality of the sound, e.g., flute sound, cello sound, etc. II
• Multitimbral – capable of playing many different sounds at J
I
the same time (e.g., piano, brass, drums, etc.)
Back
Close
Basic MIDI Concepts (Cont.)

Pitch:
• The Musical note that the instrument plays
249
Voice:
• Voice is the portion of the synthesizer that produces sound.
• Synthesizers can have many (12, 20, 24, 36, etc.) voices.
• Each voice works independently and simultaneously to produce
sounds of different timbre and pitch.
Patch:
• The control settings that define a particular timbre. JJ
II
J
I
Back
Close
Hardware Aspects of MIDI

MIDI connectors:
– Three 5-pin ports found on the back
of every MIDI unit
• MIDI IN: the connector via 250
which the device receives all MIDI
data.
• MIDI OUT: the connector
through which the device
transmits all the MIDI data it
generates itself.
• MIDI THROUGH: the
connector by which the device
echoes the data receives from
MIDI IN.

JJ
II
J
I
Back
Close
MIDI Messages

MIDI messages are used by MIDI devices to communicate


with each other.
251

MIDI messages are very low bandwidth:


• Note On Command
– Which Key is pressed
– Which MIDI Channel (what sound to play)
– 3 Hexadecimal Numbers
• Note Off Command Similar
JJ
• Other command (program change) configure sounds to be II
played. J
I
Back
Close
Structure of MIDI messages:
• MIDI message includes a status byte and up to two data
bytes.
• Status byte
252
– The most significant bit of status byte is set to 1.
– The 4 low-order bits identify which channel it belongs to
(four bits produce 16 possible channels).
– The 3 remaining bits identify the message.
• The most significant bit of data byte is set to 0.

JJ
II
J
I
Back
Close
Classification of MIDI messages:

-- voice messages 253


--- channel messages ---|
| -- mode messages
|
MIDI messages ----|
| -- common messages
--- system messages ---|-- real-time messages
-- exclusive messages

JJ
II
J
I
Back
Close
Midi Channel messages:
– messages that are transmitted on individual channels rather
that globally to all devices in the MIDI network.

254

Channel voice messages:

• Instruct the receiving instrument to assign particular sounds


to its voice
• Turn notes on and off
JJ
• Alter the sound of the currently active note or notes II
J
I
Back
Close
Midi Channel Control Messages

Voice Message Status Byte Data Byte1 Data Byte2


------------- ----------- ----------------- -----------------
Note off 8x Key number Note Off velocity
Note on 9x Key number Note on velocity
255
Polyphonic Key Pressure Ax Key number Amount of pressure
Control Change Bx Controller number Controller value
Program Change Cx Program number None
Channel Pressure Dx Pressure value None
Pitch Bend Ex MSB LSB

Notes: ‘x’ in status byte hex value stands for a channel


number. JJ
II
J
I
Back
Close
Midi Command Example
A Note On message is followed by two bytes, one to identify
the note, and on to specify the velocity.

To play: 256

• Note number 80 (HEX 50)


• With maximum velocity (127 (Hex 7F)
• On channel 13 (Hex C),

The MIDI device would send these three hexadecimal byte


values:
JJ
II
9C 50 7F J
I
Back
Close
Midi Channel mode messages:
• Channel mode messages are a special case of the Control
Change message (Bx (Hex) or 1011nnnn (Binary)).
• The difference between a Control message and a Channel
257
Mode message, is in the first data byte.
– Data byte values 121 through 127 have been reserved in
the Control Change message for the channel mode
messages.
– Channel mode messages determine how an instrument
will process MIDI voice messages.

JJ
II
J
I
Back
Close
System Messages:

• System messages carry information that are not channel


specific, Examples:
258
– Timing signal for synchronization,
– Positioning information in pre-recorded MIDI sequences,
and
– Detailed setup information for the destination device
– Setting up sounds, Patch Names etc.

JJ
II
J
I
Back
Close
Midi System Real-time Messages

• These messages are related to synchronization/timing etc.


259

System Real-Time Message Status Byte


------------------------ -----------
Timing Clock F8
Start Sequence FA
Continue Sequence FB
Stop Sequence FC
Active Sensing FE
System Reset FF JJ
II
J
I
Back
Close
System common messages

• These contain the following (unrelated) messages


260

System Common Message Status Byte Number of Data Bytes


--------------------- ----------- --------------------
MIDI Timing Code F1 1
Song Position Pointer F2 2
Song Select F3 1
Tune Request F6 None

JJ
II
J
I
Back
Close
Midi System exclusive messages
• Messages related to things that cannot be standardized:
– System dependent creation of sound
– System dependent organisation of sounds
261
(Not General Midi Compliant? (more soon))
• An addition to the original MIDI specification.
• Just a stream of bytes
– all with their high bits set to 0,
– bracketed by a pair of system exclusive start and end
messages:
F0 — Sysex Start JJ
F7 — Sysex End II
– Format of message byte stream system dependent. J
I
Back
Close
General MIDI (GM)

Problem: Midi Music may not sound the same everywhere?

Basic GM Idea: 262

• MIDI + Instrument Patch Map + Percussion Key Map –> a


piece of MIDI music sounds (more or less) the same anywhere
it is played
– Instrument patch map is a standardised list consisting of
128 instruments (patches).
Same instrument type sounds if not identical sound
– Percussion map specifies 47 percussion sounds.
Same Drum type sounds on keyboard map JJ
II
– Key-based percussion is always transmitted on MIDI
J
channel 10 (Default) I
Can be transmitted on other channels as well Back
Close
Requirements for General MIDI Compatibility

• Support all 16 channels — Default standard Multitimbral MIDI


Specification
263
• Each channel can play a different instrument/program —
multitimbral
• Each channel can play many notes — polyphony
• Minimum of 24 (usually much higher 64/128) fully
dynamically allocated voices — shared across all channels

JJ
II
J
I
Back
Close
General MIDI Instrument Patch Map
Prog No. Instrument Prog No. Instrument
-------------------------- -----------------------------------
(1-8 PIANO) (9-16 CHROM PERCUSSION)
1 Acoustic Grand 9 Celesta
2 Bright Acoustic 10 Glockenspiel
3 Electric Grand 11 Music Box
4 Honky-Tonk 12 Vibraphone
5 Electric Piano 1 13 Marimba
6 Electric Piano 2 14 Xylophone 264
7 Harpsichord 15 Tubular Bells
8 Clav 16 Dulcimer

(17-24 ORGAN) (25-32 GUITAR)


17 Drawbar Organ 25 Acoustic Guitar(nylon)
18 Percussive Organ 26 Acoustic Guitar(steel)
19 Rock Organ 27 Electric Guitar(jazz)
20 Church Organ 28 Electric Guitar(clean)
21 Reed Organ 29 Electric Guitar(muted)
22 Accoridan 30 Overdriven Guitar
23 Harmonica 31 Distortion Guitar
24 Tango Accordian 32 Guitar Harmonics

(33-40 BASS) (41-48 STRINGS)


33 Acoustic Bass 41 Violin
34 Electric Bass(finger) 42 Viola
35 Electric Bass(pick) 43 Cello
36 Fretless Bass 44 Contrabass JJ
37 Slap Bass 1 45 Tremolo Strings
38 Slap Bass 2 46 Pizzicato Strings II
39 Synth Bass 1 47 Orchestral Strings
40 Synth Bass 2 48 Timpani J
I
Back
Close
(49-56 ENSEMBLE) (57-64 BRASS)
49 String Ensemble 1 57 Trumpet
50 String Ensemble 2 58 Trombone
51 SynthStrings 1 59 Tuba
52 SynthStrings 2 60 Muted Trumpet
53 Choir Aahs 61 French Horn
54 Voice Oohs 62 Brass Section
55 Synth Voice 63 SynthBrass 1
56 Orchestra Hit 64 SynthBrass 2
(65-72 REED) (73-80 PIPE) 265
65 Soprano Sax 73 Piccolo
66 Alto Sax 74 Flute
67 Tenor Sax 75 Recorder
68 Baritone Sax 76 Pan Flute
69 Oboe 77 Blown Bottle
70 English Horn 78 Skakuhachi
71 Bassoon 79 Whistle
72 Clarinet 80 Ocarina

(81-88 SYNTH LEAD) (89-96 SYNTH PAD)


81 Lead 1 (square) 89 Pad 1 (new age)
82 Lead 2 (sawtooth) 90 Pad 2 (warm)
83 Lead 3 (calliope) 91 Pad 3 (polysynth)
84 Lead 4 (chiff) 92 Pad 4 (choir)
85 Lead 5 (charang) 93 Pad 5 (bowed)
86 Lead 6 (voice) 94 Pad 6 (metallic)
87 Lead 7 (fifths) 95 Pad 7 (halo)
88 Lead 8 (bass+lead) 96 Pad 8 (sweep) JJ
II
J
I
Back
Close
(97-104 SYNTH EFFECTS) (105-112 ETHNIC)
97 FX 1 (rain) 105 Sitar
98 FX 2 (soundtrack) 106 Banjo
99 FX 3 (crystal) 107 Shamisen
100 FX 4 (atmosphere) 108 Koto
101 FX 5 (brightness) 109 Kalimba
102 FX 6 (goblins) 110 Bagpipe
103 FX 7 (echoes) 111 Fiddle
104 FX 8 (sci-fi) 112 Shanai

(113-120 PERCUSSIVE) (121-128 SOUND EFFECTS) 266


113 Tinkle Bell 121 Guitar Fret Noise
114 Agogo 122 Breath Noise
115 Steel Drums 123 Seashore
116 Woodblock 124 Bird Tweet
117 Taiko Drum 125 Telephone Ring
118 Melodic Tom 126 Helicopter
119 Synth Drum 127 Applause
120 Reverse Cymbal 128 Gunshot

JJ
II
J
I
Back
Close
General MIDI Percussion Key Map
MIDI Key Drum Sound MIDI Key Drum Sound
-------- ---------- ---------- ----------

35 Acoustic Bass Drum 59 Ride Cymbal 2


36 Bass Drum 1 60 Hi Bongo
37 Side Stick 61 Low Bongo
38 Acoustic Snare 62 Mute Hi Conga
39 Hand Clap 63 Open Hi Conga
40 Electric Snare 64 Low Conga 267
41 Low Floor Tom 65 High Timbale
42 Closed Hi-Hat 66 Low Timbale
43 High Floor Tom 67 High Agogo
44 Pedal Hi-Hat 68 Low Agogo
45 Low Tom 69 Cabasa
46 Open Hi-Hat 70 Maracas
47 Low-Mid Tom 71 Short Whistle
48 Hi-Mid Tom 72 Long Whistle
49 Crash Cymbal 1 73 Short Guiro
50 High Tom 74 Long Guiro
51 Ride Cymbal 1 75 Claves
52 Chinese Cymbal 76 Hi Wood Block
53 Ride Bell 77 Low Wood Block
54 Tambourine 78 Mute Cuica
55 Splash Cymbal 79 Open Cuica
56 Cowbell 80 Mute Triangle
57 Crash Cymbal 2 81 Open Triangle
58 Vibraslap JJ
II
J
I
Back
Close
Digital Audio and MIDI
• Modern Recording Studio — Hard Disk Recording and MIDI
– Analog Sounds (Live Vocals, Guitar, Sax etc) — DISK
– Keyboards, Drums, Samples, Loops Effects — MIDI
268
• Sound Generators: use a mix of
– Synthesis
– Samples
• Samplers — Digitise (Sample) Sound then
– Playback
– Loop (beats)
– Simulate Musical Instruments JJ
II
J
I
Back
Close
Digital Audio, Synthesis, Midi and Compression —
MPEG 4 Structured Audio
• We have seen the need for compression already in Digital
Audio — Large Data Files
269
• Basic Ideas of compression (next lecture) used as integral
part of audio format — MP3, real audio etc.
• Mpeg-4 audio — actually combines compression synthesis
and midi to have a massive impact on compression.
• Midi, Synthesis encode what note to play and how to play it
with a small number of parameters
— Much greater reduction than simply having some encoded
bits of audio. JJ
• Responsibility to create audio delegated to generation side. II
J
I
Back
Close
MPEG 4 Structured Audio

A newer standard than MPEG3 Audio — which we study in


detail later
270

MPEG-4 covers the the whole range of digital audio:


• From very low bit rate speech
• To full bandwidth high quality audio
• Built in anti-piracy measures
• Structured Audio
• Relation to MIDI so we study MPEG 4 audio here JJ
II
J
I
Back
Close
Structured Audio Tools

MPEG-4 comprises of 6 Structured Audio tools are:


271
SAOL – the Structured Audio Orchestra Language
SASL – the Structured Audio Score Language
SASBF – the Structured Audio Sample Bank Format
Set of MIDI semantics — describe how to control SAOL with
MIDI
Scheduler – describe how to take the above parts and create
sound
JJ
AudioBIFS – part of BIFS, which lets you make audio II
soundtracks in MPEG-4 using a variety of tools and J
effects-processing techniques I
Back
Close
SAOL (Structured Audio Orchestra Language)

• Pronounced “ sail”
• The central part of the Structured Audio toolset. 272

• A new software-synthesis language


• A language for describing synthesizers, a program, or instrument
• Specifically designed it for use in MPEG-4.
• Not based on any particular method of synthesis – supports
many underlying synthesis methods.

JJ
II
J
I
Back
Close
SAOL Synthesis Methods

• Any known method of synthesis can be described in SAOL


(Open Support).
273
– FM synthesis,
– physical-modeling synthesis,
– sampling synthesis,
– granular synthesis,
– subtractive synthesis,
– FOF synthesis, and
– hybrids of all of these in SAOL.
JJ
II
J
I
Back
Close
SASL (Structured Audio Score Language)
• A very simple language to control the synthesizers specified
by SAOL instruments.
• A SASL program, or score, contains instructions that tell
274
SAOL:
– what notes to play,
– how loud to play them,
– what tempo to play them at,
– how long they last, and how to control them (vary them
while they’re playing).
• Similar to MIDI
JJ
– doesn’t suffer from MIDI’s restrictions on temporal resolution II
or bandwidth. J
– more sophisticated controller structure I
Back
Close
SASL (Structured Audio Score Language) (Cont.)

• Lightweight Scoring Language: Does not support:


– looping, 275

– sections,
– repeats,
– expression evaluation,
– some other things.
– most SASL scores will be created by automatic tools

JJ
II
J
I
Back
Close
SASBF (Structured Audio Sample Bank Format)

• A format for efficiently transmitting banks of sound samples


• Used in wavetable, or sampling, synthesis. 276

• Partly compatible with the MIDI Downloaded Sounds (DLS)


format?
• The most active participants in this activity are EMu Systems
(sampler manufacturer) and the MIDI Manufacturers
Association (MMA).

JJ
II
J
I
Back
Close
MPEG-4 MIDI Semantics

SASL can be controlled by


• SASL Scripts
277
• MIDI
• Scores in MPEG-4

Reasons to use MIDI:


• MIDI is today’s most commonly used representation for music
score data,
• Many sophisticated authoring tools (such as sequencers) JJ
work with MIDI. II
J
I
Back
Close
MPEG-4 Midi Control

• MIDI syntax external to MPEG-4 Structured Audio standard


• Use MIDI Manufacturers Association’s standard. 278

• Redefines the some semantics for MPEG-4.


• The new semantics are carefully defined as part of the
MPEG-4 specification.

JJ
II
J
I
Back
Close
MPEG-4 Scheduler

• The main body of the Structured Audio definition.


• A set of carefully defined and somewhat complicated
279
instructions
• Specify how SAOL is used to create sound when it is driven
by MIDI or SASL.

JJ
II
J
I
Back
Close
AudioBIFS

• BIFS is the MPEG-4 Binary Format for Scene Description.


• Describes how the different ”objects” in a structured media 280
scene fit together:
– MPEG-4 consists also of the video clips, sounds,
animations, and other pieces of multimedia
– Each have special formats to describe them.
– Need to put the pieces together
– BIFS lets you describe how to put the pieces together.

JJ
II
J
I
Back
Close
AudioBIFS (Cont.)

• AudioBIFS is designed for specifying the mixing and


post-production of audio scenes as they’re played back.
281
• For example,
– we can specify how the voice-track is mixed with the
background music, and
– that it fades out after 10 seconds and
– this other music comes in and has a nice reverb on it.
• Extended version of VRML: capabilities for
– streaming and JJ
– mixing audio and video data II
J
• Very advanced sound model. I
Back
Close
AudioBIFS (Cont.)

How a simple sound is created from three elementary sound


streams:
282

JJ
Figure 23: AudioBIFS Subgraph II
J
I
Back
Close
Graphic/Image File Formats
Common graphics and image file formats:
• http://www.dcs.ed.ac.uk/home/mxr/gfx/ —
comprehensive listing of various formats. 283

• See Encyclopedia of Graphics File Formats book in library


• Most formats incorporate compression
• Graphics, video and audio compression techniques in next
Chapter.

JJ
II
J
I
Back
Close
Graphic/Image Data Structures
• A digital image consists of many picture elements, termed
pixels.
• The number of pixels determine the quality of the image
284
(resolution).
• Higher resolution always yields better quality.
• A bit-map representation stores the graphic/image data in the
same manner that the computer monitor contents are stored
in video memory.

JJ
II
J
I
Back
Close
Monochrome/Bit-Map Images

285

Figure 24: Sample Monochrome Bit-Map Image


JJ
• Each pixel is stored as a single bit (0 or 1) II
• A 640 x 480 monochrome image requires 37.5 KB of storage. J
I
• Dithering is often used for displaying monochrome images
Back
Close
Gray-scale Images

286

Figure 25: Example of a Gray-scale Bit-map Image


JJ
• Each pixel is usually stored as a byte (value between 0 to 255) II
• A 640 x 480 greyscale image requires over 300 KB of storage. J
I
Back
Close
8-bit Colour Images

287

Figure 26: Example of 8-Bit Colour Image

• One byte for each pixel


• Supports 256 out of the millions s possible, acceptable colour
quality JJ
• Requires Colour Look-Up Tables (LUTs) II
• A 640 x 480 8-bit colour image requires 307.2 KB of storage (the J
same as 8-bit greyscale) I
Back
Close
24-bit Colour Images

288

Figure 27: Example of 24-Bit Colour Image

• Each pixel is represented by three bytes (e.g., RGB)


• Supports 256 x 256 x 256 possible combined colours (16,777,216)
• A 640 x 480 24-bit colour image would require 921.6 KB of
storage JJ
II
• Most 24-bit images are 32-bit images, J
– the extra byte of data for each pixel is used to store an alpha I
value representing special effect information Back
Close
Standard System Independent Formats

GIF (GIF87a, GIF89a)


• Graphics Interchange Format (GIF) devised by the UNISYS
289
Corp. and Compuserve, initially for transmitting graphical
images over phone lines via modems
• Uses the Lempel-Ziv Welch algorithm (a form of Huffman
Coding), modified slightly for image scan line packets (line
grouping of pixels) — Algorithm Soon
• Limited to only 8-bit (256) colour images, suitable for images
with few distinctive colours (e.g., graphics drawing)
• Supports interlacing JJ
II
J
I
Back
Close
JPEG
• A standard for photographic image compression created by
the Joint Photographic Experts Group
• Takes advantage of limitations in the human vision system
to achieve high rates of compression 290

• Lossy compression which allows user to set the desired level


of quality/compression
• Algorithm Soon — Detailed discussions in next chapter on
compression.

JJ
II
J
I
Back
Close
TIFF
• Tagged Image File Format (TIFF), stores many different types
of images (e.g., monochrome, greyscale, 8-bit & 24-bit RGB,
etc.) –> tagged
• Developed by the Aldus Corp. in the 1980’s and later 291

supported by the Microsoft


• TIFF is a lossless format (when not utilizing the new JPEG
tag which allows for JPEG compression)
• It does not provide any major advantages over JPEG and is
not as user-controllable it appears to be declining in popularity

JJ
II
J
I
Back
Close
Postscript/Encapsulated Postscript
• A typesetting language which includes text as well as
vector/structured graphics and bit-mapped images
• Used in several popular graphics programs (Illustrator,
FreeHand) 292

• Does not provide compression, files are often large


• Although Able to link to external compression applications

JJ
II
J
I
Back
Close
System Dependent Formats

Microsoft Windows: BMP


• A system standard graphics file format for Microsoft
293
Windows
• Used in Many PC Graphics programs, Cross-platform support
• It is capable of storing 24-bit bitmap images

JJ
II
J
I
Back
Close
Macintosh: PAINT and PICT
• PAINT was originally used in MacPaint program, initially only
for 1-bit monochrome images.
• PICT format was originally used in MacDraw (a vector based
drawing program) for storing structured graphics 294

• Still an underlying Mac format (although PDF on OS X)

JJ
II
J
I
Back
Close
X-windows: XBM
• Primary graphics format for the X Window system
• Supports 24-bit colour bitmap
• Many public domain graphic editors, e.g., xv
295
• Used in X Windows for storing icons, pixmaps, backdrops,
etc.

JJ
II
J
I
Back
Close
Colour in Image and Video — Basics of Colour
Light and Spectra
• Visible light is an electromagnetic wave in the 400nm - 700
nm range.
296
• Most light we see is not one wavelength, it’s a combination
of many wavelengths (Fig. 28).

JJ
Figure 28: Light Wavelengths II
J
• The profile above is called a spectra. I
Back
Close
The Human Retina
• The eye is basically similar to a camera
• It has a lens to focus light onto the Retina of eye
• Retina full of neurons 297

• Each neuron is either a rod or a cone.


• Rods are not sensitive to colour.

JJ
II
J
I
Back
Close
Cones and Perception
• Cones come in 3 types: red, green and blue. Each responds
differently to various frequencies of light. The following figure
shows the spectral-response functions of the cones and the
luminous-efficiency function of the human eye.
298

JJ
Figure 29: Cones and Luminous-efficiency Function of the Human II
Eye J
I
• The profile above is called a spectra. Back
Close
• The colour signal to the brain comes from the response of
the 3 cones to the spectra being observed.
That is, the signal consists of 3 numbers:

Z
299
R= E(λ)SR (λ)dλ
Z
G= E(λ)SG(λ)dλ
Z
B= E(λ)SB (λ)dλ

where E is the light and S are the sensitivity functions.

JJ
II
J
I
Back
Close
300
Figure 30: Spectra Response

• A colour can be specified as the sum of three colours. So


colours form a 3 dimensional vector space.
• The following figure shows the amounts of three primaries
needed to match all the wavelengths of the visible spectrum.

JJ
II
J
I
Back
Close
301

Figure 31: Wavelengths of the Visible Spectrum


JJ
II
J
I
Back
Close
RGB Colour Space
302

Figure 32: Original Color Image


JJ
II
• Colour Space is made up of Red, Green and Blue intensity J
components I
Back
Close
Red, Green, Blue (RGB) Image Space

303

JJ
II
J
I
Back
Close
CRT Displays
• CRT displays have three phosphors (RGB) which produce a
combination of wavelengths when excited with electrons.

304

• The gamut of colours is all colours that can be reproduced


using the three primaries JJ
• The gamut of a colour monitor is smaller than that of color II
models, E.g. CIE (LAB) Model — see later. J
I
Back
Close
CIE Chromaticity Diagram
Does a set of primaries exist that span the space with only
305
positive coefficients?
• Yes, but no pure colours.
• In 1931, the CIE defined three standard primaries (X, Y, Z) .
The Y primary was intentionally chosen to be identical to the
luminous-efficiency function of human eyes.
• Figure 33 shows the amounts of X, Y, Z needed to exactly
reproduce any visible colour via the formulae:
JJ
II
J
I
Back
Close
306

JJ
Figure 33: Reproducing Visible Colour II
J
I
Back
Close
Z
X= E(λ)x(λ)dλ
Z
Y = E(λ)y(λ)dλ
Z 307
Z= E(λ)z(λ)dλ

• All visible colours are in a horseshoe shaped cone in the


X-Y-Z space. Consider the plane X+Y+Z=1 and project it
onto the X-Y plane, we get the CIE chromaticity diagram as
shown in Fig. 34.
• The edges represent the pure colours (sine waves at the
appropriate frequency)
JJ
• White (a blackbody radiating at 6447 kelvin) is at the dot II
• When added, any two colours (points on the CIE diagram) J
I
produce a point on the line between them. Back
Close
308

Figure 34: CIE Chromaticity Diagram JJ


II
J
I
Back
Close
L*a*b (Lab) Colour Model
• A refined CIE model, named CIE L*a*b in 1976
• Luminance: L Chrominance: a – ranges from green to red,
b – ranges from blue to yellow
309
• Used by Photoshop

JJ
II
J
I
Back
Close
Lab Image Space

310

Original Color Image


JJ
II
J
I
L, A, B Images Back
Close
Colour Image and Video Representations
• Recap: A black and white image is a 2-D array of integers.
• Recap: A colour image is a 2-D array of (R,G,B) integer
triplets. These triplets encode how much the corresponding
311
phosphor should be excited in devices such as a monitor.
• Example is shown:

JJ
II
Beside the RGB representation, YIQ and YUV are the two J
commonly used in video. I
Back
Close
YIQ Colour Model
• YIQ is used in colour TV broadcasting, it is downward compatible
with B/W TV.
• Y (luminance) is the CIE Y primary.
Y = 0.299R + 0.587G + 0.114B 312
• the other two vectors:
I = 0.596R - 0.275G - 0.321B Q = 0.212R - 0.528G + 0.311B
• The YIQ transform:

    
Y 0.299 0.587 0.114 R
 I  =  0.596 −0.275 −0.321   G 
Q 0.212 −0.528 −0.311 B
JJ
• I is red-orange axis, Q is roughly orthogonal to I. II
J
• Eye is most sensitive to Y, next to I, next to Q. In NTSC, 4 MHz is I
allocated to Y, 1.5 MHz to I, 0.6 MHz to Q.
Back
Close
YIQ Colour Space

313

Original Color Image


JJ
II
J
I
Y, I, Q Images Back
Close
YUV (CCIR 601 or YCrCb) Color Model
• Established in 1982 to build digital video standard
• Video is represented by a sequence of fields (odd and even
lines). Two fields make a frame.
314
• Works in PAL (50 fields/sec) or NTSC (60 fields/sec)
• Uses the Y, Cr, Cb colour space (also called YUV)
Y = 0.299R + 0.587G + 0.114B Cr = R - Y Cb = B - Y
• The YCrCb (YUV) Transform:

    
Y 0.299 0.587 0.114 R
 U  =  −0.169 −0.331 0.500   G  JJ
V 0.500 −0.419 −0.081 B II
J
I
Back
Close
YIQ Colour Space

315

Original Color Image


JJ
II
J
I
Y, I, Q Images Back
Close
The CMY Colour Model
• Cyan, Magenta, and Yellow (CMY) are complementary colours
of RGB (Fig. 35). They can be used as Subtractive Primaries.
• CMY model is mostly used in printing devices where the
colour pigments on the paper absorb certain colours (e.g., 316

no red light reflected from cyan ink).

JJ
II
J
Figure 35: The RGB and CMY Cubes I
Back
Close
Conversion between RGB and CMY

E.g., convert White from (1, 1, 1) in RGB to (0, 0, 0) in CMY.

     
C 1 R 317
 M = 1 − G 
Y 1 B
     
R 1 C
 G = 1 − M 
B 1 Y

JJ
II
J
I
Back
Close
CMYK Color Model
• Sometimes, an alternative CMYK model (K stands for Black)
is used in colour printing (e.g., to produce darker black than
simply mixing CMY). where
318

K = min(C, M, Y ),
C = C − K,
M = M − K,
Y = Y − K.

JJ
II
J
I
Back
Close
YIQ Colour Space

319

Original Color Image


JJ
II
C, M, Y, K images J
I
Back
Close
Summary of Colour
• Colour images are encoded as triplets of values.
• Three common systems of encoding in video are RGB, YIQ,
and YCrCb.
320
• Besides the hardware-oriented colour models (i.e., RGB, CMY,
YIQ, YUV), HSB (Hue, Saturation, and Brightness, e.g., used
in Photoshop) and HLS (Hue, Lightness, and Saturation) are
also commonly used.
• YIQ uses properties of the human eye to prioritise information.
Y is the black and white (luminance) image, I and Q are the
colour (chrominance) images. YUV uses similar idea.
• YUV is a standard for digital video that specifies JJ
image size, and decimates the chrominance images (for 4:2:2 II
video) — more soon. J
I
Back
Close
Basics of Video
Types of Colour Video Signals
• Component video – each primary is sent as a separate video
signal.
321
– The primaries can either be RGB or a luminance-chrominance
transformation of them (e.g., YIQ, YUV).
– Best colour reproduction
– Requires more bandwidth and good synchronization of the
three components
• Composite video – colour (chrominance) and luminance signals
are mixed into a single carrier wave. Some interference between
the two signals is inevitable.
• S-Video (Separated video, e.g., in S-VHS) – a compromise JJ
between component analog video and the composite video. It II
uses two lines, one for luminance and another for composite J
chrominance signal. I
Back
Close
Analog Video
The following figures (Fig. 36 and 37) are from A.M. Tekalp,
Digital video processing, Prentice Hall PTR, 1995.

322

JJ
II
Figure 36: Raster Scanning J
I
Back
Close
323

Figure 37: NTSC Signal

JJ
II
J
I
Back
Close
NTSC Video
• 525 scan lines per frame, 30 frames per second (or be exact,
324
29.97 fps, 33.37 msec/frame)
• Aspect ratio 4:3
• Interlaced, each frame is divided into 2 fields, 262.5 lines/field
• 20 lines reserved for control information at the beginning of
each field (Fig. 38)
– So a maximum of 485 lines of visible data
– Laser disc and S-VHS have actual resolution of 4̃20 lines JJ
– Ordinary TV – 3̃20 lines II
J
I
Back
Close
NTSC Video Scan Line
• Each line takes 63.5 microseconds to scan. Horizontal retrace
takes 10 microseconds (with 5 microseconds horizontal synch
pulse embedded), so the active line time is 53.5 microseconds.
325

Figure 38: Digital Video Rasters

JJ
II
J
I
Back
Close
NTSC Video Colour Representation/Compression
• Colour representation:
– NTSC uses YIQ colour model.
– Composite = Y + I cos(Fsc t) + Q sin(Fsc t),
326
where Fsc is the frequency of colour subcarrier
– Basic Compression Idea

Eye is most sensitive to Y, next to I, next to Q.

– This is STILL Analog Compression:


In NTSC,
∗ 4 MHz is allocated to Y,
JJ
∗ 1.5 MHz to I,
II
∗ 0.6 MHz to Q. J
– Similar (easier to work out) Compression (Part of ) in I
digital compression — more soon Back
Close
PAL Video
• 625 scan lines per frame, 25 frames per second
327
(40 msec/frame)
• Aspect ratio 4:3
• Interlaced, each frame is divided into 2 fields, 312.5 lines/field
• Colour representation:
– PAL uses YUV (YCbCr) colour model
– composite =
Y + 0.492 x U sin(Fsc t) + 0.877 x V cos(Fsc t) JJ
– In PAL, 5.5 MHz is allocated to Y, 1.8 MHz each to U and II
V. J
I
Back
Close
Digital Video
• Advantages:
– Direct random access –> good for nonlinear video editing
– No problem for repeated recording
328
– No need for blanking and sync pulse
• Almost all digital video uses component video

JJ
II
J
I
Back
Close
Chroma Subsampling

Chroma subsampling is a method that stores color


information at lower resolution than intensity information.
329
• How to decimate for chrominance?

JJ
II
J
I
Back
Close
What do these numbers mean?
• 4:2:2 –> Horizontally subsampled colour signals by a factor
of 2. Each pixel is two bytes, e.g., (Cb0, Y0)(Cr0, Y1)(Cb2,
Y2)(Cr2, Y3)(Cb4, Y4) ...
330
• 4:1:1 –> Horizontally subsampled by a factor of 4
• 4:2:0 –> Subsampled in both the horizontal and vertical axes
by a factor of 2 between pixels.
• 4:1:1 and 4:2:0 are mostly used in JPEG and MPEG (see
Later).

JJ
II
J
I
Back
Close
Chroma Subsampling in Practice —
Analog/Digital Subsampling
• Simply different frequency sampling of signal in analog
• Digital Subsampling: Perform 2x2 (or 1x2, or 1x4) chroma
331
subsampling:
– break the image into 2x2 (or 1x2, or 1x4)pixel blocks and
– only stores the average color information for each 2x2 (or
1x2, or 1x4) pixel group.

JJ
II
J
I
Back
Close
Digital Chroma Subsampling Errors (1)
This sampling process introduces two kinds of errors:
1. The major problem is that color is typically stored at only half
the horizontal and vertical resolution as the original image.
332
This is not a real problem:
• Recall: The human eye has lower resolving power for
color than for intensity.
• Nearly all digital cameras have lower resolution for color
than for intensity,
so there is no high resolution color information present in
digital camera images.
JJ
II
J
I
Back
Close
Digital Chroma Subsampling Errors (2)

2. The subsampling process demands two conversions of the


image:
• from the original RGB representation to an intensity+color
333
(YIQ/YUV) representation , and
• then back again (YIQ/YUV –> RGB) when the image is
displayed.
• Conversion is done in integer arithmetic — some round-off
error is introduced.
– This is a much smaller effect,
– But (slightly) affects the color of (typically) one or two
percent of the pixels in an image. JJ
II
J
I
Back
Close
CCIR Standards for Digital Video
(CCIR – Consultative Committee for International Radio)

CCIR 601 CCIR 601 CIF QCIF


525/60 625/50
NTSC PAL/SECAM NTSC
-------------------- ----------- ----------- ----------- ----------- 334
Luminance resolution 720 x 485 720 x 576 352 x 240 176 x 120
Chrominance resolut. 360 x 485 360 x 576 176 x 120 88 x 60
Colour Subsampling 4:2:2 4:2:2
Fields/sec 60 50 30 30
Interlacing Yes Yes No No

• CCIR 601 uses interlaced scan, so each field only has half as
much vertical resolution (e.g., 243 lines in NTSC).
The CCIR 601 (NTSC) data rate is 1̃65 Mbps.
• CIF (Common Intermediate Format) is introduced to as an
acceptable temporary standard.
It delivers about the VHS quality. CIF uses progressive JJ
(non-interlaced) scan. II
J
I
Back
Close
ATSC Digital Television Standard
(ATSC – Advanced Television Systems Committee)

The ATSC Digital Television Standard was recommended


to be adopted as the Advanced TV broadcasting standard by
335
the FCC Advisory Committee on Advanced Television Service
on November 28, 1995.

It covers the standard for HDTV (High Definition TV).

JJ
II
J
I
Back
Close
Video Format
The video scanning formats supported by the ATSC Digital
Television Standard are shown in the following table.

Vertical Lines Horizontal Aspect Ratio Picture Rate


Pixels 336
1080 920 16:9 60I 30P 24P
720 1280 16:9 60P 30P 24P
480 704 16:9 and 4:3 60I 60P 30P 24P
480 640 4:3 60I 60P 30P 24P

• The aspect ratio for HDTV is 16:9 as opposed to 4:3 in NTSC,


PAL, and SECAM. (A 33% increase in horizontal dimension.)
• In the picture rate column, the I means interlaced scan, and
the P means progressive (non-interlaced) scan.
JJ
• Both NTSC rates and integer rates are supported (i.e., 60.00, II
59.94, 30.00, 29.97, 24.00, and 23.98). J
I
Back
Close
Compression I:
Basic Compression Algorithms
337
Recap: The Need for Compression
Raw Video, Image and Audio files are very large beasts:

Uncompressed Audio

1 minute of Audio:

Audio Type 44.1 KHz 22.05 KHz 11.025 KHz JJ


16 Bit Stereo 10.1 Mb 5.05 Mb 2.52 Mb II
16 Bit Mono 5.05 Mb 2.52 Mb 1.26 Mb J
8 Bit Mono 2.52 Mb 1.26 Mb 630 Kb I
Back
Close
Uncompressed Images

Image Type File Size 338


512 x 512 Monochrome 0.25 Mb
512 x 512 8-bit colour image 0.25 Mb
512 x 512 24-bit colour image 0.75 Mb

JJ
II
J
I
Back
Close
Video
Can involve: Stream of audio and images

Raw Video – Uncompressed Image Frames, 512x512 True


Color PAL 1125 Mb Per Min 339

DV Video — 200-300 Mb per Min (Approx) Compressed

HDTV — Gigabytes per second.


• Relying on higher bandwidths is not a good option — M25
Syndrome.
• Compression HAS TO BE part of the representation of
audio, image and video formats. JJ
II
J
I
Back
Close
Classifying Compression Algorithms
What is Compression?

Compression basically employs redundancy in the data:


340

• Temporal — in 1D data, 1D signals, Audio etc.


• Spatial — correlation between neighbouring pixels or data
items
• Spectral — correlation between colour or luminescence
components.
This uses the frequency domain to exploit relationships
JJ
between frequency of change in data. II
• Psycho-visual — exploit perceptual properties of the human J
visual system. I
Back
Close
Lossless v Lossy Compression
Compression can be categorised in two broad ways:
Lossless Compression — Entropy Encoding Schemes,
LZW algorithm used in GIF image file format.
341
Lossy Compression — Source Coding Transform Coding,
DCT used in JPEG/MPEG etc.

Lossy methods have to employed for image and video


compression:
• Compression ratio of lossless methods (e.g., Huffman Coding,
JJ
Arithmetic Coding, LZW) is not high enough II
J
I
Back
Close
Lossless Compression Algorithms:
Repetitive Sequence Suppression

• Fairly straight forward to understand and implement.


342
• Simplicity is their downfall: NOT best compression ratios.
• Some methods have their applications, e.g. Component of
JPEG, Silence Suppression.

JJ
II
J
I
Back
Close
Simple Repetition Suppression
If a sequence a series on n successive tokens appears
• Replace series with a token and a count number of
occurrences.
343
• Usually need to have a special flag to denote when the
repeated token appears

For Example

89400000000000000000000000000000000

we can replace with


JJ
II
894f32 J
I
where f is the flag for zero. Back
Close
Simple Repetition Suppression: How Much Compression?

Compression savings depend on the content of the data.


344

Applications of this simple compression technique include:


• Suppression of zero’s in a file (Zero Length Suppression)
– Silence in audio data, Pauses in conversation etc.
– Bitmaps
– Blanks in text or program source files
– Backgrounds in images
JJ
• Other regular image or data tokens II
J
I
Back
Close
Lossless Compression Algorithms:
Run-length Encoding
This encoding method is frequently applied to images
(or pixels in a scan line).
345

It is a small compression component used in


JPEG compression.

In this instance:
• Sequences of image elements X1, X2, . . . , Xn (Row by Row)
• Mapped to pairs (c1, l1), (c2, l2), . . . , (cn, ln)

JJ
where ci represent image intensity or colour and li the length II
of the ith run of pixels J
I
• (Not dissimilar to zero length suppression above).
Back
Close
Run-length Encoding Example

Original Sequence:
111122233333311112222
346

can be encoded as:


(1,4),(2,3),(3,6),(1,4),(2,4)

How Much Compression?

The savings are dependent on the data.

In the worst case (Random Noise) encoding is more heavy JJ


than original file: II
J
I
2*integer rather 1* integer if data is represented as integers. Back
Close
Lossless Compression Algorithms:
Pattern Substitution
This is a simple form of statistical encoding.

Here we substitute a frequently repeating pattern(s) with a 347

code.

The code is shorter than than pattern giving us


compression.

A simple Pattern Substitution scheme could employ predefined


codes JJ
II
J
I
Back
Close
Simple Pattern Substitution Example

For example replace all occurrences of ‘The’ with the


predefined code ’&’.
348

So:
The code is The Key

Becomes:

& code is & Key


JJ
II
Similar for other codes — commonly used words J
I
Back
Close
Token Assignment
More typically tokens are assigned to according to frequency of
occurrence of patterns:
• Count occurrence of tokens
349
• Sort in Descending order
• Assign some symbols to highest count tokens

A predefined symbol table may used i.e. assign code i to


token T . (E.g. Some dictionary of common words/tokens)

However, it is more usual to dynamically assign codes to tokens.


JJ
II
The entropy encoding schemes below basically attempt to J
decide the optimum assignment of codes to achieve the best I
compression. Back
Close
Lossless Compression Algorithms
Entropy Encoding

350
• Lossless Compression frequently involves some form of
entropy encoding
• Based on information theoretic techniques.

JJ
II
J
I
Back
Close
Basics of Information Theory
According to Shannon, the entropy of an information source S
is defined as:
H(S) = η = i pi log2 p1i
P
351
where pi is the probability that symbol Si in S will occur.
• log2 p1i indicates the amount of information contained in Si,
i.e., the number of bits needed to code Si.
• For example, in an image with uniform distribution of gray-level
intensity, i.e. pi = 1/256, then
– The number of bits needed to code each gray level is 8
bits. JJ
– The entropy of this image is 8. II
J
I
Back
Close
The Shannon-Fano Algorithm — Learn by Example
This is a basic information theoretic algorithm.

A simple example will be used to illustrate the algorithm:


352

A finite token Stream:


ABBAAAACDEAAABBBDDEEAAA........

Count symbols in stream:

Symbol A B C D E
----------------------------------
JJ
Count 15 7 6 6 5
II
J
I
Back
Close
Encoding for the Shannon-Fano Algorithm:
• A top-down approach
1. Sort symbols (Tree Sort) according to their
frequencies/probabilities, e.g., ABCDE.
353
2. Recursively divide into two parts, each with approx. same
number of counts.

JJ
II
J
I
Back
Close
3. Assemble code by depth first traversal of tree to symbol
node
Symbol Count log(1/p) Code Subtotal (# of bits)
------ ----- -------- --------- -------------------
A 15 1.38 00 30
B 7 2.48 01 14
C 6 2.70 10 12 354
D 6 2.70 110 18
E 5 2.96 111 15
TOTAL (# of bits): 89

4. Transmit Codes instead of Tokens


• Raw token stream 8 bits per (39 chars) token = 312 bits
• Coded data stream = 89 bits

JJ
II
J
I
Back
Close
Huffman Coding
• Based on the frequency of occurrence of a data item
(pixels or small blocks of pixels in images).
• Use a lower number of bits to encode more frequent data
355
• Codes are stored in a Code Book — as for Shannon (previous
slides)
• Code book constructed for each image or a set of images.
• Code book plus encoded data must be transmitted to enable
decoding.

JJ
II
J
I
Back
Close
Encoding for Huffman Algorithm:
• A bottom-up approach
1. Initialization: Put all nodes in an OPEN list, keep it sorted
at all times (e.g., ABCDE).
356

2. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest
frequencies/probabilities, create a parent node of them.
(b) Assign the sum of the children’s frequencies/probabilities
to the parent node and insert it into OPEN.
(c) Assign code 0, 1 to the two branches of the tree, and
delete the children from OPEN.
JJ
II
J
I
Back
Close
357

Symbol Count log(1/p) Code Subtotal (# of bits)


------ ----- -------- --------- --------------------
A 15 1.38 0 15
B 7 2.48 100 21
C 6 2.70 101 18
D 6 2.70 110 18
E 5 2.96 111 15 JJ
TOTAL (# of bits): 87 II
J
I
Back
Close
The following points are worth noting about the above algorithm:
• Decoding for the above two algorithms is trivial as long as
the coding table/book is sent before the data.
– There is a bit of an overhead for sending this.
358
– But negligible if the data file is big.
• Unique Prefix Property: no code is a prefix to any other
code (all symbols are at the leaf nodes) –> great for decoder,
unambiguous.
• If prior statistics are available and accurate, then Huffman
coding is very good.

JJ
II
J
I
Back
Close
Huffman Entropy

In the above example:

359

Idealentropy = (15 ∗ 1.38 + 7 ∗ 2.48 + 6 ∗ 2.7


+6 ∗ 2.7 + 5 ∗ 2.96)/39
= 85.26/39
= 2.19

Number of bits needed for Huffman Coding is: 87/39 = 2.23

JJ
II
J
I
Back
Close
Huffman Coding of Images
In order to encode images:
• Divide image up into (typically) 8x8 blocks
• Each block is a symbol to be coded 360

• Compute Huffman codes for set of block


• Encode blocks accordingly
• In JPEG: Blocks are DCT coded first before Huffman may be
applied (More soon)

Coding image in blocks is common to all image coding methods


JJ
II
J
I
Back
Close
Adaptive Huffman Coding
Motivations:
(a) The previous algorithms require prior statistical knowledge
• This may not be available
361
• E.g. live audio, video
(b) Even when stats dynamically available,
• Heavy overhead if many tables had to be sent — tables may
change drastically
• A non-order 0 model is used,
• I.e. taking into account the impact of the previous symbol to
the probability of the current symbol can improve efficiency. JJ
II
• E.g., ”qu” often come together, .... J
I
Back
Close
Solution: Use adaptive algorithms
As an example, the Adaptive Huffman Coding is examined
below.

The idea is however applicable to other adaptive compression 362

algorithms.

ENCODER DECODER
------- -------

Initialize_model(); Initialize_model();
while ((c = getc (input)) while ((c = decode (input))
!= eof) != eof)
{ {
encode (c, output); putc (c, output);
update_model (c); update_model (c); JJ
} } II
J
I
Back
Close
• Key: encoder and decoder use same initialization and
update model routines.
• update model does two things:
(a) increment the count,
(b) update the Huffman tree. 363

– During the updates, the Huffman tree will be maintained


– its sibling property, i.e. the nodes (internal and leaf) are
arranged in order of increasing weights.
– When swapping is necessary, the farthest node with weight
W is swapped with the node whose weight has just been
increased to W+1.
– Note: If the node with weight W has a subtree beneath it,
JJ
then the subtree will go with it.
II
– The Huffman tree could look very different after swapping J
I
Back
Close
364

JJ
II
J
I
Back
Close
Arithmetic Coding
• A widely used entropy coder
• Also used in JPEG — more soon
• Only problem is it’s speed due possibly complex computations 365
due to large symbol tables,
• Good compression ratio (better than Huffman coding),
entropy around the Shannon Ideal value.
Why better than Huffman?
• Huffman coding etc. use an integer number (k) of bits for
each symbol,
– hence k is never less than 1. JJ
II
• Sometimes, e.g., when sending a 1-bit image, compression J
becomes impossible. I
Back
Close
Decimal Static Arithmetic Coding

• Here we describe basic approach of Arithmetic Coding


• Initially basic static coding mode of operation. 366

• Initial example decimal coding


• Extend to Binary and then machine word length later

JJ
II
J
I
Back
Close
Basic Idea
The idea behind arithmetic coding is
• To have a probability line, 0–1, and
• Assign to every symbol a range in this line based on its 367
probability,
• The higher the probability, the higher range which assigns to
it.

Once we have defined the ranges and the probability line,


• Start to encode symbols,
• Every symbol defines where the output floating point number JJ
lands within the range. II
J
I
Back
Close
Simple Basic Arithmetic Coding Example

Assume we have the following token symbol stream


368

BACA

Therefore

• A occurs with probability 0.5,


• B and C with probabilities 0.25. JJ
II
J
I
Back
Close
Basic Arithmetic Coding Algorithm

Start by assigning each symbol to the probability range 0–1.

369
• Sort symbols highest probability first

Symbol Range
A [0.0, 0.5)
B [0.5, 0.75)
C [0.75, 1.0)
The first symbol in our example stream is B
JJ
• We now know that the code will be in the range 0.5 to 0.74999 . . .. II
J
I
Back
Close
Range is not yet unique
• Need to narrow down the range to give us a unique code.

Basic arithmetic coding iteration


370
• Subdivide the range for the first token given the probabilities
of the second token then the third etc.

JJ
II
J
I
Back
Close
Subdivide the range as follows
For all the symbols
• Range = high - low
• High = low + range * high range of the symbol being coded 371

• Low = low + range * low range of the symbol being coded


Where:
• Range, keeps track of where the next range should be.
• High and low, specify the output number.
• Initially High = 1.0, Low = 0.0
JJ
II
J
I
Back
Close
Back to our example

The second symbols we have


(now Range = 0.25, Low = 0.5, High = 0.75):
372

Symbol Range
BA [0.5, 0.625)
BB [0.625, 0.6875)
BC [0.6875, 0.75)

JJ
II
J
I
Back
Close
Third Iteration

We now reapply the subdivision of our scale again to get for


our third symbol
(Range = 0.125, Low = 0.5, High = 0.625): 373

Symbol Range
BAA [0.5, 0.5625)
BAB [0.5625, 0.59375)
BAC [0.59375, 0.625)

JJ
II
J
I
Back
Close
Fourth Iteration

Subdivide again
(Range = 0.03125, Low = 0.59375, High = 0.625):
374
Symbol Range
BACA [0.59375, 0.60937)
BACB [0.609375, 0.6171875)
BACC [0.6171875, 0.625)

So the (Unique) output code for BACA is any number in the


range:
JJ
[0.59375, 0.60937). II
J
I
Back
Close
Decoding

To decode is essentially the opposite


• We compile the table for the sequence given probabilities.
375
• Find the range of number within which the code number lies
and carry on

JJ
II
J
I
Back
Close
Binary static algorithmic coding
This is very similar to above:
• except we us binary fractions.
Binary fractions are simply an extension of the binary systems 376
into fractions much like decimal fractions.

JJ
II
J
I
Back
Close
Binary Fractions — Quick Guide
Fractions in decimal:

0.1 decimal = 1011 = 1/10


0.01 decimal = 1012 = 1/100 377

0.11 decimal = 1011 + 1012 = 11/100

So in binary we get

0.1 binary = 211 = 1/2 decimal


0.01 binary = 212 = 1/4 decimal
0.11 binary = 211 + 212 = 3/4 decimal
JJ
II
J
I
Back
Close
Binary Arithmetic Coding Example
• Idea: Suppose alphabet was X, Y and token stream:

XXY

Therefore: 378

prob(X) = 2/3
prob(Y) = 1/3

• If we are only concerned with encoding length 2 messages,


then we can map all possible messages to intervals in the
range [0..1]:

JJ
II
J
I
Back
Close
• To encode message, just send enough bits of a binary fraction
that uniquely specifies the interval.

379

JJ
II
J
I
Back
Close
• Similarly, we can map all possible length 3 messages to
intervals in the range [0..1]:

380

JJ
II
J
I
Back
Close
Implementation Issues
FPU Precision
• Resolution of the of the number we represent is limited by
FPU precision
381
• Binary coding extreme example of rounding
• Decimal coding is the other extreme — theoretically no
rounding.
• Some FPUs may us up to 80 bits
• As an example let us consider working with 16 bit resolution.

JJ
II
J
I
Back
Close
16-bit arithmetic coding

We now encode the range 0–1 into 65535 segments:

0.000 0.250 0.500 0,750 1.000 382


0000h 4000h 8000h C000h FFFFh

If we take a number and divide it by the maximum (FFFFh) we


will clearly see this:
0000h: 0/65535 = 0.0
4000h: 16384/65535 = 0.25
8000h: 32768/65535 = 0.5
C000h: 49152/65535 = 0.75
JJ
FFFFh: 65535/65535 = 1.0 II
J
I
Back
Close
The operation of coding is similar to what we have seen with
the binary coding:
• Adjust the probabilities so the bits needed for operating with
the number aren’t above 16 bits.
383
• Define a new interval
• The way to deal with the infinite number is
– to have only loaded the 16 first bits, and when needed
shift more onto it:
1100 0110 0001 000 0011 0100 0100 ...
– work only with those bytes
– as new bits are needed they’ll be shifted.
JJ
II
J
I
Back
Close
Memory Problems
What about an alphabet with 26 symbols, or 256 symbols, ...?
• In general, number of bits is determined by the size of the
interval.
384
• In general, (from entropy) need − log p bits to represent interval
of size p.
• Can be memory and CPU intensive

JJ
II
J
I
Back
Close
Estimating Probabilities - Dynamic Arithmetic Coding?
How to determine probabilities?
• If we have a static stream we simply count the tokens.
Could use a priori information for static or dynamic if scenario 385
familiar.

But for Dynamic Data?

• Simple idea is to use adaptive model:


– Start with guess of symbol frequencies — or all equal
probabilities
JJ
– Update frequency with each new symbol. II
• Another idea is to take account of inter-symbol probabilities, J
e.g., Prediction by Partial Matching. I
Back
Close
Lempel-Ziv-Welch (LZW) Algorithm
• A very common compression technique.
• Used in GIF files (LZW), Adobe PDF file (LZW), UNIX compress
(LZ Only)
386
• Patented — LZW not LZ.

Basic idea/Example by Analogy:


Suppose we want to encode the Oxford Concise English
dictionary which contains about 159,000 entries.

Why not just transmit each word as an 18 bit number? JJ


II
J
I
Back
Close
Problems:
• Too many bits,
• Everyone needs a dictionary,
• Only works for English text. 387

Solution:
• Find a way to build the dictionary adaptively.
• Original methods (LZ) due to Lempel and Ziv in 1977/8.
• Terry Welch improved the scheme in 1984,
Patented LZW Algorithm
JJ
II
J
I
Back
Close
LZW Compression Algorithm
The LZW Compression Algorithm can summarised as follows:
w = NIL;
while ( read a character k )
{ 388

if wk exists in the dictionary


w = wk;
else
{ add wk to the dictionary;
output the code for w;
w = k;
}
} JJ
II
J
• Original LZW used dictionary with 4K entries, first 256 (0-255) I
are ASCII codes. Back
Close
Example:
Input string is "ˆWEDˆWEˆWEEˆWEBˆWET".

w k output index symbol


-----------------------------------------
NIL ˆ
ˆ W ˆ 256 ˆW • A 19-symbol input 389
W E W 257 WE has been reduced
E D E 258 ED to 7-symbol plus
D ˆ D 259 Dˆ 5-code output. Each
ˆ W code/symbol will
ˆW E 256 260 ˆWE need more than 8
E ˆ E 261 Eˆ bits, say 9 bits.
ˆ W
ˆW E • Usually,
ˆWE E 260 262 ˆWEE compression
E ˆ doesn’t start until
Eˆ W 261 263 EˆW a large number of
W E bytes (e.g., > 100)
WE B 257 264 WEB are read in. JJ
B ˆ B 265 Bˆ
ˆ W II
ˆW E J
ˆWE T 260 266 ˆWET
T EOF T I
Back
Close
LZW Decompression Algorithm
The LZW Decompression Algorithm is as follows:
read a character k;
output k;
390
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
} JJ
II
J
I
Back
Close
Example (continued):
Input string is
"ˆWED<256>E<260><261><257>B<260>T"
w k output index symbol
---------------------------------------- 391
ˆ ˆ
ˆ W W 256 ˆW
W E E 257 WE
E D D 258 ED
D <256> ˆW 259 Dˆ
<256> E E 260 ˆWE
E <260> ˆWE 261 Eˆ
<260> <261> Eˆ 262 ˆWEE
<261> <257> WE 263 EˆW JJ
II
<257> B B 264 WEB
J
B <260> ˆWE 265 Bˆ I
<260> T T 266 ˆWET Back
Close
Problems?
• What if we run out of dictionary space?
– Solution 1: Keep track of unused entries and use LRU
– Solution 2: Monitor compression performance and flush
392
dictionary when performance is poor.
• Implementation Note: LZW can be made really fast;
– it grabs a fixed number of bits from input stream,
– so bit parsing is very easy.
– Table lookup is automatic.

JJ
II
J
I
Back
Close
Entropy Encoding Summary
• Huffman maps fixed length symbols to variable length codes.
Optimal only when symbol probabilities are powers of 2.
• Arithmetic maps entire message to real number range based
393
on statistics. Theoretically optimal for long messages, but
optimality depends on data model. Also can be CPU/memory
intensive.
• Lempel-Ziv-Welch is a dictionary-based compression method.
It maps a variable number of symbols to a fixed length code.
• Adaptive algorithms do not need a priori estimation of
probabilities, they are more useful in real applications.
JJ
II
J
I
Back
Close
Lossy Compression: Source Coding Techniques
Source coding is based changing on the content of the original
signal.

Also called semantic-based coding 394

High compression rates may be high but a price of loss of


information. Good compression rates make be achieved with
source encoding with (occasionally) lossless or (mostly) little
perceivable loss of information.

There are three broad methods that exist:


• Transform Coding JJ
II
• Differential Encoding J
I
• Vector Quantisation
Back
Close
Transform Coding
A simple transform coding example

A Simple Transform Encoding procedure maybe described by


the following steps for a 2x2 block of monochrome pixels: 395

1. Take top left pixel as the base value for the block, pixel A.
2. Calculate three other transformed values by taking the
difference between these (respective) pixels and pixel A,
Ii.e. B-A, C-A, D-A.
3. Store the base pixel and the differences as the values of the
transform.
JJ
II
J
I
Back
Close
Simple Transforms
Given the above we can easily form the forward transform:

X0 = A
396
X1 = B−A
X2 = C −A
X3 = D−A

and the inverse transform is:

An = X0
Bn = X1 + X0 JJ
Cn = X2 + X0 II
J
Dn = X3 + X0 I
Back
Close
Compressing data with this Transform?
Exploit redundancy in the data:
• Redundancy transformed to values, Xi.
• Compress the data by using fewer bits to represent the 397
differences.
– I.e if we use 8 bits per pixel then the 2x2 block uses 32
bits
– If we keep 8 bits for the base pixel, X0,
– Assign 4 bits for each difference then we only use 20 bits.
– Better than an average 5 bits/pixel

JJ
II
J
I
Back
Close
Example
Consider the following 4x4 image block:

120 130
125 120 398

then we get:

X0 = 120
X1 = 10
X2 = 5
X3 = 0
JJ
II
We can then compress these values by taking less bits to J
represent the data. I
Back
Close
Inadequacies of Simple Scheme
• It is Too Simple
• Needs to operate on larger blocks (typically 8x8 min)
• Simple encoding of differences for large values will result in 399
loss of information
– V. poor losses possible here 4 bits per pixel = values 0-15
unsigned,
– Signed value range: −7 – 7 so either quantise in multiples
of 255/max value or massive overflow!!
• More advance transform encoding techniques are very
common – DCT
JJ
II
J
I
Back
Close
Frequency Domain Methods

Frequency domains can be obtained through the


transformation from one (Time or Spatial) domain to the other
(Frequency) via 400

• Discrete Cosine Transform (DCT)— Heart of JPEG and


MPEG Video, (alt.) MPEG Audio.
• Fourier Transform (FT) — MPEG Audio

JJ
II
J
I
Back
Close
1D Example
Lets consider a 1D (e.g. Audio) example to see what the different
domains mean:

Consider a complicated sound such as the noise of a car 401

horn. We can describe this sound in two related ways:


• Sample the amplitude of the sound many times a second,
which gives an approximation to the sound as a function of
time.
• Analyse the sound in terms of the pitches of the notes, or
frequencies, which make the sound up, recording the
amplitude of each frequency.
JJ
II
J
I
Back
Close
An 8 Hz Sine Wave
In the example (next slide):
• A signal that consists of a sinusoidal wave at 8 Hz.
• 8 Hz means that wave is completing 8 cycles in 1 second 402

• The frequency of that wave (8 Hz).


• From the frequency domain we can see that the composition
of our signal is
– one wave (one peak) occurring with a frequency of 8 Hz
– with a magnitude/fraction of 1.0 i.e. it is the whole signal.

JJ
II
J
I
Back
Close
An 8 Hz Sine Wave (Cont.)

403

JJ
II
J
I
Back
Close
2D Image Example
Now images are no more complex really:
• Brightness along a line can be recorded as a set of values
measured at equally spaced distances apart,
404
• Or equivalently, at a set of spatial frequency values.
• Each of these frequency values is a frequency component.
• An image is a 2D array of pixel measurements.
• We form a 2D grid of spatial frequencies.
• A given frequency component now specifies what contribution
is made by data which is changing with specified x and y
direction spatial frequencies. JJ
II
J
I
Back
Close
What do frequencies mean in an image?
• Large values at high frequency components then the data
is changing rapidly on a short distance scale.
e.g. a page of text
405
• Large low frequency components then the large scale features
of the picture are more important.
e.g. a single fairly simple object which occupies most of the
image.

JJ
II
J
I
Back
Close
So How Compress (colour) images?
• The 2D matrix of the frequency content is with regard to
colour/chrominance:
• This shows if values are changing rapidly or slowly.
406
• Where the fraction, or value in the frequency matrix is low,
the colour is changing gradually.
• Human eye is insensitive to gradual changes in colour and
sensitive to intensity.
• Ignore gradual changes in colour SO
• Basic Idea: Attempt to throw away data without the human
eye noticing, we hope.
JJ
II
J
I
Back
Close
How can the Frequency Domain Transforms Help to Compress?
Any function (signal) can be decomposed into purely sinusoidal
components (sine waves of different size/shape) which when
added together make up our original signal.
407

JJ
II
J
Figure 39: DFT of a Square Wave I
Back
Close
Thus Transforming a signal into the frequency domain allows
us
• To see what sine waves make up our underlying signal
• E.g.
408
– One part sinusoidal wave at 50 Hz and
– Second part sinusoidal wave at 200 Hz.
More complex signals will give more complex graphs but the
idea is exactly the same. The graph of the frequency domain is
called the frequency spectrum.

JJ
II
J
I
Back
Close
Visualising this: Think Graphic Equaliser

An easy way to visualise what is happening is to think of a


graphic equaliser on a stereo.
409

Figure 40: A Graphic Equaliser

JJ
II
J
I
Back
Close
Fourier Theory

The tool which converts a spatial (real space) description of


an image into one in terms of its frequency components is called
the Fourier transform 410

The new version is usually referred to as the Fourier space


description of the image.

The corresponding inverse transformation which turns a Fourier


space description back into a real space one is called the
inverse Fourier transform.
JJ
II
J
I
Back
Close
1D Case
Considering a continuous function f (x) of a single variable x
representing distance.

The Fourier transform of that function is denoted F (u), where 411

u represents spatial frequency is defined by


Z ∞
F (u) = f (x)e−2πixu dx. (1)
−∞

Note: In general F (u) will be a complex quantity even though


the original data is purely real.

The meaning of this is that not only is the magnitude of each JJ


II
frequency present important, but that its phase relationship is J
too. I
Back
Close
Inverse 1D Fourier Transform
The inverse Fourier transform for regenerating f (x) from F (u) is
given by

412
Z ∞
f (x) = F (u)e2πixu du, (2)
−∞

which is rather similar, except that the exponential term has


the opposite sign.

JJ
II
J
I
Back
Close
Example Fourier Transform
Let’s see how we compute a Fourier Transform: consider a
particular function f (x) defined as

1 if |x| ≤ 1
f (x) = (3) 413
0 otherwise,

JJ
II
J
Figure 41: A top hat function I
Back
Close
So its Fourier transform is:
Z ∞
F (u) = f (x)e−2πixu dx
Z−∞1
= 1 × e−2πixu dx
−1 414
−1 2πiu
= (e − e−2πiu)
2πiu
sin 2πu
= . (4)
πu
In this case F (u) is purely real, which is a consequence of the
original data being symmetric in x and −x.

A graph of F (u) is shown overleaf.


JJ
II
This function is often referred to as the Sinc function. J
I
Back
Close
The Sync Function

415

JJ
Figure 42: Fourier transform of a top hat function II
J
I
Back
Close
2D Case
If f (x, y) is a function, for example the brightness in an image,
its Fourier transform is given by
Z ∞Z ∞
F (u, v) = f (x, y)e−2πi(xu+yv) dx dy, (5) 416
−∞ −∞

and the inverse transform, as might be expected, is

Z ∞ Z ∞
f (x, y) = F (u, v)e2πi(xu+yv) du dv. (6)
−∞ −∞

JJ
II
J
I
Back
Close
Images are digitised !!
Thus, we need a discrete formulation of the Fourier transform,
which takes such regularly spaced data values, and returns the
value of the Fourier transform for a set of values in frequency
space which are equally spaced. 417

This is done quite naturally by replacing the integral by a


summation, to give the discrete Fourier transform or DFT for
short.

In 1D it is convenient now to assume that x goes up in steps JJ


of 1, and that there are N samples, at values of x from 0 to N −1. II
J
I
Back
Close
1D Discrete Fourier transform
So the DFT takes the form
N −1
1 X
F (u) = f (x)e−2πixu/N , (7)
N x=0
418

while the inverse DFT is


N
X −1
f (x) = F (u)e2πixu/N . (8)
x=0

NOTE: Minor changes from the continuous case are a factor


of 1/N in the exponential terms, and also the factor 1/N in front JJ
of the forward transform which does not appear in the inverse II
transform. J
I
Back
Close
2D Discrete Fourier transform
The 2D DFT works is similar. So for an N × M grid in x and y
we have
N −1 M −1
1 XX
F (u, v) = f (x, y)e−2πi(xu/N +yv/M ), (9) 419
N M x=0 y=0

and
N
X −1 M
X −1
f (x, y) = F (u, v)e2πi(xu/N +yv/M ). (10)
u=0 v=0

JJ
II
J
I
Back
Close
Balancing the 2D DFT
Often N = M , and it is then it is more convenient to redefine
F (u, v) by multiplying it by a factor of N , so that the forward and
inverse transforms are more symmetrical:
N −1 N −1 420
1 XX
F (u, v) = f (x, y)e−2πi(xu+yv)/N , (11)
N x=0 y=0

and
N −1 N −1
1 XX
f (x, y) = F (u, v)e2πi(xu+yv)/N . (12)
N u=0 v=0

JJ
II
J
I
Back
Close
Compression
How do we achieve compression?
• Low pass filter — ignore high frequency noise components
• Only store lower frequency components 421

• High Pass Filter — Spot Gradual Changes


• If changes to low Eye does not respond so ignore?

Where do put threshold to cut off?

JJ
II
J
I
Back
Close
Relationship between DCT and FFT
DCT (Discrete Cosine Transform) is actually a cut-down version
of the FFT:
• Only the real part of FFT
422
• Computationally simpler than FFT
• DCT — Effective for Multimedia Compression
• DCT MUCH more commonly used in Multimedia.

JJ
II
J
I
Back
Close
The Discrete Cosine Transform (DCT)
• Similar to the discrete Fourier transform:
– it transforms a signal or image from the spatial domain to
the frequency domain
423
– DCT can approximate lines well with fewer coefficients

Figure 43: DCT Encoding

• Helps separate the image into parts (or spectral sub-bands) JJ


II
of differing importance (with respect to the image’s visual
J
quality). I
Back
Close
1D DCT
For N data items 1D DCT is defined by:
  12 N −1
2 X h π.u i
F (u) = Λ(i).cos (2i + 1) f (i)
N i=0
2.N 424

and the corresponding inverse 1D DCT transform is simple


F −1(u), i.e.:

f (i) = F −1(u)
  12 N −1
2 X h π.u i
= Λ(i).cos (2i + 1) F (i)
N i=0
2.N
where JJ
( II
√1 forξ = 0 J
Λ(i) = 2
1 otherwise I
Back
Close
2D DCT
For a 2D N by M image 2D DCT is defined :
  1   1 N −1 M −1
2 2 2 2 X X
F (u, v) = Λ(i).Λ(j).
N M
i=0 j=0
h π.u i h π.v i 425
cos (2i + 1) cos (2j + 1) .f (i, j)
2.N 2.M

and the corresponding inverse 2D DCT transform is simple F −1 (u, v),


i.e.:

f (i) = F −1 (u, v)
  21 NX −1 M −1
2 X
= Λ(i)..Λ(j).
N
i=0 j=0
h π.u i h π.v i
cos (2i + 1) .cos (2j + 1) .F (i, j)
2.N 2.M JJ
where II
( J
√1 forξ = 0
Λ(ξ) = 2 I
1 otherwise
Back
Close
Performing DCT Computations
The basic operation of the DCT is as follows:
• The input image is N by M;
• f(i,j) is the intensity of the pixel in row i and column j; 426

• F(u,v) is the DCT coefficient in row k1 and column k2 of the


DCT matrix.
• The DCT input is an 8 by 8 array of integers.
This array contains each image window’s gray scale pixel
levels;
• 8 bit pixels have levels from 0 to 255.
JJ
II
J
I
Back
Close
Compression with DCT for Compression
• For most images, much of the signal energy lies at low
frequencies;
– These appear in the upper left corner of the DCT.
427
• Compression is achieved since the lower right values
represent higher frequencies, and are often small
– Small enough to be neglected with little visible distortion.

JJ
II
J
I
Back
Close
Computational Issues (1)
• Image is partitioned into 8 x 8 regions — The DCT input is
an 8 x 8 array of integers.
• An 8 point DCT would be:
428

1X h π.u i
F (u, v) = Λ(i).Λ(j).cos (2i + 1) .
4 i,j 16
h π.u i
cos (2i + 1) f (i, j)
16
where

√1

forξ = 0 JJ
Λ(ξ) = 2
1 otherwise II
• The output array of DCT coefficients contains integers; these can J
range from -1024 to 1023. I
Back
Close
Computational Issues (2)
• Computationally easier to implement and more efficient to
regard the DCT as a set of basis functions
– Given a known input array size (8 x 8) can be precomputed
429
and stored.
– Computing values for a convolution mask (8 x 8 window)
that get applied
∗ Sum values x pixel the window overlap with image apply
window across all rows/columns of image
– The values as simply calculated from DCT formula.

JJ
II
J
I
Back
Close
Computational Issues (3)
Visualisation of DCT basis functions

430

JJ
II
Figure 44: The 64 (8 x 8) DCT basis functions J
I
Back
Close
Computational Issues (4)

• Factoring reduces problem to a series of 1D DCTs


(No need to apply 2D form directly):
431
– apply 1D DCT (Vertically) to Columns
– apply 1D DCT (Horizontally) to resultant
Vertical DCT above.
– or alternatively Horizontal to Vertical.

JJ
II
J
Figure 45: 2x1D Factored 2D DCT Computation I
Back
Close
Computational Issues (5)

• The equations are given by:


432
1X h π.v i
G(i, v) = Λ(u).cos (2j + 1) f (i, j)
2 i 16
1X h π.u i
F (u, v) = Λ(u).cos (2i + 1) G(i, v)
2 i 16

• Most software implementations use fixed point arithmetic.


Some fast implementations approximate coefficients so all
multiplies are shifts and adds. JJ
II
J
I
Back
Close
Differential Encoding
Simple example of transform coding mentioned earlier and
instance of this approach.
Here:
433
• The difference between the actual value of a sample and a
prediction of that values is encoded.
• Also known as predictive encoding.
• Example of technique include: differential pulse code
modulation, delta modulation and adaptive pulse code
modulation — differ in prediction part.
• Suitable where successive signal samples do not differ much,
but are not zero. E.g. Video — difference between frames, JJ
some audio signals. II
J
I
Back
Close
Differential Encoding Methods
• Differential pulse code modulation (DPCM)

Simple prediction (also used in JPEG):


434

fpredict(ti) = factual (ti−1)


I.e. a simple Markov model where current value is the predict
next value.
So we simply need to encode:

∆f (ti) = factual (ti) − factual (ti−1)


If successive sample are close to each other we only need JJ
II
to encode first sample with a large number of bits:
J
I
Back
Close
Simple Differential Pulse Code Modulation Example

Actual Data: 9 10 7 6

435
Predicted Data: 0 9 10 7

∆f (t): +9, +1, -3, -1.

JJ
II
J
I
Back
Close
Differential Encoding Methods (Cont.)
• Delta modulation is a special case of DPCM:
– Same predictor function,
– Coding error is a single bit or digit that indicates the
436
current sample should be increased or decreased by a
step.
– Not Suitable for rapidly changing signals.
• Adaptive pulse code modulation

Fuller Temporal/Markov model:


– Data is extracted from a function of a series of previous
JJ
values
II
– E.g. Average of last n samples. J
– Characteristics of sample better preserved. I
Back
Close
Vector Quantisation
The basic outline of this approach is:
• Data stream divided into (1D or 2D square) blocks — vectors
• A table or code book is used to find a pattern for each block. 437

• Code book can be dynamically constructed or predefined.


• Each pattern for block encoded as a look value in table
• Compression achieved as data is effectively subsampled and
coded at this level.

JJ
II
J
I
Back
Close
Compression II: Images (JPEG)
What is JPEG?
438
• JPEG: Joint Photographic Expert Group — an international
standard in 1992.
• Works with colour and greyscale images
• Up 24 bit colour images (Unlike GIF)
• Target Photographic Quality Images (Unlike GIF)
• Suitable Many applications e.g., satellite, medical, general
photography... JJ
II
J
I
Back
Close
Basic JPEG Compression Pipeline
JPEG compression involves the following:
• Encoding

439

JJ
II
Figure 46: JPEG Encoding J
I
• Decoding – Reverse the order for encoding Back
Close
Major Coding Algorithms in JPEG
The Major Steps in JPEG Coding involve:
• Colour Space Transform and subsampling (YIQ)
• DCT (Discrete Cosine Transformation) 440

• Quantization
• Zigzag Scan
• DPCM on DC component
• RLE on AC Components
• Entropy Coding — Huffman or Arithmetic

JJ
We have met most of the algorithms already: II
• JPEG exploits them in the compression pipeline to achieve J
maximal overall compression. I
Back
Close
Quantization
Why do we need to quantise:
• To throw out bits from DCT.
• Example: 101101 = 45 (6 bits). 441
Truncate to 4 bits: 1011 = 11.
Truncate to 3 bits: 101 = 5.
• Quantization error is the main source of Lossy Compression.
• DCT itself not Lossy
• How we throw away bits in Quantization Step is Lossy

JJ
II
J
I
Back
Close
Uniform quantization

• Divide by constant N and round result


(N = 4 or 8 in examples above).
442
• Non powers-of-two gives fine control
(e.g., N = 6 loses 2.5 bits)

JJ
II
J
I
Back
Close
Quantization Tables
• In JPEG, each F[u,v] is divided by a constant q(u,v).
• Table of q(u,v) is called quantization table.
• Eye is most sensitive to low frequencies (upper left corner), 443
less sensitive to high frequencies (lower right corner)
• Standard defines 2 default quantization tables, one for
luminance (below), one for chrominance.
----------------------------------
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92 JJ
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
II
---------------------------------- J
I
Back
Close
Quantization Tables (Cont)

• Q: How would changing the numbers affect the picture

444
E.g., if we doubled them all?

Quality factor in most implementations is the scaling factor for


default quantization tables.
• Custom quantization tables can be put in image/scan header.

JPEG Quantisation Examples JJ


II
• JPEG Quantisation Example (Java Applet) J
I
Back
Close
Zig-zag Scan
What is the purpose of the Zig-zag Scan:
• to group low frequency coefficients in top of vector.
• Maps 8 x 8 to a 1 x 64 vector 445

JJ
II
J
I
Back
Close
Differential Pulse Code Modulation (DPCM) on DC component

• Another encoding method is employed


• DPCM on the DC component at least. 446

• Why is this strategy adopted:


– DC component is large and varied, but often close to
previous value (like lossless JPEG).
– Encode the difference from previous 8x8 blocks – DPCM

JJ
II
J
I
Back
Close
Run Length Encode (RLE) on AC components
Yet another simple compression technique is applied to the AC
component:
• 1x64 vector has lots of zeros in it
447
• Encode as (skip, value) pairs, where skip is the number of
zeros and value is the next non-zero component.
• Send (0,0) as end-of-block sentinel value.

JJ
II
J
I
Back
Close
Entropy Coding
DC and AC components finally need to be represented by a
smaller number of bits:
• Categorize DC values into SSS (number of bits needed to
448
represent) and actual bits.
--------------------
Value SSS
0 0
-1,1 1
-3,-2,2,3 2
-7..-4,4..7 3
--------------------

• Example: if DC value is 4, 3 bits are needed.


Send off SSS as Huffman symbol, followed by actual 3 bits. JJ
• For AC components (skip, value), encode the composite symbol II
(skip,SSS) using the Huffman coding. J
• Huffman Tables can be custom (sent in header) or default. I
Back
Close
Example JPEG Compression

449

JJ
II
J
I
Back
Close
Another Enumerated Example

450

JJ
II
J
I
Back
Close
JPEG 2000
• New version released in 2002.
• Based on:
– discrete wavelet transform (DWT) ,instead of DCT, 451
– scalar quantization,
– context modeling,
– arithmetic coding,
– post-compression rate allocation.
• Application: variety of uses, ranging from digital photography
to medical imaging to advanced digital scanning and printing.
• Higher compression efficiency — visually lossless JJ
compression at 1 bit per pixel or better. II
J
I
Back
Close
Further Information
Basic JPEG Information:
• http://www.jpeg.org
• Online JPEG Tutorial 452

For more information on the JPEG 2000 standard for still image
coding, refer to
http://www.jpeg.org/JPEG2000.htm

JJ
II
J
I
Back
Close
Compression III:
Video Compression (MPEG and others)
We need to compress video (more so than audio/images) in
practice since:
453
1. Uncompressed video (and audio) data are huge.
In HDTV, the bit rate easily exceeds 1 Gbps. — big problems
for storage and network communications.
E.g. HDTV: 1920 x 1080 at 30 frames per second, 8 bits per
RGB (YCrCb actually) channel = 1.5 Gbps.
2. Lossy methods have to employed since thecompression ratio
of lossless methods (e.g., Huffman, Arithmetic, LZW) is not
high enough for image and video compression, especially JJ
when distribution of pixel values is relatively flat. II
J
I
Back
Close
Not the complete picture studied here

Much more to MPEG — Plenty of other tricks employed.

We only concentrate on some basic principles of video 454

compression:
• Earlier H.261 and MPEG 1 and 2 standards.

JJ
II
J
I
Back
Close
Compression Standards(1)
Image, Video and Audio Compression standards have been
specifies and released by two main groups since 1985:
ISO - International Standards Organisation: JPEG, MEPG.
455
ITU - International Telecommunications Union: H.261 — 264.

JJ
II
J
I
Back
Close
Compression Standards (2)
Whilst in many cases one of the groups have specified separate
standards there is some crossover between the groups.

For example: 456

• JPEG issued by ISO in 1989 (but adopted by ITU as ITU T.81)


• MPEG 1 released by ISO in 1991,
• H.261 released by ITU in 1993 (based on CCITT 1990 draft).
CCITT stands for Comité Consultatif International Téléphonique et
Télégraphique whose parent company is ITU.
• H.262 is alternatively better known as MPEG-2 released in 1994.
• H.263 released in 1996 extended as H.263+, H.263++.
• MPEG 4 release in 1998.
• H.264 releases in 2002 for DVD quality and is now part of MPEG 4 (Part 10). JJ
Quicktime 6 supports this. II
J
I
Back
Close
How to compress video?
Basic Idea of Video Compression:
Motion Estimation/Compensation
• Spatial Redundancy Removal – Intraframe coding (JPEG)
457
NOT ENOUGH BY ITSELF?
• Temporal — Greater compression by noting the temporal
coherence/incoherence over frames. Essentially we note the
difference between frames.
• Spatial and Temporal Redundancy Removal – Intraframe and
Interframe coding (H.261, MPEG)

JJ
II
J
I
Back
Close
Simple Motion Estimation/Compensation Example

Things are much more complex in practice of course.

458

Which Format to represent the compressed data?


• Simply based on Differential Pulse Code Modulation (DPCM).

JJ
II
J
I
Back
Close
Simple Motion Example (Cont.)
Consider a simple image (block) of a moving circle.

Lets just consider the difference between 2 frames.


459
It simple to encode/decode:

JJ
II
J
I
Back
Close
Now lets Estimate Motion of blocks
We will examine methods of estimating motion vectors in due
course.

460

JJ
Figure 47: Motion estimation/compensation (encoding) II
J
I
Back
Close
Decoding Motion of blocks

461

Figure 48: Motion estimation/compensation (decoding)

Why is this a better method than just frame differencing? JJ


II
J
I
Back
Close
How is this used in Video Compression Standards?
Block Matching:
• MPEG-1/H.261 is done by using block matching techniques,
For a certain area of pixels in a picture: 462

• find a good estimate of this area in a previous (or in a future)


frame, within a specified search area.
Motion compensation:
• uses the motion vectors to compensate the picture.
• parts of a previous (or future) picture can be reused in a
subsequent picture.
JJ
• individual parts spatially compressed II
J
I
Back
Close
Any Overheads?
• Motion estimation/compensation techniques reduces the
video bitrate significantly
but
463
• introduce extras computational complexity and delay (?),
– need to buffer reference pictures - backward and forward
referencing.
– reconstruct from motion parameters
Lets see how such ideas are used in practice.

JJ
II
J
I
Back
Close
H.261 Compression
The basic approach to H. 261 Compression is summarised as
follows:
H. 261 Compression has been specifically designed for video
telecommunication applications: 464

• Developed by CCITT in 1988-1990


• Meant for videoconferencing, videotelephone applications
over ISDN telephone lines.
• Baseline ISDN is 64 kbits/sec, and integral multiples (px64)

JJ
II
J
I
Back
Close
Overview of H.261
• Frame types are CCIR 601 CIF (352x288) and
QCIF (176x144) images with 4:2:0 subsampling.
• Two frame types:
Intraframes (I-frames) and Interframes (P-frames)
• I-frames use basically JPEG — but YUV (YCrCb) and larger DCT windows, 465
different quantisation
• I-frame provide us with a (re)fresh accessing point — Key Frames
• P-frames use pseudo-differences from previous frame (predicted), so frames
depend on each other.

JJ
II
J
I
Back
Close
Intra Frame Coding
• Various lossless and lossy compression techniques use
• Compression contained only within the current frame
• Simpler coding – Not enough by itself for high compression. 466

• However, cant rely on inter frame differences across a large


number of frames
– So when Errors get too large: Start a new I-Frame

JJ
II
J
I
Back
Close
Intraframe coding is very similar to that of a JPEG still image
video encoder:

467

JJ
II
J
I
Back
Close
This is a basic Intra Frame Coding Scheme is as follows:
• Macroblocks are typically 16x16 pixel areas on Y plane of
original image.
• A macroblock usually consists of 4 Y blocks, 1 Cr block, and
468
1 Cb block. (4:2:0 chroma subsampling)
– Eye most sensitive luminance, less sensitive chrominance.
– S0 operate on an effective color space: YUV (YCbCr)
colour which we have met.
– Typical to use 4:2:0 macroblocks: one quarter of the
chrominance information used.
• Quantization is by constant value for all DCT coefficients.
I.e., no quantization table as in JPEG. JJ
II
J
I
Back
Close
The Macroblock is coded as follows:

• Many macroblocks will be exact matches (or close enough).


469
So send address of each block in image –> Addr
• Sometimes no good match can be found, so send INTRA
block –> Type
• Will want to vary the quantization to fine tune compression,
so send quantization value –> Quant
• Motion vector –> vector
• Some blocks in macroblock will match well, others match
JJ
poorly. So send bitmask indicating which blocks are present II
(Coded Block Pattern, or CBP). J
• Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG. I
Back
Close
Inter-frame (P-frame) Coding
• Intra frame limited spatial basis relative to 1 frame
• Considerable more compression if the inherent temporal basis
is exploited as well.
470
BASIC IDEA:
• Most consecutive frames within a sequence are very similar
to the frames both before (and after) the frame of interest.
• Aim to exploit this redundancy.
• Use a technique known as block-based motion compensated
prediction
• Need to use motion estimation JJ
II
• Coding needs extensions for Inter but encoder can also
J
supports an Intra subset. I
Back
Close
471

Figure 49: P-Frame Coding JJ


II
J
I
Back
Close
Forward Prediction Basics:
• Start with I frame (spatially with no reference to any other
frame)
• Predict a future P frame(s) in a forward time manner.
472
• As an example, Predict future 6 frame sequence:
I,P,P,P,P,P,I,P,P,P,P,

JJ
II
J
I
Back
Close
P-coding can be summarised as follows:

473

JJ
II
J
I
Back
Close
A Coding Example (P-frame)
• Previous image is called reference image.
• Image to code is called target image.
• Actually, the difference is encoded. 474

• Subtle points:
1. Need to use decoded image as reference image,
not original. Why?
2. We’re using ”Mean Absolute Difference” (MAD) to decide
best block.
Can also use ”Mean Squared Error” (MSE) = sum(E*E)

JJ
II
J
I
Back
Close
Hard Problems in H.261
There are however a few difficult problems in H.261:
• Motion vector search
• Propagation of Errors 475

• Bit-rate Control

JJ
II
J
I
Back
Close
Motion Vector Search

476

JJ
II
J
I
Back
Close
• C(x + k, y + i) – pixels in the macro block with upper left
corner (x, y) in the Target.
• R(X + i + k, y + j + l) – pixels in the macro block with upper
left corner (x + i, y + j) in the Reference.
• Cost function is: 477

Where MAE stands for Mean Absolute Error.


• Goal is to find a vector (u, v) such that MAE (u, v) is minimum
– Full Search Method JJ
– Two-Dimensional Logarithmic Search II
J
I
Back
Close
Hierarchical Motion Estimation:

478

1. Form several low resolution version of the target and reference


pictures
JJ
2. Find the best match motion vector in the lowest resolution version.
II
3. Modify the motion vector level by level when going up J
I
Back
Close
Propagation of Errors
• Send an I-frame every once in a while
• Make sure you use decoded frame for comparison

479

JJ
II
J
I
Back
Close
Bit-rate Control
• Simple feedback loop based on ”buffer fullness”
If buffer is too full, increase the quantization scale factor to
reduce the data.
480

JJ
II
J
I
Back
Close
MPEG Compression
MPEG stands for:
• Motion Picture Expert Group — established circa 1990 to
create standard for delivery of audio and video
481
• MPEG-1 (1991).Target: VHS quality on a CD-ROM (320 x
240 + CD audio @ 1.5 Mbits/sec)
• MPEG-2 (1994): Target Television Broadcast
• MPEG-3: HDTV but subsumed into and extension of MPEG-2
• MPEG 4 (1998): Very Low Bitrate Audio-Visual Coding
• MPEG-7 (2001) ”Multimedia Content Description Interface”.
• MPEG-21 (2002) ”Multimedia Framework” JJ
II
J
I
Back
Close
Three Parts to MPEG
• The MPEG standard had three parts:
1. Video: based on H.261 and JPEG
2. Audio: based on MUSICAM technology
482
3. System: control interleaving of streams

JJ
II
J
I
Back
Close
MPEG Video
MPEG compression is essentially a attempts to over come some
shortcomings of H.261 and JPEG:
• Recall H.261 dependencies:
483

JJ
II
J
I
Back
Close
• The Problem here is that many macroblocks need information
is not in the reference frame.
• For example:

484

JJ
II
J
I
Back
Close
• The MPEG solution is to add a third frame type which is a
bidirectional frame, or B-frame
• B-frames search for macroblock in past and future frames.
• Typical pattern is IBBPBBPBB IBBPBBPBB IBBPBBPBB
485
Actual pattern is up to encoder, and need not be regular.

JJ
II
J
I
Back
Close
MPEG Video Layers (1)
MPEG video is broken up into a hierarchy of layers to help
• Error handling,
• Random search and editing, and 486

• Synchronization, for example with an audio bitstream.

JJ
II
J
I
Back
Close
MPEG Video Layers (2)
From the top level, the layers are
Video sequence layer — any self-contained bitstream.
For example a coded movie or advertisement.
487
Group of pictures – composed of 1 or more groups of intra (I)
frames and/or non-intra (P and/or B) pictures.
Picture layer — itself,
Slice Layer — layer beneath Picture it is called the slice layer.

JJ
II
J
I
Back
Close
Slice Layer
• Each slice: a contiguous sequence of raster ordered
macroblocks,
• Each macroblock ordered on row basis in typical video
488
applications
• Each macroblock is 16x16 arrays of
– luminance pixels, or
– picture data elements, with 2 8x8 arrays of associated
chrominance pixels.
• Macroblocks may be further divided into distinct 8x8 blocks,
for further processing such as transform coding.
JJ
II
J
I
Back
Close
Coding Layers in Macroblock
• Each of layers has its own unique 32 bit start code :
– 23 zero bits followed by a one, then followed by
– 8 bits for the actual start code.
489
– Start codes may have as many zero bits as desired
preceding them.

JJ
II
J
I
Back
Close
B-Frames
New from H.261
• MPEG uses forward/backward interpolated prediction.
• Frames are commonly referred to as bi-directional interpolated 490
prediction frames, or B frames for short.

JJ
II
J
I
Back
Close
Example I, P, and B frames
Consider a group of pictures that lasts for 6 frames:

• Given
I,B,P,B,P,B,I,B,P,B,P,B, 491

• I frames are coded spatially only (as before)


• P frames are forward predicted based on previous I and P frames(as before).
• B frames are coded based on a forward prediction from a previous I or P frame,
as well as a backward prediction from a succeeding I or P frame.
• Here: 1st B frame is predicted from the 1st I frame and 1st P frame.
• 2nd B frame is predicted from the 2nd and 3rd P frames.
• 3rd B frame is predicted from the 3rd P frame and the 1st I frame of the next
group of pictures.
JJ
II
J
I
Back
Close
Backward Prediction
Note: Backward prediction requires that the future frames that
are to be used for backward prediction be
• encoded and
492
• transmitted first,
• out of order.
This process is summarized in Figure 50.

JJ
II
J
Figure 50: B-Frame Encoding I
Back
Close
Also NOTE:
• No defined limit to the number of consecutive B frames that
may be used in a group of pictures,
• Optimal number is application dependent.
493
• Most broadcast quality applications however, have tended
to use 2 consecutive B frames (I,B,B,P,B,B,P,) as the ideal
trade-off between compression efficiency and video quality.

JJ
II
J
I
Back
Close
Advantage of the usage of B frames
• Coding efficiency.
• Most B frames use less bits.
• Quality can also be improved in the case of moving objects 494
that reveal hidden areas within a video sequence.
• Better Error propagation: B frames are not used to predict
future frames, errors generated will not be propagated further
within the sequence.

Disadvantage:
• Frame reconstruction memory buffers within the encoder and
decoder must be doubled in size to accommodate the 2 JJ
anchor frames. II
J
I
Back
Close
Motion Estimation
• The temporal prediction technique used in MPEG video is
based on motion estimation. 495

The basic premise:


• Consecutive video frames will be similar except for changes
induced by objects moving within the frames.
• Trivial case of zero motion between frames — no other
differences except noise, etc.),
• Easy for the encoder to predict the current frame as a duplicate
JJ
of the prediction frame. II
• When there is motion in the images, the situation is not as J
simple. I
Back
Close
Example of a frame with 2 stick figures and a tree
The problem for motion estimation to solve is :
• How to adequately represent the changes, or differences,
between these two video frames.
496

Figure 51: Motion Estimation Example JJ


II
J
I
Back
Close
Solution:
A comprehensive 2-dimensional spatial search is performed
for each luminance macroblock.
• Motion estimation is not applied directly to chrominance in
497
MPEG
• MPEG does not define how this search should be performed.
• A detail that the system designer can choose to implement
in one of many possible ways.
• Well known that a full, exhaustive search over a wide 2-D
area yields the best matching results in most cases, but at
extreme computational cost to the encoder.
JJ
• Motion estimation usually is the most computationally II
expensive portion of the video encode J
I
Back
Close
498

Figure 52: Motion Est. Macroblock Example

JJ
II
J
I
Back
Close
Motion Vectors, Matching Blocks
Figure 52 shows an example of a particular macroblock from
Frame 2 of Figure 51, relative to various macroblocks of Frame
1.
499

• The top frame has a bad match with the macroblock to be coded.
• The middle frame has a fair match, as there is some commonality between the
2 macroblocks.
• The bottom frame has the best match, with only a slight error between the 2
macroblocks.
• Because a relatively good match has been found, the encoder assigns motion
vectors to that macroblock,
• Each forward and backward predicted macroblock may contain 2 motion vectors,
• Achieved true bidirectionally predicted macroblocks will utilize 4 motion vectors. JJ
II
J
I
Back
Close
500

Figure 53: Final Motion Estimation Prediction

JJ
II
J
I
Back
Close
Figure 53 shows how a potential predicted Frame 2 can be
generated from Frame 1 by using motion estimation.
• The predicted frame is subtracted from the desired frame,
• Leaving a (hopefully) less complicated residual error frame
501
that can then be encoded much more efficiently than before
motion estimation.
• The more accurate the motion is estimated and matched,
the more likely it will be that the residual error will approach
zero,
• And the coding efficiency will be highest.

JJ
II
J
I
Back
Close
Further coding efficiency
• Motion vectors tend to be highly correlated between
macroblocks:
– The horizontal component is compared to the previously
502
valid horizontal motion vector and
– Only the difference is coded.
– Same difference is calculated for the vertical component
– Difference codes are then described with a variable length
code for maximum compression efficiency.

JJ
II
J
I
Back
Close
What Happens if we find acceptable match?

B/P Blocks may not be what they appear to be?

503

If the encoder decides that no acceptable match exists then it


has the option of
• Coding that particular macroblock as an intra macroblock,
• Even though it may be in a P or B frame ??????
• In this manner, high quality video is maintained at a slight
cost to coding efficiency.
JJ
II
J
I
Back
Close
Estimating the Motion Vectors
Basic Ideas is to search for Macroblock (MB)
• Within a ±n x m pixel search window
• Work out Sum of Absolute Difference (SAD) 504
(or Mean Absolute Error (MAE) for each window but this is
computationally more expensive)
• Choose window where SAD is a minimum.

JJ
II
J
I
Back
Close
SAD Computation
SAD is computed by:
For i = -n to +n
For j = -m to +m
505
l=N −1 j=N −1
SAD(i, j) = Σk=0 Σj=0
| C(x + k, y + l) − R(x + i + k, y + j + l) |

• N = is size of Macroblock window typically (16 or 32 pixels),


• (x, y) the position of the original MB, C, and
• R is the region to compute the SAD.
JJ
II
J
I
Back
Close
Allowing for an Alpha Mask in SAD
It is sometimes applicable for an alpha mask to be applied to
SAD calculation to mask out certain pixels.

−1 j=N −1
SAD(i, j) = Σl=N
k=0 Σj=0
506

| C(x + k, y + l) − R(x + i + k, y + j + l) | ∗(!alphac(i, j) = 0)

JJ
II
J
I
Back
Close
SAD Search Example
So for a ± 2x2 Search Area is given by dashed lines and a 2x2
Macroblock window example, the SAD is given by bold dot dash
line (near top right corner) in Figure 54.
507

JJ
II
J
Figure 54: SAD Window search Example I
Back
Close
Selecting Intra/Inter Frame coding
Based upon the motion estimation a decision is made on
whether INTRA or INTER coding is made.

To determine INTRA/INTER MODE we do the following 508

calculation:

ΣN −1
i=0,j=0 |C(i,j)|
M Bmean = N

A = Σn,m
i=0,j=0 | C(i, j) − M Bmean | ∗(!alphac (i, j) = 0)

If A < (SAD − 2N ) INTRA Mode is chosen. JJ


II
J
I
Back
Close
Coding of Predicted Frames:Coding Residual Errors
• A predicted frame is subtracted from its reference and
• the residual error frame is generated,
• this information is spatially coded as in I frames,
509
– by coding 8x8 blocks with the DCT,
– DCT coefficient quantization,
– run-length/amplitude coding, and
– bitstream buffering with rate control feedback.
• The default quantization matrix for non-intra frames is a flat matrix
with a constant value of 16 for each of the 64 locations.
• The non-intra quantization step function contains a dead-zone
around zero that is not present in the intra version. This helps
eliminate any lone DCT coefficient quantization values that might JJ
reduce the run-length amplitude efficiency. II
• Finally, the motion vectors for the residual block information are J
calculated as differential values and are coded with a variable I
length code. Back
Close
Differences from H.261

• Larger gaps between I and P frames, so expand motion


vector search range. 510

• To get better encoding, allow motion vectors to be specified


to fraction of a pixel (1/2 pixels).
• Bitstream syntax must allow random access,
forward/backward play, etc.
• Added notion of slice for synchronization after loss/corrupt
data.
JJ
II
J
I
Back
Close
Differences from H.261 (Cont.)

• B frame macroblocks can specify two motion vectors (one to


past and one to future), indicating result is to be averaged.
511

JJ
II
J
I
Back
Close
MPEG-2, MPEG-3, and MPEG-4

• MPEG-2 target applications

----------------------------------------------------------------
Level size Pixels/sec bit-rate Application 512
(Mbits)
----------------------------------------------------------------
Low 352 x 240 3 M 4 consumer tape equiv.
Main 720 x 480 10 M 15 studio TV
High 1440 1440 x 1152 47 M 60 consumer HDTV
High 1920 x 1080 63 M 80 film production
-----------------------------------------------------------------

• MPEG-2 differences from MPEG-1


1. Search on fields, not just frames.
2. 4:2:2 and 4:4:4 macroblocks
3. Frame sizes as large as 16383 x 16383 JJ
II
4. Scalable modes: Temporal, Progressive,...
J
5. Non-linear macroblock quantization factor I
6. A bunch of minor fixes Back
Close
MPEG-2, MPEG-3, and MPEG-4 (Cont.)
• MPEG-3: Originally for HDTV (1920 x 1080), got folded into
MPEG-2
• MPEG-4: very low bit-rate communication (4.8 to 64 kb/sec).
513
Video processing

JJ
II
J
I
Back
Close
Compression IV:
Audio Compression (MPEG and others)
As with video a number of compression techniques have been
applied to audio.
514

Simple Audio Compression Methods

RECAP (Already Studied)

Traditional lossless compression methods (Huffman, LZW, etc.)


usually don’t work well on audio compression
• For the same reason as in image and video compression: JJ
Too much change variation in data over a short time II
J
I
Back
Close
Some Simple But Limited Practical Methods

• Silence Compression - detect the ”silence”, similar to


run-length encoding (seen examples before)
515
• Differential Pulse Code Modulation (DPCM)
Relies on the fact that difference in amplitude in successive
samples is small then we can used reduced bits to store the
difference (seen examples before)

JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....

• Adaptive Differential Pulse Code Modulation (ADPCM)


e.g., in CCITT G.721 – 16 or 32 Kbits/sec.
516

(a) Encodes the difference between two consecutive signals


but a refinement on DPCM,

(b) Adapts at quantisation so fewer bits are used when the


value is smaller.
– It is necessary to predict where the waveform is headed
–> difficult JJ
– Apple had a proprietary scheme called ACE/MACE. Lossy II
scheme that tries to predict where wave will go in next J
I
sample. About 2:1 compression.
Back
Close
Simple But Limited Practical Methods Continued ....

• Adaptive Predictive Coding (APC) typically used on Speech.


– Input signal is divided into fixed segments (windows) 517

– For each segment, some sample characteristics are


computed, e.g. pitch, period, loudness.
– These characteristics are used to predict the signal
– Computerised talking (Speech Synthesisers use such
methods) but low bandwidth:

acceptable quality at 8 kbits/sec


JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....

• Linear Predictive Coding (LPC) fits signal to speech model


and then transmits parameters of model as in APC.
518
Speech Model:
– Speech Model:
pitch, period, loudness, vocal tract
parameters (voiced and unvoiced sounds).
– Synthesised speech
– Still sounds like a computer talking,
– Bandwidth as low as 2.4 kbits/sec.
JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....

• Code Excited Linear Predictor (CELP) does LPC, but also


transmits error term.
519
– Based on more sophisticated model of vocal tract than
LPC
– Better perceived speech quality
– Audio conferencing quality at 4.8 kbits/sec.

JJ
II
J
I
Back
Close
Psychoacoustics or Perceptual Coding
Basic Idea: Exploit areas where the human ear is less sensitive
to sound to achieve compression
E.g. MPEG audio
How do we hear sound? 520

JJ
II
J
I
Back
Close
Sound revisited
• Sound is produced by a vibrating source.
• The vibrations disturb air molecules
• Produce variations in air pressure:
521
lower than average pressure, rarefactions, and
higher than average, compressions.
This produces sound waves.
• When a sound wave impinges on a surface (e.g. eardrum or
microphone) it causes the surface to vibrate in sympathy:

JJ
II
J
• In this way acoustic energy is transferred from a source to a I
receptor. Back
Close
Human Hearing
• Upon receiving the the waveform the eardrum vibrates in
sympathy
• Through a variety of mechanisms the acoustic energy is
522
transferred to nerve impulses that the brain interprets as
sound.
The ear can be regarded as being made up of 3 parts:
• The outer ear,
• The middle ear,
• The inner ear.
JJ
II
J
I
Back
Close
Human Ear
We consider:
• The function of the main parts of the ear
• How the transmission of sound is processed. 523

⇒ FLASH EAR DEMO (Lecture ONLY)


Click Here to Run Flash Ear Demo over the Web (Shockwave
Required)

JJ
II
J
I
Back
Close
The Outer Ear

524

• Ear Canal: Focuses the incoming audio.


• Eardrum (Tympanic Membrane):
– Interface between the external and middle ear.
– Sound is converted into mechanical vibrations via the
middle ear. JJ
II
– Sympathetic vibrations on the membrane of the eardrum.
J
I
Back
Close
The Middle Ear

525

• 3 small bones, the ossicles:


Malleus, Incus, and Stapes.
• Form a system of levers which are linked together and driven
by the eardrum JJ
• Bones amplify the force of sound vibrations. II
J
I
Back
Close
The Inner Ear

526

The Cochlea:

• Transforms mechanical ossicle forces into hydraulic pressure,


• The cochlea is filled with fluid.
• Hydraulic pressure imparts movement to the cochlear duct and to the organ of Corti.
• Cochlea which is no bigger than the tip of a little finger!!
JJ
Semicircular canals II
• Body’s balance mechanism J
• Thought that it plays no part in hearing. I
Back
Close
How the Cochlea Works
• Pressure waves in the cochlea exert energy along a route
that begins at the oval window and ends abruptly at the
membrane-covered round window
527
• Pressure applied to the oval window is transmitted to all
parts of the cochlea.

Stereocilia
• Inner surface of the cochlea (the basilar membrane) is lined
with over 20,000 hair-like nerve cells — stereocilia,
• One of the most critical aspects of hearing.
JJ
II
J
I
Back
Close
Stereocilia Microscope Images

528

JJ
II
J
I
Back
Close
Hearing different frequencies
• Basilar membrane is tight at one end, looser at the other
• High tones create their greatest crests where the membrane
is tight,
529
• Low tones where the wall is slack.
• Causes resonant frequencies much like what happens in a
tight string.
• Stereocilia differ in length by minuscule amounts
• they also have different degrees of resiliency to the fluid
which passes over them.

JJ
II
J
I
Back
Close
Finally to nerve signals

• Compressional wave moves middle ear through to the cochlea


• Stereocilia will be set in motion. 530

• Each stereocilia sensitive to a particular frequency.


• Stereocilia cell will resonate with a larger amplitude of
vibration.
• Increased vibrational amplitude induces the cell to release
an electrical impulse which passes along the auditory nerve
towards the brain.
In a process which is not clearly understood, the brain is JJ
capable of interpreting the qualities of the sound upon reception II
of these electric nerve impulses. J
I
Back
Close
Sensitivity of the Ear
• Range is about 20 Hz to 20 kHz, most sensitive at 2 to 4
KHz.
• Dynamic range (quietest to loudest) is about 96 dB
531
• Approximate threshold of pain: 130 dB
• Hearing damage: > 90 dB (prolonged exposure)
• Normal conversation: 60-70 dB
• Typical classroom background noise: 20-30 dB
• Normal voice range is about 500 Hz to 2 kHz
– Low frequencies are vowels and bass
JJ
– High frequencies are consonants
II
J
I
Back
Close
Question: How sensitive is human hearing?
The sensitivity of the human ear with respect to frequency is
given by the following graph.

532

JJ
II
J
I
Back
Close
Frequency dependence is also level dependent!
Ear response is even more complicated.
Complex phenomenon to explain.
Illustration : Loudness Curves or Fletcher-Munson Curves:
533

JJ
II
J
I
Back
Close
What do the curves mean?

534

• Curves indicate perceived loudness is a function of both the


frequency and the level (sinusoidal sound signal)
• Equal loudness curves. Each contour:
– Equal loudness JJ
II
– Express how much a sound level must be changed as the J
frequency varies, I
to maintain a certain perceived loudness Back
Close
Physiological Implications

Why are the curves accentuated where they are?


• Accentuates of frequency range to coincide with speech.
535
• Sounds like p and t have very important parts of their spectral
energy within the accentuated range
• Makes them more easy to discriminate between.
The ability to hear sounds of the accentuated range (around
a few kHz) is thus vital for speech communication.

JJ
II
J
I
Back
Close
Traits of Human Hearing
Frequency Masking
• Multiple frequency audio changes the sensitivity with the
relative amplitude of the signals.
536
• If the frequencies are close and the amplitude of one is less
than the other close frequency then the second frequency
may not be heard.

JJ
II
J
I
Back
Close
Critical Bands
• Range of closeness for frequency masking depends on the
frequencies and relative amplitudes.
• Each band where frequencies are masked is called the Critical
Band 537

• Critical bandwidth for average human hearing varies with


frequency:
– Constant 100 Hz for frequencies less than 500 Hz
– Increases (approximately) linearly by 100 Hz for each
additional 500 Hz.
• Width of critical band is called a bark.

JJ
II
J
I
Back
Close
What is the cause of Frequency Masking?
• The stereocilia are excited by air pressure variations,
transmitted via outer and middle ear.
• Different stereocilia respond to different ranges of
538
frequencies — the critical bands

Frequency Masking occurs because after excitation by one


frequency further excitation by a less strong similar frequency
of the same group of cells is not possible.
JJ
II
J
I
Back
Close
Example of frequency masking
• Example: Play 1 kHz tone (maskingtone) at fixed level (60
dB). Play test tone at a different level (e.g., 1.1 kHz), and
raise level until just distinguishable.
539
• Vary the frequency of the test tone and plot the threshold
when it becomes audible:

• If we repeat for various frequencies of masking tones we get:


JJ
II
J
I
Back
Close
Temporal masking
After the ear hears a loud sound:
• It takes a further short while before it can hear a quieter sound.
Why is this so?
540
• Stereocilia vibrate with corresponding force of input sound stimuli.
• If the stimuli is strong then stereocilia will be in a high state of
excitation and get fatigued.
• After extended listening to loud music or headphones this
sometimes manifests itself with ringing in the ears and even
temporary deafness.
• Prolonged exposure to noise permanently damages the Stereocilia.
JJ
Temporal Masking occurs because the hairs take time to settle II
after excitation to respond again. J
I
Back
Close
Example of Temporal Masking
• Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz
at 40 dB. Test tone can’t be heard (it’s masked).
Stop masking tone, then stop test tone after a short delay.
541
Adjust delay time to the shortest time that test tone can be
heard (e.g., 5 ms).
Repeat with different level of the test tone and plot:

JJ
II
J
I
Back
Close
Example of Temporal Masking (Cont.)

• Try other frequencies for test tone (masking tone duration


constant). Total effect of masking:
542

JJ
II
J
I
Back
Close
Summary: How to Exploit?
• If we have a loud tone at, say at 1 kHz, then nearby quieter
tones are masked.
• Best compared on critical band scale – range of masking is
543
about 1 critical band
• Two factors for masking – frequency masking and temporal
masking
• Question: How to use this for compression?

Two examples:
– MPEG Audio
JJ
– Dolby
II
J
I
Back
Close
How to compute?
We have met basic tools:
• Fourier and Discrete Cosine Transforms
• Work in frequency space 544

• (Critical) Band Pass Filtering — Visualise a graphic equaliser

JJ
II
J
I
Back
Close
MPEG Audio Compression

• Exploits the psychoacoustic models above.


• Frequency masking is always utilised 545

• More complex forms of MPEG also employ temporal masking.

JJ
II
J
I
Back
Close
Basic Frequency Filtering Bandpass

MPEG audio compression basically works by:


• Dividing the audio signal up into a set of frequency subbands
546
• Subbands approximate critical bands.
• Each band quantised according to the audibility of
quantisation noise.

Quantisation is the key to MPEG audio compression and is


the reason why it is lossy.

JJ
II
J
I
Back
Close
How good is MPEG compression?
Although (data) lossy
MPEG claims to be perceptually lossless:
• Human tests (part of standard development), Expert
547
listeners.
• 6-1 compression ratio, stereo 16 bit samples at 48 Khz
compressed to 256 kbits/sec
• Difficult, real world examples used.
• Under Optimal listening conditions no statistically
distinguishable difference between original and MPEG.
JJ
II
J
I
Back
Close
Basic MPEG: MPEG audio coders

• Set of standards for the use of video with sound.


• Compression methods or coders associated with audio 548

compression are called MPEG audio coders.


• MPEG allows for a variety of different coders to employed.
• Difference in level of sophistication in applying perceptual
compression.
• Different layers for levels of sophistication.

JJ
II
J
I
Back
Close
An Advantage of MPEG approach

Complex psychoacoustic modelling only in coding phase


• Desirable for real time (Hardware or software)
decompression 549

• Essential for broadcast purposes.


• Decompression is independent of the psychoacoustic
models used
• Different models can be used
• If there is enough bandwidth no models at all.
JJ
II
J
I
Back
Close
Basic MPEG: MPEG Standards

Evolving standards for MPEG audio compression:


• MPEG-1 is by the most prevalent
550
• So called mp3 files we get off Internet are members of
MPEG-1 family.
• Standards now extends to MPEG-4 (structured audio) —
Previous Lecture.

For now we concentrate on MPEG-1

JJ
II
J
I
Back
Close
Basic MPEG: MPEG Facts
• MPEG-1: 1.5 Mbits/sec for audio and video
About 1.2 Mbits/sec for video, 0.3 Mbits/sec for audio
(Uncompressed CD audio is
44,100 samples/sec * 16 bits/sample * 2 channels > 1.4 Mbits/sec) 551

• Compression factor ranging from 2.7 to 24.


• MPEG audio supports sampling frequencies of 32, 44.1 and 48
KHz.
• Supports one or two audio channels in one of the four modes:
1. Monophonic – single audio channel
2. Dual-monophonic – two independent channels
(functionally identical to stereo)
3. Stereo – for stereo channels that share bits, but not using
JJ
joint-stereo coding
II
J
4. Joint-stereo – takes advantage of the correlations between I
stereo channels Back
Close
Basic MPEG-1 Compression algorithm (1)
Basic encoding algorithm summarised below:

552

JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (2)
The main stages of the algorithm are:
• The audio signal is first samples and quantised use PCM
– Application dependent: Sample rate and number of bits
553
• The PCM samples are then divided up into a number of
frequency subband and compute subband scaling factors:

JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (3)
Analysis filters
• Also called critical-band filters
• Break signal up into equal width subbands
554
• Use fast Fourier transform (FFT) (or discrete cosine
transform (DCT))
• Filters divide audio signal into frequency subbands that
approximate the 32 critical bands
• Each band is known as a sub-band sample.
• Example: 16 kHz signal frequency, Sampling rate 32 kbits/sec
gives each subband a bandwidth of 500 Hz. JJ
• Time duration of each sampled segment of input signal is II
time to accumulate 12 successive sets of 32 PCM (subband) J
I
samples, i.e. 32*12 = 384 samples. Back
Close
Basic MPEG-1 Compression algorithm (4)

Analysis filters (cont)


• In addition to filtering the input, analysis banks determine
555
– Maximum amplitude of 12 subband samples in each
subband.
– Each known as the scaling factor of the subband.
– Passed to psychoacoustic model and quantiser blocks

JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (5)

Psychoacoustic modeller:
• Frequency Masking and may employ temporal masking.
556
• Performed concurrently with filtering and analysis operations.
• Determine amount of masking for each band caused by nearby
bands.
• Input: set hearing thresholds and subband masking
properties (model dependent) and scaling factors (above).

JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (6)

Psychoacoustic modeller (cont):


• Output: a set of signal-to-mask ratios:
557
– Indicate those frequencies components whose amplitude
is below the audio threshold.
– If the power in a band is below the masking threshold,
don’t encode it.
– Otherwise, determine number of bits (from scaling
factors) needed to represent the coefficient such that noise
introduced by quantisation is below the masking effect
(Recall that 1 bit of quantisation introduces about 6 dB JJ
of noise). II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (7)

Example of Quantisation:
• Assume that after analysis, the first levels of 16 of the 32
558
bands are these:
----------------------------------------------------------------------
Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
----------------------------------------------------------------------

• If the level of the 8th band is 60 dB,


then assume (according to model adopted) it gives a masking of
12 dB in the 7th band, 15 dB in the 9th.
Level in 7th band is 10 dB ( < 12 dB ), so ignore it.
Level in 9th band is 35 dB ( > 15 dB ), so send it. JJ
–> Can encode with up to 2 bits (= 12 dB) of quantisation error.
II
J
I
Back
Close
MPEG-1 Output bitstream
The basic output stream for a basic MPEG encoder is as follows:

559

• Header: contains information such as the sample frequency


and quantisation,.
• Subband sample (SBS) format: Quantised scaling factors
and 12 frequency components in each subband.
– Peak amplitude level in each subband quantised using 6
bits (64 levels)
– 12 frequency values quantised to 4 bits JJ
II
• Ancillary data: Optional. Used, for example, to carry J
additional coded samples associated with special broadcast I
format (e.g surround sound) Back
Close
Decoding the bitstream

• Dequantise the subband samples after demultiplexing the


coded bitstream into subbands.
560
• Synthesis bank decodes the dequantised subband samples
to produce PCM stream.
– This essentially involves applying the inverse fourier
transform (IFFT) on each substream and multiplexing the
channels to give the PCM bit stream.

JJ
II
J
I
Back
Close
MPEG Layers

MPEG defines 3 levels of processing layers for audio:


• Level 1 is the basic mode,
561
• Levels 2 and 3 more advance (use temporal masking).
• Level 3 is the most common form for audio files on the Web
– Our beloved MP3 files that record companies claim are
bankrupting their industry.
– Strictly speaking these files should be called
MPEG-1 level 3 files.
Each level:
JJ
• Increasing levels of sophistication II
• Greater compression ratios. J
I
• Greater computation expense Back
Close
Level 1

• Best suited for bit rate bigger than 128 kbits/sec per channel.
• Example: Phillips Digital Compact Cassette uses Layer 1 562
192 kbits/sec compression
• Divides data into frames,
– Each of them contains 384 samples,
– 12 samples from each of the 32 filtered subbands as shown
above.
• Psychoacoustic model only uses frequency masking.
• Optional Cyclic Redundancy Code (CRC) error checking. JJ
II
J
I
Back
Close
Layer 2
• Targeted at bit rates of around 128 kbits/sec per channel.
• Examples: Coding of Digital Audio Broadcasting (DAB) on
CD-ROM, CD-I and Video CD.
• Enhancement of level 1. 563

• Codes audio data in larger groups:


– Use three frames in filter:
before, current, next, a total of 1152 samples.
– This models a little bit of the temporal masking.
• Imposes some restrictions on bit allocation in middle and high
subbands.
• More compact coding of scale factors and quantised
JJ
samples. II
• Better audio quality due to saving bits here so more bits can be J
used in quantised subband values I
Back
Close
Layer 3

• Targeted at bit rates of 64 kbits/sec per channel.


• Example: audio transmission of ISDN or suitable bandwidth network.
• Much more complex approach. 564

• Psychoacoustic model includes temporal masking effects,


• Takes into account stereo redundancy.
• Better critical band filter is used (non-equal frequencies)
• Uses a modified DCT (MDCT) for lossless subband transformation.
• Two different block lengths: 18 (long) or 6 (short)
• 50% overlap between successive transform windows gives window sizes of 36
or 12 — accounts for temporal masking
• Greater frequency resolution accounts for poorer time resolution
JJ
• Uses Huffman coding on quantised samples for better compression. II
J
I
Back
Close
Comparison of MPEG Levels

--------------------------------------------------------------------
Layer Target Ratio Quality @ Quality @ Theoretical
bitrate 64 kbits 128 kbits Min. Delay
--------------------------------------------------------------------
Layer 1 192 kbit 4:1 --- --- 19 ms
Layer 2 128 kbit 6:1 2.1 to 2.6 4+ 35 ms
565
Layer 3 64 kbit 12:1 3.6 to 3.8 4+ 59 ms
--------------------------------------------------------------------

• 5 = perfect, 4 = just noticeable, 3 = slightly annoying,


2 = annoying, 1 = very annoying
• Real delay is about 3 times theoretical delay

JJ
II
J
I
Back
Close
Bit Allocation

• Process determines the number of code bits for each subband


566
• Based on information from the psychoacoustic model.

JJ
II
J
I
Back
Close
Bit Allocation For Layer I and 2

• Compute the mask-to-noise ratio (MNR) for all subbands:


M N RdB = SN RdB − SM RdB 567
where
M N RdB is the mask-to-noise ratio,
SN RdB is the signal-to-noise ratio (SNR), and
SM RdB is the signal-to-mask ratio from the psychoacoustic
model.

• Standard tables estimate SNR for given quantiser levels.


JJ
• Designers are free to try other methods SNR estimation. II
J
I
Back
Close
Bit Allocation For Layer I and 2 (cont.)

Once MNR computed for all the subbands:


• Search for the subband with the lowest MNR
568
• Allocate code bits to that subband.
• When a subband gets allocated more code bits, the bit
allocation
– Unit looks up the new estimate for SNR
– Recomputes that subband’s MNR.
• The process repeats until no more code bits can be
allocated. JJ
II
J
I
Back
Close
Bit Allocation For Layer 3

• Uses noise allocation, which employs Huffman coding.


• Iteratively varies the quantisers in an orderly way 569

– Quantises the spectral values,


– Counts the number of Huffman code bits required to code
the audio data
– Calculates the resulting noise in Huffman coding.
If there exist scale factor bands with more than the
allowed distortion:
• Encoder amplifies values in bands JJ
• To effectively decreases the quantiser step size for those II
bands. J
I
Back
Close
Bit Allocation For Layer 3 (Cont.)

After this the process repeats. The process stops if any of


these three conditions is true:
570
• None of the scale factor bands have more than the allowed
distortion.
• The next iteration would cause the amplification for any of
the bands to exceed the maximum allowed value.
• The next iteration would require all the scale factor bands to
be amplified.
Real-time encoders include a time-limit exit condition for this
process. JJ
II
J
I
Back
Close
Stereo Redundancy Coding

Exploit redundancy in two couple stereo channels?


• Another perceptual property of the human auditory system
571
• Simply stated at low frequencies, the human auditory system
can’t detect where the sound is coming from.
– So save bits and encode it mono.

Two types of stereo redundancy coding:


• Intensity stereo coding — all layers
• Middle/Side (MS) stereo coding — Layer 3 only stereo JJ
II
coding. J
I
Back
Close
Intensity stereo coding

Encoding:
• Code some upper-frequency subband outputs:
– A single summed signal instead of sending independent left 572

and right channels codes


– Codes for each of the 32 subband outputs.
Decoding:
• Reconstruct left and right channels
– Based only on a single summed signal
– Independent left and right channel scale factors.
With intensity stereo coding, JJ
• The spectral shape of the left and right channels is the same II
within each intensity-coded subband J
I
• But the magnitude is different.
Back
Close
Middle/Side (MS) stereo coding

• Encodes the left and right channel signals in certain


frequency ranges: 573

– Middle — sum of left and right channels


– Side — difference of left and right channels.
• Encoder uses specially tuned threshold values to compress
the side channel signal further.

JJ
II
J
I
Back
Close
Further MPEG Audio Standards
MPEG-2 audio

Extension of MPEG-1:
574
• Completed in November 1994.
• Multichannel audio support:
– 5 high fidelity audio channels,
– Additional low frequency enhancement channel.
– Applicable for the compression of audio for High Definition Television or
digital movies.
• Multilingual audio support:
– Supports up to 7 additional commentary channels.

JJ
II
J
I
Back
Close
MPEG-2 audio (Cont.)

• Lower compressed audio bit rates:


– Supports bit rates down to 8 kbits/sec.
• Lower audio sampling rates:
575
– Besides 32, 44.1, and 48 kHz,
– Additional 16, 22.05, and 24 kHz.
– E.g Commentary channels can have half high fidelity channel sampling rate.

JJ
II
J
I
Back
Close
MPEG-1/MPEG-2 Compatibility
Forward/backward compatibility?
• MPEG-2 decoders can decode MPEG-1 audio bitstreams.
• MPEG-1 decoders can decode two main channels of MPEG-2
576
audio bitstreams.
– Achieved by combining suitably weighted versions of each
of the up to 5 channels into a down-mixed left and right
channel.
– These two channels fit into the audio data framework of a
MPEG-1 audio bitstream.
– Information needed to recover the original left, right, and
remaining channels fit into: JJ
• The ancillary data portion of a MPEG-1 audio bitstream, II
or J
I
• In a separate auxiliary bitstream.
Back
Close
MPEG-3/MPEG-4
MPEG-3 audio:
• does not exist anymore — merged with MPEG-2
MPEG-4 audio: 577

• Previously studied
• Uses structures audio concept
• Delegates audio production to client synthesis where
appropriate
• Otherwise compress audio stream as above.

JJ
II
J
I
Back
Close
Dolby Audio Compression

Application areas:
• FM radio Satellite transmission and broadcast TV audio
578
(DOLBY AC-1)
• Common compression format in PC sound cards
(DOLBY AC-2)
• High Definition TV standard advanced television (ATV)
(DOLBY AC-3). MPEG a competitor in this area.

JJ
II
J
I
Back
Close
Differences with MPEG
• MPEG perceptual coders control quantisation accuracy of
each subband by computing bit numbers for each sample.
• MPEG needs to store each quantise value with each sample.
579
• MPEG Decoder uses this information to dequantise:
forward adaptive bit allocation
• Advantage of MPEG?: no need for psychoacoustic
modelling in the decoder due to store of every quantise value.
• DOLBY: Use fixed bit rate allocation for each subband.
– No need to send with each frame — as in MPEG.
– DOLBY encoders and decoder need this information. JJ
II
J
I
Back
Close
Fixed Bit Rate Allocation

• Bit allocations are determined by known sensitivity


characteristics of the ear.
580

JJ
II
J
I
Back
Close
Different Dolby standards

DOLBY AC-1 :
Low complexity psychoacoustic model 581

• 40 subbands at sampling rate of 32 kbits/sec or


• (Proportionally more) Subbands at 44.1 or 48 kbits/sec
• Typical compressed bit rate of 512 kbits per second for
stereo.
• Example: FM radio Satellite transmission and broadcast
TV audio
JJ
II
J
I
Back
Close
DOLBY AC-2 :
Variation to allow subband bit allocations to vary
• NOW Decoder needs copy of psychoacoustic model.
• Minimised encoder bit stream overheads at expense of
transmitting encoded frequency coefficients of sampled 582

waveform segment — known as the encoded spectral


envelope.
• Mode of operation known as
backward adaptive bit allocation mode
• HIgh (hi-fi) quality audio at 256 kbits/sec.
• Not suited for broadcast applications:
– encoder cannot change model without changing
JJ
(remote/distributed) decoders II
• Example: Common compression format in PC sound cards. J
I
Back
Close
DOLBY AC-3 :
Development of AC-2 to overcome broadcast challenge
• Use hybrid backward/forward adaptive bit allocation mode
• Any model modification information is encoded in a frame.
• Sample rates of 32, 44.1, 48 kbits/sec supported depending 583
on bandwidth of source signal.
• Each encoded block contains 512 subband samples, with 50%
(256) overlap between successive samples.
• For a 32 kbits/sec sample rate each block of samples is of 8
ms duration, the duration of each encoder is 16 ms.
• Audio bandwidth (at 32 kbits/sec) is 15 KHz so each subband
has 62.5 Hz bandwidth.
• Typical stereo bit rate is 192 kbits/sec.
• Example: High Definition TV standard advanced television JJ
(ATV). MPEG competitor in this area. II
J
I
Back
Close
Streaming Audio (and video)
Popular delivery medium for the Web and other Multimedia networks
Real Audio (http://www.realaudio.com/), Shockwave
(http://www.macromedia.com) and Quicktime audio
(http://www.apple.com/quicktime) are examples of streamed audio 584
(and video)
• Need to compress and uncompress data in realtime.
• Buffered Data:
– Trick get data to destination before it’s needed
– Temporarily store in memory (Buffer)
– Server keeps feeding the buffer
– Client Application reads buffer
JJ
• Needs Reliable Connection, moderately fast too. II
• Specialised client, Steaming Audio Protocol (PNM for real audio). J
I
Back
Close
Multimedia Integration, Interaction and
Interchange
585
Integrating Multimedia
• So far we studied media independently
• Certain media (individually) are based on spatial
and/or temporal representations,
• Others may be static.

JJ
II
J
I
Back
Close
Integrating media (Cont.):

• Spatial and temporal implications become even more critical.


• E.g. static text may need to index or label a portion of video
586
at a given instant or segment of time
– Integration becomes temporal and spatial if the label is
placed at a given location (or locations moving over time).

JJ
II
J
I
Back
Close
Synchronisation

• Important to know the tolerance and limits for each medium


• Integration will require knowledge of these for synchronisation
587
and
– Indeed it creates further limits
– E.g. bandwidth of two media types increase, if audio is
encoded at a 48 KHz sampling rate and it needs to
accompany video being streamed out at 60 frames per
second
• Inter-stream synchronisation is not necessarily
straightforward. JJ
II
J
I
Back
Close
Integrated standards
• It is common (obvious) that media types are bundled together
for ease of delivery, storage etc.
• Formats have been developed to support, store and deliver
588
media in an integrated form.

JJ
II
J
I
Back
Close
Interchange Between Applications

The need for interchange between different multimedia


applications:
589
• Running on different platforms
• Evolved common interchange file formats.
• Build on underlying individual media formats (MPEG, JPEG
etc.)
• Truly integrated to become multimedia — Spatial, temporal
structural and procedural constraints will exist between the
media.
JJ
• This especially true now that interaction is a common feature II
of multimedia. J
I
Back
Close
Interactive Multimedia

Modern multimedia presentation and applications are


becoming increasingly interactive.
590
• Simple interactions that simply start movie clips, audio
segments animations etc.
• Complex interactions between media is new available:
– Following hyperlinks is instinctively non-linear and
– Advent of digital TV is important
• Interactivity now needs to be incorporated as part of the
media representation/format.
JJ
• The MHEG format (see below) has been developed II
expressly for such purposes. J
I
Back
Close
Multimedia Interchange
The need for interchange formats are significant in several
applications:
• As a final storage model for the creation and editing of
591
multimedia documents.
• As a format for delivery of final form digital media.
E.g. Compact Discs/DVDs to end-use players.
• As a format for real-time delivery over a distributed network
• As a format for interapplication exchange of data.

JJ
II
J
I
Back
Close
Quicktime
Introduction
• QuickTime is the most widely used cross-platform multimedia
technology available today.
592
• QuickTime now has powerful streaming capabilities, so you
can enjoy watching live events as they happen.
• Developed by Apple The QuickTime 6 (2002) is the latest
version
• It includes streaming capabilities as well as the tools needed
to create, edit, and save QuickTime movies.
• These tools include the QuickTime Player, PictureViewer, JJ
and the QuickTime Plug-in. II
J
I
Back
Close
Quicktime Main Features
Versatile support for web-based media
• Access to live and storedstreaming media content with the
QuickTime Player
593
• High-Quality Low-Bandwidth delivery of multimedia
• Easy view of QuickTime movies (with enhanced control) in
Web Browsers and applications.
• Multi platform support.
• Built in support for most popular Internet media formats
(well over 40 formats).
• Easy import/export of movies in the QuickTime Player JJ
II
J
I
Back
Close
Sophisticated playback capabilities

• Play back full-screen video


• Play slide shows and movies continuously
594
• Work with video, still-image, and sound files in all leading
formats

JJ
II
J
I
Back
Close
Easy content authoring and editing

• Create new QuickTime streaming movies by copying and


pasting content from any supported format
595
• Enhance movies and still pictures with filters for sharpening,
color tinting, embossing, and more
• Save files in multiple formats, including the new DV format
for high-quality video
• Create slide shows from pictures
• Add sound to a slide show

JJ
II
J
I
Back
Close
Quicktime Support of Media Formats
QuickTime is an open standard:
• Embraces other standards and incorporates them into its environment.
• It supports every major file format for pictures, including BMP, GIF, JPEG, PICT,
and PNG. Even JPEG 2000.
596
• QuickTime also supports every important professional file format for video,
including AVI, AVR, DV (Digital Video), M-JPEG, MPEG-1 – MPEG-4, and
OpenDML.
• All common Audio format — incl. MPEG-4 Structured Audio.
• MIDI standards support including as the Roland Sound Canvas sound set and
the GM/GS format extensions.
• Other multimedia — FLASH support.
• Other Multimedia integration standards — SMIL
• Key standards for web streaming, including HTTP, RTP, and RTSP as set forth
by the Internet Engineering Task Force, are supported as well. JJ
• Speech Models — synthesised speech II
J
• QuickTime supports Timecode tracks, including the critical
standard for video timecode (SMPTE) and for musicians. I
Back
Close
QuickTime Concepts
To following concepts QuickTime are used by Quicktime:
Movies and Media Data Structures —
• A continuous stream of data —cf. a traditional movie, whether 597
stored on film, laser disk, or tape.
• A QuickTime movie can consist of data in sequences from
different forms, such as analog video and CD-ROM.
• The movie is not the medium; it is the organizing principle.
• Contains several tracks.
• Each track refers to a media that contains references to the
movie data, which may be stored as images or sound on hard
disks, floppy disks, compact discs, or other devices.
• The data references constitute the track’s media. JJ
• Each track has a single media data structure. II
J
I
Back
Close
Components —
• Provided so that every application doesn’t need to know about
all possible types of audio, visual, and storage devices.
• A component is a code resource that is registered by the
Component Manager.
598
• The component’s code can be available as a system wide
resource or in a resource that is local to a particular application.
• Each QuickTime component supports a defined set of features
and presents a specified functional interface to its client
applications.
• Applications are thereby isolated from the details of
implementing and managing a given technology.
• For example, you could create a component that supports a
certain data encryption algorithm. JJ
• Applications could then use your algorithm by connecting to II
your component through the Component Manager, rather than J
by implementing the algorithm over again. I
Back
Close
Image Compression —

• QuickTime movie can demand substantial storage than


single images.
• Minimizing the storage requirements for image data is 599

an important consideration for any application that works


with images or sequences of images.
• The Image Compression Manager provides the application
with an interface for compressing and decompressing.
• Independent of devices and algorithms.

JJ
II
J
I
Back
Close
Time —

• Time management in QuickTime is essential for


synchronisation
• QuickTime defines time coordinate systems, which anchor 600

movies and their media data structures to a common


temporal timeframe.
• A time coordinate system contains a time scale that
provides the translation between real time and the time
frame in a movie.
• Time scales are marked in time units.

JJ
II
J
I
Back
Close
Time (cont.) —
• The number of units that pass per second quantifies the
scale–that is, a time scale of 26 means that 26 units pass
per second and each time unit is 1/26 of a second.
• A time coordinate system also contains a duration, which 601

is the length of a movie or a media in the number of time


units it contains.
• Particular points in a movie can be identified by a time
value, the number of time units elapsed to that point.
• Each media has its own time coordinate system, which
starts at time 0.
• The Movie Toolbox maps each type of media data from
the movie’s time coordinate system to the media’s time JJ
coordinate system. II
J
I
Back
Close
The QuickTime Architecture

QuickTime comprises two managers:


Movie Toolbox and
602
Image Compression Manager .

QuickTime also relies on the Component Manager, as well


as a set of predefined components.

JJ
II
J
I
Back
Close
The QuickTime Architecture (Cont.)

The relationships of these managers and an application that


is playing a movie:
603

JJ
II
J
I
Figure 55: Quicktime Architecture Back
Close
The Movie Toolbox
allows you to:
• store,
• retrieve, and
604
• manipulate time-based data
that is stored in QuickTime movies.

JJ
II
J
I
Back
Close
The Image Compression Manager :

Comprises a set of functions that compress and decompress


images or sequences of graphic images.
• Device and driver independent means of compressing and 605

decompressing images and sequences of images.


• A simple interface for implementing software and hardware
image-compression algorithms.
• System integration functions for storing compressed
images as part of PICT files,
• Ability to automatically decompress compressed PICT files.
• Most applications use the Image Compression Manager
JJ
indirectly — by calling Movie Toolbox functions or by
II
displaying a compressed picture. J
• Can call Image Compression Manager functions directly I
Back
Close
The Component Manager :

The Component Manager allows you to define and register


types of components and communicate with components 606
using a standard interface.
• A component is a code resource that is registered by the
Component Manager.
• The component’s code can be stored in a system wide
resource or in a resource that is local to a particular
application.

JJ
II
J
I
Back
Close
QuickTime Components

QuickTime includes several components


• These components provide useful/essential services to your
607
application and
• Essential to support the managers that make up the QuickTime
architecture.

JJ
II
J
I
Back
Close
QuickTime Components
Movie controller : Components, which allow applications to play
movies using a standard user interface standard image
compression dialog components, which allow the user to
specify the parameters for a compression operation by 608
supplying a dialog box or a similar mechanism
Image compressor : Components, which compress and
decompress image data sequence grabber components,
which allow applications to preview and record video and
sound data as QuickTime movies video digitizer components,
which allow applications to control video digitization by an
external device
Media data-exchange : Components, which allow applications JJ
to move various types of data in and out of a QuickTime II
J
movie derived media handler components, which allow QuickTime
I
to support new types of data in QuickTime movies Back
Close
QuickTime Components (Cont.)
Clock : Components, which provide timing services defined
for QuickTime applications preview components, which are
used by the Movie Toolbox’s standard file preview functions
to display and create visual previews for files sequence grabber 609
components, which allow applications to obtain digitized data
from sources that are external to a Macintosh computer
Sequence grabber : Channel components, which manipulate
captured data for a sequence grabber component
Sequence grabber panel : Components, which allow sequence
grabber components to obtain configuration information from
the user for a particular sequence grabber channel component
JJ
II
J
I
Back
Close
Open Media Framework Interchange (OMFI) Format

The OMFI is a common interchange framework developed


in response to an industry led standardisation effort (including
Avid — a major digital video/audio hardware/applications vendor) 610

Like Quicktime the primary concern of the OMFI format is


concerned with temporal representation of media (such as video
and audio) and a track model is used.

JJ
II
J
I
Back
Close
Target: Video/Audio Production

The primary emphasis is video production and an number of


additional features reflect this:
611
• Source (analogue) material object represent videotape and
film so that the origin of the data is readily identified. Final
footage may resort to this original form so as to ensure highest
possible quality.
• Special track types store (SMPTE) time codes for segments
of data.
• Transitions and effects for overlapping and sequences of
segments are predefined. JJ
• Motion Control — the ability to play one track at a speed II
J
which is a ratio of the speed of another track is supported. I
Back
Close
OMFI Format/Support
The OMFI file format incorporates:
• A header — including references for objects contained in file
• Object dictionary — to enhance the OMFI class hierarchy in
612
an application
• Object data
• Track data
OMFI Support:
• Main Video development tools including

Apple Final Cut Pro, Xpress (Pro/DV), Softimage JJ


• Main Audio development tools including: II
J
I
Protools, Cakewalk/Sonar 2.0 Back
Close
Multimedia and Hypermedia Information
Encoding Expert Group (MHEG)
613

• Arose directly out of the increasing convergence of broadcast and interactive


technologies — DIGITAL INTERACTIVE TV
• Specifies an encoding format for multimedia applications independently of service
paradigms and network protocols.
• Like Quicktime and OMFI it is concerned with time-based media objects, whose
encodings are determined by other standards.
• Scope of MHEG is large in that it directly supports interactive media and real-time
delivery over networks.
• The current widespread standard is MHEG-5 but standards exist up to MHEG-8. JJ
II
J
I
Back
Close
Practical MHEG: Digital Terrestrial TV
• Media interchange format in Digital TV set top boxes
• In the UK,
– ITV digital — WENT BUST (2002) !!! 614
– Freeview digital terrestrial (2002)
• MHEG is also widely used in European Digital TV.

JJ
II
J
I
Back
Close
Digital TV Group UK
• UK digital TV interests are managed by the Digital TV Group
UK — http://www.dtg.org.uk/.
• Alternative (satellite) digital TV interest: SKY,
615
– uses a proprietary API format, called OPEN (!!).
– MHEG advantage: is a truly open format (ISO standard).
– MHEG is the only open standard in this area.
Further reading:
http://www.dtg.org.uk/reference/mheg/ mheg index.html

JJ
II
J
I
Back
Close
Digital TV services
What sort of multimedia services does digital TV provide?

616

JJ
II
Figure 56: UK Digital TV Consortium
J
I
Back
Close
The family of MHEG standards

Version Complete Name


MHEG-1 MHEG object representation-base
notation (ASN.1)
MHEG-2 MHEG object representation-alternate 617
notation (SGML)
MHEG-3 MHEG script interchange representation
MHEG-4 MHEG registration procedure
MHEG-5 Support for base-level interactive applications
MHEG-6 Support for enhanced interactive applications
MHEG-7 Interoperability and conformance testing
for ISO/IEC 13522-5
Table 1: MHEG Standards
JJ
II
J
I
Back
Close
MHEG Standards Timeline

Version Status
MHEG-1 International standard
618
MHEG-2 Withdrawn
MHEG-3 International standard
MHEG-4 International standard
MHEG-5 International standard
MHEG-6 International standard
(1998)
MHEG-7 International standard
(1999)
MHEG-8 (XML) Draft international standard
(Jan 1999) JJ
II
Table 2: MHEG Standards Timeline
J
I
Back
Close
MHEG-5 overview
The major goals of MHEG-5 are:
• To provide a good standard framework for the development
of client/server multimedia applications intended to run on a
619
memory-constrained Client.
• To define a final-form coded representation for interchange
of applications across platforms of different versions and
brands.
• To provide the basis for concrete conformance levelling,
guaranteeing that a conformant application will run on all
conformant terminals.
• To allow the runtime engine on the Client to be compact and JJ
easy to implement. II
J
• To be free of strong constraints on the architecture of the
I
Client. Back
Close
MHEG-5 Goals (Cont.)

• To allow the building of a wide range of applications —


providing access to external libraries. Only be partly portable.
620
• To allow for application code that is guaranteed to be “safe”.
• To allow automatic static analysis of (final-form) application
code in order to help insure bug-free applications and
minimize the debugging investment needed to get a robust
application.
• To promote rapid application development by providing
high-level primitives and provide a declarative paradigm for
the application development. JJ
II
J
I
Back
Close
MHEG-5 Model

The MHEG-5 model is object-oriented.

621

The actions are methods targeted to objects from different


classes to perform a specific behavior and include:
• Preparation,
• Activation,
• Controlling the presentation,
• User interaction,
JJ
• Getting the value of attributes, II
• and so on. J
I
Back
Close
MHEG Client-Server Interaction

622

Figure 57: MHEG Client-Server Interaction

JJ
II
J
I
Back
Close
MHEG Programming Principles

OBJECT ORIENTED — simple Object-oriented implementation

MHEG-5 provides suitable abstractions for 623

• managing active, autonomous, and reusable entities


• pure object-oriented approach.

JJ
II
J
I
Back
Close
Basic MHEG Class Structure

An MHEG class is specified by three kinds of properties:


• Attributes that make up an object’s structure,
624
• Events that originate from an object, and
• Actions that target an object to accomplish a specific behavior
or to set or get an attribute’s value.

JJ
II
J
I
Back
Close
Main MHEG classes

The most significant classes of MHEG-5 are now briefly


described:
625
Root — A common Root superclass provides a uniform object
identification mechanism and specifies the general semantics
for preparation/destruction and activation/deactivation of
objects, including notification of changes of an object’s
availability and running status.
Group — This abstract class handles the grouping of objects in
the Ingredient class as a unique entity of interchange.
Group objects can be addressed and independently JJ
downloaded from a server. II
A Group can be specialized into Application and Scene J
classes. I
Back
Close
Main MHEG classes (Cont.)
Application — An MHEG-5 application is structurally organized
into one Application and one or more Scene objects.
• The Application object represents the entry point that
626
performs a transition to the presentation’s first Scene.
• Generally, this transition occurs at startup because a
presentation can’t happen without a Scene running.
• The Launch action activates an Application after quitting
the active Application.
• The Quit action ends the active Application, which also
terminates the active Scene’s presentation.
• The Ingredients of an Application are available to the
JJ
different Scenes that become active, thereby allowing an
II
uninterrupted presentation of contents J
• E.g. a bitmap can serve as the common background for I
all Scenes in an Application. Back
Close
Main MHEG classes (Cont.)
Scene — This class allows spatially and temporally coordinated
presentations of Ingredients.
• At most, one Scene can be active at one time.
627
• Navigating within an Application is performed via the
TransitionToaction that closes the current Scene,
including its Ingredients, and activates the new one.
• The SceneCoordinateSystem attribute specifies the
presentation space’s 2D size for the Scene.
• If a user interaction occurs in this space, a UserInput
event is generated.
• A Scene also supports timers. JJ
• A Timer event is generated when a timer expires. II
J
I
Back
Close
Main MHEG classes (Cont.)
Ingredient — Abstract class provides the common behavior for
all objects included in an Application or a Scene.
• The OriginalContent attribute maps object and content
data 628

• The ContentHook attribute specifies the encoding format


for the content.
• The action Preload gives hints to the RTE for making
the content available for presentation.
– Especially for streams, this action does not completely
download the content, it just sets up the proper network
connection to the site where the content is stored.
• The action Unload frees allocated resources for new JJ
content. II
J
The Presentable, Stream, and Link classes are subclasses of the Ingredient I
class. Back
Close
Ingredient subclasses
Presentable — This abstract class specifies the common aspects
for information that can be seen or heard by the user. The
Run and Stop actions activate and terminate the
presentation, while generating the IsRunning and 629
IsStopped events.
Visible — The Visible abstract class specializes the
Presentable class with provisions for displaying objects in
the active Scene’s presentation space.
The OriginalBoxSize and OriginalPosition attributes
respectively specify the size and position of the object’s
bounding box relative to the Scene’s presentation space.
The actions SetSize and SetPosition change the current JJ
II
values of these attributes.
J
I
Back
Close
Visible Object Classes
The specialized objects in the Visible class include:
• Bitmap — This object displays a 2D array of pixels. The
Tiling attribute specifies whether the content will be replicated
630
throughout the BoxSize area.
The action ScaleBitmap scales the content to a new size.

Example, to create a simple bitmap object:


(bitmap: BgndInfo
content-hook: #bitmapHook
content-data: referenced-content:
"Info.bitmap" JJ
box-size: ( 320 240 ) II
original-position: ( 0 0 ) J
) I
Back
Close
Visible Object Classes (Cont.)

• LineArt, DynamicLineArt — A LineArt is a


vector representation of graphical entities, like polylines and
631
ellipses.
DynamicLineArt draws lines and curves on the fly in the
BoxSize area.
• Text — This object represents a text string with a set of
rendition attributes. Essentially, these attributes specify fonts
and formatting information like justification and wrapping.

JJ
II
J
I
Back
Close
Ingredient subclasses (Cont.)
Stream — This class controls the synchronized presentation of
multiplexed audio-visual data (such as an MPEG-2 file).
• A Stream object consists of a list of components from the
Video, Audio, and RTGraphics (animated graphics) classes. 632
• The OriginalContent attribute of the Stream object refers
to the whole multiplex of data streams.
• When a Stream object is running, its streams can be switched
on and off independently — allows users switch between
different audio trails (different languages) or choose which video
stream(s) to present among a range of available ones.
• Specific events are associated with playback:
StreamPlaying/StreamStopped notifies the actual
initiation/termination and CounterTrigger notifies the system JJ
when a previously booked time-code event occurs. II
J
I
Back
Close
Ingredient subclasses (Cont.)
Link — The Link class implements event-action behavior by a
condition and an effect.
• The LinkCondition contains
633
– An EventSource — a reference to the object on which
the event occurs
– An EventType — specifies the kind of event and a
possible EventData that is a data value associated
with the event.
• MHEG-5 Action objects consist of a sequence of
elementary actions.
• Elementary actions are comparable to methods in standard JJ
object-oriented terminology. II
• The execution of an Action object means that each of its J
elementary actions are invoked sequentially. I
Back
Close
Simple Link Example

As an example, consider the following Link, which transitions


to another Scene when the character A is entered in the
EntryField EF1. 634

Example, to create a simple link:

(link: Link1
event-source: EF1
event-type: #NewChar
event-data: ’A’
link-effect: JJ
II
(action: transition-to: Scene2)
J
) I
Back
Close
Interactible Object Class

Interactible — This abstract class provides a way for users to


interact with objects within the following sub-classes:
635
Hotspot, PushButton, and SwitchButton —
These subclasses implement button selection capability
and generate the IsSelected event.

Example, to create a simple SwitchButton:


(switchbutton: Switch1
style: #radiobutton
position: ( 50 70 ) JJ
label: "On" II
) J
I
Back
Close
Interactible Object Class (Cont.)
Hypertext — This class extends the Text class with anchors.
When selected, these anchors link text content to associated
information.
636
Slider and EntryField — Respectively, these objects let users
adjust a numeric value (such as the volume of an audio
stream) and edit text.

Example, to create a simple slider:

(slider: Slider1
box-size: ( 40 5 )
original-position: ( 100 100 ) JJ
max-value: 20 II
orientation: #right J
) I
Back
Close
UK Digital Terrestrial MHEG Support:EuroMHEG
• Above only some main classes of MHEG addressed.
• Few other classes omitted.
• Gain a broad understanding of how MHEG works 637

• Basic classes that support this.


MHEG Class Support:
• Not all MHEG engines support all MHEG classes.
• UK digital TV MHEG:
– needed to be initially restricted
– to meet production timescales for the launch. JJ
– EuroMHEG standard was thus defined. II
– EuroMHEG: extensible so as to be able to include updates J
and additions in due course. I
Back
Close
EuroMHEG Classes
The MHEG classes supported by EuroMHEG are:

Root Group Application 638


Scene Ingredient Link
Program ResidentProgram RemoteProgram
Palette Font CursorShape
Variable BooleanVariable IntegerVariable
OctetStringVariable ObjectRefVariable ContentRefVariable
Presentable TokenManager TokenGroup
ListGroup Visible Bitmap
LineArt Rectangle DynamicLineArt
Text Stream Audio
Video RTGraphics Interactible
Slider EntryField HyperText
Button HotSpot PushButton
SwitchButton Action JJ
II
J
I
Back
Close
Interaction within a Scene
The MHEG application is event-driven, in the sense that all
actions are called as the result of an event firing a link.
Events can be divided into two main groups:
639
• Asynchronous events are events that occur asynchronously
to the processing of Links in the MHEG engine. These include
timer events and user input events. An application area
of MHEG-5 (such as DAVIC) must specify the permissible
UserInput events within that area.
asynchronous events are queued.
• Synchronous events are events that can only occur as the
result of an MHEG-5 action being targeted to some objects. JJ
A typical example of a synchronous event is IsSelected, II
which can only occur as the result of the MHEG-5 action J
Select being invoked. I
Back
Close
MHEG Engine Basics
The mechanism at the heart of the MHEG engine:
1. After a period of of idleness, an asynchronous event occurs —e.g.
a user input event, a timer event, a stream event, or some other
type of event. 640

2. Possibly, a link that reacts on the event is found. This link is then
fired. If no such link is found, the process starts again at 1.
3. The result of a link being fired is the execution of an action object,
which is a sequence of elementary actions. These can change
the state of other objects, create or destroy other objects, or
cause events to occur.
4. As a result of the actions being performed, synchronous events
may occur. These are dealt with immediately, i.e., before processing JJ
any other asynchronous events queued. II
When all events have been processed, the process starts again at J
1. I
Back
Close
Availability; Running Status
Before doing anything to an object, the MHEG-5 engine must
prepare it
• Preparing an object typically entails retrieving it from the
641
server, decoding the interchange format and creating the
corresponding internal data structures, and making the
object available for further processing.
• The preparation of an object is asynchronous; its completion
is signalled by an IsAvailable event.
• All objects that are part of an application or a scene have a
RunningStatus, which is either true or false.
• Objects whose RunningStatus is true are said to be JJ
running, which means that they perform the behaviour they II
J
are programmed for.
I
Back
Close
RunningStatus (Cont.)

More concretely, these are the rules governed by


RunningStatus:
642

• Only running Visibles are actually visible on the screen,


• Only running Audio objects are played out through the
loudspeaker,
• Only running Links will execute the action part if the
associated event occurs, etc.

JJ
II
J
I
Back
Close
Interactibles

The MHEG-5 mix-in class Interactible groups some


functionality associated with user interface-related objects:
643

E.g. Slider, HyperText, EntryField, Buttons.

These objects can all be highlighted

• by setting their HighlightStatus to True.

They also have the attribute InteractionStatus, which, JJ


when set to true, allows the object to interact directly with the II
user, thus bypassing the normal processing of UserInput J
events by the MHEG-5 engine. I
Back
Close
Interactibles (Cont.)

Exactly how an Interactible reacts when its


InteractionStatus is true is implementation-specific.
644

Example:
• The way that a user enters characters in an EntryField
can be implemented in different ways in different MHEG-5
engines.

At most one Interactible at a time can have its


InteractionStatus set to True. JJ
II
J
I
Back
Close
Visual Representation
For objects that are visible on the screen, the following rules
apply :
• Objects are drawn downwards and to the right of their position
645
on the screen. This point can be changed during the life
cycle of an object, thus making it possible to move objects.
• Objects are drawn without scaling. Objects that do not fit
within their bounding box are clipped.
• Objects are drawn with ”natural” priority, i.e., on top of already
existing objects. However, it is possible to move objects to
the top or the bottom of the screen, as well as putting them
before or after another object. JJ
• The screen can be frozen, allowing the application to perform II
many (possibly slow) changes and not update the screen J
I
until it’s unfrozen. Back
Close
Object Sharing Between Scenes

It is possible within MHEG-5 to share objects between some


or all scenes of an Application.
646

Example, Sharing can be used:


• To have variables retain their value over scene changes, or
• To have an audio stream play on across a scene change.

Shared objects are alway contained in an Application


object:
JJ
• Since there is always exactly one Application object II
running whenever a scene is running, the objects contained J
in an Application object are visible to each of its scenes. I
Back
Close
MHEG Object Encoding
The MHEG-5 specification does not prescribe any specific formats
for the encoding of
content.
647
• For example, it is conceivable that a Video object is encoded
as MPEG or as motion-JPEG.

This means that the group using MHEG-5 (E.g. EuroMHEG)


must define which content encoding schemes to apply for the
different objects in order to achieve interoperability.

However, MHEG-5 does specify a final-form encoding of the


JJ
MHEG-5 objects themselves. II
• This encoding is an instance of ASN.1, using the Basic J
Encoding Rules (BER). I
Back
Close
MHEG Coding Examples: A Simple MHEG Example

Left is a very simple scene


that displays a bitmap and 648
text.
• The user can press the
’Left’ Mouse (or input
other device) button and
• A transition is made
from the current scene,
InfoScene1, to a new
scene, InfoScene2. JJ
II
J
I
Back
Close
The pseudo-code from the above scene may look like the
following:
(scene:InfoScene1
<other scene attributes here>
group-items:
(bitmap: BgndInfo
content-hook: #bitmapHook 649
original-box-size: (320 240)
original-position: (0 0)
content-data: referenced-content: "InfoBngd"
)
(text:
content-hook: #textHook
original-box-size: (280 20)
original-position: (40 50)
content-data: included-content: "1. Lubricate..."
)
links:
(link: Link1
event-source: InfoScene1 JJ
event-type: #UserInput
event-data: #Left
II
link-effect: action: transition-to: InfoScene2 J
) I
)
Back
Close
An MHEG Player Java Applet — Further MHEG Examples
The Technical University of Berlin have produced a MHEG
Java Engine:
http://www.prz.tu-berlin.de/ joe/mheg/mheg engine.html
650

• Java Class libraries (with JavaDoc documentation) and details


on installation/compilation etc. are also available.
• Several examples of MHEG coding, including an MHEG
Introduction written in MHEG.

JJ
II
J
I
Back
Close
Running the MHEG Engine

The MHEG engine exists as a Java applet and supporting


Class Libraries:
651
• you can of course uses the class library in your own java
code (applications and applets).

The MHEG engine is available in the MHEG Examples on the


Multimedia Lecture Examples Web Page.

You can run the applet through any Java enabled Web browser
or applet viewer. JJ
II
J
I
Back
Close
Running the MHEG Engine Applet

Here as an example of how to run the main applet provided


for the demo MHEG example:
652

<applet name="MHEG 5 Engine"


code="mheg5/POM/Mheg5Applet.class"
codebase="applications/"
archive="mhegwww.zip"
width="510"
height="346"
align="center"
<param name="objectBasePath" value="file:.">
<param name="groupIdentifier" JJ
value="demo/startup"> II
<param name="mon" value="false"> J
</applet> I
Back
Close
Running the MHEG Engine Applet: Own Applications

If you use the applet yourself you may need to change:

653
• The code and codebase paths — these specify where the
applications and applet classes reside.
• The groupIdentifiervalue — for most of the application
demos a startup MHEG file is reference first in a folder for
each application.

See other examples below. JJ


II
J
I
Back
Close
MHEG Example — The Simple MHEG Presentation
The Simple example produces the following output:

654

Figure 58: MHEG Simple Application Example


The presentation creates:
• Two buttons, labelled “Hello” and “World” respectively, and
JJ
• Some rectangle graphics. II
• When pressed the button is brought to the foreground of the J
I
display. Back
Close
MHEG Example — The Simple MHEG Presentation Structure

The MHEG modules for this presentation are:


startup — calls helloworld.mheg
655
helloworld.mheg — sets up main presentation calls
scene1.mheg
scene1.mheg — called in helloworld.mheg

JJ
II
J
I
Back
Close
MHEG Example — The Demo MHEG Presentation
The Demo example produces the output:

656

Figure 59: MHEG Demo application Example JJ


II
As can be seen many of the key features of MHEG are illustrated J
in further sub-windows (click on button to move to respective I
window). Try these out for yourself. Back
Close
The following MHEG modules are used:
startup — Initial module
main.mhg — Called by startup
disp1.mhg — input from numeric keys 1 and 2 to tile rectangles
657
(Fig 60)
disp2.mhg — input from numeric keys 1 and 2 to tile rectangles
(different display) (Fig 61)
text.mhg — illustrates MHEG control of text display (Fig 62)
intact.mhg — illustrates MHEG interactive objects (Fig 63)
bitmap1.mhg — illustrates MHEG display of bitmaps
bitmap2.mhg — illustrates MHEG display of bitmaps JJ
ea.mhg — illustrates MHEG elementary actions (Fig 64) II
J
allcl.mhg — MHEG concrete classes and elementary actions I
(Fig 65) Back
Close
658

JJ
Figure 60: MHEG Demo application Display1 Example
II
J
I
Back
Close
659

JJ
Figure 61: MHEG Demo application Display2 Example
II
J
I
Back
Close
660

JJ
II
Figure 62: MHEG Demo application Text Example J
I
Back
Close
661

JJ
Figure 63: MHEG Demo application Interactive Objects Example II
J
I
Back
Close
662

JJ
Figure 64: MHEG Demo application Elementary Actions Example
II
J
I
Back
Close
663

JJ
II
J
Figure 65: MHEG Demo application Conctrete Classes Example I
Back
Close
token.mhg — MHEG token groups example (Fig 66)

664

JJ
Figure 66: MHEG Demo application Token Groups Example
II
J
I
Back
Close
More Examples
Further examples are available in the applications folder:
bitmap — further examples of bitmaps in MHEG
interacting — further examples of interaction in MHEG 665

intvar — integer variables


jmf — video and audio
quiz2 — MHEG quiz in MHEG
text — further text in MHEG.

JJ
II
J
I
Back
Close
MHEG Relationships to Major Standards
Important relationships exist between MHEG-5 and other
standards and specifications.
Davic (Digital Audio Visual Council) — aims to maximize
666
interoperability across applications and services for the
broadcast and interactive domains.

Davic 1.0 selected MHEG-5 for encoding base level


applications and Davic 1.1 relies on MHEG-6 to extend these
applications in terms of the Java virtual machine that uses
services from the MHEG-5 RTE.
DVB (Digital Video Broadcasting) — provides a complete JJ
solution for digital television and data broadcasting across a II
range of delivery media where audio and video signals are J
encoded in MPEG-2. I
Back
Close
MHEG Relationships to Major Standards (Cont.)
MPEG — family of standards used for coding audiovisual
information (such as movies, video, and music) in a digital
compressed format.
MPEG-1 and MPEG-2 streams are likely to be used by 667

MHEG-5 applications, which can easily control their playback


through the facilities provided by the Stream class.
DSMCC (Digital Storage Media Command and Control) — a
set of protocols for controlling and managing MPEG streams
in a client-server environment.
The user-to-user protocol (both the client and server are
users) consists of VCR commands for playback of streams
stored on the server, as well as commands for downloading JJ
II
other data (bitmaps, text, and so on).
J
I
Back
Close
MHEG Implementation
Several components may be requires in implementing and MHEG
systems:
Runtime Engine (RTE) — MHEG-5 runtime engines generally
668
run across a client-server architecture
• See The Armida (ATM) system (Figure 67) referenced
below for an example application,
• Also the Java MHEG Engine previously mentioned.

JJ
II
J
I
Back
Close
MHEG Implementation (Cont.)

669

Figure 67: Armedia Client Architecture


Armedia is a client-server based interactive multimedia application JJ
retrieval system. II
J
I
Back
Close
MHEG Implementation (Cont.)
A preceding Start-up Module may be used to perform general
initialization etc.:
• The client can be launched either as an autonomous Windows
670
application or
• As a plug-in by an HTML browser, allowing seamless
navigation between the World Wide Web and the webs of
MHEG-5 applications. (See Armida system for more details).
• Java RTE also available

JJ
II
J
I
Back
Close
Run Time Engine (RTE)

The MHEG-5 RTE is the kernel of the client’s architecture. It


performs
• The pure interpretation of MHEG-5 objects and, 671
• As a platform-independent module, issues I/O and data
access requests to other components that are optimized for
the specific runtime platform.

The RTE performs two main tasks.


• Prepares he presentation and handles accessing, decoding,
and managing MHEG-5 objects in their internal format.
JJ
• The actual presentation, which is based on an event loop
II
where events trigger actions. J
These actions then become requests to the Presentation I
layer along with other actions that internally affect the engine. Back
Close
Presentation layer
The presentation layer (PL)
• manages windowing resources,
• deals with low-level events, and
672
• performs decoding and rendering of contents from different
media to the user.
• functionality via an object-oriented API

JJ
II
J
I
Back
Close
Access module
This module provides a consistent API for accessing information
from different sources.
It’s used by the RTE to get objects and the PL to access
content data (either downloaded or streamed). 673
Typical applications should support:
• Bulk download for bitmaps, text, and MHEG-5 objects;
and
• Progressive download for audio and audiovisual streams.

JJ
II
J
I
Back
Close
DSMCC Interface—

The implementation of these mechanisms occurs via the


DSMCC interface.
674
• The user has full interactive control of the data presentation,
including playback of higher quality MPEG-2 streams
delivered through an ATM network.
• Object and content access requests can also be issued to
the Web via HTTP — may not yet provide adequate
quality-of-service (QoS).
• When accessing the broadcast service, the Access module
requires the DVB channel selection component to select the JJ
program referred to by a Stream object. II
J
I
Back
Close
MHEG Authoring Tools: MediaTouch —
The availability of an adequate authoring tool is mandatory
to create the MHEG applications. The MediaTouch (Figure 68)
application is one example developed for the Armida System
( http://drogo.cselt.stet.it/ufv/ArmidaIS/home en.htm ).
675

JJ
II
J
I
Back
Close
MediaTouch (Cont.)
It is a visual-based Hierarchical Iconic authoring tool, similar to
Authorware in many approaches.

676

JJ
II
Figure 68: MediaTouch MHEG Authoring Tool J
(Hierarchy and Links Editor windows) I
Back
Close
MHEG Authoring Tools: MHEGDitor
MHEGDitor is an MHEG-5 authoring tool based on
Macromedia Director, composed of:
• An Authoring Xtra to edit applications — It opens a window
to set preferences and links a specific external script 677

castLib to your movie for you to create specific MHEG


behaviours rapidly. You can test your application on the
spot within Director, as if it were played by MHEGPlayer,
the MHEG interpreter companion of MHEGDitor.
• Converter Xtra to convert resulting movies into MHEG-5
applications — convert Macromedia Director movies
(edited with MHEGDitor Authoring Xtra) into a folder
containing all necessary items for an MHEG-5 application. JJ
• The two MHEGDitor Xtras work separately. II
J
I
Back
Close
MHEG writing tools: MHEG Write
An editor to create and manipulate MHEG-5 applications by
hand
• Based on the free software ”VIM”, which is
available via Internet from various sites for virtually all 678

operating systems
• The MHEGWrite extension supports only the MHEG-5
textual notation.
• provides macros for object templates, syntax highlighting
and syntax error detection.

JJ
II
J
I
Back
Close
Playing MHEG files — There are a few ways to play MHEG
files:
• MHEGPlayer is an MHEG-5 interpreter which is able to
execute MHEG-5 applications developed with
MHEGDitor or any other authoring tool. 679

• MHEG Java Engine — Java Source code exists to compile


a platform-independent MHEG player
( http://enterprise.prz.tu-berlin.de/imw/)
• MHEG plug-ins for Netscape Browsers and Internet
Explorer have been developed,
Note: That a Web to MHEG Converter Also Exists
JJ
II
J
I
Back
Close
MHEG Future
Several companies and research institutes are currently
developing MHEG tools and applications and conducting
interoperability experiments for international projects and
consortia. 680

The MHEG Support Center is a European project that hopes


to implement and operate an MHEG support and conformance
testing environment for developers of multimedia systems and
applications.

JJ
II
J
I
Back
Close

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy