Multimedia Systems Ebook PDF
Multimedia Systems Ebook PDF
JJ
II
J
I
Back
Close
Recommended Course Book
8
Fundamentals of Multimedia
Mark S. Drew, Li Ze-Nian
Prentice Hall, 2003
(ISBN: 0130618721)
JJ
II
J
I
Back
Close
Other Good General Texts
9
• Multimedia Communications:
Applications, Networks, Protocols and
Standards,
Fred Halsall,
Addison Wesley, 2000
(ISBN 0-201-39818-4)
OR
• Networked Multimedia Systems,
Raghavan and Tripathi,
Prentice Hall,
(ISBN 0-13-210642) JJ
II
J
I
Back
Close
The following books are highly recommended reading:
• Hypermedia and the Web: An Engineering Approach, D. Lowe and W. Hall, J. Wiley
and Sons, 1999 (ISBN 0-471-98312-8).
• Multimedia Systems, J.F.K, Buford, ACM Press, 1994 (ISBN 0-201-53258-1).
• Understanding Networked Multimedia, Fluckiger, Prentice Hall, (ISBN 0-13-190992-4)
• Design for Multimedia Learning, Boyle, Prentice Hall, (ISBN 0-13-242215-8) 10
JJ
II
J
I
Back
Close
Multimedia Authoring — Useful for Assessed Coursework
JJ
II
J
I
Back
Close
Digital Audio
• A programmer’s Guide to Sound, T. Kientzle, Addison Wesley,
1997 (ISBN 0-201-41972-6)
• Audio on the Web — The official IUMA Guide, Patterson and
13
Melcher, Peachpit Press.
• The Art of Digital Audio, Watkinson,
Focal/Butterworth-Heinmann.
• Synthesiser Basics, GPI Publications.
• Signal Processing: Principles and Applications, Brook and
Wynne, Hodder and Stoughton.
• Digital Signal Processing, Oppenheim and Schafer, Prentice
Hall. JJ
II
J
I
Back
Close
Digital Imaging/Graphics/Video
• Digital video processing, A.M. Tekalp, Prentice Hall PTR,
1995.
• Encyclopedia of Graphics File Formats, Second Edition by
14
James D. Murray and William vanRyper, 1996, O’Reilly &
Associates.
JJ
II
J
I
Back
Close
Data Compression
• The Data Compression Book, Mark Nelson,M&T Books, 1995.
• Introduction to Data Compression, Khalid Sayood, Morgan
Kaufmann, 1996.
15
• G.K. Wallace, The JPEG Still Picture Compression Standard
• CCITT, Recommendation H.261
• D. Le Gall, MPEG: A Video Compression Standard for Multimedia
Applications
• K. Patel, et. al., Performance of a Software MPEG Video Decoder
• P. Cosman, et. al., Using Vector Quantization for Image Processing
JJ
II
J
I
Back
Close
Introduction to Multimedia
16
What is Multimedia?
JJ
II
J
I
Back
Close
Introduction to Multimedia
What is Multimedia?
17
Multimedia can have a many definitions these include:
JJ
II
J
I
Back
Close
General Definition
A good general definition is:
JJ
II
J
I
Back
Close
Multimedia Application Definition
A Multimedia Application is an Application which uses a
collection of multiple media sources e.g. text, graphics, images,
sound/audio, animation and/or video.
19
JJ
II
J
I
Back
Close
What is HyperText and HyperMedia?
Hypertext is a text which contains links to other texts.
The term was invented by Ted Nelson around 1965.
20
JJ
II
J
Figure 1: Definition of Hypertext I
Back
Close
Hypertext is therefore usually non-linear (as indicated below).
21
JJ
II
J
I
Figure 3: Definition of HyperMedia Back
Close
Example Hypermedia Applications?
23
JJ
II
J
I
Back
Close
Example Hypermedia Applications?
• The World Wide Web (WWW) is the best example of a
hypermedia application.
• Powerpoint
24
• Adobe Acrobat
• Macromedia Director
• Many Others?
JJ
II
J
I
Back
Close
Multimedia Systems
A Multimedia System is a system capable of processing
multimedia data and applications.
25
JJ
II
J
I
Back
Close
Characteristics of a Multimedia System
A Multimedia system has four basic characteristics:
• Multimedia systems must be computer controlled.
• Multimedia systems are integrated. 26
JJ
II
J
I
Back
Close
Challenges for Multimedia Systems
• Distributed Networks
• Temporal relationship between data
– Render different data at same time — continuously. 27
– Sequencing within the media
playing frames in correct order/time frame in video
– Synchronisation — inter-media scheduling
JJ
II
J
I
Back
Close
Desirable Features for a Multimedia System
Given the above challenges the following feature a desirable (if
not a prerequisite) for a Multimedia System:
Very High Processing Power — needed to deal with large data
29
processing and real time delivery of media. Special hardware
commonplace.
Multimedia Capable File System — needed to deliver real-time
media — e.g. Video/Audio Streaming.
Special Hardware/Software needed – e.g RAID technology.
Data Representations — File Formats that support multimedia
should be easy to handle yet allow for
compression/decompression in real-time. JJ
II
J
I
Back
Close
Efficient and High I/O — input and output to the file subsystem
needs to be efficient and fast. Needs to allow for real-time
recording as well as playback of data. e.g. Direct to Disk
recording systems.
Special Operating System — to allow access to file system and 30
process data efficiently and quickly. Needs to support direct
transfers to disk, real-time scheduling, fast interrupt processing,
I/O streaming etc.
Storage and Memory — large storage units (of the order of 50
-100 Gb or more) and large memory (50 -100 Mb or more).
Large Caches also required and frequently of Level 2 and 3
hierarchy for efficient management.
Network Support — Client-server systems common as JJ
distributed systems common. II
J
Software Tools — user friendly tools needed to handle media,
I
design and develop applications, deliver media. Back
Close
Components of a Multimedia System
Now let us consider the Components (Hardware and Software)
required for a multimedia system:
Capture devices — Video Camera, Video Recorder, Audio
31
Microphone, Keyboards, mice, graphics tablets, 3D input
devices, tactile sensors, VR devices. Digitising/Sampling
Hardware
Storage Devices — Hard disks, CD-ROMs, Jaz/Zip drives, DVD,
etc
Communication Networks — Ethernet, Token Ring, FDDI, ATM,
Intranets, Internets.
Computer Systems — Multimedia Desktop machines, JJ
Workstations, MPEG/VIDEO/DSP Hardware II
Display Devices — CD-quality speakers, HDTV,SVGA, Hi-Res J
I
monitors, Colour printers etc. Back
Close
Applications
Examples of Multimedia Applications include:
• World Wide Web
• Hypermedia courseware
32
• Video conferencing
• Video-on-demand
• Interactive TV
• Groupware
• Home shopping
• Games
JJ
• Virtual reality
II
• Digital video editing and production systems J
I
• Multimedia Database systems
Back
Close
Trends in Multimedia
Current big applications areas in Multimedia include:
World Wide Web — Hypermedia systems — embrace nearly
all multimedia technologies and application areas.
33
MBone — Multicast Backbone: Equivalent of conventional TV
and Radio on the Internet.
Enabling Technologies — developing at a rapid rate to support
ever increasing need for Multimedia. Carrier, Switching,
Protocols, Applications, Coding/Compression, Database,
Processing, and System Integration Technologies at the
forefront of this.
JJ
II
J
I
Back
Close
Multimedia Data: Input and format
Text and Static Data
• Source: keyboard, floppies, disks and tapes.
• Stored and input character by character:
34
– Storage of text is 1 byte per character (text or format character).
– For other forms of data e.g. Spreadsheet files some formats
may store format as text (with formatting) others may use binary
encoding.
• Format: Raw text or formatted text e.g HTML, Rich Text Format
(RTF), Word or a program language source (C, Pascal, etc..
• Not temporal — BUT may have natural implied sequence e.g.
HTML format sequence, Sequence of C program statements.
JJ
• Size Not significant w.r.t. other Multimedia. II
J
I
Back
Close
Graphics
• Format: constructed by the composition of primitive objects
such as lines, polygons, circles, curves and arcs.
• Input: Graphics are usually generated by a graphics editor
35
program (e.g. Freehand) or automatically by a program (e.g.
Postscript).
• Graphics are usually editable or revisable (unlike Images).
• Graphics input devices: keyboard (for text and cursor control),
mouse, trackball or graphics tablet.
• graphics standards : OpenGL, PHIGS, GKS
• Graphics files usually store the primitive assembly
JJ
• Do not take up a very high storage overhead. II
J
I
Back
Close
Images
• Still pictures which (uncompressed) are represented as a
bitmap (a grid of pixels).
• Input: Generated by programs similar to graphics or animation
36
programs.
• Input: scanned for photographs or pictures using a digital
scanner or from a digital camera.
• Analog sources will require digitising.
• Stored at 1 bit per pixel (Black and White), 8 Bits per pixel
(Grey Scale, Colour Map) or 24 Bits per pixel (True Colour)
• Size: a 512x512 Grey scale image takes up 1/4 Mb, a 512x512
24 bit image takes 3/4 Mb with no compression. JJ
II
• This overhead soon increases with image size J
• Compression is commonly applied. I
Back
Close
Audio
• Audio signals are continuous analog signals.
• Input: microphones and then digitised and stored
• usually compressed. 37
JJ
II
J
I
Back
Close
Video
• Input: Analog Video is usually captured by a video camera
and then digitised.
• There are a variety of video (analog and digital) formats
38
• Raw video can be regarded as being a series of single images.
There are typically 25, 30 or 50 frames per second.
• a 512x512 size monochrome video images take 25*0.25 =
6.25Mb for a minute to store uncompressed.
• Digital video clearly needs to be compressed.
JJ
II
J
I
Back
Close
Output Devices
The output devices for a basic multimedia system include
• A High Resolution Colour Monitor
• CD Quality Audio Output 39
• Colour Printer
• Video Output to save Multimedia presentations to (Analog)
Video Tape, CD-ROM DVD.
• Audio Recorder (DAT, DVD, CD-ROM, (Analog) Cassette)
• Storage Medium (Hard Disk, Removable Drives, CD-ROM)
JJ
II
J
I
Back
Close
Multimedia Authoring:
Systems and Applications
40
What is an Authoring System?
An Authoring System is a program which has pre-programmed
elements for the development of interactive multimedia software
titles.
Authoring systems vary widely
• orientation,
• capabilities, and JJ
• learning curve. II
J
I
Back
Close
Why should you use an authoring system?
• can speed up programming possibly content development
and delivery
• about 1/8th
41
• However, the content creation (graphics, text, video, audio,
animation, etc.) not affected by choice of authoring system;
• time gains – accelerated prototyping
JJ
II
J
I
Back
Close
Authoring Vs Programming
• Big distinction between Programming and Authoring.
• Authoring —
– assembly of Multimedia 42
– possibly high level graphical interface design
– some high level scripting.
• Programming —
– involves low level assembly of Multimedia
– construction and control of Multimedia
– involves real languages like C and Java.
JJ
II
J
I
Back
Close
Multimedia Authoring Paradigms
The authoring paradigm, or authoring metaphor, is the
methodology by which the authoring system accomplishes its
task.
43
– sequencing,
– hotspots,
– synchronization, etc.
• Usually a powerful, object-oriented scripting language
• in-program editing of elements (still graphics, video, audio,
etc.) tends to be minimal or non-existent.
• media handling can vary widely
JJ
II
J
I
Back
Close
Examples
• The Apple’s HyperTalk for HyperCard,
• Assymetrix’s OpenScript for ToolBook and
• Lingo scripting language of Macromedia Director 45
on exitFrame
go the frame
play sprite gNavSprite
end
JJ
II
J
I
Back
Close
Iconic/Flow Control
• tends to be the speediest in development time
• best suited for rapid prototyping and short-development
time projects.
• The core of the paradigm is the Icon Palette, contains: 46
JJ
II
J
I
Back
Close
48
JJ
II
J
I
Back
Figure 4: Macromedia Authorware Iconic/Flow Control Examples Close
Card/Scripting
• paradigm provides a great deal of power
(via the incorporated scripting language)
• suffers from the index-card structure.
• Well suited for Hypertext applications, and especially 49
JJ
II
J
I
Back
Close
52
JJ
II
J
I
Figure 5: Macromedia Director Score Window
Back
Close
53
JJ
II
Figure 6: Macromedia Director Cast Window J
I
Back
Close
54
JJ
Figure 7: Macromedia Director Script Window II
J
I
Back
Close
Hypermedia Linkage
• similar to the Frame paradigm
• shows conceptual links between elements
• lacks the Frame paradigm’s visual linkage metaphor.
55
JJ
II
J
I
Back
Close
Tagging
tags in text files to
• link pages,
• provide interactivity and
56
• integrate multimedia elements.
Examples:
• SGML/HTML,
• SMIL (Synchronised Media Integration Language),
• VRML,
• 3DML and
• WinHelp
JJ
II
J
I
Back
Close
Issues in Multimedia Applications Design
There are various issues in Multimedia authoring.
Issues involved:
57
• Content Design
• Technical Design
JJ
II
J
I
Back
Close
Content Design
Content design deals with:
• What to say, what vehicle to use.
”In multimedia, there are five ways to format and deliver your 58
message.
You can
• write it,
• illustrate it,
• wiggle it,
• hear it, and
JJ
• interact with it.” II
J
I
Back
Close
Scripting (writing)
JJ
II
J
I
Back
Close
Graphics (illustrating)
• Make use of pictures to effectively deliver your messages.
• Create your own (draw, (color) scanner, PhotoCD, ...), or
keep ”copy files” of art works. – ”Cavemen did it first.”
60
Graphics Styles
• fonts
• colors
– pastels
– earth-colors
– metallic
– primary color JJ
– neon color II
J
I
Back
Close
Animation (wiggling) Types of Animation
• Character Animation – humanise an object
• Highlights and Sparkles
• Moving Text 61
JJ
II
J
I
Back
Close
2. When to Animate
• Enhance emotional impact
• Make a point (instructional)
• Improve information delivery 62
JJ
II
J
I
Back
Close
Audio (hearing)
JJ
II
J
I
Back
Close
Interactivity (interacting)
• interactive multimedia systems!
• people remember 70% of what they interact with (according
to late 1980s study)
64
JJ
II
J
I
Back
Close
Types of Interactive Multimedia Applications:
1. Menu driven programs/presentations
– often a hierarchical structure (main menu, sub-menus, ...)
2. Hypermedia
65
+: less structured, cross-links between subsections of the
same subject –> non-linear, quick access to information
+: easier for introducing more multimedia features, e.g., more
interesting ”buttons”
-: could sometimes get lost in navigating the hypermedia
3. Simulations / Performance-dependent Simulations
– e.g., Games – SimCity, Flight Simulators
JJ
II
J
I
Back
Close
Technical Design
Technological factors may limit the ambition of your multimedia
presentation.
Studied Later in detail.
66
JJ
II
J
I
Back
Close
Storyboarding
The concept of storyboarding has been by animators and their
like for many years.
67
JJ
II
J
I
Back
Close
Storyboarding
• used to help plan the general
organisation
• used to help plan the content
of a presentation by recording 68
JJ
II
J
I
Back
Close
Overview of Multimedia Software Tools
Digital Audio
Macromedia Soundedit —- Edits a variety of different format
audio files, apply a variety of effects (Fig 8)
70
JJ
II
J
Figure 8: Macromedia Soundedit Main and Control Windows and
I
Effects Menu
Back
Close
CoolEdit/Adobe Audtion — Edits a variety of different format
audio files
71
JJ
II
J
I
Many Public domain audio editing tools also exist. Back
Close
Music Sequencing and Notation
Cakewalk
• Supports General MIDI
• Provides several editing views (staff, piano roll, event list) 72
and Virtual Piano
• Can insert WAV files and Windows MCI commands (animation
and video) into tracks
JJ
II
J
I
Back
Close
Cubase
• A better software than Cakewalk Express
• Intuitive Interface to arrange and play Music (Figs 9 and 10)
• Wide Variety of editing tools including Audio (Figs 11 and 12
73
• Score Editing
JJ
II
J
I
Back
Close
74
JJ
II
J
I
Back
Close
Figure 10: Cubase Transport Bar Window — Emulates a Tape
Recorder Interface
75
JJ
II
J
I
Back
Close
76
JJ
II
J
I
Back
Close
77
JJ
Figure 12: Cubase Audio Editing Window with Editing Functions II
J
I
Back
Close
Logic Audio
• Cubase Competitor, similar functionality
Marc of the Unicorn Performer
• Cubase/Logic Audio Competitor, similar functionality 78
JJ
II
J
I
Back
Close
79
JJ
II
J
I
Back
Close
Figure 13: Cubase Score Editing Window
Image/Graphics Editing
Adobe Photoshop
• Allows layers of images, graphics and text
• Includes many graphics drawing and painting tools
80
• Sophisticate lighting effects filter
• A good graphics, image processing and manipulation tool
Adobe Premiere
• Provides large number (up to 99) of video and audio tracks,
superimpositions and virtual clips
• Supports various transitions, filters and motions for clips
• A reasonable desktop video editing tool JJ
Macromedia Freehand II
• Graphics drawing editing package J
I
Many other editors in public domain and commercially Back
Close
Image/Video Editing
Many commercial packages available
• Adobe Premier
• Videoshop 81
• Avid Cinema
• SGI MovieMaker
JJ
II
J
I
Back
Close
Animation
Many packages available including:
• Avid SoftImage
• Animated Gif building packages e.g. GifBuilder 82
JJ
II
J
I
Back
Close
Multimedia Authoring
– Tools for making a complete multimedia presentation where
users usually have a lot of interactive controls.
Macromedia Director
83
• Movie metaphor (the cast includes bitmapped sprites, scripts,
music, sounds, and palettes, etc.)
• Can accept almost any bitmapped file formats
• Lingo script language with own debugger allows more control
including external devices, e.g., VCRs and video disk players
• Ready for building more interactivities (buttons, etc.)
• follows the cast/score/scripting paradigm, JJ
• tool of choice for animation content (Well FLASH for Web). II
J
I
Back
Close
Authorware
• Professional multimedia authoring tool
• Supports interactive applications with hyperlinks,
drag-and-drop controls, and integrated animation
84
• Compatibility between files produced from PC version and
MAC version
Other Authoring Tools mentioned in notes later
JJ
II
J
I
Back
Close
Multimedia Authoring:
Scripting (Lingo)
85
Cast/Score/Scripting paradigm.
• Macromedia Director MX
Demystified,
Phil Gross,
Macromedia Press (ISBN: 86
0321180976)
• Macromedia Director MX and
Lingo: Training from the Source
Phil Gross,
Macromedia Press (ISBN:
0321180968)
• Director 8 and Lingo
(Inside Macromedia),
Scott Wilson, JJ
Delmar (ISBN: 0766820084) II
J
I
Back
Close
Related Additional Material and Coursework
JJ
II
J
I
Back
Close
Director Overview/Definitions
movies — Basic Director Commodity:
interactive multimedia pieces that can include
• animation,
88
• sound,
• text,
• digital video,
• and many other types of media.
• link to external media
A movie can be as small and simple as an animated logo or
as complex as an online chat room or game.
JJ
Frames — Director divides lengths of time into a series of frames, II
cf. celluloid movie. J
I
Back
Close
Creating and editing movies
4 Key Windows:
the Stage — Rectangular area where the movie plays
89
JJ
II
J
I
Back
Close
the Score : Where the movie is assembled;
90
JJ
II
J
I
Back
Close
one or more Cast windows — Where the movie’s media elements
are assembled;
91
JJ
II
J
I
Back
Close
and
the Control Panel — Controls how the movie plays back.
92
JJ
II
J
I
Back
Close
To create a new movie:
93
• Choose File > New > Movie
JJ
II
J
I
Back
Close
Some other key Director Components (1)
Channels – the rows in the Score that contain sprites for
controlling media
• numbered
94
• contain the sprites that control all the visible media
• Special effects channels at the top contain behaviors as
well as controls for the tempo, palettes, transitions, and
sounds.
Sprites —
Sprites are objects that control when, where, and how media
appears in a movie.
JJ
II
J
I
Back
Close
Some other key Director Components (2)
Cast members —
• The media assigned to sprites.
• media that make up a movie.
95
• includes bitmap images, text, vector shapes, sounds, Flash
movies, digital videos, and more.
Lingo — Director’s scripting language, adds interactivity to a
movie.
Behaviors — pre-existing sets of Lingo instructions.
Markers — identify fixed locations at a particular frame in a
movie. JJ
II
J
I
Back
Close
Lingo Scripting (1)
Commands — terms that instruct a movie to do something while the
movie is playing. For example, go to sends the playback head to
a specific frame, marker, or another movie.
Properties — attributes that define an object. For example 96
colorDepth is a property of a bitmap cast member,
Functions — terms that return a value. For example, the date function
returns the current date set in the computer. The key function
returns the key that was pressed last. Parentheses occur at the
end of a function,
Keywords — reserved words that have a special meaning.
For example, end indicates the end of a handler,
JJ
II
J
I
Back
Close
Lingo Scripting (2)
Events — actions that scripts respond to.
Constants — elements that don’t change. For example, the constants
TAB, EMPTY, and RETURN always have the same meaning, and
Operators — terms that calculate a new value from one or more 97
values. For example, the add operator (+) adds two or more
values together to produce a new value.
JJ
II
J
I
Back
Close
Lingo Data Types
Lingo supports a variety of data types:
• references to sprites and cast members,
• (Boolean) values: TRUE and FALSE , 98
• strings,
• constants,
• integers, and
• floating-point numbers.
Standard Program structure syntax
JJ
II
J
I
Back
Close
Lingo Script Types (1)
Director uses four types of scripts.
Behaviors — Behaviors are attached to sprites or frames in the
Score.
99
100
JJ
II
J
I
Back
Close
Creating the Bouncing Ball Graphic
The following steps achieve a simple bouncing ball animation
along a path:
JJ
II
J
I
Back
Close
4. Now we are going to animate the ball.
• Drag ’bouncing ball’ from the cast
member window to the stage.
• You will notice the sprite (the
object that appears in the score)
is extended over 20 frames. 105
JJ
II
J
I
Back
Close
4. Ball Animation (Key Frames)
JJ
II
J
I
Back
Close
Further Animation: 1.1 Shrinking the ball
• Open example1.dir
• Open Property Inspector for Sprite
• Click on the keyframes in the score, and
• Change the Blend Transparency to 100, 75, 50, 25, 0 for the
consecutive keyframes.
• Rewind and play the movie. JJ
• Save as example4.dir II
J
I
Back
Close
1.4. Animating sprite shape — Deforming The Ball
on mouseUp
beep
end
on mouseUp
beep
alert "Button Pressed"
end
This examples illustrates how we may use Lingo Scripts as: 119
JJ
II
J
I
Back
Close
Director Example 4: Ready Made Example
To save time, we begin we a preassembled
Director movie:
on exitFrame
go the frame JJ
end
II
This frame script tells Director to keep • Pressing down Alt and J
playing the same frame. dragging the frame script in I
the Score can change this
• The Loop lasts to frame 24 Back
length.
Close
Scene Markers (1)
Now we will create some markers
• To Create a Marker You Click in the marking channel for
the Frame and label some the marker with some typed text
122
In this example:
• Markers are at frame 1,
10 and 20, naming them
scene1, scene2 and
scene3 respectively.
• Note: You can delete
a marker by clicking the • The go to next
triangle and dragging it command tells Director to
below the marker channel. go to the next consecutive
• A cast member (9) script marker in the score.
for the next button has also JJ
been created: II
on mouseUp J
go to next I
end Back
Close
Scene Markers (2)
• A cast member (10) script for the back button has also been
created:
on mouseUp
123
go to previous
end
The go to previous command tells Director to go to the
previous marker in the score.
• Once again, Play the movie, click on these buttons to see
how they work.
JJ
II
J
I
Back
Close
Sprite Scripts
Now We will create some sprite scripts:
• Sometimes a button will
– behave one way in one part of the movie and 124
JJ
II
J
I
Back
Close
The Next Button Sprite Scripts (1)
Desired Action of Next Button: Jump to next scene
125
JJ
II
J
I
Back
Close
The Next Button Sprite Scripts (2)
• Here we have split actions to map to our
Scene Markers. To achieve this:
– Click on frame 10 of channel 6 (the
next button) this sprite and choose
Modify > Split Sprite.
126
– Do the same at frame 20.
• To attach a script to each split action:
– Select the each sprite sequence
(here in channel 6).
– Ctrl-click on sequence and select
Script... from the pull-down in
the score to give a script window.
– We add a suitable jump to next
scene
– In example shown we have go to
"scene2" : JJ
This command tells Director to send II
the movie to marker "scene2".
J
– Could do other sequences similar –
But alternatives exist. I
Back
Close
Behaviour Scripts
JJ
II
J
I
Back
Close
A Behaviour Script for Next Button (Scene 2) (1)
JJ
II
J
I
Back
Close
Some more Lingo to add to our example
JJ
II
J
I
Back
Close
A problem in Director?
In Director: a script can only be associated with a complete
Object
For the way we have created the Recorder Interface we require
(and this is clearly a common requirement in many other cases): 132
JJ
II
J
I
Back
Close
Final Part: the Back Button (1)
So in this Example
• Select the sprite sequence in channel 5 and Cast member
10.
135
• Attach a Sprite script reading
on mouseUp
play done
end
JJ
II
J
I
Back
Close
What it is SMIL?
• SMIL is to synchronized multimedia what
• HTML is to hyperlinked text.
• Pronounced smile 137
JJ
II
J
I
Back
Close
SMIL :
• A simple,
• Vendor-neutral
• Markup language 138
Designed to:
• For all skill levels of WWW authors
• Schedule audio, video, text, and graphics files across a
timeline
• No need to master development tools or complex
programming languages. JJ
II
• HTML-like need a text editor only
J
• Links to media — medie not embedded in SMIL file I
Back
Close
Drawbacks of SMIL?
Good Points:
• A powerful tool for creating synchronized multimedia
presentations on the web 139
JJ
II
J
I
Back
Close
Running SMIL Applications
For this course there are basically three ways to run SMIL
applications (two use the a Java Applet) so there are basically
two SMIL supported mediums:
141
Quicktime — supported since Quicktime Version 4.0.
RealPlayer G2 — integrated SMIL support
Web Browser — use the SOJA SMIL applet viewer with html
wrapper
Applet Viewer — use the SOJA SMIL applet viewer with html
wrapper
JJ
II
J
I
Back
Close
Quicktime media support is richer (see later sections on
Quicktime).
You will need to use both as RealPlayer and SOJA support
different media
GIF img OK OK OK
JPEG img OK OK OK
Wav audio OK OK -
.au Audio audio OK OK OK
.auz Audio Zipped audio - - OK
MP3 audio OK - -
Plain text text OK OK OK
Real text textstream OK - -
Real movie video OK - - JJ
AVI video OK OK - II
MPEG video OK OK - J
MOV video OK - - I
Back
Close
Using Quicktime
• Load the SMIL file into a Quicktime plug-in (configure Browser
helper app or mime type) or
• the Quicktime movie player.
143
JJ
II
J
I
Back
Close
Using RealPlayer G2
The RealPlayer G2 is installed on the applications HD in the
RealPlayer folder.
Real player supports lots of file format and can use plugins.
The main supported formats are:
144
JJ
II
J
I
Back
Close
To run SMIL files
Real Player uses streaming to render presentations.
• works better when calling a SMIL file given by a Real Server,
• rather than from an HTTP one.
145
Locally RUN SMIL files
• drag a SMIL file onto the RealPlayer G2 Application
• Open a local SMIL file inside RealPlayer G2 Application
JJ
II
J
I
Back
Close
Using the SOJA applet SOJA stands for SMIL Output in Java
Applet.
SOJA is an applet that render SMIL in a web page or in a
separate window. It supports the following formats:
• Images: GIF, JPEG 146
JJ
II
J
I
Back
Close
Running SOJA
To run SMIL through an applet you have to
• call the applet from an HTML file:
<APPLET CODE="org.helio.soja.SojaApplet.class"
ARCHIVE="soja.jar" CODEBASE="../" 147
WIDTH="600" HEIGHT="300">
<PARAM NAME="source" VALUE="cardiff_eg.smil">
<PARAM NAME="bgcolor" VALUE="#000066">
</APPLET>
JJ
II
J
I
Back
Close
Basic Layout The basic Layout of a SMIL Documents is as
follows:
<smil>
<head>
<meta name="copyright"
150
content="Your Name" />
<layout>
<!-- layout tags -->
</layout>
</head>
<body>
<!-- media and synchronization tags -->
</body>
</smil>
JJ
II
J
I
Back
Close
A source begins with <smil> and ends with </smil>.
Note that SMIL is case sensitive
<smil>
....
</smil> 151
JJ
II
J
I
Back
Close
SMIL documents have two parts: head and body. Each of
them must have <smil> as a parent.
<smil>
<head>
.... 152
</head>
<body>
....
</body>
</smil>
JJ
II
J
I
Back
Close
Some tags, such as meta can have a slash at their end:
....
<head>
<meta name="copyright"
content="Your Name" /> 153
</head>
....
This is because SMIL is XML-based.
Some tags are written:
• <tag> ... </tag>
• <tag />
JJ
II
J
I
Back
Close
SMIL Layout Everything concerning layout (including window
settings) is stored between the <layout> and the </layout>
tags in the header as shown in the above subsection.
A variety of Layout Tags define the presentation layout:
<smil> 154
<head>
<layout>
<!-- layout tags -->
</layout>
......
JJ
II
J
I
Back
Close
Window settings
You can set width and height for the window in which your
presentation will be rendered with <root-layout>.
The following source will create a window with a 300x200
pixels dimension and also sets the background to be white.
155
<layout>
<root-layout width="300" height="200"
background-color="white" />
</layout>
JJ
II
J
I
Back
Close
Positioning Media It is really easy to position media with SMIL.
You can position media in 2 ways:
Absolute Positioning — Media are located with offsets from
the origin — the upper left corner of the window.
156
Relative Positioning — Media are located relative to the window’s
dimensions.
We define position with a <region> tag
JJ
II
J
I
Back
Close
The Region tag —
To insert a media within our presentation we use the <region>
tag.
• must specify the region (the place) where it will be displayed.
157
• must also assign an id that identifies the region.
JJ
II
J
I
Back
Close
Let’s say we want to
• insert the Cardiff icon (533x250 pixels)
• at 30 pixels from the left border and
• at 25 pixels from the top border.
158
The header becomes:
<smil>
<head>
<layout>
<root-layout width="600" height="300"
background-color="white" />
<region id="cardiff_icon"
left="30" top="25"
width="533" height="250" /> JJ
II
</layout>
J
</head> I
...... Back
Close
The img tag
To insert the Cardiff icon in the region called ”cardiff icon”, we
use the <img> tag as shown in the source below.
Note that the region attribute is a pointer to the <region>
tag.
159
<head>
<layout>
<root-layout width="600" height="300"
background-color="white" />
<region id="cardiff_icon"
left="30" top="25"
width="533" height="250" />
</layout>
</head>
<body> JJ
<img src="cardiff.gif" II
alt="The Cardiff icon" J
region="cardiff_icon" /> I
</body> Back
Close
This produces the following output:
160
JJ
II
J
I
Back
Close
Relative Position Example
if you wish to display the Cardiff icon at
• 10% from the left border and
• at 5% from the top border, modify the previous source and
replace the left and top attributes. 161
<head>
<layout>
<root-layout width="600" height="600"
background-color="white" />
<region id="cardiff_icon"
left="10%" top="5%"
width="533" height="250" />
</layout>
</head> JJ
<body> II
<img src="cardiff.gif" J
region="cardiff_icon" /> I
</body> Back
Close
Overlaying Regions
We have just seen how to position a media along x and y axes
(left and top).
What if two regions overlap ?
• Which one should be displayed on top ? 162
JJ
II
J
I
Back
Close
The following code points out the problem:
<smil>
<head>
<layout>
<root-layout width="300" height="200"
background-color="white" />
<region id="region_1" left="50" top="50" 163
width="150" height="125" />
<region id="region_2" left="25" top="25"
width="100" height="100" />
</layout>
</head>
<body>
<par>
<text src="text1.txt" region="region_1" />
<text src="text2.txt" region="region_2" />
</par>
</body>
</smil>
JJ
II
J
I
Back
Close
To ensure that one region is over the other, add z-index
attribute to <region>.
When two region overlay:
• the one with the greater z-index is on top.
164
• If both regions have the same z-index, the first rendered one
is below the other.
JJ
II
J
I
Back
Close
In the following code, we add z-index to region 1 and
region 2:
<smil>
<head>
<layout>
<root-layout width="300" height="200"
background-color="white" /> 165
<region id="region_1" left="50"
top="50" width="150"
height="125" z-index="2"/>
<region id="region_2" left="25"
top="25" width="100"
height="100" z-index="1"/>
</layout>
</head>
<body>
<par>
<text src="text1.txt" region="region_1" />
<text src="text2.txt" region="region_2" />
</par> JJ
</body> II
</smil>
J
I
Back
Close
fitting media to regions
You can set the fit attribute of the <region> tag to force
media to be resized etc.
The following values are valid for fit:
166
• fill — make media grow and fill the area.
• meet — make media grow (without any distortion) until it
meets the region frontier.
• slice — media grows (without distortion) and fill entirely its
region.
• scroll — if media is bigger than its region area gets scrolled.
• hidden — don’t show media
JJ
Obviously you set the value like this: II
<region id="region_1" ..... J
I
fit="fill" /> Back
Close
Synchronisation There are two basic ways in which we may
want to play media:
• play several media one after the other,
• how to play several media in parallel.
167
In order to do this we need to add synchronisation:
• we will need to add time parameter to media elements,
JJ
II
J
I
Back
Close
Adding a duration of time to media — dur
To add a duration of time to a media element simply specify a
dur attribute parameter in an appropriate media tag:
.....
<body> 168
<img src="cardiff.gif"
alt="The Cardiff icon"
region="cardiff_icon" dur="6s" />
</body>
.....
JJ
II
J
I
Back
Close
Delaying Media — the begin tag
To specify a delay i.e when to begin set the begin attribute
parameter in an appropriate media tag:
If you add begin=”2s” in the cardiff image tag, you will see
that the Cardiff icon will appear 2 seconds after the document
169
began and will remain during 6 other seconds. Have a look at
the source:
.....
<body>
<body>
<img src="cardiff.gif"
alt="The Cardiff icon"
region="cardiff_icon" JJ
dur="6s" begin="2s" /> II
</body> J
..... I
Back
Close
Sequencing Media — the seq tag
Scheduling media:
The <seq> tag is used to define a sequence of media.
• The media are executed one after each other:
170
.....
<seq>
<img src="img1.gif"
region="reg1" dur="6s" />
<img src="img2.gif"
region="reg2"
dur="4s" begin="1s" />
</seq>
..... JJ
II
So the setting 1s makes the img2.gif icon appear 1 second J
after img1.gif. I
Back
Close
Parallel Media — the par tag
We use the <par> to play media at the same time:
<par>
<img src="cardiff.gif"
alt="The cardiff icon" 171
region="cardiff_icon" dur="6s" />
<audio src="music.au" alt="Some Music"
dur="6s" />
</par>
This will display an image and play some music along with it.
JJ
II
J
I
Back
Close
Synchronisation Example 1: Planets Soundtrack
The following SMIL code plays on long soundtrack along with
as series of images.
Essentially:
• The audio file and 172
JJ
II
J
I
Back
Close
The files are stored on the MACINTOSHES in the Multimedia
Lab (in the SMIL folder) as follows:
• planets.html — call SMIL source (below) with the SOJA
applet. This demo uses zipped (SUN) audio files (.auz)
which are not supported by RealPlayer. 173
JJ
II
J
I
Back
Close
SMIL HEAD DATA
<smil>
<head>
<layout>
<root-layout height="400" width="600"
background-color="#000000"
title="Dreaming out Loud"/> 174
<region id="satfam" width="564" height="400"
top="0" left="0" background-color="#000000"
z-index="2" />
<region id="jupfam" width="349" height="400"
top="0" left="251" background-color="#000000"
z-index="2" />
<region id="redsun" width="400" height="400"
top="0" left="100" background-color="#000000"
z-index="2" />
...........
</layout>
</head>
JJ
II
J
I
Back
Close
SMIL BODY DATA
<body>
<par>
<audio src="media/dreamworldb.auz"
dur="61.90s" begin="3.00s"
system-bitrate="14000" />
<seq>
175
<img src="media/satfam1a.jpg" region="satfam"
begin="1.00s" dur="4.50s" />
<img src="media/jupfam1a.jpg" region="jupfam"
begin="1.50s" dur="4.50s" />
<img src="media/redsun.jpg" region="redsun"
begin="1.00s" dur="4.50s" />
........
<img src="media/orion.jpg" region="orion"
begin="1.00s" dur="4.50s" />
<par>
<img src="media/pillarsb.jpg" region="pillars"
begin="1.00s" end="50s" />
<img src="media/blank.gif" region="blank"
begin="2.00s" end="50.00s" /> JJ
<text src="media/music.txt" region="music" II
begin="3.00s" end="50.00s" />
..........
J
<text src="media/me.txt" region="me" I
begin="20.00s" dur="3.00s" /> Back
Close
<text src="media/jose.txt" region="jose"
begin="23.00s" end="50.00s" />
</par>
<text src="media/title.txt" region="title"
begin="3.00s" end="25.00s" />
</seq>
</par>
</body> 176
</smil>
JJ
II
J
I
Back
Close
Synchronisation Example 2: Slides ’N’ Sound
Dr John Rosbottom of Plymouth University has come up with
a novel way of giving lectures.
This has
• one long sequence of 177
<par>
<audio src="audio/leconlec.rm" clip-begin="24s" clip-end="51s" dur="27s"
title="slide 2"/>
<img src="slides/img002.GIF" dur="27s"/>
</par>
............
<par>
<audio src="audio/leconlec.rm" clip-begin="610s"
clip-end="634s" dur="24s" title="The Second Reason"/>
<img src="slides/img018.GIF" clip-begin="610s"
clip-end="634s" dur="24s" title="The Second Reason"/>
</par>
<par>
<audio src="audio/leconlec.rm" clip-begin="634s" clip-end="673s" dur="39s"
title="Slide 19"/>
JJ
<img src="slides/img019.GIF" clip-begin="634s" clip-end="673s" dur="39s" II
title="Slide 19"/>
</par> J
<img src="slides/img006.GIF" fill="freeze" title="And finally..." I
author="Abbas Mavani (dis80047@port.ac.uk)"
Back
Close
copyright="Everything is so copyright protected (c)1999"/>
<!-- kept this in to remind me that you can have single things
<audio src="audio/AbbasTest.rm" dur="50.5s"/>
-->
</seq>
</body>
</smil>
179
JJ
II
J
I
Back
Close
SMIL Events Smiles supports event based synchronisation:
begin events
• When a media begins, it sends a begin event.
• If another media waits for this event, it catches it.
180
JJ
II
J
I
Back
Close
To make a media wait to an event,
• one of its synchronisation attributes
• (begin or end) should be written as follows:
<!-- if you want tag to start when 181
another tag begins -->
<tag begin="id(specifiedId)(begin)" />
</par>
</body>
will make the next.gif image begin 2s after cardiff.gif JJ
begins. II
J
I
Back
Close
The switch Tag
The syntax for the switch tag is:
<switch>
<!-- child1 testAttributes1 -->
<!-- child2 testAttributes2 --> 183
<!-- child3 testAttributes3 -->
</switch>
The rule is:
• The first of the <switch> tag children whose test attributes
are all evaluated to TRUE is executed.
• A tag with no test attributes is evaluated to TRUE.
• See SMIL reference for list of valid test attributes JJ
II
J
I
Back
Close
For example you may wish to provide presentations in english
or welsh:
<body>
<switch>
<!-- English only -->
<par system-language="en"> 184
<img src="cardiff.gif"
region="cardiff"/>
< audio src ="english.au" />
</par>
186
• All data must be in the form of digital information.
• The data may be in a variety of formats:
– text,
– graphics,
– images,
– audio,
– video. JJ
II
J
I
Back
Close
Synchronisation
187
A majority of this data is large and the different media may
need synchronisation:
JJ
II
J
I
Back
Close
Static and Continuous Media
188
Static or Discrete Media — Some media is time independent:
Normal data, text, single images, graphics are examples.
Continuous media — Time dependent Media:
Video, animation and audio are examples.
JJ
II
J
I
Back
Close
Analog and Digital Signals
189
• Some basic definitions – Studied HERE
• Overviewing of technology — Studied HERE
• In depth study later.
JJ
II
J
I
Back
Close
Analog and Digital Signal Converters
JJ
II
J
I
Back
Close
Video
• Input: Analog Video is usually captured by a video camera
and then digitised.
• There are a variety of video (analog and digital) formats
196
• Raw video can be regarded as being a series of single images.
There are typically 25, 30 or 50 frames per second.
• a 512x512 size monochrome video images take
25*0.25 = 6.25Mb
• Colour Printer
• Video Output to save Multimedia presentations to (Analog)
Video Tape, CD-ROM DVD.
• Audio Recorder (DAT, DVD, CD-ROM, (Analog) Cassette)
• Storage Medium (Hard Disk, Removable Drives, CD-ROM)
JJ
II
J
I
Back
Close
Storage Media
The major problems that affect storage media are:
• Large volume of date
• Real time delivery 198
• Data format
• Storage Medium
• Retrieval mechanisms
JJ
II
J
I
Back
Close
High performance I/O
199
Data —
• high volume, continuous, contiguous vs distributed storage.
• Direct relationship between size of data and how long it
takes to handle.
• Compression
JJ
II
J
I
Back
Close
Data Storage —
• Depends of the storage hardware and
• The nature of the data.
• The following storage parameters affect how data is stored:
200
– Storage Capacity
– Read and Write Operations of hardware
– Unit of transfer of Read and Write
– Physical organisation of storage units
– Read/Write heads, Cylinders per disk,
Tracks per cylinder, Sectors per Track
– Read time
– Seek time JJ
II
J
I
Back
Close
Data Transfer —
• Depend how data generated and
• written to disk, and
• in what sequence it needs to retrieved.
201
• Writing/Generation of Multimedia data is usually
sequential e.g. streaming digital audio/video direct to disk.
• Individual data (e.g. audio/video file) usually streamed.
• RAID architecture can be employed to accomplish high
I/O rates (parallel disk access)
JJ
II
J
I
Back
Close
Operating System Support —
• Scheduling of processes when I/O is initiated.
• Time critical operations can adopt special procedures.
• Direct disk transfer operations free up CPU/Operating
system space. 202
JJ
II
J
I
Back
Close
Basic Storage
JJ
II
J
I
Back
Close
RAID — Redundant Array of Inexpensive Disks
Needed:
• To fulfill the needs of current multimedia and other data
hungry application programs,
205
• Fault tolerance built into the storage device.
• Parallel processing exploits arrangement of hard disks.
JJ
II
J
I
Back
Close
Four ways of Overcoming Reliability Problems
• Mirroring or shadowing of the contents of disk, which can
be a capacity kill approach to the problem.
– write on two disks - a 100% capacity overhead.
208
– Reads to disks may however be optimised.
• Horizontal Hamming Codes: A special means to
reconstruct information using an error correction encoding
technique.
• Parity and Reed-Soloman Codes: Also an error correction
coding mechanism. Parity may be computed in a number of
ways.
• Failure Prediction: There is no capacity overhead in this JJ
technique. II
J
I
Back
Close
RAID Architecture
Each disk within the array needs to have its own I/O controller,
but interaction with a host computer may be mediated through
an array controller 209
Disk
Host Controller
Processor
Host
Adaptor Disk
Controller
Array
Controller
Disk
Controller
Manages the control
logic and parity
Disk
JJ
Controller
II
J
I
Back
Close
Orthogonal RAID
Possible to combine the
disks together to produce a
collection of devices, where
• Each vertical array is now 210
Simultaneous reads
Read and on every drive
Write
Span
All
drives
Parallel
Access Simultaneous reads
on every drive
Parity
Parity Each drive now also
Every handles parity
Write must update indicated by the filled circle
RAID 3
a dedicated parity drive
RAID 4 RAID 5 JJ
II
J
I
Back
Close
Optical Storage
• The most popular storage medium in the multimedia context
• compact size,
• High density recording,
213
• Easy handling and
• Low cost per MB.
• CD and recently DVD (ROM) the most common
• Laser disc — older format.
JJ
II
J
I
Back
Close
CD Storage
There a now various formats of CD:
• CD-DA (Compact Disc-Digital Audio)
• CD-I (Compact Disc-Interactive)
214
• CD-ROM/XA (eXtended Architecture)
• Photo CD
The capacity of a CD-ROM is
• 620-700 Mbs depending on CD material,
• 650/700 Mb (74/80 Mins) is a typical write once CD-ROM
size.
• Drives that read and write CD-ROMS (CD-RW) also similar. JJ
II
J
I
Back
Close
CD Standards
JJ
II
J
I
Back
Close
What are the features of DVD-Video?
JJ
II
J
I
Back
Close
Multimedia Data Representation
222
Issues to be covered:
• Digital Audio
• Graphics/Image Formats
• Digital Video (Next Lecture)
• Sampling/Digitisation
• Compression JJ
II
J
I
Back
Close
Digital Audio
JJ
II
J
I
Back
Close
Digital Sampling
JJ
II
J
I
Back
Close
Computer Manipulation of Sound
• Volume
• Cross-Fading 228
• Looping
• Echo/Reverb/Delay
• Filtering
JJ
II
J
I
Back
Close
Sample Rates and Bit Size
Nyquist’s Theorem:
JJ
II
J
I
Back
Close
231
(MPEG Audio))
• Decibel (dB) a logarithmic measurement of sound
• 16-Bit has a signal-to-noise ratio of 98 dB — virtually
inaudible
• 8-bit has a signal-to-noise ratio of 50 dB
• Therefore, 8-bit is roughly 8 times as noisy
JJ
– 6 dB increment is twice as loud II
• Click Here to Hear Sound Examples J
I
Back
Close
Implications of Sample Rate and Bit Size (cont)
235
File Type 44.1 KHz 22.05 KHz 11.025 KHz
16 Bit Stereo 10.1 Mb 5.05 Mb 2.52 Mb
16 Bit Mono 5.05 Mb 2.52 Mb 1.26 Mb
8 Bit Mono 2.52 Mb 1.26 Mb 630 Kb
JJ
II
J
I
Back
Close
Practical Implications of Nyquist Sampling Theory
236
JJ
II
J
I
Back
Close
Why are CD Sample Rates 44.1 KHz?
237
JJ
II
J
I
Back
Close
Why are CD Sample Rates 44.1 KHz?
238
• Compress Files:
– Could affect live transmission on Web
JJ
II
J
I
Back
Close
Streaming Audio
• Buffered Data:
– Trick get data to destination before it’s needed
242
– Temporarily store in memory (Buffer)
– Server keeps feeding the buffer
– Client Application reads buffer
• Needs Reliable Connection, moderately fast too.
• Specialised client, Steaming Audio Protocol (PNM for real
audio).
JJ
II
J
I
Back
Close
Synthetic Sounds — reducing bandwidth?
JJ
II
J
I
Back
Close
MIDI
What is MIDI?
244
• No Longer Exclusively the Domain of Musicians.
• Midi provides a very low bandwidth alternative on the Web:
– transmit musical and
– certain sound effects data
• also now used as a compression control language (modified)
– See MPEG-4 Section soon
JJ
II
J
I
Back
Close
MIDI on the Web
245
Very Low Bandwidth (few 100K bytes)
JJ
II
J
I
Back
Close
Components of a MIDI System
Synthesizer:
• It is a sound generator (various pitch, loudness, tone colour).
247
• A good (musician’s) synthesizer often has a microprocessor,
keyboard, control panels, memory, etc.
Sequencer:
• It can be a stand-alone unit or a software program for a
personal computer. (It used to be a storage server for MIDI
data. Nowadays it is more a software music editor on the
computer.
JJ
• It has one or more MIDI INs and MIDI OUTs. II
J
I
Back
Close
Basic MIDI Concepts
Track:
• Track in sequencer is used to organize the recordings.
• Tracks can be turned on or off on recording or playing back. 248
Channel:
• MIDI channels are used to separate information in a MIDI
system.
• There are 16 MIDI channels in one cable.
• Channel numbers are coded into each MIDI message.
Timbre:
JJ
• The quality of the sound, e.g., flute sound, cello sound, etc. II
• Multitimbral – capable of playing many different sounds at J
I
the same time (e.g., piano, brass, drums, etc.)
Back
Close
Basic MIDI Concepts (Cont.)
Pitch:
• The Musical note that the instrument plays
249
Voice:
• Voice is the portion of the synthesizer that produces sound.
• Synthesizers can have many (12, 20, 24, 36, etc.) voices.
• Each voice works independently and simultaneously to produce
sounds of different timbre and pitch.
Patch:
• The control settings that define a particular timbre. JJ
II
J
I
Back
Close
Hardware Aspects of MIDI
MIDI connectors:
– Three 5-pin ports found on the back
of every MIDI unit
• MIDI IN: the connector via 250
which the device receives all MIDI
data.
• MIDI OUT: the connector
through which the device
transmits all the MIDI data it
generates itself.
• MIDI THROUGH: the
connector by which the device
echoes the data receives from
MIDI IN.
JJ
II
J
I
Back
Close
MIDI Messages
JJ
II
J
I
Back
Close
Classification of MIDI messages:
JJ
II
J
I
Back
Close
Midi Channel messages:
– messages that are transmitted on individual channels rather
that globally to all devices in the MIDI network.
254
To play: 256
JJ
II
J
I
Back
Close
System Messages:
JJ
II
J
I
Back
Close
Midi System Real-time Messages
JJ
II
J
I
Back
Close
Midi System exclusive messages
• Messages related to things that cannot be standardized:
– System dependent creation of sound
– System dependent organisation of sounds
261
(Not General Midi Compliant? (more soon))
• An addition to the original MIDI specification.
• Just a stream of bytes
– all with their high bits set to 0,
– bracketed by a pair of system exclusive start and end
messages:
F0 — Sysex Start JJ
F7 — Sysex End II
– Format of message byte stream system dependent. J
I
Back
Close
General MIDI (GM)
JJ
II
J
I
Back
Close
General MIDI Instrument Patch Map
Prog No. Instrument Prog No. Instrument
-------------------------- -----------------------------------
(1-8 PIANO) (9-16 CHROM PERCUSSION)
1 Acoustic Grand 9 Celesta
2 Bright Acoustic 10 Glockenspiel
3 Electric Grand 11 Music Box
4 Honky-Tonk 12 Vibraphone
5 Electric Piano 1 13 Marimba
6 Electric Piano 2 14 Xylophone 264
7 Harpsichord 15 Tubular Bells
8 Clav 16 Dulcimer
JJ
II
J
I
Back
Close
General MIDI Percussion Key Map
MIDI Key Drum Sound MIDI Key Drum Sound
-------- ---------- ---------- ----------
• Pronounced “ sail”
• The central part of the Structured Audio toolset. 272
JJ
II
J
I
Back
Close
SAOL Synthesis Methods
– sections,
– repeats,
– expression evaluation,
– some other things.
– most SASL scores will be created by automatic tools
JJ
II
J
I
Back
Close
SASBF (Structured Audio Sample Bank Format)
JJ
II
J
I
Back
Close
MPEG-4 MIDI Semantics
JJ
II
J
I
Back
Close
MPEG-4 Scheduler
JJ
II
J
I
Back
Close
AudioBIFS
JJ
II
J
I
Back
Close
AudioBIFS (Cont.)
JJ
Figure 23: AudioBIFS Subgraph II
J
I
Back
Close
Graphic/Image File Formats
Common graphics and image file formats:
• http://www.dcs.ed.ac.uk/home/mxr/gfx/ —
comprehensive listing of various formats. 283
JJ
II
J
I
Back
Close
Graphic/Image Data Structures
• A digital image consists of many picture elements, termed
pixels.
• The number of pixels determine the quality of the image
284
(resolution).
• Higher resolution always yields better quality.
• A bit-map representation stores the graphic/image data in the
same manner that the computer monitor contents are stored
in video memory.
JJ
II
J
I
Back
Close
Monochrome/Bit-Map Images
285
286
287
288
JJ
II
J
I
Back
Close
TIFF
• Tagged Image File Format (TIFF), stores many different types
of images (e.g., monochrome, greyscale, 8-bit & 24-bit RGB,
etc.) –> tagged
• Developed by the Aldus Corp. in the 1980’s and later 291
JJ
II
J
I
Back
Close
Postscript/Encapsulated Postscript
• A typesetting language which includes text as well as
vector/structured graphics and bit-mapped images
• Used in several popular graphics programs (Illustrator,
FreeHand) 292
JJ
II
J
I
Back
Close
System Dependent Formats
JJ
II
J
I
Back
Close
Macintosh: PAINT and PICT
• PAINT was originally used in MacPaint program, initially only
for 1-bit monochrome images.
• PICT format was originally used in MacDraw (a vector based
drawing program) for storing structured graphics 294
JJ
II
J
I
Back
Close
X-windows: XBM
• Primary graphics format for the X Window system
• Supports 24-bit colour bitmap
• Many public domain graphic editors, e.g., xv
295
• Used in X Windows for storing icons, pixmaps, backdrops,
etc.
JJ
II
J
I
Back
Close
Colour in Image and Video — Basics of Colour
Light and Spectra
• Visible light is an electromagnetic wave in the 400nm - 700
nm range.
296
• Most light we see is not one wavelength, it’s a combination
of many wavelengths (Fig. 28).
JJ
Figure 28: Light Wavelengths II
J
• The profile above is called a spectra. I
Back
Close
The Human Retina
• The eye is basically similar to a camera
• It has a lens to focus light onto the Retina of eye
• Retina full of neurons 297
JJ
II
J
I
Back
Close
Cones and Perception
• Cones come in 3 types: red, green and blue. Each responds
differently to various frequencies of light. The following figure
shows the spectral-response functions of the cones and the
luminous-efficiency function of the human eye.
298
JJ
Figure 29: Cones and Luminous-efficiency Function of the Human II
Eye J
I
• The profile above is called a spectra. Back
Close
• The colour signal to the brain comes from the response of
the 3 cones to the spectra being observed.
That is, the signal consists of 3 numbers:
Z
299
R= E(λ)SR (λ)dλ
Z
G= E(λ)SG(λ)dλ
Z
B= E(λ)SB (λ)dλ
JJ
II
J
I
Back
Close
300
Figure 30: Spectra Response
JJ
II
J
I
Back
Close
301
303
JJ
II
J
I
Back
Close
CRT Displays
• CRT displays have three phosphors (RGB) which produce a
combination of wavelengths when excited with electrons.
304
JJ
Figure 33: Reproducing Visible Colour II
J
I
Back
Close
Z
X= E(λ)x(λ)dλ
Z
Y = E(λ)y(λ)dλ
Z 307
Z= E(λ)z(λ)dλ
JJ
II
J
I
Back
Close
Lab Image Space
310
JJ
II
Beside the RGB representation, YIQ and YUV are the two J
commonly used in video. I
Back
Close
YIQ Colour Model
• YIQ is used in colour TV broadcasting, it is downward compatible
with B/W TV.
• Y (luminance) is the CIE Y primary.
Y = 0.299R + 0.587G + 0.114B 312
• the other two vectors:
I = 0.596R - 0.275G - 0.321B Q = 0.212R - 0.528G + 0.311B
• The YIQ transform:
Y 0.299 0.587 0.114 R
I = 0.596 −0.275 −0.321 G
Q 0.212 −0.528 −0.311 B
JJ
• I is red-orange axis, Q is roughly orthogonal to I. II
J
• Eye is most sensitive to Y, next to I, next to Q. In NTSC, 4 MHz is I
allocated to Y, 1.5 MHz to I, 0.6 MHz to Q.
Back
Close
YIQ Colour Space
313
Y 0.299 0.587 0.114 R
U = −0.169 −0.331 0.500 G JJ
V 0.500 −0.419 −0.081 B II
J
I
Back
Close
YIQ Colour Space
315
JJ
II
J
Figure 35: The RGB and CMY Cubes I
Back
Close
Conversion between RGB and CMY
C 1 R 317
M = 1 − G
Y 1 B
R 1 C
G = 1 − M
B 1 Y
JJ
II
J
I
Back
Close
CMYK Color Model
• Sometimes, an alternative CMYK model (K stands for Black)
is used in colour printing (e.g., to produce darker black than
simply mixing CMY). where
318
K = min(C, M, Y ),
C = C − K,
M = M − K,
Y = Y − K.
JJ
II
J
I
Back
Close
YIQ Colour Space
319
322
JJ
II
Figure 36: Raster Scanning J
I
Back
Close
323
JJ
II
J
I
Back
Close
NTSC Video
• 525 scan lines per frame, 30 frames per second (or be exact,
324
29.97 fps, 33.37 msec/frame)
• Aspect ratio 4:3
• Interlaced, each frame is divided into 2 fields, 262.5 lines/field
• 20 lines reserved for control information at the beginning of
each field (Fig. 38)
– So a maximum of 485 lines of visible data
– Laser disc and S-VHS have actual resolution of 4̃20 lines JJ
– Ordinary TV – 3̃20 lines II
J
I
Back
Close
NTSC Video Scan Line
• Each line takes 63.5 microseconds to scan. Horizontal retrace
takes 10 microseconds (with 5 microseconds horizontal synch
pulse embedded), so the active line time is 53.5 microseconds.
325
JJ
II
J
I
Back
Close
NTSC Video Colour Representation/Compression
• Colour representation:
– NTSC uses YIQ colour model.
– Composite = Y + I cos(Fsc t) + Q sin(Fsc t),
326
where Fsc is the frequency of colour subcarrier
– Basic Compression Idea
JJ
II
J
I
Back
Close
Chroma Subsampling
JJ
II
J
I
Back
Close
What do these numbers mean?
• 4:2:2 –> Horizontally subsampled colour signals by a factor
of 2. Each pixel is two bytes, e.g., (Cb0, Y0)(Cr0, Y1)(Cb2,
Y2)(Cr2, Y3)(Cb4, Y4) ...
330
• 4:1:1 –> Horizontally subsampled by a factor of 4
• 4:2:0 –> Subsampled in both the horizontal and vertical axes
by a factor of 2 between pixels.
• 4:1:1 and 4:2:0 are mostly used in JPEG and MPEG (see
Later).
JJ
II
J
I
Back
Close
Chroma Subsampling in Practice —
Analog/Digital Subsampling
• Simply different frequency sampling of signal in analog
• Digital Subsampling: Perform 2x2 (or 1x2, or 1x4) chroma
331
subsampling:
– break the image into 2x2 (or 1x2, or 1x4)pixel blocks and
– only stores the average color information for each 2x2 (or
1x2, or 1x4) pixel group.
JJ
II
J
I
Back
Close
Digital Chroma Subsampling Errors (1)
This sampling process introduces two kinds of errors:
1. The major problem is that color is typically stored at only half
the horizontal and vertical resolution as the original image.
332
This is not a real problem:
• Recall: The human eye has lower resolving power for
color than for intensity.
• Nearly all digital cameras have lower resolution for color
than for intensity,
so there is no high resolution color information present in
digital camera images.
JJ
II
J
I
Back
Close
Digital Chroma Subsampling Errors (2)
• CCIR 601 uses interlaced scan, so each field only has half as
much vertical resolution (e.g., 243 lines in NTSC).
The CCIR 601 (NTSC) data rate is 1̃65 Mbps.
• CIF (Common Intermediate Format) is introduced to as an
acceptable temporary standard.
It delivers about the VHS quality. CIF uses progressive JJ
(non-interlaced) scan. II
J
I
Back
Close
ATSC Digital Television Standard
(ATSC – Advanced Television Systems Committee)
JJ
II
J
I
Back
Close
Video Format
The video scanning formats supported by the ATSC Digital
Television Standard are shown in the following table.
Uncompressed Audio
1 minute of Audio:
JJ
II
J
I
Back
Close
Video
Can involve: Stream of audio and images
JJ
II
J
I
Back
Close
Simple Repetition Suppression
If a sequence a series on n successive tokens appears
• Replace series with a token and a count number of
occurrences.
343
• Usually need to have a special flag to denote when the
repeated token appears
For Example
89400000000000000000000000000000000
In this instance:
• Sequences of image elements X1, X2, . . . , Xn (Row by Row)
• Mapped to pairs (c1, l1), (c2, l2), . . . , (cn, ln)
JJ
where ci represent image intensity or colour and li the length II
of the ith run of pixels J
I
• (Not dissimilar to zero length suppression above).
Back
Close
Run-length Encoding Example
Original Sequence:
111122233333311112222
346
code.
So:
The code is The Key
Becomes:
350
• Lossless Compression frequently involves some form of
entropy encoding
• Based on information theoretic techniques.
JJ
II
J
I
Back
Close
Basics of Information Theory
According to Shannon, the entropy of an information source S
is defined as:
H(S) = η = i pi log2 p1i
P
351
where pi is the probability that symbol Si in S will occur.
• log2 p1i indicates the amount of information contained in Si,
i.e., the number of bits needed to code Si.
• For example, in an image with uniform distribution of gray-level
intensity, i.e. pi = 1/256, then
– The number of bits needed to code each gray level is 8
bits. JJ
– The entropy of this image is 8. II
J
I
Back
Close
The Shannon-Fano Algorithm — Learn by Example
This is a basic information theoretic algorithm.
Symbol A B C D E
----------------------------------
JJ
Count 15 7 6 6 5
II
J
I
Back
Close
Encoding for the Shannon-Fano Algorithm:
• A top-down approach
1. Sort symbols (Tree Sort) according to their
frequencies/probabilities, e.g., ABCDE.
353
2. Recursively divide into two parts, each with approx. same
number of counts.
JJ
II
J
I
Back
Close
3. Assemble code by depth first traversal of tree to symbol
node
Symbol Count log(1/p) Code Subtotal (# of bits)
------ ----- -------- --------- -------------------
A 15 1.38 00 30
B 7 2.48 01 14
C 6 2.70 10 12 354
D 6 2.70 110 18
E 5 2.96 111 15
TOTAL (# of bits): 89
JJ
II
J
I
Back
Close
Huffman Coding
• Based on the frequency of occurrence of a data item
(pixels or small blocks of pixels in images).
• Use a lower number of bits to encode more frequent data
355
• Codes are stored in a Code Book — as for Shannon (previous
slides)
• Code book constructed for each image or a set of images.
• Code book plus encoded data must be transmitted to enable
decoding.
JJ
II
J
I
Back
Close
Encoding for Huffman Algorithm:
• A bottom-up approach
1. Initialization: Put all nodes in an OPEN list, keep it sorted
at all times (e.g., ABCDE).
356
2. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest
frequencies/probabilities, create a parent node of them.
(b) Assign the sum of the children’s frequencies/probabilities
to the parent node and insert it into OPEN.
(c) Assign code 0, 1 to the two branches of the tree, and
delete the children from OPEN.
JJ
II
J
I
Back
Close
357
JJ
II
J
I
Back
Close
Huffman Entropy
359
JJ
II
J
I
Back
Close
Huffman Coding of Images
In order to encode images:
• Divide image up into (typically) 8x8 blocks
• Each block is a symbol to be coded 360
algorithms.
ENCODER DECODER
------- -------
Initialize_model(); Initialize_model();
while ((c = getc (input)) while ((c = decode (input))
!= eof) != eof)
{ {
encode (c, output); putc (c, output);
update_model (c); update_model (c); JJ
} } II
J
I
Back
Close
• Key: encoder and decoder use same initialization and
update model routines.
• update model does two things:
(a) increment the count,
(b) update the Huffman tree. 363
JJ
II
J
I
Back
Close
Arithmetic Coding
• A widely used entropy coder
• Also used in JPEG — more soon
• Only problem is it’s speed due possibly complex computations 365
due to large symbol tables,
• Good compression ratio (better than Huffman coding),
entropy around the Shannon Ideal value.
Why better than Huffman?
• Huffman coding etc. use an integer number (k) of bits for
each symbol,
– hence k is never less than 1. JJ
II
• Sometimes, e.g., when sending a 1-bit image, compression J
becomes impossible. I
Back
Close
Decimal Static Arithmetic Coding
JJ
II
J
I
Back
Close
Basic Idea
The idea behind arithmetic coding is
• To have a probability line, 0–1, and
• Assign to every symbol a range in this line based on its 367
probability,
• The higher the probability, the higher range which assigns to
it.
BACA
Therefore
369
• Sort symbols highest probability first
Symbol Range
A [0.0, 0.5)
B [0.5, 0.75)
C [0.75, 1.0)
The first symbol in our example stream is B
JJ
• We now know that the code will be in the range 0.5 to 0.74999 . . .. II
J
I
Back
Close
Range is not yet unique
• Need to narrow down the range to give us a unique code.
JJ
II
J
I
Back
Close
Subdivide the range as follows
For all the symbols
• Range = high - low
• High = low + range * high range of the symbol being coded 371
Symbol Range
BA [0.5, 0.625)
BB [0.625, 0.6875)
BC [0.6875, 0.75)
JJ
II
J
I
Back
Close
Third Iteration
Symbol Range
BAA [0.5, 0.5625)
BAB [0.5625, 0.59375)
BAC [0.59375, 0.625)
JJ
II
J
I
Back
Close
Fourth Iteration
Subdivide again
(Range = 0.03125, Low = 0.59375, High = 0.625):
374
Symbol Range
BACA [0.59375, 0.60937)
BACB [0.609375, 0.6171875)
BACC [0.6171875, 0.625)
JJ
II
J
I
Back
Close
Binary static algorithmic coding
This is very similar to above:
• except we us binary fractions.
Binary fractions are simply an extension of the binary systems 376
into fractions much like decimal fractions.
JJ
II
J
I
Back
Close
Binary Fractions — Quick Guide
Fractions in decimal:
So in binary we get
XXY
Therefore: 378
prob(X) = 2/3
prob(Y) = 1/3
JJ
II
J
I
Back
Close
• To encode message, just send enough bits of a binary fraction
that uniquely specifies the interval.
379
JJ
II
J
I
Back
Close
• Similarly, we can map all possible length 3 messages to
intervals in the range [0..1]:
380
JJ
II
J
I
Back
Close
Implementation Issues
FPU Precision
• Resolution of the of the number we represent is limited by
FPU precision
381
• Binary coding extreme example of rounding
• Decimal coding is the other extreme — theoretically no
rounding.
• Some FPUs may us up to 80 bits
• As an example let us consider working with 16 bit resolution.
JJ
II
J
I
Back
Close
16-bit arithmetic coding
JJ
II
J
I
Back
Close
Estimating Probabilities - Dynamic Arithmetic Coding?
How to determine probabilities?
• If we have a static stream we simply count the tokens.
Could use a priori information for static or dynamic if scenario 385
familiar.
Solution:
• Find a way to build the dictionary adaptively.
• Original methods (LZ) due to Lempel and Ziv in 1977/8.
• Terry Welch improved the scheme in 1984,
Patented LZW Algorithm
JJ
II
J
I
Back
Close
LZW Compression Algorithm
The LZW Compression Algorithm can summarised as follows:
w = NIL;
while ( read a character k )
{ 388
JJ
II
J
I
Back
Close
Entropy Encoding Summary
• Huffman maps fixed length symbols to variable length codes.
Optimal only when symbol probabilities are powers of 2.
• Arithmetic maps entire message to real number range based
393
on statistics. Theoretically optimal for long messages, but
optimality depends on data model. Also can be CPU/memory
intensive.
• Lempel-Ziv-Welch is a dictionary-based compression method.
It maps a variable number of symbols to a fixed length code.
• Adaptive algorithms do not need a priori estimation of
probabilities, they are more useful in real applications.
JJ
II
J
I
Back
Close
Lossy Compression: Source Coding Techniques
Source coding is based changing on the content of the original
signal.
1. Take top left pixel as the base value for the block, pixel A.
2. Calculate three other transformed values by taking the
difference between these (respective) pixels and pixel A,
Ii.e. B-A, C-A, D-A.
3. Store the base pixel and the differences as the values of the
transform.
JJ
II
J
I
Back
Close
Simple Transforms
Given the above we can easily form the forward transform:
X0 = A
396
X1 = B−A
X2 = C −A
X3 = D−A
An = X0
Bn = X1 + X0 JJ
Cn = X2 + X0 II
J
Dn = X3 + X0 I
Back
Close
Compressing data with this Transform?
Exploit redundancy in the data:
• Redundancy transformed to values, Xi.
• Compress the data by using fewer bits to represent the 397
differences.
– I.e if we use 8 bits per pixel then the 2x2 block uses 32
bits
– If we keep 8 bits for the base pixel, X0,
– Assign 4 bits for each difference then we only use 20 bits.
– Better than an average 5 bits/pixel
JJ
II
J
I
Back
Close
Example
Consider the following 4x4 image block:
120 130
125 120 398
then we get:
X0 = 120
X1 = 10
X2 = 5
X3 = 0
JJ
II
We can then compress these values by taking less bits to J
represent the data. I
Back
Close
Inadequacies of Simple Scheme
• It is Too Simple
• Needs to operate on larger blocks (typically 8x8 min)
• Simple encoding of differences for large values will result in 399
loss of information
– V. poor losses possible here 4 bits per pixel = values 0-15
unsigned,
– Signed value range: −7 – 7 so either quantise in multiples
of 255/max value or massive overflow!!
• More advance transform encoding techniques are very
common – DCT
JJ
II
J
I
Back
Close
Frequency Domain Methods
JJ
II
J
I
Back
Close
1D Example
Lets consider a 1D (e.g. Audio) example to see what the different
domains mean:
JJ
II
J
I
Back
Close
An 8 Hz Sine Wave (Cont.)
403
JJ
II
J
I
Back
Close
2D Image Example
Now images are no more complex really:
• Brightness along a line can be recorded as a set of values
measured at equally spaced distances apart,
404
• Or equivalently, at a set of spatial frequency values.
• Each of these frequency values is a frequency component.
• An image is a 2D array of pixel measurements.
• We form a 2D grid of spatial frequencies.
• A given frequency component now specifies what contribution
is made by data which is changing with specified x and y
direction spatial frequencies. JJ
II
J
I
Back
Close
What do frequencies mean in an image?
• Large values at high frequency components then the data
is changing rapidly on a short distance scale.
e.g. a page of text
405
• Large low frequency components then the large scale features
of the picture are more important.
e.g. a single fairly simple object which occupies most of the
image.
JJ
II
J
I
Back
Close
So How Compress (colour) images?
• The 2D matrix of the frequency content is with regard to
colour/chrominance:
• This shows if values are changing rapidly or slowly.
406
• Where the fraction, or value in the frequency matrix is low,
the colour is changing gradually.
• Human eye is insensitive to gradual changes in colour and
sensitive to intensity.
• Ignore gradual changes in colour SO
• Basic Idea: Attempt to throw away data without the human
eye noticing, we hope.
JJ
II
J
I
Back
Close
How can the Frequency Domain Transforms Help to Compress?
Any function (signal) can be decomposed into purely sinusoidal
components (sine waves of different size/shape) which when
added together make up our original signal.
407
JJ
II
J
Figure 39: DFT of a Square Wave I
Back
Close
Thus Transforming a signal into the frequency domain allows
us
• To see what sine waves make up our underlying signal
• E.g.
408
– One part sinusoidal wave at 50 Hz and
– Second part sinusoidal wave at 200 Hz.
More complex signals will give more complex graphs but the
idea is exactly the same. The graph of the frequency domain is
called the frequency spectrum.
JJ
II
J
I
Back
Close
Visualising this: Think Graphic Equaliser
JJ
II
J
I
Back
Close
Fourier Theory
412
Z ∞
f (x) = F (u)e2πixu du, (2)
−∞
JJ
II
J
I
Back
Close
Example Fourier Transform
Let’s see how we compute a Fourier Transform: consider a
particular function f (x) defined as
1 if |x| ≤ 1
f (x) = (3) 413
0 otherwise,
JJ
II
J
Figure 41: A top hat function I
Back
Close
So its Fourier transform is:
Z ∞
F (u) = f (x)e−2πixu dx
Z−∞1
= 1 × e−2πixu dx
−1 414
−1 2πiu
= (e − e−2πiu)
2πiu
sin 2πu
= . (4)
πu
In this case F (u) is purely real, which is a consequence of the
original data being symmetric in x and −x.
415
JJ
Figure 42: Fourier transform of a top hat function II
J
I
Back
Close
2D Case
If f (x, y) is a function, for example the brightness in an image,
its Fourier transform is given by
Z ∞Z ∞
F (u, v) = f (x, y)e−2πi(xu+yv) dx dy, (5) 416
−∞ −∞
Z ∞ Z ∞
f (x, y) = F (u, v)e2πi(xu+yv) du dv. (6)
−∞ −∞
JJ
II
J
I
Back
Close
Images are digitised !!
Thus, we need a discrete formulation of the Fourier transform,
which takes such regularly spaced data values, and returns the
value of the Fourier transform for a set of values in frequency
space which are equally spaced. 417
and
N
X −1 M
X −1
f (x, y) = F (u, v)e2πi(xu/N +yv/M ). (10)
u=0 v=0
JJ
II
J
I
Back
Close
Balancing the 2D DFT
Often N = M , and it is then it is more convenient to redefine
F (u, v) by multiplying it by a factor of N , so that the forward and
inverse transforms are more symmetrical:
N −1 N −1 420
1 XX
F (u, v) = f (x, y)e−2πi(xu+yv)/N , (11)
N x=0 y=0
and
N −1 N −1
1 XX
f (x, y) = F (u, v)e2πi(xu+yv)/N . (12)
N u=0 v=0
JJ
II
J
I
Back
Close
Compression
How do we achieve compression?
• Low pass filter — ignore high frequency noise components
• Only store lower frequency components 421
JJ
II
J
I
Back
Close
Relationship between DCT and FFT
DCT (Discrete Cosine Transform) is actually a cut-down version
of the FFT:
• Only the real part of FFT
422
• Computationally simpler than FFT
• DCT — Effective for Multimedia Compression
• DCT MUCH more commonly used in Multimedia.
JJ
II
J
I
Back
Close
The Discrete Cosine Transform (DCT)
• Similar to the discrete Fourier transform:
– it transforms a signal or image from the spatial domain to
the frequency domain
423
– DCT can approximate lines well with fewer coefficients
f (i) = F −1(u)
12 N −1
2 X h π.u i
= Λ(i).cos (2i + 1) F (i)
N i=0
2.N
where JJ
( II
√1 forξ = 0 J
Λ(i) = 2
1 otherwise I
Back
Close
2D DCT
For a 2D N by M image 2D DCT is defined :
1 1 N −1 M −1
2 2 2 2 X X
F (u, v) = Λ(i).Λ(j).
N M
i=0 j=0
h π.u i h π.v i 425
cos (2i + 1) cos (2j + 1) .f (i, j)
2.N 2.M
f (i) = F −1 (u, v)
21 NX −1 M −1
2 X
= Λ(i)..Λ(j).
N
i=0 j=0
h π.u i h π.v i
cos (2i + 1) .cos (2j + 1) .F (i, j)
2.N 2.M JJ
where II
( J
√1 forξ = 0
Λ(ξ) = 2 I
1 otherwise
Back
Close
Performing DCT Computations
The basic operation of the DCT is as follows:
• The input image is N by M;
• f(i,j) is the intensity of the pixel in row i and column j; 426
JJ
II
J
I
Back
Close
Computational Issues (1)
• Image is partitioned into 8 x 8 regions — The DCT input is
an 8 x 8 array of integers.
• An 8 point DCT would be:
428
1X h π.u i
F (u, v) = Λ(i).Λ(j).cos (2i + 1) .
4 i,j 16
h π.u i
cos (2i + 1) f (i, j)
16
where
√1
forξ = 0 JJ
Λ(ξ) = 2
1 otherwise II
• The output array of DCT coefficients contains integers; these can J
range from -1024 to 1023. I
Back
Close
Computational Issues (2)
• Computationally easier to implement and more efficient to
regard the DCT as a set of basis functions
– Given a known input array size (8 x 8) can be precomputed
429
and stored.
– Computing values for a convolution mask (8 x 8 window)
that get applied
∗ Sum values x pixel the window overlap with image apply
window across all rows/columns of image
– The values as simply calculated from DCT formula.
JJ
II
J
I
Back
Close
Computational Issues (3)
Visualisation of DCT basis functions
430
JJ
II
Figure 44: The 64 (8 x 8) DCT basis functions J
I
Back
Close
Computational Issues (4)
JJ
II
J
Figure 45: 2x1D Factored 2D DCT Computation I
Back
Close
Computational Issues (5)
Actual Data: 9 10 7 6
435
Predicted Data: 0 9 10 7
JJ
II
J
I
Back
Close
Differential Encoding Methods (Cont.)
• Delta modulation is a special case of DPCM:
– Same predictor function,
– Coding error is a single bit or digit that indicates the
436
current sample should be increased or decreased by a
step.
– Not Suitable for rapidly changing signals.
• Adaptive pulse code modulation
JJ
II
J
I
Back
Close
Compression II: Images (JPEG)
What is JPEG?
438
• JPEG: Joint Photographic Expert Group — an international
standard in 1992.
• Works with colour and greyscale images
• Up 24 bit colour images (Unlike GIF)
• Target Photographic Quality Images (Unlike GIF)
• Suitable Many applications e.g., satellite, medical, general
photography... JJ
II
J
I
Back
Close
Basic JPEG Compression Pipeline
JPEG compression involves the following:
• Encoding
439
JJ
II
Figure 46: JPEG Encoding J
I
• Decoding – Reverse the order for encoding Back
Close
Major Coding Algorithms in JPEG
The Major Steps in JPEG Coding involve:
• Colour Space Transform and subsampling (YIQ)
• DCT (Discrete Cosine Transformation) 440
• Quantization
• Zigzag Scan
• DPCM on DC component
• RLE on AC Components
• Entropy Coding — Huffman or Arithmetic
JJ
We have met most of the algorithms already: II
• JPEG exploits them in the compression pipeline to achieve J
maximal overall compression. I
Back
Close
Quantization
Why do we need to quantise:
• To throw out bits from DCT.
• Example: 101101 = 45 (6 bits). 441
Truncate to 4 bits: 1011 = 11.
Truncate to 3 bits: 101 = 5.
• Quantization error is the main source of Lossy Compression.
• DCT itself not Lossy
• How we throw away bits in Quantization Step is Lossy
JJ
II
J
I
Back
Close
Uniform quantization
JJ
II
J
I
Back
Close
Quantization Tables
• In JPEG, each F[u,v] is divided by a constant q(u,v).
• Table of q(u,v) is called quantization table.
• Eye is most sensitive to low frequencies (upper left corner), 443
less sensitive to high frequencies (lower right corner)
• Standard defines 2 default quantization tables, one for
luminance (below), one for chrominance.
----------------------------------
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92 JJ
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
II
---------------------------------- J
I
Back
Close
Quantization Tables (Cont)
444
E.g., if we doubled them all?
JJ
II
J
I
Back
Close
Differential Pulse Code Modulation (DPCM) on DC component
JJ
II
J
I
Back
Close
Run Length Encode (RLE) on AC components
Yet another simple compression technique is applied to the AC
component:
• 1x64 vector has lots of zeros in it
447
• Encode as (skip, value) pairs, where skip is the number of
zeros and value is the next non-zero component.
• Send (0,0) as end-of-block sentinel value.
JJ
II
J
I
Back
Close
Entropy Coding
DC and AC components finally need to be represented by a
smaller number of bits:
• Categorize DC values into SSS (number of bits needed to
448
represent) and actual bits.
--------------------
Value SSS
0 0
-1,1 1
-3,-2,2,3 2
-7..-4,4..7 3
--------------------
449
JJ
II
J
I
Back
Close
Another Enumerated Example
450
JJ
II
J
I
Back
Close
JPEG 2000
• New version released in 2002.
• Based on:
– discrete wavelet transform (DWT) ,instead of DCT, 451
– scalar quantization,
– context modeling,
– arithmetic coding,
– post-compression rate allocation.
• Application: variety of uses, ranging from digital photography
to medical imaging to advanced digital scanning and printing.
• Higher compression efficiency — visually lossless JJ
compression at 1 bit per pixel or better. II
J
I
Back
Close
Further Information
Basic JPEG Information:
• http://www.jpeg.org
• Online JPEG Tutorial 452
For more information on the JPEG 2000 standard for still image
coding, refer to
http://www.jpeg.org/JPEG2000.htm
JJ
II
J
I
Back
Close
Compression III:
Video Compression (MPEG and others)
We need to compress video (more so than audio/images) in
practice since:
453
1. Uncompressed video (and audio) data are huge.
In HDTV, the bit rate easily exceeds 1 Gbps. — big problems
for storage and network communications.
E.g. HDTV: 1920 x 1080 at 30 frames per second, 8 bits per
RGB (YCrCb actually) channel = 1.5 Gbps.
2. Lossy methods have to employed since thecompression ratio
of lossless methods (e.g., Huffman, Arithmetic, LZW) is not
high enough for image and video compression, especially JJ
when distribution of pixel values is relatively flat. II
J
I
Back
Close
Not the complete picture studied here
compression:
• Earlier H.261 and MPEG 1 and 2 standards.
JJ
II
J
I
Back
Close
Compression Standards(1)
Image, Video and Audio Compression standards have been
specifies and released by two main groups since 1985:
ISO - International Standards Organisation: JPEG, MEPG.
455
ITU - International Telecommunications Union: H.261 — 264.
JJ
II
J
I
Back
Close
Compression Standards (2)
Whilst in many cases one of the groups have specified separate
standards there is some crossover between the groups.
JJ
II
J
I
Back
Close
Simple Motion Estimation/Compensation Example
458
JJ
II
J
I
Back
Close
Simple Motion Example (Cont.)
Consider a simple image (block) of a moving circle.
JJ
II
J
I
Back
Close
Now lets Estimate Motion of blocks
We will examine methods of estimating motion vectors in due
course.
460
JJ
Figure 47: Motion estimation/compensation (encoding) II
J
I
Back
Close
Decoding Motion of blocks
461
JJ
II
J
I
Back
Close
H.261 Compression
The basic approach to H. 261 Compression is summarised as
follows:
H. 261 Compression has been specifically designed for video
telecommunication applications: 464
JJ
II
J
I
Back
Close
Overview of H.261
• Frame types are CCIR 601 CIF (352x288) and
QCIF (176x144) images with 4:2:0 subsampling.
• Two frame types:
Intraframes (I-frames) and Interframes (P-frames)
• I-frames use basically JPEG — but YUV (YCrCb) and larger DCT windows, 465
different quantisation
• I-frame provide us with a (re)fresh accessing point — Key Frames
• P-frames use pseudo-differences from previous frame (predicted), so frames
depend on each other.
JJ
II
J
I
Back
Close
Intra Frame Coding
• Various lossless and lossy compression techniques use
• Compression contained only within the current frame
• Simpler coding – Not enough by itself for high compression. 466
JJ
II
J
I
Back
Close
Intraframe coding is very similar to that of a JPEG still image
video encoder:
467
JJ
II
J
I
Back
Close
This is a basic Intra Frame Coding Scheme is as follows:
• Macroblocks are typically 16x16 pixel areas on Y plane of
original image.
• A macroblock usually consists of 4 Y blocks, 1 Cr block, and
468
1 Cb block. (4:2:0 chroma subsampling)
– Eye most sensitive luminance, less sensitive chrominance.
– S0 operate on an effective color space: YUV (YCbCr)
colour which we have met.
– Typical to use 4:2:0 macroblocks: one quarter of the
chrominance information used.
• Quantization is by constant value for all DCT coefficients.
I.e., no quantization table as in JPEG. JJ
II
J
I
Back
Close
The Macroblock is coded as follows:
JJ
II
J
I
Back
Close
P-coding can be summarised as follows:
473
JJ
II
J
I
Back
Close
A Coding Example (P-frame)
• Previous image is called reference image.
• Image to code is called target image.
• Actually, the difference is encoded. 474
• Subtle points:
1. Need to use decoded image as reference image,
not original. Why?
2. We’re using ”Mean Absolute Difference” (MAD) to decide
best block.
Can also use ”Mean Squared Error” (MSE) = sum(E*E)
JJ
II
J
I
Back
Close
Hard Problems in H.261
There are however a few difficult problems in H.261:
• Motion vector search
• Propagation of Errors 475
• Bit-rate Control
JJ
II
J
I
Back
Close
Motion Vector Search
476
JJ
II
J
I
Back
Close
• C(x + k, y + i) – pixels in the macro block with upper left
corner (x, y) in the Target.
• R(X + i + k, y + j + l) – pixels in the macro block with upper
left corner (x + i, y + j) in the Reference.
• Cost function is: 477
478
479
JJ
II
J
I
Back
Close
Bit-rate Control
• Simple feedback loop based on ”buffer fullness”
If buffer is too full, increase the quantization scale factor to
reduce the data.
480
JJ
II
J
I
Back
Close
MPEG Compression
MPEG stands for:
• Motion Picture Expert Group — established circa 1990 to
create standard for delivery of audio and video
481
• MPEG-1 (1991).Target: VHS quality on a CD-ROM (320 x
240 + CD audio @ 1.5 Mbits/sec)
• MPEG-2 (1994): Target Television Broadcast
• MPEG-3: HDTV but subsumed into and extension of MPEG-2
• MPEG 4 (1998): Very Low Bitrate Audio-Visual Coding
• MPEG-7 (2001) ”Multimedia Content Description Interface”.
• MPEG-21 (2002) ”Multimedia Framework” JJ
II
J
I
Back
Close
Three Parts to MPEG
• The MPEG standard had three parts:
1. Video: based on H.261 and JPEG
2. Audio: based on MUSICAM technology
482
3. System: control interleaving of streams
JJ
II
J
I
Back
Close
MPEG Video
MPEG compression is essentially a attempts to over come some
shortcomings of H.261 and JPEG:
• Recall H.261 dependencies:
483
JJ
II
J
I
Back
Close
• The Problem here is that many macroblocks need information
is not in the reference frame.
• For example:
484
JJ
II
J
I
Back
Close
• The MPEG solution is to add a third frame type which is a
bidirectional frame, or B-frame
• B-frames search for macroblock in past and future frames.
• Typical pattern is IBBPBBPBB IBBPBBPBB IBBPBBPBB
485
Actual pattern is up to encoder, and need not be regular.
JJ
II
J
I
Back
Close
MPEG Video Layers (1)
MPEG video is broken up into a hierarchy of layers to help
• Error handling,
• Random search and editing, and 486
JJ
II
J
I
Back
Close
MPEG Video Layers (2)
From the top level, the layers are
Video sequence layer — any self-contained bitstream.
For example a coded movie or advertisement.
487
Group of pictures – composed of 1 or more groups of intra (I)
frames and/or non-intra (P and/or B) pictures.
Picture layer — itself,
Slice Layer — layer beneath Picture it is called the slice layer.
JJ
II
J
I
Back
Close
Slice Layer
• Each slice: a contiguous sequence of raster ordered
macroblocks,
• Each macroblock ordered on row basis in typical video
488
applications
• Each macroblock is 16x16 arrays of
– luminance pixels, or
– picture data elements, with 2 8x8 arrays of associated
chrominance pixels.
• Macroblocks may be further divided into distinct 8x8 blocks,
for further processing such as transform coding.
JJ
II
J
I
Back
Close
Coding Layers in Macroblock
• Each of layers has its own unique 32 bit start code :
– 23 zero bits followed by a one, then followed by
– 8 bits for the actual start code.
489
– Start codes may have as many zero bits as desired
preceding them.
JJ
II
J
I
Back
Close
B-Frames
New from H.261
• MPEG uses forward/backward interpolated prediction.
• Frames are commonly referred to as bi-directional interpolated 490
prediction frames, or B frames for short.
JJ
II
J
I
Back
Close
Example I, P, and B frames
Consider a group of pictures that lasts for 6 frames:
• Given
I,B,P,B,P,B,I,B,P,B,P,B, 491
JJ
II
J
Figure 50: B-Frame Encoding I
Back
Close
Also NOTE:
• No defined limit to the number of consecutive B frames that
may be used in a group of pictures,
• Optimal number is application dependent.
493
• Most broadcast quality applications however, have tended
to use 2 consecutive B frames (I,B,B,P,B,B,P,) as the ideal
trade-off between compression efficiency and video quality.
JJ
II
J
I
Back
Close
Advantage of the usage of B frames
• Coding efficiency.
• Most B frames use less bits.
• Quality can also be improved in the case of moving objects 494
that reveal hidden areas within a video sequence.
• Better Error propagation: B frames are not used to predict
future frames, errors generated will not be propagated further
within the sequence.
Disadvantage:
• Frame reconstruction memory buffers within the encoder and
decoder must be doubled in size to accommodate the 2 JJ
anchor frames. II
J
I
Back
Close
Motion Estimation
• The temporal prediction technique used in MPEG video is
based on motion estimation. 495
JJ
II
J
I
Back
Close
Motion Vectors, Matching Blocks
Figure 52 shows an example of a particular macroblock from
Frame 2 of Figure 51, relative to various macroblocks of Frame
1.
499
• The top frame has a bad match with the macroblock to be coded.
• The middle frame has a fair match, as there is some commonality between the
2 macroblocks.
• The bottom frame has the best match, with only a slight error between the 2
macroblocks.
• Because a relatively good match has been found, the encoder assigns motion
vectors to that macroblock,
• Each forward and backward predicted macroblock may contain 2 motion vectors,
• Achieved true bidirectionally predicted macroblocks will utilize 4 motion vectors. JJ
II
J
I
Back
Close
500
JJ
II
J
I
Back
Close
Figure 53 shows how a potential predicted Frame 2 can be
generated from Frame 1 by using motion estimation.
• The predicted frame is subtracted from the desired frame,
• Leaving a (hopefully) less complicated residual error frame
501
that can then be encoded much more efficiently than before
motion estimation.
• The more accurate the motion is estimated and matched,
the more likely it will be that the residual error will approach
zero,
• And the coding efficiency will be highest.
JJ
II
J
I
Back
Close
Further coding efficiency
• Motion vectors tend to be highly correlated between
macroblocks:
– The horizontal component is compared to the previously
502
valid horizontal motion vector and
– Only the difference is coded.
– Same difference is calculated for the vertical component
– Difference codes are then described with a variable length
code for maximum compression efficiency.
JJ
II
J
I
Back
Close
What Happens if we find acceptable match?
503
JJ
II
J
I
Back
Close
SAD Computation
SAD is computed by:
For i = -n to +n
For j = -m to +m
505
l=N −1 j=N −1
SAD(i, j) = Σk=0 Σj=0
| C(x + k, y + l) − R(x + i + k, y + j + l) |
−1 j=N −1
SAD(i, j) = Σl=N
k=0 Σj=0
506
JJ
II
J
I
Back
Close
SAD Search Example
So for a ± 2x2 Search Area is given by dashed lines and a 2x2
Macroblock window example, the SAD is given by bold dot dash
line (near top right corner) in Figure 54.
507
JJ
II
J
Figure 54: SAD Window search Example I
Back
Close
Selecting Intra/Inter Frame coding
Based upon the motion estimation a decision is made on
whether INTRA or INTER coding is made.
calculation:
ΣN −1
i=0,j=0 |C(i,j)|
M Bmean = N
A = Σn,m
i=0,j=0 | C(i, j) − M Bmean | ∗(!alphac (i, j) = 0)
JJ
II
J
I
Back
Close
MPEG-2, MPEG-3, and MPEG-4
----------------------------------------------------------------
Level size Pixels/sec bit-rate Application 512
(Mbits)
----------------------------------------------------------------
Low 352 x 240 3 M 4 consumer tape equiv.
Main 720 x 480 10 M 15 studio TV
High 1440 1440 x 1152 47 M 60 consumer HDTV
High 1920 x 1080 63 M 80 film production
-----------------------------------------------------------------
JJ
II
J
I
Back
Close
Compression IV:
Audio Compression (MPEG and others)
As with video a number of compression techniques have been
applied to audio.
514
JJ
II
J
I
Back
Close
Simple But Limited Practical Methods Continued ....
JJ
II
J
I
Back
Close
Psychoacoustics or Perceptual Coding
Basic Idea: Exploit areas where the human ear is less sensitive
to sound to achieve compression
E.g. MPEG audio
How do we hear sound? 520
JJ
II
J
I
Back
Close
Sound revisited
• Sound is produced by a vibrating source.
• The vibrations disturb air molecules
• Produce variations in air pressure:
521
lower than average pressure, rarefactions, and
higher than average, compressions.
This produces sound waves.
• When a sound wave impinges on a surface (e.g. eardrum or
microphone) it causes the surface to vibrate in sympathy:
JJ
II
J
• In this way acoustic energy is transferred from a source to a I
receptor. Back
Close
Human Hearing
• Upon receiving the the waveform the eardrum vibrates in
sympathy
• Through a variety of mechanisms the acoustic energy is
522
transferred to nerve impulses that the brain interprets as
sound.
The ear can be regarded as being made up of 3 parts:
• The outer ear,
• The middle ear,
• The inner ear.
JJ
II
J
I
Back
Close
Human Ear
We consider:
• The function of the main parts of the ear
• How the transmission of sound is processed. 523
JJ
II
J
I
Back
Close
The Outer Ear
524
525
526
The Cochlea:
Stereocilia
• Inner surface of the cochlea (the basilar membrane) is lined
with over 20,000 hair-like nerve cells — stereocilia,
• One of the most critical aspects of hearing.
JJ
II
J
I
Back
Close
Stereocilia Microscope Images
528
JJ
II
J
I
Back
Close
Hearing different frequencies
• Basilar membrane is tight at one end, looser at the other
• High tones create their greatest crests where the membrane
is tight,
529
• Low tones where the wall is slack.
• Causes resonant frequencies much like what happens in a
tight string.
• Stereocilia differ in length by minuscule amounts
• they also have different degrees of resiliency to the fluid
which passes over them.
JJ
II
J
I
Back
Close
Finally to nerve signals
532
JJ
II
J
I
Back
Close
Frequency dependence is also level dependent!
Ear response is even more complicated.
Complex phenomenon to explain.
Illustration : Loudness Curves or Fletcher-Munson Curves:
533
JJ
II
J
I
Back
Close
What do the curves mean?
534
JJ
II
J
I
Back
Close
Traits of Human Hearing
Frequency Masking
• Multiple frequency audio changes the sensitivity with the
relative amplitude of the signals.
536
• If the frequencies are close and the amplitude of one is less
than the other close frequency then the second frequency
may not be heard.
JJ
II
J
I
Back
Close
Critical Bands
• Range of closeness for frequency masking depends on the
frequencies and relative amplitudes.
• Each band where frequencies are masked is called the Critical
Band 537
JJ
II
J
I
Back
Close
What is the cause of Frequency Masking?
• The stereocilia are excited by air pressure variations,
transmitted via outer and middle ear.
• Different stereocilia respond to different ranges of
538
frequencies — the critical bands
JJ
II
J
I
Back
Close
Example of Temporal Masking (Cont.)
JJ
II
J
I
Back
Close
Summary: How to Exploit?
• If we have a loud tone at, say at 1 kHz, then nearby quieter
tones are masked.
• Best compared on critical band scale – range of masking is
543
about 1 critical band
• Two factors for masking – frequency masking and temporal
masking
• Question: How to use this for compression?
Two examples:
– MPEG Audio
JJ
– Dolby
II
J
I
Back
Close
How to compute?
We have met basic tools:
• Fourier and Discrete Cosine Transforms
• Work in frequency space 544
JJ
II
J
I
Back
Close
MPEG Audio Compression
JJ
II
J
I
Back
Close
Basic Frequency Filtering Bandpass
JJ
II
J
I
Back
Close
How good is MPEG compression?
Although (data) lossy
MPEG claims to be perceptually lossless:
• Human tests (part of standard development), Expert
547
listeners.
• 6-1 compression ratio, stereo 16 bit samples at 48 Khz
compressed to 256 kbits/sec
• Difficult, real world examples used.
• Under Optimal listening conditions no statistically
distinguishable difference between original and MPEG.
JJ
II
J
I
Back
Close
Basic MPEG: MPEG audio coders
JJ
II
J
I
Back
Close
An Advantage of MPEG approach
JJ
II
J
I
Back
Close
Basic MPEG: MPEG Facts
• MPEG-1: 1.5 Mbits/sec for audio and video
About 1.2 Mbits/sec for video, 0.3 Mbits/sec for audio
(Uncompressed CD audio is
44,100 samples/sec * 16 bits/sample * 2 channels > 1.4 Mbits/sec) 551
552
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (2)
The main stages of the algorithm are:
• The audio signal is first samples and quantised use PCM
– Application dependent: Sample rate and number of bits
553
• The PCM samples are then divided up into a number of
frequency subband and compute subband scaling factors:
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (3)
Analysis filters
• Also called critical-band filters
• Break signal up into equal width subbands
554
• Use fast Fourier transform (FFT) (or discrete cosine
transform (DCT))
• Filters divide audio signal into frequency subbands that
approximate the 32 critical bands
• Each band is known as a sub-band sample.
• Example: 16 kHz signal frequency, Sampling rate 32 kbits/sec
gives each subband a bandwidth of 500 Hz. JJ
• Time duration of each sampled segment of input signal is II
time to accumulate 12 successive sets of 32 PCM (subband) J
I
samples, i.e. 32*12 = 384 samples. Back
Close
Basic MPEG-1 Compression algorithm (4)
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (5)
Psychoacoustic modeller:
• Frequency Masking and may employ temporal masking.
556
• Performed concurrently with filtering and analysis operations.
• Determine amount of masking for each band caused by nearby
bands.
• Input: set hearing thresholds and subband masking
properties (model dependent) and scaling factors (above).
JJ
II
J
I
Back
Close
Basic MPEG-1 Compression algorithm (6)
Example of Quantisation:
• Assume that after analysis, the first levels of 16 of the 32
558
bands are these:
----------------------------------------------------------------------
Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
----------------------------------------------------------------------
559
JJ
II
J
I
Back
Close
MPEG Layers
• Best suited for bit rate bigger than 128 kbits/sec per channel.
• Example: Phillips Digital Compact Cassette uses Layer 1 562
192 kbits/sec compression
• Divides data into frames,
– Each of them contains 384 samples,
– 12 samples from each of the 32 filtered subbands as shown
above.
• Psychoacoustic model only uses frequency masking.
• Optional Cyclic Redundancy Code (CRC) error checking. JJ
II
J
I
Back
Close
Layer 2
• Targeted at bit rates of around 128 kbits/sec per channel.
• Examples: Coding of Digital Audio Broadcasting (DAB) on
CD-ROM, CD-I and Video CD.
• Enhancement of level 1. 563
--------------------------------------------------------------------
Layer Target Ratio Quality @ Quality @ Theoretical
bitrate 64 kbits 128 kbits Min. Delay
--------------------------------------------------------------------
Layer 1 192 kbit 4:1 --- --- 19 ms
Layer 2 128 kbit 6:1 2.1 to 2.6 4+ 35 ms
565
Layer 3 64 kbit 12:1 3.6 to 3.8 4+ 59 ms
--------------------------------------------------------------------
JJ
II
J
I
Back
Close
Bit Allocation
JJ
II
J
I
Back
Close
Bit Allocation For Layer I and 2
Encoding:
• Code some upper-frequency subband outputs:
– A single summed signal instead of sending independent left 572
JJ
II
J
I
Back
Close
Further MPEG Audio Standards
MPEG-2 audio
Extension of MPEG-1:
574
• Completed in November 1994.
• Multichannel audio support:
– 5 high fidelity audio channels,
– Additional low frequency enhancement channel.
– Applicable for the compression of audio for High Definition Television or
digital movies.
• Multilingual audio support:
– Supports up to 7 additional commentary channels.
JJ
II
J
I
Back
Close
MPEG-2 audio (Cont.)
JJ
II
J
I
Back
Close
MPEG-1/MPEG-2 Compatibility
Forward/backward compatibility?
• MPEG-2 decoders can decode MPEG-1 audio bitstreams.
• MPEG-1 decoders can decode two main channels of MPEG-2
576
audio bitstreams.
– Achieved by combining suitably weighted versions of each
of the up to 5 channels into a down-mixed left and right
channel.
– These two channels fit into the audio data framework of a
MPEG-1 audio bitstream.
– Information needed to recover the original left, right, and
remaining channels fit into: JJ
• The ancillary data portion of a MPEG-1 audio bitstream, II
or J
I
• In a separate auxiliary bitstream.
Back
Close
MPEG-3/MPEG-4
MPEG-3 audio:
• does not exist anymore — merged with MPEG-2
MPEG-4 audio: 577
• Previously studied
• Uses structures audio concept
• Delegates audio production to client synthesis where
appropriate
• Otherwise compress audio stream as above.
JJ
II
J
I
Back
Close
Dolby Audio Compression
Application areas:
• FM radio Satellite transmission and broadcast TV audio
578
(DOLBY AC-1)
• Common compression format in PC sound cards
(DOLBY AC-2)
• High Definition TV standard advanced television (ATV)
(DOLBY AC-3). MPEG a competitor in this area.
JJ
II
J
I
Back
Close
Differences with MPEG
• MPEG perceptual coders control quantisation accuracy of
each subband by computing bit numbers for each sample.
• MPEG needs to store each quantise value with each sample.
579
• MPEG Decoder uses this information to dequantise:
forward adaptive bit allocation
• Advantage of MPEG?: no need for psychoacoustic
modelling in the decoder due to store of every quantise value.
• DOLBY: Use fixed bit rate allocation for each subband.
– No need to send with each frame — as in MPEG.
– DOLBY encoders and decoder need this information. JJ
II
J
I
Back
Close
Fixed Bit Rate Allocation
JJ
II
J
I
Back
Close
Different Dolby standards
DOLBY AC-1 :
Low complexity psychoacoustic model 581
JJ
II
J
I
Back
Close
Integrating media (Cont.):
JJ
II
J
I
Back
Close
Synchronisation
JJ
II
J
I
Back
Close
Interchange Between Applications
JJ
II
J
I
Back
Close
Quicktime
Introduction
• QuickTime is the most widely used cross-platform multimedia
technology available today.
592
• QuickTime now has powerful streaming capabilities, so you
can enjoy watching live events as they happen.
• Developed by Apple The QuickTime 6 (2002) is the latest
version
• It includes streaming capabilities as well as the tools needed
to create, edit, and save QuickTime movies.
• These tools include the QuickTime Player, PictureViewer, JJ
and the QuickTime Plug-in. II
J
I
Back
Close
Quicktime Main Features
Versatile support for web-based media
• Access to live and storedstreaming media content with the
QuickTime Player
593
• High-Quality Low-Bandwidth delivery of multimedia
• Easy view of QuickTime movies (with enhanced control) in
Web Browsers and applications.
• Multi platform support.
• Built in support for most popular Internet media formats
(well over 40 formats).
• Easy import/export of movies in the QuickTime Player JJ
II
J
I
Back
Close
Sophisticated playback capabilities
JJ
II
J
I
Back
Close
Easy content authoring and editing
JJ
II
J
I
Back
Close
Quicktime Support of Media Formats
QuickTime is an open standard:
• Embraces other standards and incorporates them into its environment.
• It supports every major file format for pictures, including BMP, GIF, JPEG, PICT,
and PNG. Even JPEG 2000.
596
• QuickTime also supports every important professional file format for video,
including AVI, AVR, DV (Digital Video), M-JPEG, MPEG-1 – MPEG-4, and
OpenDML.
• All common Audio format — incl. MPEG-4 Structured Audio.
• MIDI standards support including as the Roland Sound Canvas sound set and
the GM/GS format extensions.
• Other multimedia — FLASH support.
• Other Multimedia integration standards — SMIL
• Key standards for web streaming, including HTTP, RTP, and RTSP as set forth
by the Internet Engineering Task Force, are supported as well. JJ
• Speech Models — synthesised speech II
J
• QuickTime supports Timecode tracks, including the critical
standard for video timecode (SMPTE) and for musicians. I
Back
Close
QuickTime Concepts
To following concepts QuickTime are used by Quicktime:
Movies and Media Data Structures —
• A continuous stream of data —cf. a traditional movie, whether 597
stored on film, laser disk, or tape.
• A QuickTime movie can consist of data in sequences from
different forms, such as analog video and CD-ROM.
• The movie is not the medium; it is the organizing principle.
• Contains several tracks.
• Each track refers to a media that contains references to the
movie data, which may be stored as images or sound on hard
disks, floppy disks, compact discs, or other devices.
• The data references constitute the track’s media. JJ
• Each track has a single media data structure. II
J
I
Back
Close
Components —
• Provided so that every application doesn’t need to know about
all possible types of audio, visual, and storage devices.
• A component is a code resource that is registered by the
Component Manager.
598
• The component’s code can be available as a system wide
resource or in a resource that is local to a particular application.
• Each QuickTime component supports a defined set of features
and presents a specified functional interface to its client
applications.
• Applications are thereby isolated from the details of
implementing and managing a given technology.
• For example, you could create a component that supports a
certain data encryption algorithm. JJ
• Applications could then use your algorithm by connecting to II
your component through the Component Manager, rather than J
by implementing the algorithm over again. I
Back
Close
Image Compression —
JJ
II
J
I
Back
Close
Time —
JJ
II
J
I
Back
Close
Time (cont.) —
• The number of units that pass per second quantifies the
scale–that is, a time scale of 26 means that 26 units pass
per second and each time unit is 1/26 of a second.
• A time coordinate system also contains a duration, which 601
JJ
II
J
I
Back
Close
The QuickTime Architecture (Cont.)
JJ
II
J
I
Figure 55: Quicktime Architecture Back
Close
The Movie Toolbox
allows you to:
• store,
• retrieve, and
604
• manipulate time-based data
that is stored in QuickTime movies.
JJ
II
J
I
Back
Close
The Image Compression Manager :
JJ
II
J
I
Back
Close
QuickTime Components
JJ
II
J
I
Back
Close
QuickTime Components
Movie controller : Components, which allow applications to play
movies using a standard user interface standard image
compression dialog components, which allow the user to
specify the parameters for a compression operation by 608
supplying a dialog box or a similar mechanism
Image compressor : Components, which compress and
decompress image data sequence grabber components,
which allow applications to preview and record video and
sound data as QuickTime movies video digitizer components,
which allow applications to control video digitization by an
external device
Media data-exchange : Components, which allow applications JJ
to move various types of data in and out of a QuickTime II
J
movie derived media handler components, which allow QuickTime
I
to support new types of data in QuickTime movies Back
Close
QuickTime Components (Cont.)
Clock : Components, which provide timing services defined
for QuickTime applications preview components, which are
used by the Movie Toolbox’s standard file preview functions
to display and create visual previews for files sequence grabber 609
components, which allow applications to obtain digitized data
from sources that are external to a Macintosh computer
Sequence grabber : Channel components, which manipulate
captured data for a sequence grabber component
Sequence grabber panel : Components, which allow sequence
grabber components to obtain configuration information from
the user for a particular sequence grabber channel component
JJ
II
J
I
Back
Close
Open Media Framework Interchange (OMFI) Format
JJ
II
J
I
Back
Close
Target: Video/Audio Production
JJ
II
J
I
Back
Close
Digital TV Group UK
• UK digital TV interests are managed by the Digital TV Group
UK — http://www.dtg.org.uk/.
• Alternative (satellite) digital TV interest: SKY,
615
– uses a proprietary API format, called OPEN (!!).
– MHEG advantage: is a truly open format (ISO standard).
– MHEG is the only open standard in this area.
Further reading:
http://www.dtg.org.uk/reference/mheg/ mheg index.html
JJ
II
J
I
Back
Close
Digital TV services
What sort of multimedia services does digital TV provide?
616
JJ
II
Figure 56: UK Digital TV Consortium
J
I
Back
Close
The family of MHEG standards
Version Status
MHEG-1 International standard
618
MHEG-2 Withdrawn
MHEG-3 International standard
MHEG-4 International standard
MHEG-5 International standard
MHEG-6 International standard
(1998)
MHEG-7 International standard
(1999)
MHEG-8 (XML) Draft international standard
(Jan 1999) JJ
II
Table 2: MHEG Standards Timeline
J
I
Back
Close
MHEG-5 overview
The major goals of MHEG-5 are:
• To provide a good standard framework for the development
of client/server multimedia applications intended to run on a
619
memory-constrained Client.
• To define a final-form coded representation for interchange
of applications across platforms of different versions and
brands.
• To provide the basis for concrete conformance levelling,
guaranteeing that a conformant application will run on all
conformant terminals.
• To allow the runtime engine on the Client to be compact and JJ
easy to implement. II
J
• To be free of strong constraints on the architecture of the
I
Client. Back
Close
MHEG-5 Goals (Cont.)
621
622
JJ
II
J
I
Back
Close
MHEG Programming Principles
JJ
II
J
I
Back
Close
Basic MHEG Class Structure
JJ
II
J
I
Back
Close
Main MHEG classes
JJ
II
J
I
Back
Close
Ingredient subclasses (Cont.)
Stream — This class controls the synchronized presentation of
multiplexed audio-visual data (such as an MPEG-2 file).
• A Stream object consists of a list of components from the
Video, Audio, and RTGraphics (animated graphics) classes. 632
• The OriginalContent attribute of the Stream object refers
to the whole multiplex of data streams.
• When a Stream object is running, its streams can be switched
on and off independently — allows users switch between
different audio trails (different languages) or choose which video
stream(s) to present among a range of available ones.
• Specific events are associated with playback:
StreamPlaying/StreamStopped notifies the actual
initiation/termination and CounterTrigger notifies the system JJ
when a previously booked time-code event occurs. II
J
I
Back
Close
Ingredient subclasses (Cont.)
Link — The Link class implements event-action behavior by a
condition and an effect.
• The LinkCondition contains
633
– An EventSource — a reference to the object on which
the event occurs
– An EventType — specifies the kind of event and a
possible EventData that is a data value associated
with the event.
• MHEG-5 Action objects consist of a sequence of
elementary actions.
• Elementary actions are comparable to methods in standard JJ
object-oriented terminology. II
• The execution of an Action object means that each of its J
elementary actions are invoked sequentially. I
Back
Close
Simple Link Example
(link: Link1
event-source: EF1
event-type: #NewChar
event-data: ’A’
link-effect: JJ
II
(action: transition-to: Scene2)
J
) I
Back
Close
Interactible Object Class
(slider: Slider1
box-size: ( 40 5 )
original-position: ( 100 100 ) JJ
max-value: 20 II
orientation: #right J
) I
Back
Close
UK Digital Terrestrial MHEG Support:EuroMHEG
• Above only some main classes of MHEG addressed.
• Few other classes omitted.
• Gain a broad understanding of how MHEG works 637
2. Possibly, a link that reacts on the event is found. This link is then
fired. If no such link is found, the process starts again at 1.
3. The result of a link being fired is the execution of an action object,
which is a sequence of elementary actions. These can change
the state of other objects, create or destroy other objects, or
cause events to occur.
4. As a result of the actions being performed, synchronous events
may occur. These are dealt with immediately, i.e., before processing JJ
any other asynchronous events queued. II
When all events have been processed, the process starts again at J
1. I
Back
Close
Availability; Running Status
Before doing anything to an object, the MHEG-5 engine must
prepare it
• Preparing an object typically entails retrieving it from the
641
server, decoding the interchange format and creating the
corresponding internal data structures, and making the
object available for further processing.
• The preparation of an object is asynchronous; its completion
is signalled by an IsAvailable event.
• All objects that are part of an application or a scene have a
RunningStatus, which is either true or false.
• Objects whose RunningStatus is true are said to be JJ
running, which means that they perform the behaviour they II
J
are programmed for.
I
Back
Close
RunningStatus (Cont.)
JJ
II
J
I
Back
Close
Interactibles
Example:
• The way that a user enters characters in an EntryField
can be implemented in different ways in different MHEG-5
engines.
JJ
II
J
I
Back
Close
Running the MHEG Engine
You can run the applet through any Java enabled Web browser
or applet viewer. JJ
II
J
I
Back
Close
Running the MHEG Engine Applet
653
• The code and codebase paths — these specify where the
applications and applet classes reside.
• The groupIdentifiervalue — for most of the application
demos a startup MHEG file is reference first in a folder for
each application.
654
JJ
II
J
I
Back
Close
MHEG Example — The Demo MHEG Presentation
The Demo example produces the output:
656
JJ
Figure 60: MHEG Demo application Display1 Example
II
J
I
Back
Close
659
JJ
Figure 61: MHEG Demo application Display2 Example
II
J
I
Back
Close
660
JJ
II
Figure 62: MHEG Demo application Text Example J
I
Back
Close
661
JJ
Figure 63: MHEG Demo application Interactive Objects Example II
J
I
Back
Close
662
JJ
Figure 64: MHEG Demo application Elementary Actions Example
II
J
I
Back
Close
663
JJ
II
J
Figure 65: MHEG Demo application Conctrete Classes Example I
Back
Close
token.mhg — MHEG token groups example (Fig 66)
664
JJ
Figure 66: MHEG Demo application Token Groups Example
II
J
I
Back
Close
More Examples
Further examples are available in the applications folder:
bitmap — further examples of bitmaps in MHEG
interacting — further examples of interaction in MHEG 665
JJ
II
J
I
Back
Close
MHEG Relationships to Major Standards
Important relationships exist between MHEG-5 and other
standards and specifications.
Davic (Digital Audio Visual Council) — aims to maximize
666
interoperability across applications and services for the
broadcast and interactive domains.
JJ
II
J
I
Back
Close
MHEG Implementation (Cont.)
669
JJ
II
J
I
Back
Close
Run Time Engine (RTE)
JJ
II
J
I
Back
Close
Access module
This module provides a consistent API for accessing information
from different sources.
It’s used by the RTE to get objects and the PL to access
content data (either downloaded or streamed). 673
Typical applications should support:
• Bulk download for bitmaps, text, and MHEG-5 objects;
and
• Progressive download for audio and audiovisual streams.
JJ
II
J
I
Back
Close
DSMCC Interface—
JJ
II
J
I
Back
Close
MediaTouch (Cont.)
It is a visual-based Hierarchical Iconic authoring tool, similar to
Authorware in many approaches.
676
JJ
II
Figure 68: MediaTouch MHEG Authoring Tool J
(Hierarchy and Links Editor windows) I
Back
Close
MHEG Authoring Tools: MHEGDitor
MHEGDitor is an MHEG-5 authoring tool based on
Macromedia Director, composed of:
• An Authoring Xtra to edit applications — It opens a window
to set preferences and links a specific external script 677
operating systems
• The MHEGWrite extension supports only the MHEG-5
textual notation.
• provides macros for object templates, syntax highlighting
and syntax error detection.
JJ
II
J
I
Back
Close
Playing MHEG files — There are a few ways to play MHEG
files:
• MHEGPlayer is an MHEG-5 interpreter which is able to
execute MHEG-5 applications developed with
MHEGDitor or any other authoring tool. 679
JJ
II
J
I
Back
Close