0% found this document useful (0 votes)
158 views74 pages

SMPTE EG41 Engineering Guide

Uploaded by

samambaialote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views74 pages

SMPTE EG41 Engineering Guide

Uploaded by

samambaialote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Proposed SMPTE Engineering Guideline EG 41

for Television 
Material Exchange Format (MXF)
Engineering Guideline
(Informative)

Page 1 of 74 pages

Table of Contents
1 Scope

2 The MXF document structure

3 Introduction

4 File Interchange Requirements

5 A guide to the wording of the MXF standard

6 Metadata Classifications & Placement

7 MXF in Detail

8 MXF worked examples

Annex B

Preferred Enumerated String Values

Annex C

Bibliography

1 Scope
This Engineering Guideline gives an introduction to and the background for the Material Exchange Format
(MXF). This document describes the technology involved in the Format, the names of the various elements
within the Format, and the way in which the Format may be used within the real world applications.

Some parts of the descriptions within this document are generic to file formats, while other parts are specific to
the Material Exchange Format. There are descriptions of the object-oriented technology used within the MXF
Copyright © 2003 by THE SOCIETY OF THIS PROPOSAL IS PUBLISHED FOR COMMENT ONLY
MOTION PICTURE AND TELEVISION ENGINEERS
595 W. Hartsdale Ave., White Plains, NY 10607
(914) 761-1100
SMPTE EG41

Format, as well as a discussion of the Metadata that may be used within the file. There are worked examples
within this Engineering Guideline to guide implementers and hence improve the interoperability of applications
using different MXF implementations.

2 The MXF document structure


The MXF Specification is split into a number of separate parts in order to create a document structure that
allows new applications to be covered in the future. These parts are:

Part 1 Engineering Guidelines – Informative (this document is SMPTE EG 41)


Part 2 MXF File Format Specification – Normative (SMPTE 377M)
Part 3 Operational Patterns – Normative (e.g. Op1a is SMPTE 378M)
Part 4 MXF Descriptive Metadata Schemes – Normative (e.g. DMS-1 is SMPTE 380M)
Part 5 Essence Containers – Normative (e.g. the MXF Generic Container is SMPTE 379M)
Part 5a Mapping essence and metadata into the Essence Container – Normative (e.g. Mapping MPEG Streams
into the Generic Container is SMPTE 381M)

When implementing an MXF application or system, you should ensure that you have the latest version of all of
these documents. The individual Operational Patterns and Essence Container mappings will be independently
updated.

There are several parts to the MXF standard. This is Part 1, the MXF Engineering Guideline, which provides an
introduction and description. This document should be read first because it introduces many of the concepts and
explains what problem MXF is intended to solve. Part 1 also includes other Engineering Guidelines including a
Descriptive Metadata Engineering Guideline, which explains the concepts behind the use of Descriptive
Metadata in MXF files.

Part 2 is a normative definition of the Format of an MXF file. It is the toolbox from which different file interchange
tools are chosen to fulfill the requirements of different applications. The MXF File Format defines the syntax and
semantics of MXF Files.

Part 3 describes the Operational Patterns of the MXF Format. In order to create an application to solve a
particular interchange problem, some constraints and Structural Metadata definitions are required before
SMPTE 377M can be used. An Operational Pattern defines
those restrictions of the Format that allow interoperability Part 2 Part 1
between applications of defined levels of complexity. File Format
(normative)
Engineering
Guideline
Applications that use the MXF Format must adhere to one of (informative)

the Operational Patterns in order to achieve interchange.

Part 4 defines MXF Descriptive Metadata sets that may be


plugged in to an MXF file. Different application environments
will require different metadata sets to be carried by MXF.
These collections of metadata sets are described in the part 4
document(s). Part 3.x Part 4.x Part 5.x
Operational Descriptive Essence
Patterns Metadata plug-ins Containers
Part 5 defines the Essence Container of the MXF Format for (normative) (normative) (normative)
i.e. constraints on i.e. metadata i.e. how to KLV
containing Picture and Sound Essence. There may be the format collections code
limitations to the Essence Container that may be required in a
particular Operational Pattern. The reader is advised to cross
reference parts 3, 4 and 5 before and during implementation.
The MXF Generic Container is a standardized Essence
Container providing an encapsulation mechanism that allows Part 5a.x
Mapping
many existing and future formats to be mapped into MXF. documents
(normative)
i.e. how to map &
index essence in
the container

Page 2 of 74 pages
SMPTE EG41

Part 5a comprises a number of documents for mapping many of the essence and metadata formats used in the
content creation industry into the defined MXF Essence Container.

The MXF document suite makes reference to other documents that contain information required for the
implementation of an MXF system. One such document is the SMPTE Dictionary, RP210, which contains
definitions of parameters, their data types and their Keys when used in a KLV representation. Another is the
SMPTE Labels Registry, RP224, which contains a list of normalized labels that can be used in MXF sets. Annex
B in this MXF Engineering Guideline contains a list of recommended string constants that an application may
use to improve interoperability.

In the unlikely event of conflict or ambiguity between the different parts of the document, the Format document
has precedence over the Operational Patterns, which have precedence over the Essence Containers, which
have precedence over the Descriptive Metadata documents.
Note: During the early development of MXF, a catalogue of enumerated values was created to list SMPTE Labels, Strings
Keys and Tags used within the MXF document suite. The normative definition of the SMPTE Labels is maintained in the
SMPTE Labels Registry and the normative definitions of the SMPTE Keys and Tags are to be found in the MXF document
suite.

2.1 About this document

The information in this document is ordered for the novice reader. Concepts are introduced gradually and
repeated in more detail later in the document. This is done to make the document easier to read, however, it
does make the document somewhat less good as a reference. For that reason, a Table of Contents is provided
at the start of the document to allow “Random Access” to the information within the text.

Section 8 provides MXF worked examples. In order to improve the readability of the text, an arrow is used to
indicate that an example of a certain subject exists for this section. For example (!8.4) indicates an example for
this subject exists in section 8.4.

3 Introduction
The introduction is constructed as a list of questions. The concepts in MXF can be introduced in a way that gives
an overall view of the specification and the concepts embodied within it. Once the introduction is understood, the
requirements of the file format are discussed. Some specific words and phrases used in the specification are
then defined and finally the Material Exchange Format is introduced in a much more detailed fashion. Although
this entire document is informative, it is hoped that it will give sufficient information for technical and non-
technical readers to understand MXF.

3.1 What problem is the Material Exchange Format trying to solve?

The MXF Specification is intended to encourage an environment where it is convenient to interchange


multimedia information as a file. This will allow users to take advantage of non-real time transfers and to
package together essence and metadata for effective interchange between servers and between businesses.
MXF is not a panacea, but is an aid to automation and machine-machine communication. It allows essence and
metadata transfer without the metadata elements having to be manually re-entered.

The MXF Specification is intended to allow the interchange of captured, ingested, finished or “almost finished”
material. It is not intended to be an authoring format. Despite this, careful thought has gone into SMPTE 377M
to ensure that authoring tools such as those based on AAF Association technology are able to directly open and
use an MXF file efficiently without having to convert the file.

The MXF Specification has also been carefully crafted to ensure that it can be efficiently stored on a variety of
media, as well as transported over communications links. The MXF Format has not forgotten about tape. There
are structures and mechanisms within the file that make MXF appropriate for data tape storage and archiving of
content.

Page 3 of 74 pages
SMPTE EG41

Finally, the MXF Specification is intended to be expandable. A considerable effort has been put into making
SMPTE 377M compression format independent, resolution independent and can be constrained to suit a large
number of application environments. The document structure has been created to allow new applications to take
advantage of the MXF Format in a backwards compatible way.

3.2 How does MXF satisfy the design requirements?

3.2.1 Basic Structure


The MXF Format follows a common theme of many file formats and has the following basic structure:
A File Header that provides information about the file as a whole, including Labels for the early determination of
decoder compliance.
A File Body that comprises picture, sound and data essence stored in Essence Containers (see 3.5.8). Essence
Containers from different tracks may be interleaved or separated. The section on Operational Patterns goes into
more detail on this subject
A File Footer that terminates the file. The File Footer may include some information not available at the time of
writing the Header (such as the duration of the file). In certain specialized Operational Patterns, the File Footer
may be omitted.
A simple MXF file is shown in Figure 1.

File Header File Body File Footer

Header Header Essence Container Footer Header


partition pack Metadata partition pack Metadata

Figure 1 : A Simple MXF File

MXF files may include an optional, but recommended, Index Table that provides rapid conversion from sample-
based indexes (e.g. Timecode) into byte offsets within an Essence Container. The Index Table may be
segmented, and may be stored before, after or multiplexed with the essence data segments.

MXF files may also include optional File Body Partitions that can be inserted at intervals within the File Body
and are used to provide a variety of features:
1. Robustness of metadata information by repetition of the Header Metadata.
2. Multiplexing of different Essence Containers
3. Distributing an Index Tables in small chunks (e.g. for devices with limited memory)
4. Providing “per-stream” Index Tables that are position independent within the file
5. Easier location of Essence Container data when using high speed tape devices
6. Optimizing the distribution of the data in a file for storage or transmission
Repetition of the Header Metadata within a Body Partition is dependent upon the application on a per-
application basis. Such applications are to be found in the transfer of an MXF file as a stream over a uni-
directional link and in data tape shuttling. One purpose of such Header Metadata repetition is to support the
recovery of critical metadata in applications where the file may be interrupted or where the decoder starts to
receive data in mid-transfer.

Multiplexing and storage optimization is a complex subject and is highly dependent on the storage or
transmission device used. Hard discs, DVDs, satellite links and tape devices all have different requirements.
The MXF structure allows a great deal of flexibility in the positioning of the partitioning information and the use of
fillers to allow optimization for different devices. Typically, if storage or transmission optimization is important in
an application then the MXF encoder will know which parameters are important to it. MXF provides the tools, but
encoders can make the optimizations that add value to their implementations.

Page 4 of 74 pages
SMPTE EG41

MXF files use Key-Length-Value (KLV) coding throughout for flexibility and extensibility. KLV coding is defined in
SMPTE 336M; a full review was published in the July 2000 edition of the SMPTE Journal (Vol. 109, No 7,
Engineering Report). This mechanism is used to encapsulate the individual elements of an MXF file in such a
way that devices can ignore information when the Key of a KLV triplet is unknown. The Length parameter tells
the KLV decoder how much data should be ignored.

In Specialized Operational Patterns, the Header (see section 3.5.2 below) is allowed to start with a non-KLV run-
in. This is to allow synchronization bytes or “camouflage” bytes to be added at the front of the file in certain
(limited) applications. In all other circumstances, there will be no run-in and the entire file must consist of only of
KLV elements with NO gaps.

3.3 2 ways of viewing an MXF file

An MXF file can be viewed in 2 ways:

There is the physical view of the MXF byte stream on disk or on the wire.

There is the description of the file contents obtained by decoding the data model. This will be referred to as the
logical view of the file.

These two views are summarized in Figure 2.


Hdr. set

Hdr. set

Hdr. set

Partition Hdr. set Picture Sound


Physical view of an MXF File

fill
K L Pack K L K L K L K L K L Element K L Element K L

“played” Picture
Material Package
“played” Sound
Logical view of the same MXF File
Stored Picture Track
Top-Level File Package
Stored Sound Track

Figure 2 : Physical and Logical views of an MXF File

3.3.1 The Physical view of an MXF file


This is the simplest way to view the file. Many MXF processes and applications will use this layer only. Some of
the physical properties of the file are the Partitions, the KLV coding, the Index Tables, the Run-In, the KLV
Alignment Grid (KAG) and the Random Index Pack (RIP).

The physical properties of the file are largely independent of the number of tracks in the file, the amount of
metadata carried and the relationship between the different Picture and Sound Elements. The way in which an
MXF File is written is MXF encoder and application dependant. Many application specific optimizations may be
incorporated into an application to improve the way an MXF file is physically written to a device.

Physical optimizations may include any or all of the following:


• Matching the KLV Alignment Grid (KAG) to an integer multiple of the underlying physical sector / cluster
/ packet size of the medium
• Adding body partitions with repeated Header Metadata to allow recovery from an interrupted
transmission
• Using the Run-In mechanism to camouflage the MXF file as a different file type
• Repeating Index Tables in the File Header and File Footer for easy access in a tape environment
• Adding a Random Index Pack to quickly find all the partitions in a large file

Page 5 of 74 pages
SMPTE EG41

3.3.2 The Logical (Metadata) view of an MXF file


The Logical view of the file is defined by the contents of the MXF metadata and not by the way in which it is
organized as a byte stream. The metadata defines the number of different Picture, Sound and Data tracks as
well as the descriptions of the different Essence Types within the file. Figure 2 shows a very simple MXF file that
contains a Material Package and a Top-Level File Package, each of which has a single Picture Track and a
single Sound Track. The data is physically stored in KLV coded triplets and organized by Partitions as shown in
the upper portion of the diagram. The lower part of the diagram shows what the metadata in the file is intended
to represent. Bear in mind that this logical representation is very compact – a typical File Package will be less
than 1kByte, whereas the Essence it represents may be Megabytes, Gigabytes or even Terabytes!

The Material Package can generally be thought of as the “output timeline” of the file. The Top-Level File
Package can be thought of as the stored data or “input timeline” of the file. The metadata within the file
describes the stored data within the file as well as the portion that is to be output when the file is played or used
in some way. The example in Figure 2 shows that all the tracks of the stored data (in the Top-Level File
Package) are used in the Material Package, but an MXF player will play only a small segment from the middle of
the file.

3.3.2.1 Structural Metadata

The Structural Metadata is the way in which MXF describes different Essence types and their relationship along
a timeline. The MXF Structural Metadata defines the way in which the output timeline of the file relates to the
one or more stored Top-Level File Packages. The Structural Metadata defines the synchronization of different
tracks along a timeline. It also defines the Picture Size, Picture Rate, Aspect Ratio, Audio Sampling and other
essence description parameters.
The Structural Metadata is defined in SMPTE 377M. Most of the parameters are defined in the MXF File Format
document, but additional descriptors and labels may be defined in essence mapping documents. The MXF
Structural Metadata is derived from the AAF data model. This means the relationships between all the different
sets and their properties are precisely defined. More information on the structural concepts appear later in this
document.
3.3.2.2 Descriptive Metadata

MXF Descriptive Metadata comprises information in addition to the structure of the MXF File. This may be
intended for human use (as in the majority of the SMPTE 380M: MXF DMS-1 specification) or it may be
information for machine use, such as a track of information containing depth information for 3D processing.
SMPTE 377M provides a very simple plug-in mechanism that allows different Metadata sets to be defined and
used in an MXF environment. SMPTE 377M provides mechanisms for uniquely identifying the Metadata
Scheme(s) present in the file, mechanisms for preventing numerical conflict with existing metadata and a
mechanism for determining the version of the Descriptive Metadata Specification used.
The MXF Metadata plug-in scheme was developed as a result of strong User Requirements. No single Metadata
definition and structure will be appropriate for everyone. A mechanism that properly allows the integration of new
metadata schemes without redeveloping applications and equipment needed to be created. The MXF plug-in
mechanism is very lightweight and allows versatility for the implementers and extensibility for the users.
When Descriptive Metadata is added using the plug-in mechanism, many of the features of MXF are achieved
automatically. The ability to create multiple tracks and synchronize them against each other, the ability to add
metadata events synchronized with the video / audio or other tracks and the ability to use metadata in the output
timeline that was available in the source file are all part of the standard MXF feature set. This document will
outline only the basics of a descriptive metadata scheme. A fuller treatment of the subject can be found in the
Descriptive Metadata Engineering Guideline, SMPTE EG42. It is worth noting that Descriptive Metadata can be
for both Human and Machine use. Much of the machine-Descriptive Metadata relates to special properties of the
Essence and has an intimate spatio-temporal relationship to the Essence. For this reason it is often called
Intimate metadata.

Page 6 of 74 pages
SMPTE EG41

3.3.2.3 Dark Metadata

Dark metadata is the term given to metadata that is unknown by an application. This metadata may be privately
defined and generated, it may be new properties added to SMPTE 377M or it may be metadata that is part of
the MXF standard, but not relevant to the application processing the MXF file. It is important that there are some
rules on the use of Dark metadata to prevent numerical or namespace clashes when private metadata is added
to a file that already contains Dark Metadata. Rules are given in the SMPTE 377M along with the specification of
a data structure called the Primer Pack. Guidance on the use of this structure is given in section 8.5.1 of this
document. (!8.5.1)

3.4 What is the Header Metadata?

Although only occupying a small fraction of the size of a typical MXF file, the Header Metadata is often, for those
inexperienced in data models, the most difficult part to understand. The following sections introduce the topic of
object-oriented coding in a general and easy to understand manner. For a more rigorous explanation, there are
many reference books that cover the principles and methods of implementation in far more detail than given
here.

3.4.1 Why an object-oriented approach?


The underlying data structure of MXF was chosen to be a subset of the AAF data model. This AAF data model
uses an object-oriented approach so MXF adopted the principle. This document will give a brief outline of some
of the concepts. More detail on how this relates to the Descriptive Metadata Structure is given in SMPTE EG42.

3.4.2 But what is an object-oriented approach?


This is a technique for describing the functionality of a complex system by describing each of its components as
though each is an independent object (or thingy or blob – whatever word is easiest). The quickest way of
explaining objects is by means of an example. We will use the track object.

A track can be thought of as a straight line on a piece of paper. It starts at the start; it ends at the end and it lasts
for its duration. The start, end and duration are known as properties.

A feature of object orientation is “inheritance”. This means that we can have different sorts of track. They all
share some common properties that they inherit from the parent or superclass, but have extra properties or
functionality added to make them useful. For example, consider an event track. The straight line on the piece of
paper can now be marked with events. An event can start at any point along the track. It may be instantaneous
(i.e. no duration) or it may last for a defined time. Events may also overlap.

Another sort of track is a timeline track. Similar to an event track, it starts at a certain time, ends at a certain
time and has a duration. This track has a restricted functionality in that it only allows Source Clips to be placed
on the track. All the Source Clips must be contiguous, which means there are no overlaps and no gaps. Both of
these track types inherit properties and functionality from a common track class.

This principle is the basis for the object-oriented definition of the MXF file Format. The definition of the classes
from which MXF objects are created comes primarily from the AAF Association class model. Generic classes
with general functionality are defined. Classes with specific functionality then inherit the general class features.
SMPTE 377M restricts some of the flexibility of these AAF classes to define the MXF sets. MXF applications
populate these sets with values to create MXF objects in a file.

During the development of MXF, a Zero Divergence Doctrine (ZDD) was created in order to ensure that any
change in the model of behavior between AAF and MXF was severely restricted and eliminated wherever
possible.

3.4.3 What sort of metadata can be put in?


Broadly speaking, the metadata items can be split into 2 groups: Structural Metadata and Descriptive Metadata
as described above. The Structural Metadata is intended to bind the different elements of the file together and is
needed to define the basic file structure. The Descriptive Metadata is intended to supply extra information about

Page 7 of 74 pages
SMPTE EG41

the file such as a program name or scene description. There are a large number of metadata elements defined
in the SMPTE dictionary and in SMPTE 377M. To understand the restrictions on the use of metadata elements,
it is necessary to understand the terminology in section 5.2 below.

3.4.4 Where does all the metadata go?


It depends on the metadata. The MXF object model creates “hooks” on which the metadata can be placed.
These hooks live in the File Header, the Body Partitions, some Essence Containers and the File Footer.

The Header Metadata area is able to contain Descriptive Metadata that allows a production to be described. For
example Production, Clip and Scene information is described in the MXF Descriptive Metadata Scheme 1
document (part 4 of this specification).
There are certain metadata parameters that might live in multiple places. The most obvious of these is
Timecode. This may exist in the Header Metadata, but might also live embedded within the Essence Container
data, e.g. in the GOP header of an MPEG Essence Container. This repetition is often important and the handling
of any conflict between the different instances of the data is application dependent.
3.4.5 How does AAF fit into the big picture?
At a first glimpse, the relationship is obvious; MXF is for simple transfers and AAF is for authoring. Both formats
exist to aid interchange of program material as files, which in turn will increase interoperability between file-
based products.

The meaning of the opening sentence is a little more difficult than it first seems. "Authoring" can be seen as a
catch-all phrase for a series of complex processes that take pieces of video and audio essence and put them
together using a variety of composition effects (cuts, dissolves, DVE, rendering, magic). When the authoring
process is complete, the "finished" program material can then be exchanged as a file. This is a simple transfer of
the compiled / rendered / etc., program.

The complexity of AAF has been simplified so that we can state that: "MXF files apply a subset of the AAF class
model". This means that the complexity of the authoring file format has been simplified. But beware; "simplified"
does not mean "completely obvious and like SDI". It is important to remember here that we are mixing two very
complex and different worlds – A/V and IT.

When video engineers look at a series of words in an SDI stream, there is an implicit understanding of the
complex spatio-temporal sampling and visual processing that went into creating those data words. A video
engineer would take great care before modifying any value to ensure proper clipping, filtering and possible
gamma correction took place.

The IT environment that has created AAF is just as complex (and just as "obvious" to those practiced in the art).
AAF arranges its file format in terms of objects. These objects are chosen and defined to reflect the actual
processes and content items that go into the authoring process. AAF is so powerful that the physical
representation of these objects could be redefined providing no information is created or lost. An IT engineer
would take great care before modifying the object model to remove things that looked like they were not needed
- the implications for future enhancement and interoperability might be very serious and not "obvious".

“MXF files apply a subset of the AAF class model," means that MXF contains just enough of the AAF object
model to allow it to represent a file interchange. This means it can represent an output timeline that has video,
audio and data. It has a logical metadata structure, a defined physical representation (KLV) and is interoperable
with other MXF systems and upward compatible with AAF. It has been designed so that an AAF system can
open an MXF file without modification to either the MXF file or the AAF System.

In practical situations this means that there is a lot of overlap between MXF and AAF functionality. MXF is
targeted at interchange throughout the broadcast and content creation chain, whereas AAF is optimized for
round-tripping in Post-Production. As a rough rule of thumb, content interchange, cut-edit functionality or simpler
is an MXF application; AAF is more appropriate for everything else. More details are given in section A.1.

Page 8 of 74 pages
SMPTE EG41

3.4.6 How do we represent Time and Timelines with Tracks?


Time and Timelines are features of the logical representation of an MXF File. The concept of time within the file
is independent of the arrangement of bytes within the file, although constraints may be applied in order to get
certain functionality from the file (streaming !8.2.1). Time is used to measure the duration of the content as well
as to synchronize the content. In MXF a “track” is used to represent the passage of time. A track has units to
represent time and has an associated duration property. Some tracks have segments that butt against each
other to form a continuous sequence of video (timeline tracks) whereas others may have overlapping events that
refer to the point at which Descriptive Metadata is valid (event tracks). In fact, the mechanism for adding new
Descriptive Metadata definitions to an MXF file is to add new tracks on which to “hang” the metadata items.

To synchronize two tracks, they must be somehow related. This is done by putting them within a package (a
container for tracks) that synchronizes the start and duration of multiple tracks. Note, however, that the tracks
may have different time measurement units within the package. Time is normalized within a track by its “Edit
Rate” property. This in turn gives us an Edit Unit, of 1 / Edit Rate.

3.4.7 What are the units of time?


There are 2 main units of Time used with an MXF file. These are:
• Edit Units = 1/Edit Rate – used to mark time along a track
• Sample Units = 1/Sample Rate – used to describe the underlying sample rate of the Essence
Edit Units may be chosen for the convenience of the file writer, whereas Sample Units define the sampling
structure of the Essence. A sequence of audio samples, for example may have a Sample Rate of 48kHz,
whereas the Track that describes the sequence may have an Edit Rate of 50Hz so that synchronization with
parallel tracks is numerically simplified. For video streams, the Sample Rate is usually defined as the Field or
Frame rate of the content and not the sampling clock.

3.5 How does MXF manage the complexity?

An MXF file is highly structured. There are different structural elements that divide the file in different ways to
make the complexity manageable. This section describes some of these structural elements along with the
reasons for the division.

3.5.1 What are the File Header, the File Body and the File Footer?
The basic File Header, File Body and File Footer are explained in section 3.2.1 above. The reason for the split is
quite simple. The File Header is designed to be small enough that it can easily be isolated and sent to a
microprocessor for parsing. The bulk of the file will usually be the File Body – this is the picture, sound and data
essence. The File Footer provides a means to put the Header Metadata at the end of the file. Why? In certain
applications such as recording a stream to an MXF file, there will be Header Metadata values that won’t be
known until the recording is finished. The File Footer provides a mechanism for doing this. It also provides clear
indication that the file has terminated.

3.5.2 What is a partition?


A partition is a division of data within the file. There are 3 different sorts of partition, each of which can have four
states:
Header Partition – this is the first partition of the file
Footer Partition – this is the last partition in the file
Body Partitions – all the other partitions are in the middle of the file and are used to divide the Essence
Container(s) in a certain way.
A partition may be Open or Closed, except for the Footer, which may only be closed. The normative definition of
these terms is in SMPTE 3&&M and extra clarification is given here:

Page 9 of 74 pages
SMPTE EG41

Open – this marks the information in a partition with a “caution” notice. Any metadata information in the partition
was correct at the time of writing, but the application writing the file had not completed the writing process. This
means that some of the information may be absent, or may turn out to be plain wrong when the file is ultimately
closed. For example, a capture device may have identified a picture and a sound track when it initially started
writing the file. During the writing process, a second Sound track commenced – this track was not described in
the Open Header Metadata.

Closed – this marks any metadata information in the partition as finalized. The application or device creating the
file correctly terminated the file and all the properties of the Metadata sets were filled in to the best of the
application’s ability. In the example above, a repetition of the Header Metadata would be placed in the footer
that correctly described the existence and duration of the second Sound Track. All closed partitions in a file must
have the same Metadata property values. This is mandatory. This allows an MXF decoder to determine that the
metadata is correct as soon as it finds a closed partition. SMPTE 377M states that the File Footer, if present, will
always be a closed partition.

An MXF File can only be called a “Closed” File if there is at least one closed partition with Metadata. It is
important to note that robustness is enhanced when all the partitions in a file are closed (!8.2.6). If a file is
accidentally truncated during a transfer and the only closed partition in the file was the footer, then the file is no
longer a “Closed File”. If robustness is desired (and it usually is), application and device developers are urged to
close all the partitions of their files. All valid MXF files must be closed however certain situations, such as an
interrupted file transfer, may leave an “Open” file that is still partly usable. The ability of a device to handle
“Open” MXF files is an application issue.

In an ideal world, the two states of “Open” and “Closed” would be sufficient to describe all the files in existence.
The desire for cheap hardware and software, however, means that some capture devices and applications will
not be able to parse the wide variety of essence types they might expect to place in an MXF file. To cope with
this condition, the states “complete” and “incomplete” have been defined to mark the status of the Essence
Descriptor (s) in the MXF File.

Complete – each of the properties in the Header Metadata with a status of “required” or “best effort” exist in the
file and are correct. The status of each of the properties is given in SMPTE 377M.

Incomplete – One or more properties within in the Header Metadata with a status of “best effort” has a
distinguished value. The distinguished value is used to mark the property as “unknown at the time of writing”. An
MXF file may still be a closed file because all the other properties of the file are known. Some of the Header
Metadata may be incomplete due to the absence of an essence parser at the time of file creation. This allows an
application to report many of the metadata properties of the file, but certain Essence Decoders may need to
parse portions of the file before it is playable.

Maximum robustness is achieved when applications and devices create Closed and Complete MXF Files.
(!8.2.6)

Each partition starts with a Partition Pack that defines what sort of partition it is, followed by the following
optional items:
• Header Metadata
• Index Table Segment(s)
• Essence Container data
From these and other restrictions, we limit an MXF partition to contain only a single “thing”, i.e. a single Essence
type with its associated Index Table Segments. If different Essence Containers need to be multiplexed together
within the file, then a new partition must be started when the Essence Container changes.

3.5.3 How does KLV leave room for expansion?


At a lower level than the object definitions is the KLV coding. KLV stands for Key Length Value. Every object,
piece of metadata or any “thing” in the MXF file has a Key (16 byte value) and a Length that defines how long
the Value of the object, metadata or “thing” is. After this, the Value of the object, metadata or “thing” follows.
Note that the Key is in fact a SMPTE Universal Label and as such follows the rules defined in SMPTE 298M.

Page 10 of 74 pages
SMPTE EG41

KLV coding is fully defined in SMPTE 336M and includes not just the encapsulation of individual data items, but
also the encapsulation of collections of individually coded KLV data items into logical data sets and packs (a.k.a.
objects as above).

A decoder that does not recognize a Key is able to skip over the unknown Value and inspect the next Key. This
allows extra functionality to be added to the MXF specification at a later date, knowing that older decoders will
be able skip over the Values.

Words within the Key are ISO Object Identifiers (OID) using primitive BER (Basic Encoding Rules: ISO/IEC
8825-1 ASN.1). This means that the most significant bit of each 8 bit value is a flag to say that the word is
greater than a 7 bit value. For example if the 12 bit value b (b11 .. b0) is to be mapped into a KLV key then here is
a possible mapping into bytes 14 and 15 of a key:

Word. 14. 14. 14. 14. 14. 14. 14. 14. 15. 15. 15. 15. 15. 15. 15. 15.
bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
value 1 0 0 b11 b10 b9 b8 b7 0 b6 b5 b4 b3 b2 b1 b0

Figure 3 : Example of BER OID encoding

Figure 3 shows a binary 1 in bit 7 of byte 14 to indicate that this is a multi-byte value. There is a binary 0 in bit 7
of byte 15 to show that this is the last byte of a multi-byte value. A byte value of 0 is often used to terminate a
label and a marker bit in bit 6 of byte 15 may be used to prevent accidental termination from occurring. Note that
the actual mapping of bits into a label Key must be normatively defined in an appropriate document.
Note: At the time of writing this Engineering Guideline, this multi-byte OID technique is not in use in any of the specifications.
MXF parser writers should be aware that this technique may be used in the future, and that, although the number of bytes in
a SMPTE key is 16. The number of words may be less than 16, or alternatively, there may be 16 bytes of which the final
words are assumed to be 0.

The Length field is BER (Basic Encoding Rules: ISO/IEC 8825-1 ASN.1) coded. This allows the length field to
have a variable number of bytes. So how do you know the length of the length field?

The length field is always coded MSB (most significant byte) first. If bit 7 of the first byte is a ‘0’ then the 7 least
significant bits contains the length value (0 .. 127). If bit 7 of the first byte is a ‘1’ then the 7 least significant bits
tell you the number of bytes in the length field. e.g. the value ‘83h’ means that the next 3 bytes contain the
length field. The Format document gives recommendations for the upper limit of the length field. Decoders must
be able to handle both long form and short form BER coding.

The examples below show a length value of 64 coded in the 3 different ways:
40h short form coded
83.00.00.40 long form coding using 4 bytes overall
87.00.00.00.00.00.00.40 long form coding using 8 bytes overall

3.5.4 What is a KAG?


It is a KLV Alignment Grid. This is a performance enhancer for devices with fixed size blocks. During the design
of the MXF Format there were many discussions on whether the format should use rigid sectoring or not. The
conclusion was that sometimes it was important, but a device or application should always be able to read an
MXF file regardless of whether the elements within the file fell on rigid byte boundaries within the file.

The KAG can be thought of as gridlines spaced on uniform byte boundaries in each partition. To achieve good
performance, all the important KLV items within the file (Header Metadata, Content Packages of the Essence
etc.) should line up on the Grid. This means that the first byte of the Key should be on a grid boundary.

Page 11 of 74 pages
SMPTE EG41

The reference point for a KAG is the first byte of the key of a Partition Pack, and the KAG value is valid within
the partition. SMPTE 377M states “The first gridline in any partition is the first byte of the Key of the Partition
Pack that defines that partition.”. In order to have a global KAG value, each and every Partition Pack must have
the same KAG value. Additionally, to maintain this global KAG value, the first byte of each and every Partition
Pack must lie on a KAG boundary. Finally, if there is a run-in, its length in bytes must be an integer multiple of
the KAG value

This feature is a performance enhancer because it reduces the need to search every byte for the start of a new
file component. It is possible that some process may make a change to a file that breaks the KAG rules, but is
unable to modify the KAG value in the partition header. An MXF decoder that is receiving a file may desire a
certain KAG value because its internal storage is arranged on rigid boundaries. It should continue to check each
of the KLV triplets received for confirmation that they still lie on the KAG. The majority of files that use the KAG
feature will respect the value in the partition header, but some may not. The MXF application receiving the file
that does not respect the KAG should not fail under this condition, but performance may be severely restricted.
For example, the receiving application may choose to process the incoming stream to force it to be aligned to
the KAG by inserting Fill KLVs. This may slow it down and cause it to recalculate Index Tables.

3.5.5 What is an Index Table?


An Index Table improves random access within an MXF file. Specifically, it allows random access by a time
index. This means that if you want to access the picture, sound or data that starts 10 seconds into the file, then
an Index Table will provide the translation between the time value and the byte offset within the file. MXF Index
Tables are quite complex because the Format is designed to cope with interleaved Essence Containers that
may be constant or variable bit rate and that also may be temporally re-ordered on the disk compared to the
presentation order (e.g. Long GOP MPEG2 files). Index Tables are more fully described in section 8.3.

3.5.6 What is a Random Index Pack?


A Random Index Pack (RIP) provides a list of the positions of all the partitions within a file. This is different from
an Index Table, which provides the byte offsets of the content within a partition. The difference can be clearly
seen when two different Essence Containers are multiplexed together. There will be two separate Index Tables,
each of which contains conversions between temporal offsets and Byte Offsets within each Essence Container.

The RIP, however, gives absolute positions of the Partitions, so all the Index Tables may be rapidly built without
parsing the entire file. The RIP contains a mechanism for quickly determining its existence.

3.5.7 What is an Operational Pattern?


An Operational Pattern is used to constrain MXF complexity. The Generalized Operational Patterns are intended
to split the complexity depending on the complexity of processing required by an MXF decoder. Specialized
Operational Patterns are likely to be created in order to constrain MXF for a particular “application space”.
Usually an MXF file is interchanged for a purpose. This may be the exchange of an ingested clip, a camera
output, a finished program or the interchange of a partially edited program. Both of these requirements have
different implications for the structure of the file. Different Operational Patterns define the Structural Metadata
that is required to satisfy a particular application. In general, the higher the number of the Operational Pattern,
the more complex the file and the more functionality is required in the decoder. Simple Operational Patterns
such as OP1a can be used with both linear and non-linear access devices. Some, more complex, Operational
Patterns require non-linear access devices.

Each Operational Pattern has an assigned SMPTE Label value that allows MXF decoders to quickly recognize
the complexity of an MXF file.

3.5.8 What is an Essence Container?


An Essence Container defines the encapsulation of a particular type of essence. Its purpose is to allow the
essence to be wrapped in KLV and to have associated with it an optional Index Table to allow rapid access to a
given time offset within the essence. The Essence Container is structured to allow easy multiplexing with other
Essence Containers and to allow identification of the decoding requirements needed to display / listen to / play /
execute the content.

Page 12 of 74 pages
SMPTE EG41

An Essence Container specification defines a unique SMPTE Label for identification as well as a method for
encapsulating the essence in a KLV structure. Different Essence Containers may place restrictions on the
interleaving of the essence data to be compatible with existing applications. The SMPTE Label allows decoders
to make a fast go/no-go check of the essence type at the very beginning of the file.

An MXF file may have more than one Essence Container. The precise number of Essence Containers and their
relationships is constrained by the Operational Pattern with which the file complies.

A “Generic Container” is defined within MXF. This is intended to carry all the mainstream Essence types in
existence at the time of creating SMPTE 377M. It is very simple in operation, yet flexible enough to carry
uncompressed material as well as re-ordered MPEG compressed material. Associated with the Generic
Container are a number of mapping documents that define how the actual Essence byte stream should be
placed in the Essence Container.

3.5.9 How have we specified the Essence Container?


There are several specific questions that need to be asked when putting an Essence Container into an MXF file.
These are notably:
1. What limitations are placed on the Essence Container when it is in an MXF file?
2. Are there interleaved variants of the Essence Container?
3. How do we KLV code the contents of the Essence Container?
4. How do we pad the Essence Containers to fit the chosen KAG size?
5. What do we do with the metadata embedded within the Essence Container?
6. How do we use Index Tables with the Essence Container?
The Essence Container and mapping specifications are basically recommended answers to these questions. It
is the intention of the Essence Container and mapping documents to restrict the choices of an Essence
Container implementation sufficiently to allow interoperability between devices, yet with enough flexibility to
solve real world problems.

3.6 How does MXF interoperate with Stream Interfaces?

MXF files may be directly created from standardized formats such as MPEG2 system and elementary streams,
AES3 data streams and DV DIF packet streams. These formats may be mapped from one of several real-time
interfaces such as SMPTE 259M (SDI), SMPTE 305M (SDTI), SMPTE 292M (HD-SDI), or transport interfaces
with real-time protocols such as IEEE-1394, ATM, IEEE802 (ethernet), ANSI Fibre Channel and so on.

When a streaming file is captured, a File Header is created and the essence is KLV wrapped on the fly. The
data rate increases due to the KLV wrapping and addition of headers. Real Time streaming devices must ensure
that any buffering requirements of a streaming interface are catered for with this change of data rate.

Conversion to and from the source format is always possible, but sometimes there will be loss of information.
Not all streaming and storage formats are able to store the rich metadata constructs available in an MXF file.
Often there will be a lossy data mapping where information in one format cannot be represented in the other.
Eliminating this undesired loss is a function of the systems engineering that interconnects MXF and non-MXF
systems. In many formats such as the MPEG2 Transport Stream, research is being done to find ways in which
MXF headers can be “tunneled” through the Transport Stream so that its use in an MXF system provides
transparency as well as interoperability.

3.7 How does MXF interoperate with other files?

As previously stated, MXF files apply a subset of the AAF class model. The Material Exchange Format provides
a data structure together with a set of constraints and plug-ins to create files that can be directly written and read
by AAF systems. MXF is also able to inter-operate with other existing file formats by utilizing techniques such as
external essence and using the run-in to “camouflage” the appearance of the file (see the end of 3.2.1 above).
Different metadata models can be plugged into the MXF file Format to provide extensions and the KLV structure
itself can be converted to formats such as XML for exporting MXF data to other systems.

Page 13 of 74 pages
SMPTE EG41

When an application needs to convert the contents of an MXF File to and from other formats, such as AVI, the
entire file will normally need to be unwrapped and re-coded in the new format. Often the Essence itself (for
example, MPEG Long GOP video) will not need re-MPEG encoding, however it is very likely that Metadata will
be lost when an MXF file is converted to another format.

3.8 What is meant by simplicity?

MXF files must be amenable to implementation in high throughput hardware or software devices. This translates
into the need for well-defined design parameters for buffer size, latency, and the need for algorithmic simplicity.
MXF is also intended to cover a very large application space, and not all the requirements apply to all the
applications. The examples below are all application specific:

Example constraints:
• Buffer size must be minimized for low latency streamability.
• KLV wrapping and file partitioning latency must be small and bounded.
• Algorithms should not require distant look-ahead to calculate parameter values.
• Algorithms should not require deep stacks or high performance coprocessors, and should preferably be
straight-line (no looping).
• Operational Patterns should create controlled and bounded application environments that are
constrained enough to ensure interoperability, yet broad enough to allow many implementations.
The design can also be kept simple through the proper use of layering. Network, transport and session layer
functions and data units must be kept separate at all costs, so as not to burden any layer with processing that
belongs to another layer.

3.9 Why does MXF need to work with stream interfaces?

MXF files will often be processed in streaming environments. This will include streaming to and from videotape
and data tape, and transmission over unidirectional links or links with a narrow-band return-channel.

In these environments it is impractical to rewind the stream to update parameter values so files must be written
sequentially. This implies that the minimum buffer size and latency are determined by (among other things) the
maximum KLV packet size. Implementations of MXF streaming should take into account all the constraints of
the Operational Pattern in use, as well as extra restrictions imposed by the particular streaming data link before
recommending buffer sizes or latency requirements.

Sequential writing is necessary when source or link or destination operate only in streaming mode. Random
access writing is permissible before or after data transfer, for example, to optimize downstream access
performance.

Operational Patterns have a special qualifier bits that indicates that the file has been created for streaming.

3.10 How does MXF provide for stream recovery?

Streaming environments also impose requirements for recovery and re-synchronization in several different
circumstances:
1. When a packet or other data block is lost.
2. When a decoder joins a transfer that is already in progress.
3. When a transfer or partial transfer is restarted.
4. When it is necessary to access or retransmit a file that is still being received (“Pre-Play”).
5. When overall metadata is modified during the time of transfer.
The first of these (packet loss) usually requires a return-channel or forward error correction for effective
protection. The other circumstances are addressed by judicious design of the Format to allow for re-
synchronization points and for repetition of important metadata.

Page 14 of 74 pages
SMPTE EG41

3.11 How does MXF provide for application diversity?

Different applications may require Metadata to be processed separately from the Essence. Other applications
(such as archive) may require Metadata to be stored with the Essence. This requires efficient insertion and
extraction of the Metadata from the Essence Container(s) of the file.

Some applications may prefer Index Tables to be accessed separately from the Essence; others may require
the two to be accessed together. In some cases, the Index Tables are most naturally stored at the start of the
file; however, while recording, the most natural location is at the end of the file. This diversity requires efficient
insertion, extraction and relocation of Index Tables within the file.

3.12 How does MXF make references to its different components?

MXF uses different referencing mechanisms for different purposes. One example that causes confusion is the
difference between references to the Top-Level “File Package” and “The Essence”. The MXF Content Storage
Set uses Instance UIDs to reference all the Packages in an MXF File. One of these will match the Instance UID
of a File Package within the File. This is a strong reference to the package. The package itself is a description of
the Essence, but is not the Essence itself.

The Content Storage Set also uses Instance UIDs to keep a list of Essence Container Sets. These are used to
group the various IDs that enable an MXF Decoder to work out which Partitions and Index Tables relate to which
Top-Level File Package. Specific details are given in section 7.5.

This seems straightforward until we look at how a Material Package SourceClip references the Essence. This
structure does not use the Instance UID values, it uses the 32 byte UMID of the essence as a reference. This is
because the Material Package is referencing the Essence of which the Top-Level File Package is a description.

4 File Interchange Requirements


There are two basic types of File Interchange requirement: User requirements and Technical requirements. The
User requirements are lists of things that users want to be able to do with files. The technical requirements are
features of the file that allow applications to be accommodated.

4.1 User Requirements for an Interchange File

The MXF Format, at its lowest level, should support functionality that is commonly available in today’s video file-
servers. The MXF / AAF Joint File Interchange Working Group, in co-operation with the EBU P/PITV group and
the SMPTE have summarized the user requirements for MXF as follows:

Table 1 : User Requirements table

User Requirements General PROFESSIONAL APPLICATIONS


Priority LIST

That are assigned the following priorities:


Authoring Interchange
Finished Interchange
Content Repository
Publication (Emission,
Transmission, Store &

A = Baseline ("Must"),
Forward, etc.)

B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)

Page 15 of 74 pages
SMPTE EG41

User Requirements General PROFESSIONAL APPLICATIONS


Priority LIST

That are assigned the following priorities:

Authoring Interchange
Finished Interchange
Content Repository
Publication (Emission,
Transmission, Store &
A = Baseline ("Must"),

Forward, etc.)
B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)

Must be easy to understand & apply and standardized A++ Y Y Y Not easy

Must be compression independent A Y Y Y Y

Low implementation overhead A Y E.g. Y No


Could
be
complex
if editing
required

Must be open (as per ITU definition) A Y Y Y Y

Must provide Identification of the payload A Y Y Y Y

Must provide for normative templates A Y Y Y Y

Must be extensible in header and body (by KLV coding?) A Y Y Y Y


(E.g. from one frame to many frames)

Scalability (small file/single frame to large file) A Y Y Y Y

Must provide synchronization for multiple essence types A Y Y Y Y


e.g. Audio/Video/Data Essence and certain Metadata

Must wrap A Y Y Y Y
Video Essence[s]
Audio Essence[s]
Data Essence[s]
Metadata

Must permit direct mapping for existing transfer format A? Y Y? Y Not


(e.g. MPEG-TS, SMPTE 314M, FC-AV, ATM-Wrapper) always
needed

Must uniquely identify container framework (e.g. FC/AV) A Y Y Y Y


Must be usable on major platforms / OSs A Y Y Y Y
Must be application independent A Y Y Y Y
Must provide means for partial file transfers A Y Y Y Not
always
needed
Must provide means for graceful recovery after interrupted transfer A Y Y Y Desirable

Must provide cut-only edit capability (versioning) A Y Y Desirable Desirable

Page 16 of 74 pages
SMPTE EG41

User Requirements General PROFESSIONAL APPLICATIONS


Priority LIST

That are assigned the following priorities:

Authoring Interchange
Finished Interchange
Content Repository
Publication (Emission,
Transmission, Store &
A = Baseline ("Must"),

Forward, etc.)
B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)

Must be transport and storage mechanism independent A Y Y Y Y


(e.g. FEC is a transport issue)
Simple and complex template (backward-forward compatibility?) A Y Y Y Y
simple Both simple complex
Format Expandability in Operational Patterns: A A, B, C, A, B, C, A, B, C A-F
1a: Simple Pattern: single item/representation (e.g. clip) D, E
Extended Pattern that might be an individual pattern or a more generalized pattern
1a.
A: Compiled: Segmented item/representation (e.g. part of a final composition)
B: Uncompiled Program: simple edit representation (e.g. compound clips=each
track has its own time line)
C: Uncompiled Compound: edit representation (as template before but with handles
e.g. for cross fades)
D: Uncompiled Elements:
E: Metadata only representation
F: Effect representation
G: Archiving
Etc.
Prerequisite for all Operational Patterns, generalized patters etc. is a proper
standardization/documentation to guarantee interoperability.
It is also assumed that Operational Patterns reflect a certain application(s)
environment. This has to be described in the documentation (standards).
Easy conversion from file to stream and vice versa A Y Y Y Desirable
Robustness against errors. A Y Y Y Y
Examples:
During file transfer interrupt;
Corrupted header
File access error;
Interface to pre-existing interconnect standards (mappings into IP, FC etc.) A Y Y Y Y
Note: robustness against errors may belong more to the transfer mechanism than to
the file format domain.
Extensibility to include non-predefined data (e.g. dark Metadata) A Undesirable Undesirable Undesirable Y

A/B LIST
Can provide random access: A/B Y Y Y Y
Play/access while transfer
Play/access while record (open ended)
Fast frame and field level access (E.g. by means of indexing to field/frame/audio A/B Y Y Y Y
frame level)
B-LIST

Page 17 of 74 pages
SMPTE EG41

User Requirements General PROFESSIONAL APPLICATIONS


Priority LIST

That are assigned the following priorities:

Authoring Interchange
Finished Interchange
Content Repository
Publication (Emission,
Transmission, Store &
A = Baseline ("Must"),

Forward, etc.)
B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)

Low latency (values see 1st TF report) B Y but Maybe Y Maybe


“goal 1 Frame, 1 GOP” depends
on further
applica-
tions
Link Metadata to structural composition information B N Y Maybe Y
Can accommodate a range of GoPs (e.g. MPEG) B Y Y Y Y
Can provide for re-coding data sets (e.g. compression history information) B Y Y Y Y
Can provide Index i.e. can tabulate byte offsets within a file that correspond to given B Optional Optional Optional Y
Timecodes.
Assignable granularity of Metadata (field, frame/clip/file) B Y Y Y Y
C-List
Extensible for internet. Metadata as binary and text format C Y Y Y Y
Discontinuous essence elements (chunking) C If If If If
required required required required
Y Y Y Y
Allow externally referenced essence files for certain applications such as Archiving. C N Undesira N Y
A proper standardization / documentation is prerequisite if external references are ble
used.
X/U List

Allow proprietary vendor created templates X ? ? ? Maybe

4.2 Technical requirements of a file

The technical requirements derive from the user requirements. The individual requirements are introduced
gradually throughout the document. A typical example of a technical requirement is that of Body Partitions. The
user requirements state that the file format must support partial transfers and must provide graceful recovery
after errors. The technical requirement from this is that the file must periodically contain repeated data to allow
partial transfers or recovery. The implementation chosen in MXF is Body Partitions.

5 A guide to the wording of the MXF standard


MXF files apply a subset of the AAF class model. Because of this, many of the words used in the MXF standard
are the same as in the AAF specification. There are occasionally subtle differences of meaning between MXF
and AAF because of the different applications they address. An example of naming differences is the use of the
package naming where in AAF the phrase “File Source Mob” is used, the shorter MXF phrase “File Package”
has the same meaning. In all MXF documents, new normative terms will be defined within the document. Subtle

Page 18 of 74 pages
SMPTE EG41

differences with AAF have not been highlighted in the specification because the MXF standard is self-consistent.
The main glossary of terms and data types can be found at the start of SMPTE 377M

5.1 Normative vs. Informative

5.1.1 Normative
The definition of Normative is given in the SMPTE Administrative Practices. For information, normative parts of a
document cover those elements of the format that are fully specified. The implication of a normative clause is “if
you do this particular function or encoding process, do it like this”. Normative does not imply that all decoders
must understand all normative elements, just as it does not imply that all encoders will encode all normative
elements. Normative clauses use the verb “shall”.

The value of a Normative clause is that it defines the parameters and syntax for a given function or process.

5.1.2 Informative
Informative parts of a document provide additional explanation or describe optional functions or processes. The
implication of an Informative clause is “you may do this particular function like this”. The value of an Informative
clause is that it provides an illuminating example of how to achieve a function or process to improve
interoperability. Informative clauses use the verb “may”.

Since neither Normative nor Informative convey any information as to which functions an implementation is
expected to perform, additional terminology is needed.

5.1.3 Recommendations
There are many recommendations in SMPTE 377M. There are many places where it was desirable to make a
normative provision, but the provision could not be enforced. For example “the duration property should be
correct in all Header Metadata repetitions”. Devices such as cameras cannot create an MXF File with the correct
duration because the header is written before the file is closed and completed. This provision is therefore a
recommendation rather than a normative requirement. Recommendations use the verb “should”.

5.2 Encoding, Decoding

One of the key points of developing any new techniques is to consider the layering of any file format and its
contents. This helps us to understand the meaning of an ‘encoder’ and a ‘decoder’ at any given layer.
Unfortunately attempts to introduce new words such as “encapsulate” have not been well accepted and words
such as “encoder” are forced to have slightly different meanings depending on context.

The layers for encoders and decoders can be broken down as follows:

Table 2 : Content Layering

Layer File Body File Header & Footer


Application Source Coding Data Interpretation
(525, 625 etc) (e.g. dictionary of data definitions)
Essence Coding Compression Coding Data Communication
(MPEG, DV etc) (e.g. relationships between objects)
Container Essence Container Data Container
(CP, PS, TS for MPEG, DIF for DV) (e.g. KLV sets, Objects)
Encapsulation MXF coding (KLV)
Transport Transport (IP packets, etc)

Page 19 of 74 pages
SMPTE EG41

The overall system is as follows:


1. An MXF system accepts essence represented in its "source coded format".
2. The essence is optionally compressed through a source Encoder.
3. The essence components are multiplexed by an MXF Encoder into partitions.
4. The multiplexed partitions are encoded into an MXF file by an MXF Encoder.
5. The MXF file is decoded by an MXF Decoder to present the essence to a user.
6. The MXF file is demultiplexed by an MXF Decoder to split the file into its different essence components.
7. The encoded essence is decompressed by an essence Decoder.
8. The decompressed essence is displayed or presented in its "source coded format".
(Note: processes are italicized, nouns are in bold).

Note that not all processes will be supported by all equipment. Many devices will operate over all layers to
provide a network or stream interface at the lowest layer, and an interface to the user at the highest layer.
However, devices that simply ‘store and forward’ need only respond to the lowest 2 layers and devices that
‘unwrap’ the data contents to provide the raw data streams only respond to the lowest 3 layers.

5.3 Functional descriptions – Encoder Required etc.

The following terms have been proposed to describe functionality that must be supported in order to create an
interoperable MXF environment. SMPTE 377M defines the normative terms, extra text and words are given here
for information.

Summary:

Table 3 : Functional Descriptions

Phrase Abbreviation MXF MXF Meaning


encoder decoder
Required Req Shall Must See below
Encoder Required E/req Shall May See below
Decoder Required D/req May Shall See below
Optional Opt May May See below
Best Effort B.Effort Should May See below
Dark Dark Should Shall Used to describe essence and metadata
not ignore items that are unknown to an application
at a given time.
Incompatible Incompat. Shall not Can Items that could cause catastrophic
explode decoder failure

5.3.1 Required
A Required Item is essential to both encoder and decoder. An example of a required metadata item is a Preface
Set. The encoder must encode this and the decoder must understand it and act on it.

5.3.2 Encoder Required


An Encoder Required Item must be sent by the encoder, but a decoder may choose to ignore it. An Encoder
Required Item must be encoded by the encoder, but need not be decoded by the decoder. An Encoder must not
assume that a decoder has taken notice of such an item.

Page 20 of 74 pages
SMPTE EG41

5.3.3 Decoder Required


A Decoder Required Item may be sent by the encoder. If sent, the decoder must act upon the Item. If not sent,
then the decoder may either do nothing, or set the item to an default value or take a predefined default action if
specified by the relevant document.

5.3.4 Optional
An Optional Item may be sent by the encoder if it is known. If sent, the decoder may choose to ignore the Item.
If not sent, then the decoder may either do nothing, or set the item to a default value or take a predefined
default action if specified by the relevant document.

5.3.5 Best Effort


A Best Effort Item is very important to a decoder, but may not be known by the encoder at the time of file
creation. These Item have distinguished values that mark them as not known; when these distinguished values
are used, the file becomes an “Incomplete” file as explained in section 3.5.2.

Note that a ‘default’ value for an Item is the value that a decoder should use in the absence of the Item. A
‘distinguished’ value is used by an encoder to signal that the Item value is unknown by the encoder. The
difference between ‘default’ and ‘distinguished’ is important.

5.3.6 Dark
A Dark Item is one that is unknown by a decoder or an encoder. This Item may be proprietary and unknowable
by a decoder. It may be an extension to SMPTE 377M that has not been incorporated into a device or
application. It may even be metadata in the original specification that is not relevant to a device or application.
All that is certain is that the meaning of the metadata is unknown. In certain application environments, encoders
may be required to carry Dark metadata and decoder may be required to make Dark metadata available.

SMPTE 377M uses KLV local sets with 2 byte tags and 2 byte lengths and includes a special pack structure
called the “Primer Pack” to ensure that dark metadata properties can be created and handled without the
possibility of a numerical clash of local tag values.

Why is this important? Imagine that 2 companies X and Y each independently want to extend the MXF
Identification Set to include some vital property of their application in every MXF file that they save. Without the
Primer Pack, there is a finite chance that they will both choose the same local tag value for their private
metadata property and when they open each others’ files, they will mis-interpret or even corrupt each others’
metadata properties. The Primer Pack mechanism exists to prevent this happening.

5.3.7 Incompatible
An encoder must not send Incompatible Items. This data classification is provided to allow certain data items to
be forbidden if they could prevent successful or deterministic decoding. There are no “Incompatible” Items
defined within SMPTE 377M, but the concept of Incompatible Items is described here because it gives a
common word for designers and implementers to describe a class of metadata that should be avoided.

5.4 Element, Item, Container, Stream, Body, Multiplexing, Interleaving

An MXF file may have external essence in addition to essence within the MXF File Body. The MXF File Body
may have several Essence Containers that are multiplexed together, each of which can sometimes be called a
stream. Each of these Essence Containers may have a single piece of essence or may have different essence
elements interleaved together. Each of these elements may be categorized into Picture Items, Audio Items, Data
Items and System Items. This results in an MXF File Body that may contain a multiplex of Essence Containers
that in turn contain interleaved Essence items that in turn contain the individual interleaved Essence Elements.

Page 21 of 74 pages
SMPTE EG41

Picture Track
Material Package stereo Sound Track
orchestral Sound Track
orchestral Sound Track
Stored Picture Track
Top-Level File Package (DV + AES Audio)
Stored Sound Track

Stored Sound Track


Top-Level File Package (AES Audio)
Stored Sound Track

Figure 4 : Multiplexing and Interleaving - the logical view

That was horribly complicated, so an example will help to clarify this extreme example of MXF capabilities.
Imagine a file that was captured in DV and has had its stereo Sound extracted and separately edited. Later in
the process, an orchestral score was added using in a separate Essence Container and described by a separate
File Package. The Operational Pattern 1b mechanism is used to synchronize the two file packages. The
resulting file looks logically like Figure 4. This seems a simple logical view, but the physical representation is
much more complex as show in Figure 5.
Partition

Partition

Partition

Partition
R. Sound

R. Sound

R. Sound
L. Sound

L. Sound

L. Sound
Pack

Pack

Pack

Pack
Physical view K L K L DV Compound K L K L K L K L K L K L K L DV Compound K L Sound K L K L K L
of the MXF File Element Element Element

DV + stereo DV + stereo
In Generic Container Orchestral In Generic Container Orchestral
Score Score

Figure 5 : Multiplexing and Interleaving - the physical view

The final sentence of the opening paragraph may now be a little clearer. The file is a multiplex of different
partitions; in this case two generic containers are multiplexed using the partition mechanism. One of these
Generic Containers is an interleave of Essence Items – a DV Compound Item and a Sound Item. In each of the
multiplexed Generic Containers, the Sound Items contain an interleave of Sound Elements – left and right
channel. It is also worth noting that the DV itself is an intrinsic interleave of DV-DIF blocks. In most MXF
processes, this level of interleaving is left to the Essence codec and is usually opaque to MXF.

There are normative descriptions of these words in the Format document and in the Generic Container
document. It is strongly recommended that new Essence Container documents follow this wording.

5.4.1 Essence Element


In many places in SMPTE 377M documents, the term Essence Element is used generically.

In many discussions of low level wrapping of the data in a Generic Container mapping, the term Essence
Element is used to mean, “A KLV wrapped essence entity that has a defined key”. For any given key, any
Essence Elements with that key relate to the same Essence stream.

In other macroscopic discussion of interleaving and multiplexing, the term Essence Element is used to describe
all the KLV wrapped essence entities with a given key, such as a single video data stream. When a stream has
a single video data stream and an associated audio data stream, the Essence Container would be regarded as
having two Essence Elements, regardless of how many KLVs were used to hold the essence.

This contextual use of the term Essence Element may cause confusion, but the authors felt it would be worse to
try to invent a new term for every one of the subtle changes in context.

Page 22 of 74 pages
SMPTE EG41

5.5 Classes, Objects, Packages & References

In this section, the concept of object implementation will be introduced, as will the idea of collecting objects and
information into packages. This section is intended to improve understanding of the concepts. It is not
intended to be a rigorous definition of the terms. The actual definitions of packages, Strong references and the
like can be found in SMPTE 377M and other MXF documents.

5.5.1 What are classes?


A Class is a generic definition of the behavior and properties of a generic object. The textbook example is a
given make and model of a car. All the cars from the same class have the same generic behavior and
properties. When describing a particular car, all the properties (such as color, engine size) are given values.
This is called an object or an instance of the class. The class definition includes all the core design parameters
that are common to all instances. A given make and model of a car (i.e. the class) may be a blue or red but are
still clearly the same car, except the color ‘property’ has been changed between the two objects.

Classes are defined by a set of data items, where each item is commonly called a property. When an instance is
made from a class, it becomes an object and values are assigned to all the properties.

Modeling of a system can involve the creation of many similar classes. In this document, we have described that
there are different sorts of track. Each of these tracks has properties that are very similar. In modeling terms,
there is an abstract superclass that defines the common functionality of all the different tracks. Abstract means
that the class is never used directly. Superclass means that the purpose of this class is to create subclasses that
add to all the properties of the Superclass. A generic Track is an abstract superclass. A Timeline Track and an
Event Track are two subclasses that share all the common properties of the Track class and have added their
own specific properties and behaviors.

5.5.2 How are objects implemented?


In MXF objects are implemented as KLV Local Sets as defined in SMPTE 336M. In SMPTE 377M, the word Set
is used in nearly all cases to describe an object of a given class. The specification of the class properties is done
using tables in the normative MXF documents, and the behavior is specified in the text of the document.

5.5.3 What is a Package?


A package is simply a container for a number of tracks that in turn represent the passage of time. The package
mechanism allows different tracks to be “ganged” together in parallel. This allows metadata and essence to be
synchronized to a common timeline.

Each package describes some aspect of the essence or data in a file and the different types of package will be
explained here with the help of some real world analogies.. The Top-Level File Package contains a collection of
metadata items and sets that describe, for example, the embedded video essence. It is described as though the
essence tracks were in a file – hence the name Top-Level File Package.

It is important to note that the Tracks are synchronized in time. This synchronization is determined by a specified
Offset value from the beginning of each track.

For the AAF-conversant reader, it is useful to note that Composition Packages are not currently used in MXF.

5.5.3.1 What is the Material Package?


The Material Package is a metadata structure that generally represents the output timeline of the file. If you
imagine the file being “played” in an MXF player, you would expect to see video, hear audio and view the data
as though it were a tape in a VTR. The Material Package contains the “hooks” that allow this to happen. It
contains timing information about the output – for example how the time is measured. It contains information
about the output tracks – how many and what format they take. It also provides hooks to say where the essence
data comes from to fill these tracks (i.e. which Top-Level File Packages).

Page 23 of 74 pages
SMPTE EG41

As can be seen in Figure 6, the Material Package can be viewed as a set of parallel tracks – one for each kind
of essence in the output stream. There is metadata associated with the file that has a global scope, such as the
Name, the UMID etc. Each Track contains further metadata to describe the way in which the final output should
be created from the Top-Level File Packages.

UMID Name etc.

Timecode track – the output timeline


Picture track(s) – describe output video
Sound track(s) – describe the output audio
Event tracks e.g. Scene Track – describes (overlapping) scene information

Figure 6 : the Material Package

Figure 7 shows the relationship between the pictures. It shows how the Material Package track can define a
sequence of SourceClips. Each SourceClip in the Material Package indicates which portion of a Top-Level File
Package should be “played” next. This is the way in which MXF supports Edit Decision Lists (EDLs).

The Material Package in Figure 7 shows how the SourceClip references the entire Top-Level File Package. Only
the File Packages in the top level of an MXF File describe the actual Essence in the File Body.

The MXF Operational Patterns constrain the relationships between the Material Package SourceClips and the
File Package(s) in an MXF File. In an OP1a file, there is no EDL support and the Material Package references
the entire Top-Level File Package. In an OP3c file, complex timeline relationships are allowed that may require
the MXF decoder to have random access capabilities.

Page 24 of 74 pages
SMPTE EG41

Material Package Track (defines start)


(generally describes
the output timeline) Sequence (defines duration)

SourceClip

The Material Package SourceClip(s) reference the Top-Level File


Package. This can be used to define an “EDL” of File Packages.

Top-level Track(defines
Track (definesstart)
start)
File Packages Sequence(defines
(definesduration)
duration)
(describes the actual Sequence
Essence in the file)
segment segment segment
SourceClip SourceClip SourceClip

Body Container

Essence Descriptor
e.g. MPEG The Top-Level File Package SourceClip(s)
may reference Lower-Level Source
Packages. These do not describe actual
stored Essence. They describe where the
stored Essence came from e.g. previously
conformed MXF files.

Lower-Level Track (defines start)


Source Packages
(describe where the Sequence (defines duration)
Essence came from
e.g. tape, reel,
source file) SourceClip SourceClip
Essence Descriptor
e.g. Tape Descriptor

Figure 7 : Relationship between the packages

5.5.3.2 What is a Top-Level File Package?


The Top-Level File Package represents the storage of some essence. This essence may be stored in the File
Body or externally in a separate file (located by information in the Essence Descriptor). The Top-Level File
Package contains the tracks that describe the type of essence, the compression scheme used (if any) and the
source coding parameters such as the number of samples, pixels and aspect ratio of the essence as
appropriate.

The tracks in the Top-Level File Package may be made up from a number of SourceClips that are used as
historical annotation to indicate where the content came from.

5.5.3.3 What are Lower-Level Source Packages?


The SourceClips in the Top-Level File Package may refer to either File Packages or Physical Packages. In
SMPTE 377M, the generic class “Source Package” is used to refer to either File Packages or Physical
Packages.

In SMPTE 377M, a Source Package that is not at the top level is used to describe the derivation of the essence;
i.e. where it came from. This is very useful metadata, especially when creating archives or providing historical
information about the source of the File Package. Lower-level Source Packages often contain physical
descriptors such as Tape Descriptors that refer to a physical location or storage medium for the content.

Page 25 of 74 pages
SMPTE EG41

5.5.4 References
Within the MXF Format we need a way of referring to objects. For example the statement, “A Material Package
has one Timecode Track object”, is quite clear. This is known as a strong (one to one) reference between the
Material Package and the Timecode Track object.

Each metadata set is coded and identified as a KLV Local Set and has a Value that contains all the locally
coded metadata items in sequence as a Tag (typically 2 bytes), Length (typically also 2 bytes) and the individual
metadata item value. Note that most MXF sets contain a Unique Identifier (Instance UID) for that set. This
Instance UID is the core data construct used to connect objects together into a logical framework

A ‘Strong Reference’ to any KLV coded data set is a one-to-one relationship between the reference and the
target data set. In MXF files, a Strong Reference is made by matching the value of a “StrongRef” in the
referencing set to the Instance UID property of the referenced set.

A ‘Weak Reference’ also uses an Instance UID to connect data sets, but any weakly referenced data set or item
may be referenced by more than one other data set. Thus a weakly referenced set is a stand-alone data set with
an Instance UID to which one or more other data sets can refer through the value of a ‘WeakRef” property. In
order to properly construct an MXF File, each and every set must have one Strong Reference to it. There is no
limit to the number of weak reference which may be made to a set. Figure 8 illustrates the concept of Strong and
Weak References in a stream of KLV coded metadata sets. Figure 8 illustrates the concept of Strong and Weak
References in a stream of KLV coded metadata sets.

Other Weak Refs


Weak Ref

K L ID K L ID K L ID K L ID

Strong Ref
Strong Ref
Strong Ref

Figure 8 : Strong and Weak Referenced Data Sets in a KLV Coded Data Stream

Note that the metadata sets are contiguous in order to preserve the KLV coding protocol (i.e. there are no gaps
between the metadata sets.

Figure 9 provides a more detailed example of data set organization and includes three techniques for the
connection of data sets:

Page 26 of 74 pages
SMPTE EG41

Participant Set Person Set Organisation Set Note:

Contribution Status Data Definition Data Definition Embedding Strongly Referenced sets is easier to
Job Function Duration Start Position understand, butÉ..
Job Function Code Duration
Role or Identity Name SourcePackageID If the length of the embedded set c hanges (e.g. by
SourceTrackID changing a text string), then the length value of
both the embedded set and all outer sets must
change accordingly

Strong Referencing by Embedding


Participant Set

Person Set Organisation Set


Value as set
16-byte L of KLV coded Set Value as set
Key
Items Tag L of KLV coded Set Value as set
Items Tag L of KLV coded
Items

Strong Referencing by UID Linking


Participant Set Person Set Organisation Set

U U U Value as set U Value as set


16-byte L Value as set 16-byte of KLV coded 16-byte of KLV coded
I I L I L I
Key of KLV coded Key Items Key Items
D D D D
Items
x y x y

Strong Reference UID connecting


Participant Set to owned sets

Weak Referencing by UID Linking


Participant Set Person Set Organisation Set

U U U Value as set U Value as set


16-byte Value as set 16-byte
L I I L I of KLV coded 16-byte L I of KLV coded
Key of KLV coded Key
D D D Items Key D Items
Items
x y x y

Weak Reference UID connecting


Participant Set to shared sets

Figure 9 : Strong and Weak Referenced Data Sets in Streams

Strong Referencing. Strong Referencing implies ownership of the referenced object as well as a one to one
relationship with it. When an MXF application creates a tree of interlinked Objects starting at the MXF Preface
Set, all objects will have at least 1 strong reference so that they are “owned” and can fit into the overall tree. An
object may additionally be weakly referenced by a large number of other objects.

StrongReferencing by embedding. This can be used where a strongly referenced data set is easily embedded
into the referencing data set. It is used in applications requiring high-speed operation, but has the drawback that
when the referencing set is changed, the length fields of both the contained and surrounding sets must change
accordingly. This mechanism is not used in the MXF Header Metadata, but may be used in an Essence
Container specification. Ownership of the referenced object is implicit because it is contained within the
referencing object.

Page 27 of 74 pages
SMPTE EG41

Strong Referencing by UID. This requires an Instance UID property in the referenced data set and a property
of type StrongRef in the referring Data Set. The overhead is thus higher than the embedding method above, but
if a property value in the referenced set changes length, it impacts only that data set and its parent data sets, but
does not affect the length of the referencing data set.

Weak Referencing by UID. Weak referencing uses an Instance UID in the referenced data set; one or more
other data sets can refer to the referenced data set by using the same Weak Reference UID value. The
advantage of a weak reference is that the values of metadata items in a data set can be shared by several
referring data sets. It is worth noting that everything within an MXF file that is weak referenced must also be
strongly referenced.

A Reference Collection is a list of UIDs connecting the referencing entity to zero or more other entities (either
weak or strong).

A Reference Array is a set of ordered references (or vector). This implies that the order is significant for
whatever reason.
Note: because all properties in MXF are unique within the AAF class model, all StrongRef and WeakRef properties are
strongly typed. This means that the property can only have a StrongRef to a specific sort of Set (or one of its subclasses).
Thus, SMPTE 377M uses the nomenclature “StrongRef (MyClass)” to mean a strong reference only to an object of type
“MyClass” or an object derived from MyClass.

For every reference in an MXF File, an MXF Decoder should be able to find a set that is the target of that
reference. The previous sentence uses the word “should” and not “shall” – why? From the definitions above, you
would expect that a decoder would always be able to find the target of a strong reference. In the absence of any
extensions to SMPTE 377M, this would be a true statement; however, it is expected that additions will be made
and new metadata sets and schemes will be developed as the format matures. Decoders that do not understand
these extensions are likely to discover that there are Dark metadata sets (i.e. the set Key of the KLV is not
understood by the decoder) within the file and that there are references without identifiable targets.

“Clever” decoders may be able to help in this situation, by looking inside Dark sets, especially those whose local
tags appear to be stored in the Primer Pack. Instance UIDs could then be discovered with some high confidence
and the presence of Dark extensions to SMPTE 377M discovered. In some circumstances, this behavior may be
quite helpful, but in general, making intelligent guesses about Dark sets is outside the scope of SMPTE 377M. It
may also lead to unpredictable results!

To summarize the MXF referencing behavior:


1. References are made from a property in one set of type WeakRef or StrongRef to the InstanceUID property
in another set.
2. All Header Metadata sets (other than the primer) are linked to the preface (directly or indirectly) by strong
references.
3. All strong references in any instance of Header Metadata match one and only one set in that instance.
4. Weak references may be made to "global definitions" that are outside the file, in these cases the WeakRef
will be either a UUID or a UL. Therefore, if a weak reference cannot be matched in the file it can be
regarded as a global definition.
5. Typical global definitions are Codec ULs, Container ULs and Compression ULs, which are used to
enumerate different codec, container and compression mechanisms
6. As dark metadata can exist in the header this means that references of any kind may appear to be
unresolved even though they are correct. MXF Decoders must be able to cope with this.

5.5.5 Resolving ULs and UUIDs


SMPTE 377M contains a large number of sets and properties (referred to as Items). The normative definition of
these properties - what they are and their type (e.g. Integer, UL, string, etc.) - is given by the SMPTE metadata
dictionary, RP210. In the MXF Format document, bytes 9 onwards of the entry in the dictionary are repeated in

Page 28 of 74 pages
SMPTE EG41

the "UL designator" column of the set definitions. Within the file, Local Set coding is being used in which a short
2-byte tag is used to substitute for a 16 byte UL.

Some of the properties in SMPTE 377M are themselves Universal Labels. Some of the values that these
Properties may take are ULs, and indeed some of the KLV keys in MXF may be ULs. These Labels are
generally used to identify lists of unique things. For example "Picture Coding Type" has a UL value. All the
Picture Coding Types that are known to MXF are simply listed in the SMPTE Labels Registry. Applications that
need to determine the meaning of a label use the SMPTE Labels Registry as the normative reference.

In certain cases an encoder may place an un-registered UL or a non-UL unique identifier in a property of type
“UL”. Example cases are where new MXF features are being developed, but have not yet been standardized,
and where private extensions are added for use in a carefully controlled MXF system. Some of these cases are
outside the scope of the MXF format, but decoders should make every effort to handle these files gracefully. For
example, decoders should not rely on the values being validly coded as a registered SMPTE Label.

5.5.6 UUID properties and their scope


Universally Unique IDs (UUIDs) are arithmetically computed unique numbers that can be used in MXF files in
two different ways. Firstly they are used for making links between different parts of the same file, such as with
strong and weak reference Instance UIDs. Secondly they are used to provide identifying or typing information,
such as where a property’s local tag is translated via the primer pack into a UUID. In the first case the UUIDs
have partition scope; an occurrence of the same UUID in two different files, or even two different partitions of the
same file, does not imply any relationship between them – even thought the likelihood of the same UUID being
generated twice is extremely remote. In the second case the UUIDs have global scope, wherever the same
UUID is used it has the same meaning.

5.5.7 Byte order of UUIDs


ISO/IEC 11578 states that in the absence of explicit specification to the contrary, UUIDs are encoded as a
sequence of 16 bytes starting with the bytes holding the time field and ending with the node ID. However, the
significance of byte order depends on the scope of the UUID.

If the scope of a UUID is local to the file then the byte order is unimportant, providing each occurrence of that
UUID uses the same byte order. In these cases the default order specified in ISO/IEC 11578 should be used.

Where UUIDs have global scope the byte order is significant. In these cases the byte order will be given when
the UUID value is published. For example, where a manufacturer publishes the UUID that a particular device
inserts into the “Product UID” field in the identification set, the byte order of that UUID will be specified as well as
the values of the bytes.

5.5.8 Storing ULs and UUIDs in the same property


Some data fields, such as the UID property of the LocalTagEntry batch in the Primer, can contain either a UL or
a UUID. In this case there is an advantage to using a particular byte order for the UUIDs. All UUIDs have a 1 in
the most significant bit of the “clk_seq_hi_res” word (byte 9), whereas all ULs have a 0 in the most significant bit
of the first byte. If UUIDs are stored with a byte order that places the “clk_seq_hi_res” word first, then it is always
possible to tell if the value is a UL or a UUID by examining the MSB of the first byte. This byte order also
prevents the remote possibility of a UUID being stored that matches a registered UL. For these reasons, it is
recommended that when any UUID is published for inclusion in a data field that can also contain ULs, the byte
order specified for that UUID be the same as the ISO/IEC 11578 default order, but with the upper and lower
eight bytes swapped.

The section above gives rise to the following guidelines:


• A UUID may be stored in a data field of type UL by swapping the top and bottom 8 bytes of the UUID
(the most significant bit of the first byte of such a swapped UUID is always 1)
• MXF decoders should accept a swapped UUID in a place where a UL is expected.
Note: AAF uses a compatible byte-swap method for storing ULs and UUIDs in the same properties, which it defines as
AUIDs.

Page 29 of 74 pages
SMPTE EG41

5.5.9 ULs identifying the file’s handling requirements


In the Partition Packs of the MXF File, there are a number of properties whose UL values are intended to give
an indication of the codec and handling requirements needed for the file. This information is intended to be a
performance enhancement to provide “fail-fast” functionality. This information is located in the first few bytes of
every file so that an application can quickly determine if it is able to handle the content of a file. The information
is copied from the authoritative information in the Header Metadata.

The Operational Pattern UL identifies the timeline complexity of the file. The Essence Container ULs identify the
Essence Data that is contained in the file so that an application can determine if a suitable codec is available.
These numbers are registered values so that an application that cannot handle a particular Essence Container
Type is able to report the Essence Type in the file. This type of reporting behavior helps users to identify content
and is encouraged. Anonymous failure such as “a codec cannot be found” without reporting what sort of codec
was sought is not encouraged. Older decoders that are unaware of new UL values should at least attempt to
report the ULs that were not known. It is important to note that it may not be possible for this information to be
provided by all MXF encoders and that decoders should not fail if this information is empty or missing.

If an MXF File contains multiple Essence Containers, but these are all of the same type, then the Essence
Container Label appears in the Partition Pack only once. This non-duplication is to ensure that a higher
Operational Pattern file with 100 small MPEG clips need not insert 100 ULs in the list.

Some Essence Container specifications (such as the MPEG Long GOP Generic Container mapping) define
Essence Container ULs for the different MPEG streams that may be encountered when transwrapping from
MPEG Program Stream to MXF. It is possible that the list of Essence Containers will contain a UL for the Sound
data and a UL for the Picture data even when the resulting file contains only a single Essence Container with
interleaved Sound and Pictures. During the design of MXF it was felt that there needed to be a descriptor for
each of the different types of audio so that the MXF decoder requirements could be determined rapidly.

As an example, if you have an OP1a MXF file with MPEG 2 video, 2 channels of AES audio and Timecode, the
file would have:
• 2 ULs in the EssenceContainer list (1 video, 1 audio)
• OP1a declared in the Partition Pack and the Preface Set
• 4 Tracks: 1 Picture, 2 Sound, 1 Timecode
• Material Package Tracks have the same duration as the Top-Level File Package Tracks
MXF decoders must be able to cope with the case where there are many Essence Containers of the same type
with a single UL in the EssenceContainer list. MXF decoders must also be able to cope with the case where
there are several ULs in the EssenceContainer list, each of which relates to a different Element of a single MXF
Generic Container.

5.5.10 Data Definitions


There are several MXF sets that are generic (e.g. the sequence set) and the specific behavior is identified by the
Data Definition property. In AAF, these components are implemented as weak references to definition objects in
the dictionary. These definition objects each contain an Identification property that is a 16-byte "magic number"
that the application can use to figure out how to handle the component.

MXF doesn't have such a dictionary, so cannot work the same way. Instead the DataDef property in a
component actually is the 16-byte "magic number" that the application can use to figure out how to handle the
component. This is a very subtle change in behavior between AAF and MXF, and implementers of compatible
systems should take appropriate actions to ensure interoperability. This type of data is actually a weak reference
into an external data set – i.e. a registry or dictionary, such as the SMPTE Labels registry.

5.6 Implementing objects as sets

KLV coding allows related metadata items to be grouped together in sets; e.g. Titling metadata might be
grouped into a set for convenience. SMPTE 336M defines several mechanisms for grouping the data together.

Page 30 of 74 pages
SMPTE EG41

Basically, a set comprises an outer KLV that defines the set and a number of inner KLVs that define the data
items.

The inner keys could be full length (Universal set) or could be shortened for processing and storage
convenience. KLV sets using these shortened item keys are known as local sets and the technique is fully
defined in SMPTE 336M. This standard defines how all sets have Universal Labels with a consistent definition in
the first 8 bytes of the type of data set or data pack being used. The options provided are:
• Universal Set,
• Local Set,
• Variable Length Pack and
• Fixed Length Pack.
• Global Sets (not used in MXF)
All MXF decoders must support local sets. Encoders should use the sets as required by the Operational Pattern.
If there is no guidance in the Operational Pattern then the encoder should opt for a local set implementation
using the local Tags as defined in SMPTE 377M. Note that 2-byte lengths in local sets are always coded as Big-
endian (i.e. MSB first).

Every property in MXF has a full 16-byte Universal Label so that the property may be interchanged with other
systems as either a single KLV item or as a Universal set.

MXF-specified Metadata is currently implemented using 2 byte tags and lengths. This restriction does not apply
to private metadata schemes, although it is recommended because the Primer Pack mechanism for preventing
numerical clashes of local tags, is only defined for two-byte tags.

5.7 Implementing Text

Many of the text fields in MXF are encoded using UNICODE. The coding technique is UTF-16 with big-endian
byte order to allow good international support. More information on UNICODE can be found in reference 5
(section C.1 below). There are occasions when ISO-646 text is used. This is often to comply with some other
standard such as the ISO-639 language descriptor codes.

Text is stored in a KLV or Tag-Length-Value structure. Zero word termination of strings is optional. A string may
be the same length as the “L” of the KLV or the “Length” of the Tag-Length-Value with no zero word at the end.
Alternatively, a shorter string may be placed in the space allocated by the KLV or the Tag-Length-Value
structure by inserting a zero word after the last character of the string. MXF Decoders must support both
mechanisms.

5.8 Tracking Changes with Generation Numbers

A Generation Number is a weak reference to the Identification Set that was created when the MXF file was
saved or modified by an application. Each time the MXF File is modified, a new Identification set is created. If a
metadata set is changed the Generation ID property is updated so its value will be the same as the Generation
ID of the Identification Set that was created when the property was modified.

It is important to note that Generation Number properties are optional and that decoders should not rely on their
existence; however in certain applications they can be very useful. If your application stores extended data that
is dependent on data stored in AAF’s built-in classes and properties, your application may need to check if
another application has modified the data in the built-in classes and properties.

The Generation property allows you to track whether another application has modified data in an MXF file that
may invalidate data that your application has stored in extensions. The Generation property is a weak reference
to the Identification object created when an MXF file is created or modified. If your application creates extended
data that is dependent on data stored in MXF built-in classes or properties, you can use the Generation property
to check if another application has modified the MXF file since the time that your application set the extended
data. To do this, your application stores the value of the Generation UID of the Identification object created when
your application sets the value of the extended data.

Page 31 of 74 pages
SMPTE EG41

6 Metadata Classifications & Placement


The main objective of the Material Exchange Format is to exchange program material together with attached
metadata information. This section provides a very brief overview of some of the underlying concepts of
metadata as it is used within an MXF file. A fuller description of the use of MXF Descriptive Metadata can be
found in the DM Engineering Guideline SMPTE EG42.

In general terms, the use of metadata has many dimensions as follows:


1. It is in widespread use within different content-based industries, including broadcast, film, music and web
authoring.
2. It is in widespread use in different content-based applications, including capture/creation, production, post-
production and archive/libraries.
3. It can be divided into several different broad categories including business transactions, publication
information, content identification and labeling, compositional information and formatting, etc.
4. It may have different states such as being static for a defined duration, being dynamic (with several kinds of
dynamic including transitory, metronomic, incrementing and so on).
5. It may have different levels of stability with elements having durable values that remain stable or transient
values that may frequently change.
Metadata can be divided into three broad categories:
1. Structural Metadata: a set of information that defines the essence structure, i.e. how the essence was edited
and what source components were included in what derivation chain.
2. Descriptive Metadata: a set of information that describes, parameterizes or catalogues content, such as
episode number, copyright holder and so on.
3. Dark Metadata: is unknown to an application at the time of processing. This may be for many reasons
including private metadata, unknown extensions to MXF and standardized Metadata items that are not
handled by the application.
MXF (and AAF) provide the ability to bind Metadata, Essence and Data Essence Streams together via Structural
Metadata. MXF also provides a Descriptive Metadata mechanism that allows independent DM schemes to be
created as plugs into the overall MXF File Format.

The placement of metadata in a file may be in one or more of several possible locations most suited to the
application of the particular metadata item. Figure 10 below indicates several broad locations where metadata
may be stored.

Page 32 of 74 pages
SMPTE EG41

File Header
File
Wrapper
Header Metadata
Metadata link e.g.
Material,
Compositional
1 Content Labelling and Identification
Package Catalogue
Business (access)

Server Metadata Metadata " Video " Audio " Data


e.g.
Labelling and Identification
Compositional
Catalogue

Content Packages
Publication

Sequence of
Business

Essence
Container
To end of sequence

Metadata " Picture " Sound " Data

Inter-track metadata (multiplex–rate) Intra-track metadata


e.g. e.g. Format
Format Temporal
Temporal Spatial
Material Data streams
Labelling and Identification - subtitling etc.

Figure 10 : Different Locations for metadata storage

6.1 Embedded Metadata Location

Embedded metadata (intra-track in the figure above) is that which is tightly embedded in the essence stream
such as is present in MPEG2 Video ES and AES3 data. Metadata that is embedded is typically:
Format: for decoder operation
Temporal: with particular reference to time-code
Spatial: such as pan-scan vectors and aspect ratio.
Extra data: such as captioning, subtitles etc.
6.2 Linked Metadata Location

Linked Metadata (inter-track in the figure above) is that which is closely linked to the content, whether video,
audio or data content, through a container on a picture-by-picture basis. Thus this metadata is interleaved with
the content and maintains a tight timing relationship with it. As an example, the System Item of SDTI-CP
provides this metadata location. Metadata that can be stored as linked to the frame is that relating to:
Format: often as a duplicate of the embedded metadata,
Temporal: mostly as temporally variable metadata extra to any embedded metadata,
Material: including the extended UMID and
Label: simple labeling of the content.

Page 33 of 74 pages
SMPTE EG41

6.3 Attached Metadata Location

Attached metadata (header metadata) is that which may appear in a File Header such as is present in MXF. It
can encompass a wide variety of metadata, in particular:
Content: providing metadata about the content in the File Body,
Compositional: providing simple or complex editing information for the clip or program,
Label: providing a full set of content labeling and identification,
Catalogue: for location of events, markers and for archival metadata and
Business: for access and security information.
6.4 Server Metadata Location

Server metadata can be used to replicate almost all of the metadata described so far. However, it is particularly
useful for the following metadata sets:
Label: providing a full set of content labeling and identification metadata,
Compositional: providing simple or complex editing information and historical derivation metadata
Catalogue: for use in off-line searches,
Publication: defining when and where content is to be delivered and
Business: for audience information, program statistics etc.

7 MXF in Detail

7.1 General Overview

SMPTE 377M defines a file format for the transfer of program material between equipment in the professional
broadcast environment. Stream and file transfers are both used for the interchange of program material, with file
transfers increasing in proportion to stream transfers. Neither will dominate; rather they will co-exist and the
MXF file is designed to work within both transfer classes.

File transfer is different from stream transfer in several respects:

Files are often created directly from incoming streams and are often converted into streams for emission and
distribution. The MXF standard specifies an MXF File Format that is readily convertible to and from common
streaming formats with low overhead and without loss of data.

In order to appreciate the differences between stream and file transfers, we can summarize the major
characteristics of each as follows:

File transfers...

1. Can be made using removable file media


2. Use a packet-based reliable network interconnect and are usually acknowledged
3. Are usually transferred as a single unit (or as a known set of segments) with a predetermined start and end
4. Are not normally synchronized to an external clock (during the transfer)
5. Are often point-to-point or point-to-multipoint with limited multipoint size
6. File formats are often structured to allow access to essence data at random or widely distributed byte
positions

Stream transfers...

1. Use a data streaming interconnect and are usually unacknowledged


2. Are open-ended, with no predetermined start or end.

Page 34 of 74 pages
SMPTE EG41

3. Streams are normally synchronized to a clock or are asynchronous, with a specified minimum/maximum
transfer rate.
4. Are often point-to-multipoint or broadcast
5. Streaming formats are usually structured to allow access to essence data at sequential byte positions.
Streaming decoders are always sequential.

Figure 11 illustrates the interoperation between streaming transfers based on stream interfaces such as SDTI
and file transfers between disc servers and tape archives. One of the issues of the file transfer is that many
servers support playout before file closure (i.e. read from a partially written file while it is still in the process of
writing), so blurring the distinctions outlined above.

Physical Media Streaming File Server File Exchange File Server


Streaming
Tape Case Wrapper
IP Network

Essence MXF File MXF File


Essence
- or -
Metadata Essence Essence

Metadata
Metadata Metadata

Removable
Interconnect Media
(SMPTE 305M SDTI, Fibre Channel, ATM, Ethernet, IEEE1394, etc)

Figure 11 : MXF Files and Streaming Formats

7.2 Content Model

The Content Model used in SMPTE 377M is based on that defined by the EBU/SMPTE Task Force Report,
which defines content as in the figure below:

Page 35 of 74 pages
SMPTE EG41

Wrapper

Content Package

Content Item Content Item Content Item


Content Element Content Element Content Element

Content Element Content Element Content Element

Content Element Content Element

These are all Content Components:

Essence Component (Video) Metadata Item

Essence Component (Audio) Vital Metadata (eg Essence Type)

Essence Component (Other Data) Association Metadata (eg Timecode)

Figure 12 : Content Package Model

The content model also uses the terminology of SMPTE 336M (KLV Coding) and SMPTE 298M (Universal
Labels), which define:
• Universal Labels (ULs) used as Keys
• Key-Length-Value formatting of individual metadata and essence items
• Coding of groups of data items into Sets and Packs
The content model also uses the terminology of SMPTE 326M – SDTI-CP, which defines frame-interleaved
content based on the following components:
A System Item that includes system level Descriptive Metadata and content metadata
Picture Item that includes one or more picture Elements
Sound Item that contains one or more audio Elements
Data Item that contains one or more data essence Elements
Compound Item that contains one or more intrinsically interleaved Elements
(such as an interleave of DV-DIF packets)
A link item that links metadata in the System Item to any one of the Elements.

Each of these essence Elements can be separately indexed in an Index Table and is also mapped to a track in
the Header Metadata. The track is the metadata object that controls the way in which this essence Element is
used.

Page 36 of 74 pages
SMPTE EG41

7.3 Operational Patterns

Different applications produce and consume material of various degrees of complexity and structure, from a
single clip to a multitude of clips and effects. Applications requiring only the simplest files should not be
burdened with support of the most complex. To maximize interoperability MXF uses Operational Patterns to
define constrained levels of file complexity.

During the development of MXF there were many different attempts at defining the functionality of an
Operational Pattern. The goal was to create a number of axes that allowed software and hardware developers to
create products with different levels of functionality (and hence cost). These different axes had to correspond to
real world ways of working, and had to provide mechanisms for a file to be “flattened” from a complex
Operational Pattern to a simple Operational Pattern in a way that made sense to someone working with the
Multimedia content.

The description below is of the different axes followed by a non-exhaustive discussion of some applications

7.3.1 Operational Pattern “Axes”


When trying to constrain the complexity of an MXF file, there are different axes or degrees of freedom that can
be constrained independently. It is intended that the Operational Patterns be written and standardized as they
are needed. Most Operational Patterns will be written as a constraint on the axes in this section. However, for
certain specialized applications (such as allowing audio-only WAV files to be read by non-MXF devices) there
may be Specialized Operational Patterns that constrain the specification differently. Regardless of the
Operational Pattern, any MXF decoder will be able to read the header and report the contents of the file and why
it can or cannot process the file.

The Operation Pattern axes are arranged so that any Operational Pattern to the left, or above another
Operational Pattern is a subset of its functionality. For example Operational Pattern 3b is a superset of the
functionality of OP1a, OP2a, OP1b, OP2b and OP3a, and includes not just the ability for each Material Package
to access sequential Top-Level File Packages, but also the ability to access a sequence of ganged Top-Level
File Packages.

Page 37 of 74 pages
SMPTE EG41

Item
Complexity

Package Single Item Play-list Items Edit Items


Complexity 1 2 3
MP MP MP
Single
Package a FP
FPs FPs

MP MP MP

Ganged
Packages b FPs AND
FPs FPs AND

Only 1 MP SourceClip = FP
Each MP SourceClip = entire FP Any MP track from any FP track
d ti
MP1 MP1 MP1

OR OR OR
Alternate
Packages MP2 MP2 MP2
c
Only 1 MP SourceClip = FP Each MP SourceClip = entire FP Any MP track from any FP track
d ti

Figure 13 : Operational Pattern Axes

7.3.1.1 Item complexity

Here we constrain the temporal relationship between different Top-Level File Packages within the MXF file. In
principle, there are 3 levels of constraint:

1 Single item the file contains Top-Level File Packages that have the same duration as the output
timeline (like a tape)
2 Playlist items the file contains Top-Level File Packages that are butted one against the other. All
tracks are switched synchronously with optional audio fade out / fade in to prevent
clicking. This can be likened to a playlist of tapes.
3 Edit items the file contains several Top-Level File Packages with one or more cut edits. Tracks
may have independent editing to allow audio and video to be switched at different
points in the timeline. This will often involve random access within the file and
therefore MXF files in this column are unlikely to be streamable.

7.3.1.2 Package complexity

a Single package the file contains only one active Essence Container at any point on the
output timeline

Page 38 of 74 pages
SMPTE EG41

b Ganged packages the file contains two or more Essence Containers that share a common
synchronized timeline. The MXF structure is used to wrap several Essence
Containers and multiplex them using the KLV and partitioning rules. This
could be used to gang together an MPEG Picture track in one package with
an uncompressed Sound track in another (possibly external) package.
c Alternate packages the file contains several versions of the “program”. There are several
Material Packages that might be used to control a browse track or different
language versions of a program, or different edits of some finished material
destined for different censorship zones. For example, an OP1c file may
have 2 continuous timelines – one for the French soundtrack and another
for the English Soundtrack. Another example is an OP3c file, where not
only is there a choice of English or French, but the cut lists for the output
tracks are different. Since this OP is a superset of the Ganged Package
complexity, it also has the capabilities of Ganged Packages as well as
Alternate Packages.

7.3.2 Operational Pattern Qualifiers


In addition to the axes above, there are Operational Pattern qualifiers that modify the behaviors above.

7.3.2.1 Internal / External flag

This is a simple flag that modifies an Operational Pattern. It has 2 states to indicate that all the Essence
Containers are internal to the file (Internal) or that one or more of the Essence Containers are in an external file
(eXternal). For example an OP1bx file may have internal Picture data, but external Sound data. (! 8.2.7.1)
7.3.2.2 Stream / Non-Stream (Wire / Storage) Flag

This is a simple flag that indicates either that the partitions in the file have been arranged so that it can be
streamed on a wire (Wire file), or that some other non-streaming arrangement has been used (Stored File). The
streamed file representation implies that Essence Containers are multiplexed together and that within an
Essence Container, any interleave that exists will allow decoding of the essence during streaming file transfer so
that the pictures may be viewed and the sound heard during transfer with minimal latency. The size of buffers
required to do this is an application issue and outside the scope of SMPTE 377M. Any file that does not have
this property is just a File. (! 8.2.7.2)
7.3.2.3 Uni-Track / Multi-Track Flag

This is a simple flag that indicates that all the Essence Containers in an MXF file have only a single essence
track. This flag is to aid workflows where all the different essence components of a production are required to be
individual files. This flag helps MXF decoders know that the file meets this criterion. The flag is either Uni-track
or Multi-track. (! 8.2.7.3)
7.3.3 Operational Pattern Applications
MXF applications should, where appropriate, be able to perform the following functions with respect to
Operational Patterns:
Encoders and Decoders should be able to report the most complex Operational Pattern they can handle.
A Decoder should be able to indicate what level of Operational Pattern has been processed when its capabilities
have been exceeded.
Encoders should ALWAYS correctly signal the Operation Pattern of the files they create. This means that an
MXF encoder capable of creating all possible Operational Patterns should not signal the files it creates with the
highest Operational Pattern code. It should signal the Operational Pattern to which the file complies.
Listed below are several MXF applications and possible ways in which they may be implemented using SMPTE
377M. They are intended to give a guide on how MXF might be used. They are not normative definitions of the
Operational Patterns concerned.

An Application might give a file a name depending on its functionality, for example:

Page 39 of 74 pages
SMPTE EG41

Test_OP1aiwm.mxf - mxf file with internal essence, wire-file, multitrack, Operational Pattern 1a
Test_OP3cxm.mxf - mxf file with external essence, not streamable, multitrack, Operational Pattern 3c

7.3.3.1 Video Tape replacement


A video tape is essentially a single container with a single item on it. Even though there may be more than one
“scene” or “shot” or “clip” on the tape, no special processing is required to play the sequence. All the material is
internal to the tape and it is stored in a way that can be streamed. This makes an Operational Pattern for video
tape replacement one of the simplest Operational Patterns.

7.3.3.2 Archive
There are many different Archive applications. Often, it is desirable to have metadata or a browse track “online”
and the full-quality content in some deep store. This requires referencing of external essence as well as multiple
representations of the same content. There may only be one single item in each of the representations (each
having the same duration) and the content could be arranged for streaming or storage depending on the precise
application.

7.3.3.3 D-Cinema
For distribution of D-Cinema content, it may be desirable to have different representations of the same film
distributed on common media. Alternatively, MXF may be used to represent each “reel”, which is then
assembled via a composition list that itself may be an MXF File. Different representations may be as simple as
different language tracks, or may be as complicated as different audio-video cuts to meet local or regional
content restrictions. The Operational Pattern axes allow this split of functionality. In addition a D-Cinema
application will almost certainly require protection of the content. This can be achieved with a metadata plug-in
to describe the encryption / protection scheme and an Essence Container type to contain the encrypted /
protected essence(s). The other mechanisms within MXF remain unchanged.

7.3.3.4 Adding Handles to Material


Handles are extra bits of material before and after the desired content. There are several ways in which these
could be implemented in MXF depending on the desired result.

The most common use of Handles is to adjust edit points, and / or to provide context for production processes
such as color correction. This use of Handles implies that the content within the Handle is not actually used in
the Material Package, but exists within the Top-Level File Package. The resulting file would be in the Edit Items
column of the Operational Pattern axes matrix. The precise row or column of the Operational Pattern would
depend on the construction of the essence within the file. For a mono-essence file it would be constructed as an
OP2a or OP3a file. Multi-track files would be either OP2b or OP3b depending on whether or not the cut points of
the Top-Level File Packages are synchronized on the timeline.

7.4 Relationship between MXF and Essence Containers

MXF files created in accordance with the MXF standard use Essence Containers to encapsulate one or more
essence elements. These essence elements may be intrinsically interleaved (for example a SMPTE 314M DV-
based stream) or may consist of a single non-interleaved essence element.

In order to support stream capability, the essence elements are interleaved over a limited duration (typically 1
frame). Each essence element can be encapsulated using KLV coding over the interleave duration to allow an
MXF decoder to access the essence on these KLV boundaries.

The MXF Format does not provide the individual Essence Container specifications, but defines the constraints
that a compliant Essence Container specification must meet in order for it to be encapsulated in an MXF File
Body. Constraints on the Essence Container are given in the Operational Pattern document and the Essence
Container document. They may be summarized as follows:
1. Must encapsulate each essence component with KLV coding using publicly registered Keys,

Page 40 of 74 pages
SMPTE EG41

2. Must provide for interleaving of the essence components over a limited duration (typically 1 frame), when
inputs or outputs are use for streaming.
3. Must be standardized as an open specification, preferably through the due-process of SMPTE,
4. Must meet the SMPTE criteria for a standard (see the SMPTE Administrative Practices).
It is expected that compliant Essence Containers will become available for the systems below.

Note that none of the compression formats is a compulsory function.

7.4.1 MXF Generic Container


SMPTE 377M provides a Generic Container with intrinsic interleaving. This allows most existing formats to be
mapped into the MXF Format with minimal invention of new techniques.

Wrapping all essence variants in a common Essence Container format is advantageous for system design and
interoperability. The MXF document suite specifies mappings of a variety of essence formats into the MXF
Generic Container as described below.

The MXF Generic Container may also use Essence Elements and Metadata Items defined in SMPTE 331M
through application of the specifications in SMPTE 385M (Mapping SDTI-CP Essence and Metadata into the
MXF GC).

7.4.2 MPEG-2 Long GOP and Type D-10


MPEG compressed picture essence in streams may be interleaved in several different patterns as defined by
the ISO 13818-1 Systems layer, including Elementary Streams, Program Streams, and Transport Streams.
SMPTE Type D-10 MPEG Elementary streams are defined by SMPTE 356M. An MXF Essence Container
specification allowing wrapping of these Essence types currently recommends Frame by Frame wrapping of
Elementary Streams in the Generic Container as the preferred MXF encapsulation method.

7.4.3 DV Compressed Essence


MXF Files created in accordance with this specification are intended for use in systems employing the DV family
of compression schemes defined by IEC61834-2, SMPTE 314M and SMPTE 370M.

7.4.4 Uncompressed Pictures


MXF files may be used for the transfer of program material employing uncompressed video at all resolutions,
including standard and high definitions. The MXF standards specify the use of the KLV data construct for
encapsulating uncompressed video, and the use of a separate KLV packet to carry signal parameters for use by
decoders and transcoders. Like all GC Element mappings, this Picture Element may be used on its own, or may
be used with appropriate Sound or Data Essence Elements.

7.4.5 Audio
An MXF mapping document for the encapsulation of AES3 audio and Broadcast Wave compatible audio in the
Generic Container has been defined. This audio element may be used on its own, or may be used to add audio
to another Generic Container Element such as Uncompressed Pictures or MPEG Long GOP pictures.

7.4.6 Other Compression Types


MXF Files may be used to encapsulate various other video essence compression systems, including M-JPEG,
JPEG-2000, MPEG-4 simple, MPEG-4 studio profile, MPEG-4 part 10 video, and audio essence compression
systems, including Dolby AC-3 and Dolby E.

7.4.7 Essence Container and Essence Type Identification


The types of essence permitted in each specific variant of MXF file are defined by individual Essence Container
Specifications and are identified in the File Header by one or more unique Essence Container Labels.

Page 41 of 74 pages
SMPTE EG41

7.5 How MXF objects / sets relate to the Essence Container

SMPTE 377M is a physical representation of the underlying AAF class model and uses the same methods for
data identification and data relationships. The method of relating the Structural Header Metadata to the Essence
Container is now described.

In each Partition of an MXF file, there may be any or all of the following core components:
1. A Partition Pack that defines:
- a Body SID for the container data stream in this partition,
- an Index SID for the Index Table in this partition.
2. A Primer Pack
3. Header Metadata repetition that includes:
- a Content Storage Set at the top level,
- one or more Top-Level File Packages each associated with an Essence Container Data Set.
- other metadata to describe the entire file (after all it’s a Header Metadata repetition)
4. An Essence Container (that occupies the whole File Body or a part).
5. Unique IDs that link data sets together (16-byte Instance UIDs).
6. Unique Material IDs (32-byte UMIDs) that identify the Essence Container.
These components are related as indicated in the following figure:

Partition Header Metadata IndexTable IndexTable Essence Container


Pack Preface Set Segment Segment BodySID(x) given in Partition Pack

contains IndexSID(y) IndexSID(y)

reference by UID Identifies Track in


BodySID(x) Essence Container
IndexSID(y)
Content Storage Set reference by UID
reference by UID reference by UID
Link by EssenceContainer Data
Material Package SourcePackageID File Package UMID
link by UMID BodySID(x)
reference by UID reference by UID
IndexSID(y)
Picture Track Link by Picture Track
SourceTrackID
reference by UID Edit Rate reference by UID Edit Rate Defines IndexSID – BodySID
Physical Track ID EssenceTrackNumber relationship

Picture Sequence Picture SourceClip


reference by UID
Link by start position
Picture SourceClip and duration
SourcePackageID
SourceTrackID
Start Position
Duration

Figure 14 : MXF Metadata and Relationship to the Essence Container

The relationships are as follows:

The Partition Pack includes a BodySID and an IndexSID that identify the Essence Container segment and Index
Table Segments in the partition. These are linked to the BodySID and IndexSID in the relevant Top-Level File
Package via the corresponding EssenceContainerData Set. They are also linked to the BodySID and IndexSID

Page 42 of 74 pages
SMPTE EG41

in the relevant Index Table. When the BodySID value in a partition is zero, it indicates that there is no Essence
Container segment in this partition. Likewise a zero IndexSID value indicates there are no Index Table
Segments in this partition

The Header Metadata has a Content Storage set at the top level that contains a set of Package UIDs and a set
of EssenceContainerData UIDs. The Content Storage set strongly references every Package, including each
Top-Level File Package as well as each Material Package. The Content Storage Set will also reference Lower-
Level Source Packages where these are present in the Header Metadata.

Within the Header Metadata, there is also an Essence Container Data set for every Top-level File Package. This
set provides the linking between BodySID, IndexSID and their related Package UMID value. This mechanism
relates the Partitions and Index Tables within the File Body to the Top-Level File Packages in the Header
Metadata.
Note: The Package UIDs are Basic UMIDs.

7.6 A discussion on endian-ism in MXF

The MXF Format is intended to be platform neutral. This means it should not rely on resources available on any
specific platform. There are, however, two distinct ways in which multi-byte numbers are stored in computer
systems, Big-Endian and Little-Endian. Big-Endian systems place higher value bytes in the lower value
addresses, whereas Little-Endian systems do the reverse. This means that any data structure placed directly in
a processor’s memory by hardware can be read “in place” on one system, but must undergo a byte swap
process in the other.

In addition MXF is intended to have a common object model with AAF. AAF implements variable Endian-ism
based on a byte-order property within various classes.

Note that this feature applies only to the Metadata elements in the file. The Essence Containers have fixed byte
orders depending on the specification of the Essence Container.

There are several possible solutions in MXF, of which 3 are listed here:
1. all Header Metadata items will be Big-Endian
2. all Header Metadata items will be Little-Endian
3. the MXF encoder will signal the Endian-ness it used; i.e. Source-Endian.
There were many design discussions during the development of MXF and the final conclusion was that MXF
should be Big-Endian and should not indicate this in the file. The main reason behind this decision was to
simplify the handling of dark metadata where the Endian-ism cannot be known (because the metadata is dark).

7.7 MXF Decoder Design

MXF Decoder design is, of course an application-specific issue. This section is intended to advise implementers
of issues that will improve interoperability with other systems. It is desirable that all MXF decoders should be
able to parse (i.e. understand the syntactic structure) at least the following:
1. The KLV packet structure of all parts of the file (including the KLV packets of any kind of Essence
Container).
2. The KLV structure of the Header Partition, any Body Partition and any the Footer Partition
3. The KLV structure of any optional Index Tables.
4. The optional Random Index Pack
5. The basic Header Metadata structure in any partition.
6. Locate the SMPTE Universal Labels in all the Partition Packs
7. Skip over any run-in.
In addition, it is desirable that MXF decoders decode (i.e. interpret and act on the values within) at least:

Page 43 of 74 pages
SMPTE EG41

The metadata sets and individual metadata items defined in the minimum implementation of the simplest
Operational Pattern.
Decoding of other aspects such as the compressed bitstream or the specific Essence Container in the File Body
depends on the ability of the decoder to support those aspects. It is desirable that MXF Decoders be able to
locate and present the information that identifies the contents of the MXF file as follows:
1. The MXF file identification itself (that identifies that the file is MXF compliant) through the Key value of the
Header Partition Pack.
2. The UL of the Operational Pattern (Structural Metadata) to which the file conforms.
3. An array of ULs that identify each Essence Container and its contents in the File Body.
4. An array of ULs that identify each Descriptive Metadata collection within the file

7.7.1 The minimum decoder concept


It may be useful to application specifiers to use the concept of a minimum decoder. This would have a defined
functionality in addition to that listed above. Two examples are given below:
• The minimum decoder for a tape-based MXF player would include the ability to decode and unwrap the
Essence in a restricted number of compression types. It could include a “turbo” mode where aligning the
data to a specified KAG value could guarantee faster-than-real-time behavior
• The minimum decoder for a content-aware filing system would include the ability to determine which
metadata sets were included in the file and to create menu items to allow the metadata schemes to be
browsed. It may include thumbnail generation for a limited number of essence types. It might also
enable database registration of the UMIDs and Descriptive Metadata with a media asset management
system
In general the minimum decoder will depend on the system in which the MXF file is being used.

7.8 External files – where is the essence

The MXF Essence Descriptor contains a list of properties called “Locators”. MXF supports two different types of
locator – Network and Text. The Top-Level File Package that describes the Essence (i.e. the one that is
referenced by the Material Package) may have external essence, and the decoder must scan the Locators in the
order they are given to find the Essence. A typical example of this might be the creation of a CD-ROM where the
Network Locators are given as a file reference relative the location of the MXF file, followed by other locations in
which the file might be found, e.g.:

Network locator: “src/clip1.dv” a relative file reference to clip1.dv in folder src


Network locator: “file://usr/~jon/clip1.dv” an absolute file reference to clip1.dv in jon’s home folder
Text locator: “clip1 DV tape is on shelf 42” a text locator intended for a human to interpret

Even though the actual Essence Data is external to the file, there may be metadata describing the essence
within the file. In the extreme case, all the Essence could be external to the file leaving a small MXF stub that
fully describes the external Essence. MXF Files with Internal essence may also have locators. When all the
essence can be found internally, the locators should be treated as being for information purposes. In higher
Operational Patterns, it is possible that some of the Essence will be internal and some of it will be external. In
this case, Internal Essence, where present, should take precedence over external references. Where there is no
internal essence available from a Material Package SourceClip reference, the locators should be searched in
their listed order to find the content (see also 8.2.7.1). External content can be verified by checking the BodySID
value in the Essence Container set for the appropriate UMID. A zero value indicates external essence.

Page 44 of 74 pages
SMPTE EG41

8 MXF worked examples

8.1 Identifying the contents of an MXF file

This section is written in a decoder-centric fashion to illustrate why certain parameters are stored the way they
are. An Encoder should create a file so that the maximum number of decoders is likely to be able to read /
decode it. What does this mean? In practice, it means that the MXF Encoder’s designers may discover that
there are choices to be made when creating MXF Files. It may be the case that “elegant little tricks” with the
MXF syntax are found that may make life easier for the Encoder designer. If the use of such tricks reduces the
chance of interoperability with simple decoders, these tricks should be avoided. MXF is an Interchange File
format and the goal of all MXF devices should be to maximize the probability of Interoperability.
The order in which an MXF device or application searches for parameters within the file depends very much on
what the device or application is trying to do with the file. For example:
• An MXF file explorer GUI probably wants ownership information from the Identification Set
• An MXF Asset Manager needs to know UMIDs of the current and previous versions as well as whether
the content is in the file or externally referenced.
• An MXF Tape device probably wants the size of the Header Metadata and the Essence Container type
• A computer based MXF playback application probably wants to know the Operational Pattern and what
Essence Container Type(s) are in the file
• An MXF Edit conformer needs to know the Essence Container Types and whether or not all the
Essence is Internal to the file.
Notice from the list above that there are valid and important MXF applications that do not need to know the
exact Essence Type and are never likely to decode the content. To be able to read the file, the MXF decoder is
likely to go through a number of steps in both the physical and logical structures of the file.

8.1.1 Is it an MXF File?


All MXF Files start with an Optional Run-In followed by the Header Partition Pack Key. The Run-In is less than
64k bytes and the condition for finding the start of the file is to identify the first 11 bytes of the Partition Pack key.
The simplest way to do this is to scan the initial 64k bytes of a file for these 11 bytes. When they are found, the
MXF specific decoding can begin

8.1.2 Is this an MXF File that my application can process?


MXF has been designed to allow the generation of “early failure” messages. This means that MXF Decoder
designers should attempt to determine as early as possible whether or not they can wholly or partially process a
given MXF File. Where possible, feedback should be given to the user if the application is not able to process
some or all of the file. Typical reasons might be
• “No codec available for Essence Container type <name>”, where <name> is the Human readable (in the
local language) name of the Essence Container as determined by a dictionary
• “Unknown Essence Container type <number> - not found in database”, where <number> is the UL of
the Essence Type that cannot be handled because it was not found in the local dictionary.
• “Operational Pattern Complexity exceeded. This file is OPxx, this device can play files of complexity
OPyy”
It is crucial that MXF Encoders create files with accurate header information. An MXF Encoder may be asked to
create files that are simpler than the highest Operational Pattern it was designed to create. It is a normative
provision of the specification that the MXF Encoder correctly set its header information. For example, if an MXF
Encoder can create files of OP1b complexity, but is asked to create a file with a single mono-Essence Top-Level
File Package, then the MXF Encoder must signal “OP1a” complexity in the header.

Most of the “fail fast” information required by a decoder can be found in the Partition Pack. Typical processing by
the decoder may be:

Page 45 of 74 pages
SMPTE EG41

• Is this an MXF Version I understand? The MXF decoder checks the MajorVersion and MinorVersion
properties of the Partition Pack and checks them against the decoder’s reference value. Note that in
future versions of SMPTE 377M the Partition Pack key may have differences in bytes 14, 15 and 16
compared to previous versions of the specification.

• Is this an Operational Pattern I can handle? The MXF decoder checks the Operational Pattern UL
against the list of ULs it knows how to handle.

• Is the data in this Partition stable? The MXF decoder checks byte 15 of the Partition Pack key to
determine if this partition is of type “closed” or “closed and complete”. If the partition is of type “Open”
then the MXF application should find another Partition Pack because the information in this one may
have been created on the fly and may be inaccurate.

• Can I decode or process the Essence? The MXF decoder processes the EssenceContainers Batch in
the Partition Pack to compare each label against a list of labels it knows how to process. It is possible
that the Essence will be stored in several Essence Containers of the same type (e.g. 3 DV clips) – in
this case, there will be only 1 instance of the EssenceContainer Label. It is also possible that there will
be a single EssenceContainer in the file and that this will contain several different interleaved Essence
Types – for example, there may be uncompressed images in a Generic Container interleaved with
several tracks of AES audio. In this case there would be 2 Essence Container Labels – one for the
uncompressed pictures and the other for the interleaved audio.

• What is the duration of the file? The MXF decoder searches for the Primary Package UID in the Preface
Set and discovers the duration by inspecting the duration property of the sequences of the tracks in that
package.

• What device made it? This information is stored in the Identification Set which can be found using the
most recent Generation UUID.

• Is it HDTV or SDTV? This can be determined by inspecting the Essence Descriptor for the Picture
Track. The Picture Track in the Top-Level File Package(s) has a property called TrackID. This will match
one of the linked TrackID values in one of the EssenceDescriptors within the file. This
EssenceDescriptor contains many properties that fully describe the source Picture Essence. These
include horizontal and vertical sizes as well as the frame rate and nominal aspect ratio of the content.

• Where is the External Essence? Each Essence Descriptor has a Locators property, which is an ordered
list of places where the Essence might be. This list should be searched in order to find the essence. A
locator may be a URL or it may be text intended for a human operator (e.g. “all known URLs have been
searched (<list of URLs inserted by application>) and the essence was not found – it came from the
green cassette on the shelf behind the water cooler”). Mechanisms for finding external essence are
outside the scope of this document, but Media Asset Management systems that use UMIDs for
identification are becoming more common at the time of writing of this document.

8.2 Partitioning a file

8.2.1 Partitioning for streaming – the streamable file


When streaming an MXF file, it is desirable to reduce the size of the buffers needed in the receiver, which in turn
reduces the overall latency of the system. To be streamable, a file will usually contain an interleave of Picture
and Sound Elements. In many systems that use compressed sound material, it is likely that the smallest unit of
Sound does not have the same duration as the field or frame duration of the Pictures. The guidelines below are
intended to improve the chances of interchange when streaming and refer to the placement of Elements in the
Content Package of the MXF Generic Container. The term Access Unit is borrowed from MPEG to indicate the
smallest unit of content that can be allocated a time value. Figure 15 below shows the basic structure of a
Content Package – Different Essence Items that each contain different Essence Elements. The Items can
appear in any order, but all Elements of the same type must be contiguous.

Page 46 of 74 pages
SMPTE EG41

` All content packages in any Generic Container


Content Package
should have the same number and order of elements

System Item Picture Item Sound Item Data Item

System System System Picture Picture Sound Sound Sound Data Data Data
element element element Element Element element element element element element element

System metadata
to element linking

Figure 15: Logical Structure of Items and Elements in a Content Package

In each Content Package:


1. There is one Picture Access Unit
2. The synchronized Sound sample should be in the first Sound Element in the same Content Package. This
implies that the Start Position of the Picture Access Unit should be equal to the Start Position of the Sound
Element or fall within the duration of the first Sound Element.
3. Sound Elements should be placed in the Content Package until a Sound Element is found that may start a
later Content Package. (Note that when the sound element duration is greater than the Picture Access Unit,
this results in Content Packages with no or zero length Sound Elements)
4. Any Data Element should start with the first indivisible unit of Data where the Start Position of the video
Access Unit is equal to the Start Position of the Data Element or falls within the duration of the first Data
Element.
5. Any Data Element should end with the unit of Data whose position on the timeline is not later that the
position of the next video Access Unit.

These guidelines create files that are streamable, but may require large receiver buffers to synchronize the
Picture, Sound and Data. Many compression specifications provide a lot of information on buffering and
streaming, and creating a system with similar buffer characteristics is the goal here. For example, the MPEG-2
specification ISO /IEC 13818-1 gives rules and guidelines for multiplexing the audio and video streams into
either a Program Stream or a Transport Stream.

When streaming a file, the decoder is intended to display the pictures and recreate the sound while the file is
being sent. The delay through the video and audio decoders is often not the same; therefore buffering is
required in the decoder to bring the sound and pictures into synchronization. This buffering is often in addition to
any buffering required for compression decoding and basic demultiplexing of the streams.

The guidance given here is that an MXF encoder should create a stream as though it were creating the content
for streaming using the underlying compression standard; the GC Content Package guidelines above should
then be applied. This should result in a good compromise between low latency and KLV decodability.

8.2.1.1 OP1a file requirements


This simple Operational Pattern is the one that is most likely to be used for streaming. This Operational Pattern
normatively requires that “… the Essence Container shall provide for the continuous decoding of contiguous
essence elements with no processing. The Essence Container or essence element specifications may add extra
restrictions to this condition”.

This constraint is to ensure the continuous decodability of the Essence. It does not constrain changes in aspect
ratio, Active Format Descriptor, Colorimetry or any other parameter that can vary without resetting or crashing
an Essence Decoder. Changes of picture size, frame rate, Essence Coding Mode, discontinuities in timing
parameters and errored data are all examples of Essence Decodability conditions that would break the OP1a

Page 47 of 74 pages
SMPTE EG41

requirement. It is important to note that even if the OP1a Essence Decodability conditions are met, the file must
still be wrapped and delivered in an appropriate fashion to be a streaming file.

8.2.2 How do I know what sort of Track or Package I’ve got?


Each Package in an MXF File has an array of Strong References to Tracks. Following these references will give
the track sets that describe the content for this package. A Material Package can be identified by its Key value
and will have no Essence Descriptors. Top-level File Packages and Lower-level Source Packages will have
Essence Descriptors. For mono-essence content, the Descriptor will either be a Type of File Descriptor or a
Type of Physical Descriptor or, for other essence, a Multiple Descriptor. The File or Physical Source Package
type can be determined as follows:
1. If there is one Descriptor and it is a File Descriptor then the package is a File Package.
2. If there is one Descriptor and it is a Physical Descriptor then the package is a Physical Package.
3. If there is a Multiple Descriptor and any of the Descriptors referenced by the Multiple Descriptor’s array are
Physical Descriptors then the package is a Physical Package
4. If there is a Multiple Descriptor and all of the Descriptors referenced by the Multiple Descriptor’s array are
File Descriptors then the package is a File Package.
The Primary Package property of the Preface Set indicates which package is to be considered the Primary
Package. For an MXF player application, this is the package that should be played out by default. For an MXF
Ingest application, the Primary Package is the one that most accurately describes the Ingested material. By
default, this will be the Material Package of the file.

Now that the underlying Package type is known, the relationships between the packages can be determined.
The Material Package, or Material Packages, have Tracks that have Sequences that have SourceClips that refer
to Top-Level File Package tracks. Only these Top-Level File Packages are allowed to describe actual Essence.
The Top-Level File Packages have Tracks that have Sequences that have SourceClips that may reference
lower-level Source Packages. These lower-level Source Packages contain historical derivation information.
Lower level Source Packages whether File Packages or Physical Packages, will always describe essence that is
external to the MXF file.

Now that all the Packages are known, the Track types need to be identified. In MXF, all Tracks look the same
and it is not until the Sequence referenced by the Track is inspected that the Track type is known. Similarly, all
Sequence Sets look the same and it is not until the Data Definition Property value is resolved that the track type
can finally be worked out. The values of the ULs corresponding to the different Track types are given in the
SMPTE Labels Registry. There are different UL values for Picture, Sound and Data Tracks; this Data Definition
value should be consistent between the Sequences and SourceClips along a Track as well as those up and
down the Source Reference chain.

8.2.3 How Multi-Top-Level File Package files are arranged


As mentioned already in this Engineering Guideline, the logical and physical representations of a file are
essentially orthogonal. Any generalized Operational Pattern MXF file that is not OP1a will have multiple Top-
Level File Packages. The physical arrangement of the essence described by these Top-Level File Packages will
depend on the qualifier bits as well as the Operational Pattern.

The most obvious physical constraint is to make a file that is streamable (!8.2.1). When there are multiple Top-
Level File Packages in the file, managing streaming buffers becomes slightly more complicated because of the
requirement that the essence for each Top-Level File Package must be in a partition with a unique BodySID
value. The management of the data in the Partition Packs and any Index Table segments must be done in such
a way that the receiver Essence buffers are still kept in a condition that prevents overflow and underflow.

8.2.3.1 Which Top-Level File Package goes with which Material Package track?

Each Material Package SourceClip has 2 properties that identify the appropriate Top-Level File Package:
SourcePackageID - a 32 byte Basic UMID
SourceTrackID - a 4 byte Uint32 Track Identifier

Page 48 of 74 pages
SMPTE EG41

These identify respectively the Top-Level Source Package and the track within it. The referenced Top-Level File
Package Set will have a PackageUID property that is the same as the SourcePackageID property of the Material
Package SourceClip. This Top-Level File Package will have an InstanceUID that is in the batch of Strong
References to Packages in the ContentStorage Set (when the Top-Level File Package is stored within the file).

8.2.3.2 Which Partition of Essence goes with which Top-Level File Package?
The important parameter here is the BodySID value, which is found in one of the Essence Container Data sets.
Having identified the Top-Level File Package UMID, which was the same as the SourcePackageID in the
Material Package SourceClip, each of the Essence Container Data sets is searched until the Package UID is
found in the Linked Package UID property. This set will contain a BodySID value and an IndexSID value that are
used to identify the partitions that contain the Essence Data and Index Table data for this Top-Level File
Package. This BodySID value will be found in the BodySID property of the Partition Packs where Essence Data
can be found.

8.2.3.3 Which Index Table goes with which Essence Container?


The IndexSID value found in the matched Essence Container Data set will be found in the IndexSID property of
the Partition Packs where Index Table Segments can be found. According to the partitioning rules and the Index
Table rules, there is a unique Index Table for each of the Top-Level File Packages. This unique Index Table will
contain segments that will only be found in partitions where the IndexSID has the correct value.

8.2.3.4 Which KLV wrapped Essence goes with which track?


This section is only relevant if the Essence Container has Interleaved Picture, Sound, Data or Systems
Elements. Each of the Interleaved Elements within the identified partition must be associated with a Track in
order for MXF to describe them. The Track Number property of the Track Set is used to identify the Essence
within the Essence Container.

For Essence Containers that use the MXF Generic Container, the Track Number property will match bytes 13-16
of the Key of wrapped Essence Data. Specific details of these 4 bytes can be found in the MXF Generic
Container specification as well as the individual Generic Container mapping documents.

8.2.3.5 Which part of the Top-Level File Package do I use?


The initial answer to this question seems easy – it’s the part referenced by the Material Package SourceClip.
Here are the steps taken to resolve the reference including some finer points of the specification that are
sometimes overlooked:
1. The Material Package SourceClip has a SourcePackageID (UMID) property that identifies the Top-Level File
Package.
2. The Material Package SourceClip has a SourceTrackID that identifies the TrackID of the track within the
Top-Level File Package that is to be used.
3. The Material Package SourceClip has a StartPosition property that determines the start point along the
Track in the Top-Level File Package
4. The Material Package SourceClip has a Duration property that determines how long the Clip lasts.
Assuming the Edit Rate of the Material Package Track is the same as that of the Top-Level File Package.
Assuming also, that both Tracks have Origin values of 0, it is straightforward to determine which portion of the
essence to use.

If these assumptions are not valid, some math is required to determine the correct start point. In SMPTE 377M,
synchronization is discussed in section 8.4. The equation for synchronization is copied below:

Positionn Positionm
Essence on tracks n and m are synchronized when: =
EditRaten EditRatem

Page 49 of 74 pages
SMPTE EG41

In addition, a SourceClips StartPosition is measured in Edit Units of the Track containing the SourceClip, not of
the referenced Track. This means that when material is re-digitized or re-linked, you don’t have to go and re-
normalize all the tracks that reference that material.

Now it should be clear that the desired Position along the referenced track (in Edit Units of the referenced track)
is given by the equation below:

 Position mp 
Position along File Package Track is Position fp = EditRate fp ×  
 EditRate 
 mp 
But this is not the end! The Origin Parameter for the File Package indicates how much stored essence exists
before the Position=0 point on the track. The final equation giving the start point along the stored essence
measured in File Package Edit Units is therefore given by the equation below:

 Positionmp 
Offset _ From _ Stored _ Essence_ Start fp = (Positionfp + Originfp ) = EditRatefp ×   + Originfp
 EditRate 
 mp 

8.2.4 Creating a file with multiple Top-Level File Packages


When a file with multiple Top-Level File Packages is not being streamed, there may be no constraints governing
the construction of the file. Under these circumstances, this Engineering Guideline recommends that each of the
different Essence Containers within the file is kept contiguous within the file – even when each Essence
Container is segmented into multiple Partitions.

The next question to be answered is “In which order should the Essence Containers appear in the file?”

If it is known that some of the Essence Containers are more likely to be changed than the others (for example
audio tracks that might be edited), then those Essence Containers should occur last in the file. The Essence
Container that is least likely to be changed should be placed first in the file.

If no knowledge of the likelihood of change is available to the MXF encoder then the Essence Containers should
be ordered so that the largest Essence Container appears first in the file. There are always going to be
circumstances when this rule is not optimal (e.g. when preview pictures are in the file), so implementers are
advised to think carefully about application requirements before committing to firm multiplexing rules.

8.2.5 Creating a file with Multiple Material Packages


In many ways, a file with multiple Material Packages is simpler than one with multiple Top-Level File Packages.
There is no extra essence to be added, only extra metadata to give a choice of different timelines using the
content within the file. A few simple examples of this may be:
• OP1c – single Picture track with a choice of different language Sound Tracks
• OP1c – single Picture track with a choice of stereo / multi-channel Sound Tracks
• OP1c – choice of lo-res preview Pictures with mono sound or hi-res Pictures with multi-channel Sound.
• OP2c – feature material with a choice of languages on the Sound Tracks and selectable language
specific Picture clips at the start of the feature material
• OP3c – feature material that has selectable clips (or reels, or whatever terminology is used) within the
feature for localization of the feature.
In general, the arrangement of the essence within the file should follow the same rules as a file in rows a or b of
the Operational Pattern axes matrix. If a file is marked as streamable, then this means that each and every
Material Package is streamable. If a file is marked as having internal essence, this means that all the essence
for all the file packages is internal. The essence described by the Top-Level File Packages must follow the
guidelines in 8.2.4 above and any Interleaving Guidelines (e.g. streaming guidelines in section 8.2.1) that exist
for the essence type being used.

Page 50 of 74 pages
SMPTE EG41

The question of “which Material Package do I use” is an application-specific question, but in general the
Package whose Instance UID value appears in the Preface Pack’s Primary Package property should by the one
chosen if no additional information is available.

8.2.6 Achieving Robustness for File Recovery and Partial Restore


One of the design requirements of MXF was to accommodate Partial Restore and provide file transfer
robustness. The design feature to implement both of these applications is the use of Partitions. It has already
been noted in this document that a Partition Pack may be inserted at the beginning, end or anywhere in the
middle of the file. It is these Body Partitions in the middle of the file and the use of the Random Index Pack that
allow file recovery and partial restore.

8.2.6.1 File Recovery


This application can be split roughly into 2 different scenarios:
1. A push-mode file transfer was interrupted or joined after the start
2. A stored file needs checking for consistency
In both of these cases, Partition Packs need to be inserted regularly and frequently enough for the physical
parameters to allow recovery without the loss of too much data. How much is too much? Well, that is a highly
application-specific question and may be as small as a Frame, or as big as the entire file. For this reason, an
MXF encoder targeted at this sort of application must be designed with an awareness of the data loss that could
arise from the Partition spacing that is used.

The Partition Pack has two properties that should be consistent throughout the file:

ThisPartition: The offset to the start of this partition in the sequence of partitions (as a byte count relative
to the start of the Header Partition).

PreviousPartition: The offset to the start of the previous partition in the sequence of partitions (as a byte
count Byte relative to the start of the Header Partition).

In addition, the start of an MXF file is identified by the first 11 bytes of the Key of the Partition Pack.

It should now be possible to see that a push-mode transfer may be joined halfway through the stream by
detecting the first 11 bytes of a Partition Pack. If this is a valid Partition Pack then the remaining byte of the key
will match a known Partition Pack, and the values within the Pack will contain valid values. The very first
partition of the file is always the Header Partition and will have a “This Partition” value of 0. If a push-mode
transfer is joined and “ThisPartition” is non-zero then the number of missed bytes can be determined.

The PreviousPartition value can be used as a rough measure of the rate of insertion of partitions (assuming that
there is some consistency to the partitioning strategy used by the MXF Encoder). It should also be noted that
although the first 11 bytes of the Partition Pack key is quite a long byte sequence, it is not necessarily sufficiently
unique to never occur in the essence of a file. For this reason, a more robust decoder may wait until the second
Partition Pack header is received and check that:

ThisPartitionn - PreviousPartitionn = ThisPartitionn-1


Checking a stored file for consistency now involves counting bytes within a file and verifying that all the
ThisPartition and PreviousPartition properties are correct.

8.2.6.2 Partial Restore


This application is subtly different from the one above. The application needs to extract a recoverable portion of
the (possibly damaged) original file and present it as a new MXF file.

Files that act as the Master for this Operation should be constructed with regular Body Partitions, a Random
Index Pack (RIP) and Index Tables. Ideally a complete Index Table for each and every Essence Container will
exist both in the Header and in the Footer of the File.

Page 51 of 74 pages
SMPTE EG41

The portion of the file to be extracted will most often be expressed in terms of time along the file. This example
will only consider the case of an Operational Pattern 1a file. In the higher Operational Patterns, extra work must
be undertaken to ensure that the correct portions of each and every referenced Top-Level File package are
extracted. The complexity of the Index Table handling will also increase because there is one Index Table per
Essence Container that may be segmented. Each Essence Container must be handled separately with the RIP
being used to identify the start of each partition.

In any MXF file, a RIP can be detected by accessing the last 32 bytes of the MXF File and using this as a Uint32
backwards offset from the end of the file to the start of the RIP (precise details are in SMPTE 377M). If the RIP
is present then the offset will point to the first byte of the KLV key of the RIP. The RIP can now be read and the
start point of each of the partitions in the file can be determined. In an OP1a file, this data is less critical than in a
higher Operational Pattern file where the Partitions will also be used to separate the different Essence
Containers. In OP1a files, there is only one Essence Container and therefore only one Index Table. An Index
Table Segment can now be located by finding a partition with the correct IndexSID value in the Partition Pack.

Now that the Index Table has been found, the byte Offsets within the Essence Stream can be found by an Index
Table look-up. If the partial file extraction is to be done with a minimum of processing then all the partitions from
the one containing the first byte up to and including the last partition containing the last byte can be extracted.

It is strongly recommended that after this extraction process has been done, the partition header data be
processed to correct the MXF file:
• The “ThisPartition” and “PreviousPartition” values in each partition header should be corrected
• Index Tables should be created that are consistent with the new partial file
• The UMIDs should be updated to show that this is not the same as the original material. (A combination
of SMPTE RP205 and Operational Practice will determine the exact UMID modification required)

8.2.7 Setting the Operational Pattern Qualifier Bits


The MXF Operational Pattern has 3 qualifier bits that provide global information about the internal arrangement
of the data within the file. This section is intended to clarify how these bits should be set and to explain some of
the pathological cases that may not otherwise be clear.

8.2.7.1 Bit 1: Internal / External Essence


At first glance, this seems obvious – either the content is internal or it isn’t. MXF allows referencing of external
Essence Containers via Locators in the Top-Level File Package. However, Locators are allowed to be present
even when there are Essence Containers internal to the file. This implies that Bit 1 should be zero only if all
Top-Level File Packages in the file have matching Essence Container Data sets and Essence Containers in this
file.

Are Locators the only way of finding external metadata? No. If a Material Package references a File Package
that is simply not present in the File, then this is a valid external reference. In this case Bit 1 would have to be
set. Finding the Essence is rather more difficult – an external Media Asset Management system needs to be
used in order to resolve the UMID and find the content.

The next 3 figures attempt to show 3 different conditions that could result in external essence. Figure 16 shows
linkage using only UMID as the linking mechanism. The Material Package contains a SourceClip with a
SourcePackageID (UMID) that is not in the file. This can be determined by inspection of all the Top-Level File
Packages and optionally by the presence of an Essence Container Data Set with a BodySID of 0. Some external
mechanism (such as an asset management system) is required to resolve this UMID to a filename that can be
inspected for a UMID match as shown in the lower part of the diagram.

Page 52 of 74 pages
SMPTE EG41

File being inspected


Partition Header Metadata
Pack

Essence Container Data Material Package


UMID= XX Picture Track
BodySID= 0
Picture Sequence

Picture SourceClip
SourcePackageID= XX
SourceTrackID
Link by UMID Link by SourcePackageID
Start Position
Duration

External MXF Essence File


Partition Header Metadata IndexTable Essence Container
Pack Segment BodySID= x given in Partition Pack

IndexSID=y
BodySID=x
IndexSID=y File Package Essence Container Data
UMID= XX UMID= XX
BodySID= x
IndexSID= y
Picture Track

Picture Sequence

Figure 16 External Essence example using only UMID for linking

Locators provide a mechanism for discovering the location of external essence using only information within the
file. The advantage is that no external mechanism is required; the disadvantage is that when the external file is
moved, the locators should be updated. Figure 17 shows a similar example to the one above, although this time
the Material Package contains a SourceClip with a SourcePackageID (UMID) that appears to be in the file. Why
“appears to be”? Because a File Package exists in the file with the correct UMID, but the Essence Container
Data set indicates the BodySID value is 0. There are, however, two network locators and a text locator. The first
of these text locators is resolved to the file in the lower half of the figure. The locator identifies non-MXF essence
and because of this, it may be difficult for an application to check the UMID for correctness.

File being inspected


Partition Header Metadata
Pack

Essence Container Data Material Package File Package


UMID= XX Picture Track UMID= XX
BodySID= 0
Picture Sequence Picture Track Essence Descriptors
Identifies essence Picture SourceClip Picture Sequence Locators
as external
SourcePackageID= XX Network Locator z:/src/clip.mpg
SourceTrackID Network Locator //archive/sept/clip.mpg
Start Position Text Locator ”DVD-R #14326”
Duration

External Essence File


z:/src/clip.mpg

Figure 17 : External Essence example using locators for linking

Page 53 of 74 pages
SMPTE EG41

Figure 18 shows an example where the external essence is an MXF File. As in the examples above, a Material
Package references a File Package that appears to be in the File. The Essence Container Data set indicates
that the essence is external because the BodySID is 0, equally obviously because there is no essence in the file!
The locator resolves to an MXF File, and this time checks can be made to determine that the target of the
reference is correct. The Top-Level File Package of the target file will have the same values as the Top-Level
File Package in the first file. If the UMIDs match, the target file has been found. If not, the rest of the Locators
should be inspected as above.

The Top-Level File Package in the external file should be identical to that in the original file. If there are any
discrepancies, then the metadata values in the external file should take precedence. The Top-Level File
Package in the original file should be regarded as a copy.

File being inspected


Partition Header Metadata
Pack

Essence Container Data Material Package File Package (copy)


UMID= XX Picture Track UMID= XX
BodySID= 0
Picture Sequence Picture Track Essence Descriptors
Identifies essence Picture SourceClip Picture Sequence Locators
as external
SourcePackageID= XX Network Locator z:/tmp/clip.mxf
SourceTrackID Network Locator //archive/oct/clip.mxf
Start Position Text Locator ”LTO #26”
Duration

z:/tmp/clip.mxf
External MXF Essence File
Partition Header Metadata IndexTable Essence Container
Pack Segment BodySID= x given in Partition Pack

IndexSID=y
BodySID=x
IndexSID=y File Package Essence Container Data
UMID= XX UMID= XX
BodySID= x
IndexSID= y
Picture Track

Picture Sequence

Figure 18 : External Essence example using locators and UMIDs for linking

8.2.7.2 Bit 2: Stream File / Non-Stream File


The best description of this is found in MXF Format specification 9.2: "The Essence Containers used in
streaming Operational Patterns must be capable of interleave over a defined interleaving period or must be
capable of being multiplexed in an MXF file using the partition mechanism. The interleave / multiplex duration is
dependent upon the application, but should be the period of the minimum duration of usable picture essence,
typically a picture frame period."

Reading the paragraph above 2 or 3 times, it seems clear. One possibly ambiguous case is where a file
contains only a single Essence Container that is intrinsically streamable but is clip-wrapped, either in a Generic
Container or in its own native container. In this case, Bit 2 should be set to “Wire File” because the resulting file
is still streamable according to the definition above. The interleave duration is set by the intrinsic streamability of
the underlying essence and there is no partitioning (i.e. multiplexing). The application has determined, therefore,
that the Multiplex duration is equal to the length of the file.

Page 54 of 74 pages
SMPTE EG41

Some cases of streamable status are clear and unambiguous. However, other cases can be subjective. The
following illustrate some possible cases of streamable files (all assuming that the essence is, itself, streamable):
1. A single, frame-wrapped, EC with a single essence element (e.g. OP1a with B-Wav essence).
2. A single, frame-wrapped, EC with multiple interleaved essence elements (e.g. OP1a with Type D-10
mapping).
3. Multiple, frame-wrapped, ECs where the ECs are in presentation sequence and in contiguous partitions
(e.g. OP2a with Type D-11 mapping).
4. A single, clip-wrapped, EC with a single essence element (e.g. OP1a with MPEG-2 long-GOP video ES).
5. A single, clip-wrapped, EC with an inherently interleaved essence stream (e.g. OP1a with a DV DIF stream).
6. Multiple, clip-wrapped, essence elements, each in separate ECs, which are multiplexed over clips of short
duration (say <1sec) (e.g. OP2b with MPEG-2 I-frame video ES multiplexed with B-Wav audio).

8.2.7.3 Bit 3: Uni-Track / Multi-Track


At first glance, this too seems straightforward – there is either one track or more than one track. The definition of
this bit refers to the Essence Container, not to the File Package. This bit is intended to give the intention of the
Essence Container, for example a stereo AES file should be signaled as a Single Track because the left and
right channels are treated together. A DV Essence Container in which only the video is used should be flagged
as Uni-track because that is the intention of the Essence Container.

Note that the goal of this flag is to describe a Uni-Track file i.e. an OP2a file could be uni-track because it could
be constructed to have only one active track in the output timeline. OP1b could not be uni-track because there
will always be two or may synchronized tracks active on the output timeline.

8.2.7.4 Higher Operational Patterns


In Operational Patterns higher than OP1a, care needs to be taken when setting these qualifier bits. An example
is an OP1b that is split into 2 parts (MP=Material Package, FP= Top-Level File Package):
Part1 = MP + FP pictures (internal) + FP sound (external)
Part2 = MP + FP pictures (external) + FP sound (internal)
The following conditions are true and would result in setting of qualifier bits:
• Each file is uni-track so bit 3 is set to Uni-Track
• Both were constructed to be streamable (as a result of the way they were created) so bit 2 set to “Wire
file”
• Each file has external essence so bit 1 should be set to “external”
If the Primary Package property is set be the Internal Top-Level File Package of the file, then it is possible that
the file may be played, even though it is flagged as having external essence.
8.3 Index Tables

MXF Index Tables are intended to be versatile, compression agnostic, streamable and applicable to any and all
of the MXF Operational Patterns defined in SMPTE 377M. The purpose of an Index Table is to convert from
time offsets to byte offsets within a file. The MXF Index Table specification may, at first seem rather complex,
but its resulting versatility gives huge functionality to random access systems:
• Cameras and streaming devices can create Segmented Index Tables on the fly
• Storage devices may have Index Tables at the start, end or both
• Index Tables are created for each Essence Container. Multiplexing Essence Containers or changing the
partitioning of a file does not change the Index Table
The Index Table structure for an Essence Container is defined by the “Delta Entries”. There is one Delta Entry
for each of the Interleaved Elements of the Essence Container. These Delta Entries allow an Essence Element
to be categorized as either CBE (Constant Bytes per Element) or VBE (Variable Bytes per Element). MXF
Encoders should always “play it safe” if there is any uncertainty that an Element is CBE. Each and every

Page 55 of 74 pages
SMPTE EG41

Element in the entire file should have the CBE byte count – if this is not true then each Index Table must use the
slice mechanism to indicate a VBE stream.

8.3.1 Using Index Table Delta Entries


Figure 19 below shows the physical and logical representations of a Content Package for use in this Index Table
example. The physical Content Package shows an Edit Unit n with 5 elements and some Fill at the end of the
Content Package. The logical arrangement of these Elements is shown on the right hand side of the picture.

In this example, there are 3 separate Essence Tracks that need to be indexed. The Data and Sound Elements
are all CBE, but the Interleaving Rules used for this Essence Container lead to a variable number of Sound
Elements per Content Package. In the Content Package shown in Figure 19 there are two Sound Elements.
This results in a VBE Sound Stream for the purposes of Indexing because the number of bytes of Sound data
for a given Edit Unit is not constant.

The MXF Index Table Delta Entries are intended to allow identification of each of the Indexed Elements, and to
indicate whether they are CBE or VBE Elements. We will partially fill in the DeltaEntry Array here with the CBE
values we know. The VBE values will be filled in once slices have been introduced.

In the table below the expression BCSystem indicates the byte count for the System Element

Table 4

Item Name Meaning Value Why …


NDE Number of delta entries 5 There are 5 delta entries as shown below
Length Length of each delta entry 6 Each one is 6 bytes long
PosTableInd Temporal Reordering / Index into
Delta Entry 0

ex PosTable
System

Slice Slice number in IndexEntry 0 It’s the start of the Index Table – slice 0
Element Delta from start of slice to this 0 It’s the first entry – offset 0bytes from the
Delta Element start of the start of this Indexed Edit Unit
PosTableInd Temporal Reordering / Index into
ex PosTable
Delta Entry 1

Slice Slice number in IndexEntry 0 The previous element was CBE, so this is
Data

still slice 0
Element Delta from start of slice to this BCSystem This element starts at the end of the
Delta Element System Element, so this value is the byte
count of the System Element
PosTableInd Temporal Reordering / Index into
Delta Entry 2

ex PosTable
Picture

Slice Slice number in IndexEntry


Element Delta from start of slice to this
Delta Element
PosTableInd Temporal Reordering / Index into
Delta Entry 3

ex PosTable
Sound

Slice Slice number in IndexEntry


Element Delta from start of slice to this
Delta Element
PosTableInd Temporal Reordering / Index into
Entry 4
Delta

Fill

ex PosTable
Slice Slice number in IndexEntry

Page 56 of 74 pages
SMPTE EG41

Element Delta from start of slice to this


Delta Element

8.3.2 Using Index Table Slices


If every entry of an Index Table were CBE, the slice mechanism would not be needed. The size of each Element
would be known and the DeltaEntry Array would be filled in as above. In our example, we have 3 variable length
Elements – the Picture, Sound and Fill. The Index Table is “sliced” so that a VBE Element is the last Element in
a slice. In our example Slice 0 is terminated by the Picture Element, Slice 1 is terminated by the Sound Element
and the final slice is terminated by the Fill element. We can now fill in the slice information in the table:

Page 57 of 74 pages
SMPTE EG41

Table 5

Item Name Meaning Value Why …


NDE Number of delta entries 5 There are 5 delta entries as shown below
Length Length of each delta entry 6 Each one is 6 bytes long
PosTableInd Temporal Reordering / Index into
Delta Entry 0

ex PosTable
System

Slice Slice number in IndexEntry 0 It’s the start of the Index Table – slice 0
Element Delta from start of slice to this 0 It’s the first entry – offset 0bytes from the
Delta Element start of the start of this Indexed Edit Unit
PosTableInd Temporal Reordering / Index into
ex PosTable
Delta Entry 1

Slice Slice number in IndexEntry 0 The previous element was CBE, so this is
Data

still slice 0
Element Delta from start of slice to this Sizeof(Syste This element starts at the end of the
Delta Element m) System Element, so this value is the byte
count of the System Element
PosTableInd Temporal Reordering / Index into
Delta Entry 2

ex PosTable
Picture

Slice Slice number in IndexEntry 0 This is the Element that terminates slice 0
Element Delta from start of slice to this BCSystem + The offset to the start of the Picture item
Delta Element BCData is the byte count of the system Element +
the byte count of the Data element
PosTableInd Temporal Reordering / Index into
Delta Entry 3

ex PosTable
Sound

Slice Slice number in IndexEntry 1 This is the Element that terminates slice 1
Element Delta from start of slice to this 0 It is also the first element in slice 1
Delta Element
PosTableInd Temporal Reordering / Index into
Delta Entry 4

ex PosTable
Fill

Slice Slice number in IndexEntry 2 This is the Element that terminates slice 2
Element Delta from start of slice to this 0 It is also the first element in slice 2
Delta Element

The use of the PostableIndex field is given in 8.3.5 below.

8.3.3 Indexing frame-wrapped and clip-wrapped Essence Containers


In Frame wrapping mode the tables index the first byte of the Key that wraps the indexed frame. This will most
likely be the first byte of the Picture Element Key or Sound Element Key in the appropriate GC Mapping
Specification. This means that each Element Delta values include the lengths of the "KL" for each Element.

If the overall length of all the Elements in each frame is constant, then a Delta Entry Array and an "Edit Unit Byte
Count” Item are sufficient to define the Index Table Segment.

In Clip wrapping mode the tables index the first byte of the data for each indexed frame. For example, in the
MPEG Long GOP case, this will be the first byte of the start_code for the appropriate access unit. This means
that each Element Delta values give precisely the length of the data for each Frame.

Page 58 of 74 pages
SMPTE EG41

If the overall length of all the data for each and every indexed stream is constant for all frames, then a Delta
Entry Array and an "Edit Unit Byte Count” Item are sufficient to define the Index Table Segment.

8.3.4 Fixed sized essence


This simple example comes from the Generic Container DV mapping document.

This is a simple case where the Index Table points to the first byte of the DV-DIF Compound Element Generic
Container Key. There are no other Generic Container items and any Sound or Data Information is embedded
within the DV-DIF container. Therefore, the only pieces of information in the Index Table segment are the Start
Position, Duration and (fixed) size of each DV-DIF Compound Item KLV triplet.

Table 6 : Frame Wrapped Index Table Segment Set

Item Name Req ? Meaning Use


# Index Table Segment Req An Index Table Segment set See MXF Format Specification
↔ Length Req Set Length See MXF Format Specification
$ Instance ID Req Unique ID of this instance See MXF Format Specification
Index Edit Rate Req Edit Rate copied from the See MXF Format Specification
tracks of the Essence
Container
Index Start Position Req The first editable unit indexed Set to the position value of the first edit
by this Index Table segment unit indexed by this Index Table
measured in File Package segment.
Edit Units
Index Duration Req Time duration of this table May be set to zero to indicate that this
segment measured in Edit Index Table Segment applies to all Edit
Units of the referenced Units in this Essence Container
Package
Edit Unit Byte Count D/Req Defines the byte count of Set to the number of bytes in every
each and every Edit Unit. A KLV, including the length of the Key
value of 0 defines the byte and Length. The Index Table can be
count of Edit Units is only used to find the first byte of the KLV of
given in the Index Entry Array every DV-DIF frame
IndexSID D/Req Stream Identifier (SID) of See MXF Format Specification
Index Table
BodySID Req Stream Identifier (SID) of the See MXF Format Specification
indexed Essence Container

8.3.5 Variable sized essence


This worked example comes from the MPEG mapping document. It is intended to show the construction of an
Index Table for a Frame Wrapped interleave of sound, Data and Picture Elements where it is important to
preserve the synchronization of the Picture, Sound and Data to an accuracy of better than a frame.

This is a case where the Index Table points to the first byte of the MPEG Picture Element Generic Container
Key. The other Generic Container Elements should be indexed by correct use of the Delta Entries and Index
Entries. This example assumes that the Sound Elements that are Indexed require the use of the fractional
Position mechanism defined in SMPTE 377M. The following figure represents a typical Content Package being
indexed. This figure is based on a figure in SMPTE 377M.

Page 59 of 74 pages
SMPTE EG41

Index Entry n
start Synchronised Sound sample Position=PCP
within the Sound Frame

Edit Unit ‘n’ Duration= 1 edit unit

Data Picture Sound Sound

fill
element Picture Element
element Element 1 Element 2
CBE VBE VBE VBE Sound Element 1 Sound Element 2
Delta Entry 2 Delta Entry 3 Delta Entry 4 Delta Entry 5
Sound
Slice1 start point Slice2 start point Start Data
in Index Entry ‘n’ in Index Entry ‘n’ Position Start Data Element 1
offset Position
offset

Physical layout of bytes Temporal positioning of Elements

Figure 19 : Content Package for Index Table Example

In this example, the Picture, Sound and Fill are all VBE. The Fill is indexed so that it can be eliminated from any
Essence Byte Counting based solely on calculations in the Index Table.

Table 7 : Frame Wrapped Index Table Segment Set example

Item Name Req ? Meaning Use


# Index Table Segment Req An Index Table Segment set See MXF Format Specification
↔ Length Req Set Length See MXF Format Specification
$ Instance ID Req Unique ID of this instance See MXF Format Specification
Edit Rate Req Edit Rate copied from the See MXF Format Specification
tracks of the Essence
Container
Start Position Req The first editable unit indexed Calculate
by this Index Table segment
measured in File Package
Edit Units
Duration Req Time duration of this table Calculate
segment measured in Edit
Units of the referenced
Package
Edit Unit Byte Count D/Req Defines the byte count of 0 unless the total length of all the GC
each and every Edit Unit. A Elements are of constant size. In this
value of 0 defines the byte example for we assume that only the
count of Edit Units is only Data Elements are VBR so the value is
given in the Index Entry Array 0.
IndexSID D/Req Stream Identifier (SID) of See MXF Format Specification
Index Table
BodySID Req Stream Identifier (SID) of the See MXF Format Specification
indexed Essence Container
Slice Count D/Req Number of slices minus 1 2
(NSL)
PosTableCount Opt Number of PosTable Entries 1
minus 1 NPE
Delta Entry Array Opt Map Elements onto Slices

Page 60 of 75 pages
SMPTE EG41

Item Name Req ? Meaning Use


Table 8
Index Entry Array D/Req Index from Edit Unit number Table 9
to stream offset

The Delta Entry Array contains an entry for every indexed Element in the Generic Container. The order of the
elements in the Delta Entry Array matches the order of the Elements in the Generic Container. The Example
below is a Delta Entry Array designed to match the example in Figure 19. Implementations should construct a
Delta Entry Array according to the properties of the actual Essence in the file.

In this example, we have several variable length Elements so that an Index Entry array is required.

Note also that the Delta Entry Array does not distinguish which element is which in the Index Table. To know
which element is indexed, the following rules apply when an MPEG Long GOP stream is indexed:
• Each Content Package starts with the same number and order of Elements and the previous Content
Package
• If new elements are introduced for whatever reason, they are appended to the end of the existing
Content Package elements
• If elements in the Content Package have no data, then an IndexEntry for a zero length VBE element is
created
• Index tables have the same number of delta entries as the maximum number of elements in any
Content Package
• The essence type of an index entry can be determined by inspecting the key that wraps the indexed
essence.

Table 8 : Frame Wrapped Delta Entry Array example for Figure 19

Field Name Type Meaning Use


NDE UInt32 Number of delta entries 4
Length UInt32 Length of each delta entry 6
st
PosTableIndex Int8 0= No reordering 1 (1 PosTable Entry)
Delta Entry

+ve = PosTable Index


Data

Slice UInt8 Slice number in IndexEntry 0


Element Delta UInt32 Delta from start of slice to this 0
Element
PosTableIndex Int8 -ve - reordered -1 (reordered Long GOP content)
Delta Entry
Picture

Slice UInt8 Slice number in IndexEntry 0


Element Delta UInt32 Delta from start of slice to this sizeof(KL) + sizeof(Data)
Element
nd
PosTableIndex Int8 0= No reordering 2 (2 PosTable Entry)
Delta Entry

+ve = PosTable Index


Sound

Slice UInt8 Slice number in IndexEntry 1


Element Delta UInt32 Delta from start of slice to this 0
Element
PosTableIndex Int8 0= No reordering 0
Delta Entry

+ve = PosTable Index


Fill

Slice UInt8 Slice number in IndexEntry 2


Element Delta UInt32 Delta from start of slice to this 0
Element

Page 61 of 75 pages
SMPTE EG41

Table 9 : Frame Wrapped Index Entry Array description for Figure 19

N Field Name Type Meaning Use


1 NIE UInt32 Number of index entries =number of frames
1 Length UInt32 Length of each index array calculate
entry
Temporal Offset Int8 Offset in edit units from 0
Display Order to Coded Order
Key-Frame Offset Int8 Offset in edit units to previous 0
Key Frame. The value is zero
if this is a Key-Frame.
Flags EditUnitFlag Flags for this Edit Unit calculate
Bit 7: Random Access
One Index Entry for every frame

Bit 6: Sequence Header


N Bit 5: forward prediction
I Bit 4: backward prediction

E 00= I , 10= P, 01 or 11= B


Bits 0-3: reserved
Stream Offset UInt64 Offset in bytes from the first Offset from the first byte of the key of the
KLV element in this Edit Unit KLV for the first frame to the first byte of
within the Essence Container the Key of the KLV for the Data Element
Stream in this frame as shown in Figure 19
SliceOffset NSL x UInt32 The offset in bytes from the Optional depending on the complexity of
Stream Offset to the start of the VBR items. In this case there are 3
this slice. slices and NSL is set to 2
PosTable NPE *Rational The fractional position offset This should be calculated for each
from the start of the Content wrapped element to ensure precise
Package to the synchronized synchronization is maintained. There are
sample in the Content 2 elements requiring offsets in this
Package example so NPE=2

The table above shows the descriptions of the various elements required in the Index Entry. Below, the Table
shows entries for the first 6 frames of a Long GOP sequence. The following values have been used in creating
the table:
• The GOP display sequence for frames 0-5 is B0I1B2P3B4P5. This is the indexed order of the frames.
• The GOP transmission order for frames 0-5 is I1B0P3B2P5B4. This is the stored order of the frames
• The GOP is closed (i.e. the first B frame contains predictions only from the I Frame).
• The Data Element length is fixed at 700 bytes and temporally offset by -0.25 edit units
• The I frames are 48000 bytes, P frames are 9000 bytes and B frames 1000 bytes.
• In the 6 Content Packages, there are 8 Sound Elements.
• The number of Sound Elements are multiplexed in the Content Packages as follows: (1)(1)(2)(1)(1)(2)
• Each Sound Elements is 1000 bytes.
• Each Fill element is 300 bytes

Page 62 of 74 pages
SMPTE EG41

Table 10

N Field Name Type Value Note


NIE UInt32 6 6 Entries in this Index Table Segment
Length UInt32 35 Sizeof(IndexEntry) including the SliceOffsets & PosTable

Temporal Offset Int8 1 B0 is stored in Content Package[1]


0 Key-Frame Offset Int8 1 The key frame is IndexEntry[1] and 1-0=1
Index Entry[0] – “B”

Flags EditUnitFlag D0h Closed GOP B frame – backward reference,


sequence_header & random access
st
Stream Offset UInt64 0 Offset of the 1 Stored CP in the Stream – I1
SliceOffset[0] UInt32 48700 sizeof(data) + sizeof(I1 frame)
SliceOffset[1] UInt32 49700 sizeof(data) + sizeof(I1 frame) + sizeof(sound)
PosTable[0] Rational 0 Temporal offset from start of data to start of video
PosTable[1] Rational 0 Temporal offset from start of sound to start of video

Temporal Offset Int8 -1 I1 is stored in Content Package[0]


1 Key-Frame Offset Int8 0 The key frame is IndexEntry[1] and 1-1=0
Index Entry[1] – “I”

Flags EditUnitFlag 00h I frame – no reference


nd
Stream Offset UInt64 50 000 Offset of the 2 Stored CP in the Stream – B0
SliceOffset[0] UInt32 1700 sizeof(data) + sizeof(B0 frame)
SliceOffset[1] UInt32 2700 sizeof(data) + sizeof(B0 frame) + sizeof(sound)
PosTable[0] Rational -1/4 Temporal offset from start of data to start of video
PosTable[1] Rational -1/3 Temporal offset from start of sound to start of video

Temporal Offset Int8 1 B2 is stored in Content Package[3]


2 Key-Frame Offset Int8 -1 The key frame is IndexEntry[1] and 1-2= -1
Index Entry[2] – “B”

Flags EditUnitFlag 30h B frame – bidirectional reference


rd
Stream Offset UInt64 53 000 Offset of the 3 Stored CP in the Stream – P3
SliceOffset[0] UInt32 9700 sizeof(data) + sizeof(P3 frame)
SliceOffset[1] UInt32 11700 sizeof(data) + sizeof(P3 frame) + 2*sizeof(sound)
PosTable[0] Rational -1/4 Temporal offset from start of data to start of video
PosTable[1] Rational -2/3 Temporal offset from start of sound to start of video

Temporal Offset Int8 -1 P3 is stored in Content Package[2]


Index Entry[3] – “P”

3 Key-Frame Offset Int8 -2 The key frame is IndexEntry[1] and 1-3= -2


Flags EditUnitFlag 40h B frame – forward reference
th
Stream Offset UInt64 65 000 Offset of the 4 Stored CP in the Stream – B2
SliceOffset[0] UInt32 1700 sizeof(data) + sizeof(B2 frame)
SliceOffset[1] UInt32 2700 sizeof(data) + sizeof(B2 frame) + sizeof(sound)
PosTable[0] Rational -1/4 Temporal offset from start of data to start of video

Page 63 of 74 pages
SMPTE EG41

N Field Name Type Value Note


PosTable[1] Rational 0 Temporal offset from start of sound to start of video

Temporal Offset Int8 1 B4 is stored in Content Package[5]


4 Key-Frame Offset Int8 -3 The key frame is IndexEntry[1] and 1-4= -3
Index Entry[4] – “B”

Flags EditUnitFlag 30h B frame – bidirectional reference


th
Stream Offset UInt64 68 000 Offset of the 5 Stored CP in the Stream – P5
SliceOffset[0] UInt32 9700 sizeof(data) + sizeof(P5 frame)
SliceOffset[1] UInt32 10700 sizeof(data) + sizeof(P5 frame) + sizeof(sound)
PosTable[0] Rational -1/4 Temporal offset from start of data to start of video
PosTable[1] Rational -1/3 Temporal offset from start of sound to start of video

Temporal Offset Int8 -1 P5 is stored in Content Package[4]


Key-Frame Offset Int8 -4 The key frame is IndexEntry[1] and 1-5= -4
Index Entry[5] – “P”

5 Flags EditUnitFlag 40h B frame – forward reference


th
Stream Offset UInt64 79 000 Offset of the 6 Stored CP in the Stream – B4
SliceOffset[0] UInt32 1700 sizeof(data) + sizeof(B4 frame)
SliceOffset[1] UInt32 3700 sizeof(data) + sizeof(B4 frame) + 2*sizeof(sound)
PosTable[0] Rational -1/4 Temporal offset from start of data to start of video
PosTable[1] Rational -2/3 Temporal offset from start of sound to start of video

8.3.6 Audio Only Index Tables


In an audio only Index Table the choice of Indexing frequency is at the discretion of the application designer.
Many audio formats have a fixed number of bits per second, and it may be desirable to use the Constant Bytes
per Element mechanism as outline in section 8.3.3.

Where there are a variable number of bytes per Element, the IndexEntry mechanism needs to be used as
shown in the examples above. An appropriate Indexing Rate is often to provide one Index Entry per second.

8.3.7 External essence Index Table


When an MXF Index Table is used to Index External content, the problem of reliability of the data comes into
play. When the Index Table was created, the data was accurate. At some later date, when the information is
used, an MXF application must make some basic checks to be sure that the Index Table applies to the file being
processed (e.g. verify file length vs. max (Index Table byte offsets), and perhaps check that a few random
entries point to frame start points.)

When Indexing an external Essence Container, it is recommended that Index Tables be constructed in the same
way they would be constructed if the Essence Container were internal. When Indexing External data that is not
KLV wrapped, the Index Table should be created where the byte offsets refer to the first byte of each Edit Unit of
the Essence – as in Clip Wrapping. Typically this will be the first byte of each frame.

8.4 Wrapping essence in the Generic Container

8.4.1 Long GOP MPEG with uncompressed audio & other data
Long GOP MPEG is unlike many other essence types in that video frames are re-ordered when stored. This
leads to complications in the creation of Index Tables and the synchronization of associated Audio and Data

Page 64 of 74 pages
SMPTE EG41

Elements. The MXF MPEG mapping document goes into detail of how the different elements should be
arranged to achieve synchronization and to improve interoperability.

This MXF Engineering Guideline recommends that Frame wrapping of Long GOP MPEG should be used
wherever possible. It also recommends that the interleaving guidelines should be followed so that the
relationship between the Essence Elements in each Content Package is consistent.

The interleaving rules are designed so that when a group of Content Packages are extracted, the likelihood of
extracting the synchronized Picture, Sound and Data Elements is maximized. The figure below shows the
physical arrangement of the KLV triplets in a file. It can be seen that the different channels of Sound and Data
are KLV wrapped and kept contiguous with the Picture KLV.

` Sound Sound
Item Item
Picture Sound Sound Data Picture Sound Sound Data
Element Element Element Element Element Element Element Element

1 frame 1 frame
K L K L K L K L K L K L K L K L

V V V V V V V V

Figure 20 : Frame Wrapping with other GC elements

8.4.2 Uncompressed Video & Audio


Although Uncompressed Video tends to generate the largest file sizes, the physical structure of the files is
simpler than Long GOP MPEG. There is no temporal re-ordering of the frames in an Uncompressed Video file.
Figure 20 is also applicable to the Frame wrapping of Uncompressed Video with associated Audio. To know the
exact format of the uncompressed video, the Essence Descriptor must be inspected. Specifically, the
PixelLayout property defines the storage format of each of the pixels that collectively comprise the stored Image.
The figure in Appendix E of the Format specification defines many of the Image parameters, but the PixelLayout
parameter deserves further mention here:

The intention of PixelLayout is to provide an algorithmic way of expressing the stored Pixels of bit packing
schemes which are likely to be used. The PixelLayout property is a zero terminated pairing of character codes
and Uint8 bit Depth values. These are all defined in SMPTE 377M, but a brief example illustrates the principle.

To describe 8-bit component 4:2:2 pixels packed into a 32-bit word, the 601 sequence would be:

Cb, Y, Cr, Y, Cb, Y, Cr, Cb, Y, Cr …

If these bytes were stored contiguously in an MXF file, the PixelLayout property to describe this arrangement
would be:

PixelLayout= { ‘U’, 8, ‘Y’, 8, ‘V’, 8, ‘Y’, 8, 0, 0 }

This decodes as: 8 bit U (Cb) followed by 8 bits Y, followed by 8 bit V (Cr) followed by 8 bits Y. The final 2 zero
values terminate the property.

8.4.3 VBI Data;


SMPTE 331M specifies the formats of VBI lines and Ancillary data to be used in SDTI-CP systems. SMPTE
385M defines the mapping of such CP Essence Elements to the MXF Generic Container including the
definitions of the KLV construct. It is recommended that these definitions be used for the carriage of VBI line
data and Ancillary Data (both H-ANC and V-ANC). Note that Anc Packets have a data type identification that

Page 65 of 74 pages
SMPTE EG41

identifies the payload in the Anc packet. VBI data is, by its nature, un-typed and can carry any kind of payload
without any local identification.

8.4.4 Wrapping Private essence


Wrapping private essence types involves the creation of unique values within the MXF metadata that ensure
that the essence is consistently described during an MXF file interchange. Ideally the data values that are used
should be SMPTE Universal Labels and the essence wrapping would eventually be standardized. This will not
be the case for all essence types however. This section of the EG will assume that the Package, Track and
essence identification sections above are well understood.

It is recommended that all data be wrapped using the MXF Generic Container Specification – even private data.
This allows the maximum re-use of existing tools, tests, code and knowledge in the MXF interchange
environment. This example will assume a mapping of private essence to the Generic Container.

To identify the private essence, there are certain unique identifiers that need to be generated:
1. Keys to wrap the private essence
2. An Essence Container UL to describe the essence containment used
3. An Essence Descriptor with appropriate ULs to describe the actual Essence
4. A Data Definition for use in the Sequences and SourceClips
When the Generic Container is used, the first of 13 bytes of the Key are already defined.

An Essence Container UL should ideally be a registered SMPTE UL. This could probably be an organizationally
registered UL if the Essence type is regularly used by an organization. Mechanisms for registering within
SMPTE are being created as this document is being written.

8.5 Adding metadata

8.5.1 Adding Private metadata


Private Metadata additions to MXF can fall into 2 broad categories:
1. Extensions to existing MXF sets
2. New KLV coded sets and groups
The first of these uses a mechanism called the Primer Pack to prevent number clashes in the 2-byte tag values
within the set. Essentially the 2-byte tag value is shorthand for the full 16-byte Universal Label that identifies the
set. The Primer Pack is essentially a list of all the 2-byte tags used in the file and their 16-byte equivalents.

A private metadata item should have a unique 16-byte identifier. It is recommended that any metadata Item that
is likely to be used often is registered with SMPTE and a 16-byte UL is allocated. This is not always possible, so
a 16-byte UUID may be generated instead. It is important that this UUID is understood both by the encoder and
decoder of this private data, otherwise the data cannot be interchanged.

To extend an existing MXF set, the MXF encoder places the 16-byte identifier for the data in the Primer Pack
and generates a 2-byte local tag from the “dynamic” range of numbers given in the Format Specification. It is
important to check that this allocated number is not already used within the Primer Pack of the file.

Once this procedure is complete the private metadata value can be added to the appropriate Local Sets in the
MXF specification. The Primer Pack mechanism ensures that all decoders that don’t recognize the 16-byte
identifier will ignore it. In order to respect the AAF data model, all private metadata additions to the MXF
specification must follow the requirements of the single-inheritance hierarchy rules of the AAF class hierarchy.
Failure to follow this rule may lead to decoder errors. The best way to add private metadata such that it is
compatible with the AAF data model is to study the model, which is available from the AAF Association (see
section C.1).

A new KLV coded set or group is more straightforward to add. A new 16-byte UL must be registered with
SMPTE to wrap the set. The set should use the Primer Pack mechanism outlined already if 2-byte tags are

Page 66 of 74 pages
SMPTE EG41

being used in the set, otherwise the new set should follow SMPTE 336M. This set may be specified so that it
does not need to follow the single inheritance hierarchy rules and may therefore be “dark” to AAF decoders.

At the time of writing this document a Private Metadata Carrier set was being designed. If the design is
successful, this will provide a standardized way of adding sets of private metadata items to the MXF
specification.

8.5.1.1 Intimate Metadata – e.g. Aspect ratio & AFD information


Intimate metadata carries properties that are intimately associated with the Essence. The metadata is usually
dynamic in nature and may have as many bytes as the Essence itself. When the intimate metadata is very large,
e.g. 3-D depth map information, the metadata should be treated as essence with the guidelines for private
essence being followed.

If the intimate metadata is quite compact, it may be appropriate to represent it as private metadata in the
header. An example could be the representation of Camera movements as private Events on a Descriptive
Metadata Track.

It is likely that well-known intimate metadata properties such as Aspect Ratio and AFD information will have
intimate metadata mechanisms defined for tracking them in MXF.

It is also likely that certain kinds of metadata may be carried within the Essence Container itself, such as in
some Elements of a System Item of the Generic Container.

8.6 Working with Timecode

In MXF, Timecode is metadata annotation. The concept of time in MXF corresponds to a number of edit units
along a particular track. To determine the Timecode at a given position on a Track, the value of the Timecode
segment must be calculated or read for that Timecode Track. It is highly recommended that all MXF files are
created with Timecode Tracks in the Material and Top-Level File Packages, although this is not a normative
requirement. MXF decoders should still operate correctly if the Timecode Track is missing.

8.7 Playing a File Backwards

Parsing an MXF file in the forward direction is a relatively simple task thanks to KLV coding. Parsing the file in
the backwards direction is much more difficult without help. At the time of writing this Engineering Guideline, a
proposal exists for a simple Generic Container System Element that does nothing more than provide a
backwards pointer to the previous KLV wrapped Content Package. This allows very simple devices to provide
forwards and backwards play.

8.8 Creating new Descriptive Metadata (DM) Plug-Ins

This subject is covered in great depth in SMPTE EG42 and will only be lightly covered here. The plug-in
mechanism is very simple and has the features described in the next sections.

8.8.1 Temporal Properties of the Metadata


MXF provides three sorts of track. These define whether the content on the track is static –(Static Track (DM)),
may have overlapping or discontinuous content –(Event Track (DM)) or must have continuous content with no
Overlaps –(Timeline Track (DM)). These track types may have metadata content placed on them according to
the properties of the metadata.

8.8.2 Inserting Metadata values or Referencing Metadata values?


MXF provides a DM Segment for relating Metadata values to one of the tracks. The DM Segment contains the
MXF properties that allow the descriptive metadata to fit into the MXF model along with a StrongRef to a DM
Framework, which is the “socket” into which a new Descriptive Metadata plug-in plugs.

Page 67 of 74 pages
SMPTE EG41

MXF also provides a DM SourceClip for referencing Descriptive Metadata. This is useful in the case where an
application wants to say, “The Descriptive Metadata for the Top-Level File Package is the same as the Lower-
Level Source Package”. Rather than duplicating the Metadata values, a reference can be created between the
two packages.

8.8.3 Linking Properties of the Metadata to tracks


Not all Descriptive Metadata applies to all the tracks. For example in an Arts program, metadata about the
production crew may apply to all tracks, metadata about a Dancer may apply only to the Picture Track, and
metadata about an orchestra may apply only to the Sound Track.

Both the DM Segment and DM SourceClip have a TrackIDs property that references all the MXF Tracks to
which this metadata applies. If this property is omitted the metadata applies to all the tracks in the Package.

8.8.4 How much metadata?


32
The theoretical limit to the number of Metadata Tracks in a Package is huge (probably more than 2 ). Practical
limits to do with the size of the Header Metadata and usefulness of such large numbers of Tracks will have more
of an impact on the actual upper limit.

8.8.5 Descriptive Metadata Identification


The final part of the Plug-In mechanism worth mentioning here is identification of the scheme. SMPTE 377M
defines Generic Universal Labels for the identification of MXF Metadata Plug-ins that are in a file and also
defines Generic Keys for the wrapping of the Descriptive Metadata Content.

Page 68 of 74 pages
SMPTE EG41

Annex A (Informative)
The Relationship of MXF to AAF

MXF was designed to have little or no divergence from the underlying AAF model. A joint working group ensured
that any deviation of the two formats was justified. Both formats have benefited from the work carried out on the
two different applications of the common underlying class model.

A.1 Comparison of MXF Files with AAF Files

For MXF files, the Partitioning is designed for the following desirable characteristics:
1. Repetition of Header Metadata
2. Incremental sequential writing
3. The contents of the Index Tables do not change if the Index is relocated within the file
4. The contents of the Metadata KLV triplets do not change for each repetition
5. The Essence is completely unaffected by the insertion or deletion of Partition, Metadata or Index sectors
6. Multiple Independent Essence Containers and Index Tables
7. Simplicity – i.e. the ability for hardware only implementations to process MXF files
8. The encoding of Partitions does not require look ahead, except to record the number of bytes allocated for
Metadata and Index
9. The encoding of Metadata and Index Segments does not require any knowledge of context within the
multiplex
10. The stream can be picked up safely (after failure or join in progress) at ANY Body Partition
11. Minimal overhead
For AAF files using Structured Storage, the overhead of this low-level data structure serves other needs, which
are in conflict with some of the MXF requirements:
1. Efficient edit-in-place of individual Metadata Items and Sets
2. Complex hierarchical relationships between Sets
3. Efficient mixture of small and large data items
To maximize interoperability, the MXF and AAF low-level byte-stream formats share some key concepts:
1. A common class model
2. The widespread use of 16-byte universal labels
3. The notion of Streams
4. Stream Identifiers (SIDs)
5. Essence Descriptors
Using this commonality, the AAF SDK can efficiently implement MXF, with the following desired characteristics:
1. An MXF device can always read and / or write MXF files
2. An AAF application can always open MXF files with no conversion at all
3. If an AAF application modifies an EDL in an MXF file, an MXF device would see this as updated Header
Metadata
4. If an AAF application adds AAF specific metadata to an MXF file, an MXF device would see the additions as
"dark metadata"
5. An AAF application can flatten its internal Object Hierarchy to create an MXF file
6. An AAF application can also, of course, create AAF files
7. An AAF application can convert an AAF file into an MXF file in one of three ways (although not all of these
methods will be built in to the open source AAF SDK):
a. filter out non-MXF data
b. constrain the creation of the AAF file so it is never beyond MXF complexity
c. render an AAF composition.

Page 69 of 74 pages
SMPTE EG41

Note that Metadata to describe Effects other than cuts is defined by the Advanced Authoring Format.
Application-specific variants of MXF Files including Effects Metadata could be defined. However, that is outside
the scope of the MXF standard.

A.2 Relationship between MXF and AAF

Files created according to the MXF standard may be opened by applications that are designed to read AAF, and
can be opened simultaneously by hardware or software designed for MXF.

Specific requirements on the MXF File Format include:


• Must permit precise and repeatable external references to defined positions in the contents of the MXF
File.
• Files created according to the MXF standard may be opened by applications that are designed to read
AAF, and can be opened simultaneously by hardware or software designed for MXF.
A.3 MXF and AAF references – inter-working

A.3.1 Strong references – in-file instance ownership


Strong references are about object instance ownership. Every object in AAF (and MXF) other than the ultimate
root object is owned i.e. has one and only one strong reference to it. As a consequence, a valid MXF or AAF
data model can be viewed as a tree-like hierarchy of strong references between objects.

An AAF data model persisted to Microsoft Structured Storage (MSS) represents a strong reference as MSS
storage containment of the target object. This means strong references are efficiently followed in an AAF file.

An MXF data model represented in KLV uses unique object instance identifiers to identify the target of a strong
reference. The target is persisted elsewhere in the KLV stream with the same identifier. (This is a bit like adding
an “artificial” key column to a relational database table when no combination of the existing columns is unique
for every possible row in the table.) These identifiers are transitory, in that provided this referential integrity is
maintained, they may be re-generated whenever an object is persisted to file.

There is ongoing work to create a common representation in XML of the shared data model. Details of this work
are available, at the time of writing, to the working committees of SMPTE, AAF and Pro-MPEG.

A.3.2 General weak references – in-file shared instances


The concept of general weak references exists in the design of both MXF and AAF. A general weak reference is
a non-ownership reference to an object. Any object may be the target of zero or more weak references. Weak
references allow information to be shared rather than duplicated.

The AAF data model never required general weak references and they are not currently implemented. A future
implementation would probably use a path identifier (like a file system path) that identified the unique route from
the root object to the referenced object via the strong reference tree.

The MXF data model uses general weak references in the DMS 1. General weak references are easily
represented in MXF files by using the existing file-unique object instance identifiers.

A.3.3 Restricted weak references – inter-file shared


The AAF data model does include shared objects. However these are all definitions of various kinds: data
definitions, operation definitions, codec definitions, etc.

There are two common features of these classes.


1. The most important feature of each definition is that it has a universally unique identifier, which is the same
across all files. These identifiers are defined by an application or by an external registry.

Page 70 of 74 pages
SMPTE EG41

2. In the AAF data model, all object instances of each of these classes always reside in a known location in the
strong reference hierarchy.
Each set of objects in a file (data definitions, codec definitions, etc.) is in effect a local copy of a universal
registry, or at least all those entries that are used by the file. Each object includes the name of the definition and
a short description, much the same as is found in a SMPTE registry.

Definitions are also represented using the universally unique identifier in MXF files. The only difference is that
there is no local copy of the registry.

A.3.4 AAF Classes


The AAF data model is well documented and implementers involved in AAF – MXF interoperability are strongly
advised to compare the latest version of the AAF class model and its implementation in the reference SDK
against the latest version of SMPTE 377M.

Page 71 of 74 pages
SMPTE EG41

Annex B
Preferred Enumerated String Values

This annex defines preferred string values for certain properties defined in SMPTE 377M using English terms
and words.

Strings are listed in the form [SetName : PropertyName] in order to ensure that the target property is clearly
identified.

B.1 Identification : Platform (Operating System used)

The following values are preferred text string values that enable operating system type discovery for common
software platforms. Other string values may be used where needed.
“Windows 95”
“Windows 98”
“Windows ME”
“Windows 2000”
“Windows NT”
“Windows XP”
“Mac OS Classic”
“Mac OS X”
“Unix System V”
“Solaris”
“Unix BSD 4.3”
“Unix BSD 4.4”
“Irix”
“Linux”
“FreeBSD”
“AIX”

Page 72 of 74 pages
SMPTE EG41

Annex C
Bibliography

List of References used normatively in other parts of SMPTE 377M.

The following documents are referred to normatively in other parts of SMPTE 377M. The list is provided here for
information so that a complete list of references can be found in a single place. For dated references,
subsequent amendments to, or revisions of, any of these publications do not apply. For undated references, the
latest edition of the normative documents referred to applies. Members of ISO and IEC maintain registers of
currently valid International Standards.
Note: this list may not include Normative References for MXF documents defined after the publication point of this document
(for example new Essence Container documents).

Note: Approved SMPTE standards may be obtained from http://www.smpte.org. Drafts of SMPTE documents may be
obtained from ftp://smpte.vwh.net/pub. Approved ANSI standards may be obtained from http://www.ansi.org.

1. SMPTE 336M-2001, Television – Data Coding Protocol using Key-Length-Value


2. SMPTE RP 210.4-2002, Metadata Dictionary Registry of Metadata Element Descriptions
3. SMPTE 330M-2000, Television – Unique Material Identifier (UMID)
4. SMPTE 305.2M-2000, Television – Serial Data Transport Interface (SDTI)
5. IEC 61834-2 (1998-08), Recording – Helical-scan digital video cassette recording system using 6.35mm
magnetic tape for consumer use (525-60, 625-50 and 1250-50 Systems), Part 2: SD format for 525-60 and
625-50 systems
6. SMPTE 314M-1999, Television – Data Structure for DV-based Audio, Data and Compressed Video, 25 and
50 Mb/s
7. SMPTE 370M2002, Television – Data Structure for DV Based Audio, Data and Compressed Video at 100
Mb/s 1080/60i, 1080/50i, 720/60p
8. SMPTE 326M-2000, Television – SDTI Content Package Format (SDTI-CP)
9. SMPTE 331M-2000, Television – Element and Metadata Definitions for SDTI-CP
10. SMPTE RP204-2000, SDTI-CP MPEG-2 Decoder Templates
11. ISO/IEC 646:1991, Information Technology – ISO 7-Bit Coded Character Set for Information Interchange
12. ISO 13818-1:2000, Information Technology - Generic Coding of Moving Pictures and Associated Audio
Information: Systems (MPEG-2)
13. ISO/IEC 13818-2:2000, Information Technology - Generic Coding of Moving Pictures and Associated Audio
Information: Video (MPEG-2)
14. ISO/IEC 13818-2: Amendment 2: (MPEG-2, 4:2:2P@ML).
15. SMPTE 356M-2001, Type D10 Type Stream Specifications – MPEG-2 4:2:2P@ML for 525/60 and 625/50
16. SMPTE 367M-2002, Type D-11 Picture Compression and Data Stream Format
17. SMPTE 369M-2002, for Television: Type D-11 Data Stream and AES3 Data Mapping over SDTI
18. ANSI/SMPTE298M-1997, Television - Universal Labels for Unique Identification of Digital Data
19. SMPTE 377M – MXF File Format Specification
20. SMPTE 378M – MXF Operational Pattern 1a (Single Item, Single Package)
21. SMPTE 379M – MXF Generic Container
22. SMPTE 380M – MXF Descriptive Metadata Scheme 1
23. SMPTE 381M – MXF Mapping MPEG streams into the MXF Generic Container

Page 73 of 74 pages
SMPTE EG41

24. SMPTE 382M – MXF Mapping AES3 and Broadcast Wave Audio into the MXF Generic Container
25. SMPTE 383M – MXF Mapping DV-DIF Data to the MXF Generic Container
26. SMPTE 384M – MXF Mapping of Uncompressed Pictures into the Generic Container
27. SMPTE 385M – MXF Mapping SDTI-CP Essence and Metadata into the MXF Generic Container
28. SMPTE 386M – MXF Mapping Type D-10 Essence Data to the MXF Generic Container
29. SMPTE 387M – MXF Mapping Type D-11 Essence Data to the MXF Generic Container
30. SMPTE 321M-2002, Television – Data Stream Format for the Exchange of DV-Based Audio Data and
Compressed Video over a Serial Data Transport Interface
31. SMPTE 322M-1999, Television – Format for Transmission of DV Compressed Video Audio and Data over a
Serial Data Transport Interface
32. SMPTE 359M-2001, Television and Motion Pictures – Dynamic Documents
33. SMPTE 352M-2002, Television (Dynamic) - Video Payload Identification for Digital Interfaces
34. ITU-R BR.1352-1:2002: Broadcast Wave Format (BWF), Annex 1, Annex 1 Appendix 1 and 2, and Annex 3
35. EBU tech T3285 Supplement 3 (2001): BWF, Peak Envelope Chunk
36. Draft AES project X66 (tentative designation AES31-2): File format for transferring digital audio data
37. AES3 (1992): Serial transmission format for two-channel linearly represented digital audio data
38. SMPTE 337M-2000, Format for Non-PCM Audio and Data in an AES3 Serial Digital Audio Interface
39. SMPTE 338M-2000, Television – Format for Non-PCM Audio and Data in AES3 – Data Types
40. SMPTE 339M-2000, Format for Non-PCM Audio and Data in an AES3 – Generic Data Types
41. SMPTE EG 42 MXF, Descriptive Metadata Engineering Guideline
42. ISO/IEC 8825-1:1998, ASN.1 Basic Encoding Rules

C.1 Informative Reading

The following list of informative documents is provided to help give background information and an overview of
standards related to SMPTE 377M.
1. EBU / SMPTE Task Force for Harmonized Standards for the Exchange of Program Material as Bit-streams
– 1998, http://www.smpte.org and http://www.ebu.ch
2. Advanced Authoring Format, http://www.AAFassociation.org
3. DVCPRO White Papers, http://www.dvcpropartners.com
4. The SMPTE Data Coding Protocol and Dictionaries, Jim Wilkinson, SMPTE Journal, July 2000 Vol. 109, No
7, Engineering Report
5. UNICODE – http://www.unicode.org for informative reading on the coding of international characters.
6. Pro-MPEG forum web site http://www.pro-mpeg.org
7. UML information for understanding class diagrams and other aspects of data modeling and programming
http://www.oreilly.com

Page 74 of 74 pages

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy