SMPTE EG41 Engineering Guide
SMPTE EG41 Engineering Guide
for Television
Material Exchange Format (MXF)
Engineering Guideline
(Informative)
Page 1 of 74 pages
Table of Contents
1 Scope
3 Introduction
7 MXF in Detail
Annex B
Annex C
Bibliography
1 Scope
This Engineering Guideline gives an introduction to and the background for the Material Exchange Format
(MXF). This document describes the technology involved in the Format, the names of the various elements
within the Format, and the way in which the Format may be used within the real world applications.
Some parts of the descriptions within this document are generic to file formats, while other parts are specific to
the Material Exchange Format. There are descriptions of the object-oriented technology used within the MXF
Copyright © 2003 by THE SOCIETY OF THIS PROPOSAL IS PUBLISHED FOR COMMENT ONLY
MOTION PICTURE AND TELEVISION ENGINEERS
595 W. Hartsdale Ave., White Plains, NY 10607
(914) 761-1100
SMPTE EG41
Format, as well as a discussion of the Metadata that may be used within the file. There are worked examples
within this Engineering Guideline to guide implementers and hence improve the interoperability of applications
using different MXF implementations.
When implementing an MXF application or system, you should ensure that you have the latest version of all of
these documents. The individual Operational Patterns and Essence Container mappings will be independently
updated.
There are several parts to the MXF standard. This is Part 1, the MXF Engineering Guideline, which provides an
introduction and description. This document should be read first because it introduces many of the concepts and
explains what problem MXF is intended to solve. Part 1 also includes other Engineering Guidelines including a
Descriptive Metadata Engineering Guideline, which explains the concepts behind the use of Descriptive
Metadata in MXF files.
Part 2 is a normative definition of the Format of an MXF file. It is the toolbox from which different file interchange
tools are chosen to fulfill the requirements of different applications. The MXF File Format defines the syntax and
semantics of MXF Files.
Part 3 describes the Operational Patterns of the MXF Format. In order to create an application to solve a
particular interchange problem, some constraints and Structural Metadata definitions are required before
SMPTE 377M can be used. An Operational Pattern defines
those restrictions of the Format that allow interoperability Part 2 Part 1
between applications of defined levels of complexity. File Format
(normative)
Engineering
Guideline
Applications that use the MXF Format must adhere to one of (informative)
Page 2 of 74 pages
SMPTE EG41
Part 5a comprises a number of documents for mapping many of the essence and metadata formats used in the
content creation industry into the defined MXF Essence Container.
The MXF document suite makes reference to other documents that contain information required for the
implementation of an MXF system. One such document is the SMPTE Dictionary, RP210, which contains
definitions of parameters, their data types and their Keys when used in a KLV representation. Another is the
SMPTE Labels Registry, RP224, which contains a list of normalized labels that can be used in MXF sets. Annex
B in this MXF Engineering Guideline contains a list of recommended string constants that an application may
use to improve interoperability.
In the unlikely event of conflict or ambiguity between the different parts of the document, the Format document
has precedence over the Operational Patterns, which have precedence over the Essence Containers, which
have precedence over the Descriptive Metadata documents.
Note: During the early development of MXF, a catalogue of enumerated values was created to list SMPTE Labels, Strings
Keys and Tags used within the MXF document suite. The normative definition of the SMPTE Labels is maintained in the
SMPTE Labels Registry and the normative definitions of the SMPTE Keys and Tags are to be found in the MXF document
suite.
The information in this document is ordered for the novice reader. Concepts are introduced gradually and
repeated in more detail later in the document. This is done to make the document easier to read, however, it
does make the document somewhat less good as a reference. For that reason, a Table of Contents is provided
at the start of the document to allow “Random Access” to the information within the text.
Section 8 provides MXF worked examples. In order to improve the readability of the text, an arrow is used to
indicate that an example of a certain subject exists for this section. For example (!8.4) indicates an example for
this subject exists in section 8.4.
3 Introduction
The introduction is constructed as a list of questions. The concepts in MXF can be introduced in a way that gives
an overall view of the specification and the concepts embodied within it. Once the introduction is understood, the
requirements of the file format are discussed. Some specific words and phrases used in the specification are
then defined and finally the Material Exchange Format is introduced in a much more detailed fashion. Although
this entire document is informative, it is hoped that it will give sufficient information for technical and non-
technical readers to understand MXF.
The MXF Specification is intended to allow the interchange of captured, ingested, finished or “almost finished”
material. It is not intended to be an authoring format. Despite this, careful thought has gone into SMPTE 377M
to ensure that authoring tools such as those based on AAF Association technology are able to directly open and
use an MXF file efficiently without having to convert the file.
The MXF Specification has also been carefully crafted to ensure that it can be efficiently stored on a variety of
media, as well as transported over communications links. The MXF Format has not forgotten about tape. There
are structures and mechanisms within the file that make MXF appropriate for data tape storage and archiving of
content.
Page 3 of 74 pages
SMPTE EG41
Finally, the MXF Specification is intended to be expandable. A considerable effort has been put into making
SMPTE 377M compression format independent, resolution independent and can be constrained to suit a large
number of application environments. The document structure has been created to allow new applications to take
advantage of the MXF Format in a backwards compatible way.
MXF files may include an optional, but recommended, Index Table that provides rapid conversion from sample-
based indexes (e.g. Timecode) into byte offsets within an Essence Container. The Index Table may be
segmented, and may be stored before, after or multiplexed with the essence data segments.
MXF files may also include optional File Body Partitions that can be inserted at intervals within the File Body
and are used to provide a variety of features:
1. Robustness of metadata information by repetition of the Header Metadata.
2. Multiplexing of different Essence Containers
3. Distributing an Index Tables in small chunks (e.g. for devices with limited memory)
4. Providing “per-stream” Index Tables that are position independent within the file
5. Easier location of Essence Container data when using high speed tape devices
6. Optimizing the distribution of the data in a file for storage or transmission
Repetition of the Header Metadata within a Body Partition is dependent upon the application on a per-
application basis. Such applications are to be found in the transfer of an MXF file as a stream over a uni-
directional link and in data tape shuttling. One purpose of such Header Metadata repetition is to support the
recovery of critical metadata in applications where the file may be interrupted or where the decoder starts to
receive data in mid-transfer.
Multiplexing and storage optimization is a complex subject and is highly dependent on the storage or
transmission device used. Hard discs, DVDs, satellite links and tape devices all have different requirements.
The MXF structure allows a great deal of flexibility in the positioning of the partitioning information and the use of
fillers to allow optimization for different devices. Typically, if storage or transmission optimization is important in
an application then the MXF encoder will know which parameters are important to it. MXF provides the tools, but
encoders can make the optimizations that add value to their implementations.
Page 4 of 74 pages
SMPTE EG41
MXF files use Key-Length-Value (KLV) coding throughout for flexibility and extensibility. KLV coding is defined in
SMPTE 336M; a full review was published in the July 2000 edition of the SMPTE Journal (Vol. 109, No 7,
Engineering Report). This mechanism is used to encapsulate the individual elements of an MXF file in such a
way that devices can ignore information when the Key of a KLV triplet is unknown. The Length parameter tells
the KLV decoder how much data should be ignored.
In Specialized Operational Patterns, the Header (see section 3.5.2 below) is allowed to start with a non-KLV run-
in. This is to allow synchronization bytes or “camouflage” bytes to be added at the front of the file in certain
(limited) applications. In all other circumstances, there will be no run-in and the entire file must consist of only of
KLV elements with NO gaps.
There is the physical view of the MXF byte stream on disk or on the wire.
There is the description of the file contents obtained by decoding the data model. This will be referred to as the
logical view of the file.
Hdr. set
Hdr. set
fill
K L Pack K L K L K L K L K L Element K L Element K L
“played” Picture
Material Package
“played” Sound
Logical view of the same MXF File
Stored Picture Track
Top-Level File Package
Stored Sound Track
The physical properties of the file are largely independent of the number of tracks in the file, the amount of
metadata carried and the relationship between the different Picture and Sound Elements. The way in which an
MXF File is written is MXF encoder and application dependant. Many application specific optimizations may be
incorporated into an application to improve the way an MXF file is physically written to a device.
Page 5 of 74 pages
SMPTE EG41
The Material Package can generally be thought of as the “output timeline” of the file. The Top-Level File
Package can be thought of as the stored data or “input timeline” of the file. The metadata within the file
describes the stored data within the file as well as the portion that is to be output when the file is played or used
in some way. The example in Figure 2 shows that all the tracks of the stored data (in the Top-Level File
Package) are used in the Material Package, but an MXF player will play only a small segment from the middle of
the file.
The Structural Metadata is the way in which MXF describes different Essence types and their relationship along
a timeline. The MXF Structural Metadata defines the way in which the output timeline of the file relates to the
one or more stored Top-Level File Packages. The Structural Metadata defines the synchronization of different
tracks along a timeline. It also defines the Picture Size, Picture Rate, Aspect Ratio, Audio Sampling and other
essence description parameters.
The Structural Metadata is defined in SMPTE 377M. Most of the parameters are defined in the MXF File Format
document, but additional descriptors and labels may be defined in essence mapping documents. The MXF
Structural Metadata is derived from the AAF data model. This means the relationships between all the different
sets and their properties are precisely defined. More information on the structural concepts appear later in this
document.
3.3.2.2 Descriptive Metadata
MXF Descriptive Metadata comprises information in addition to the structure of the MXF File. This may be
intended for human use (as in the majority of the SMPTE 380M: MXF DMS-1 specification) or it may be
information for machine use, such as a track of information containing depth information for 3D processing.
SMPTE 377M provides a very simple plug-in mechanism that allows different Metadata sets to be defined and
used in an MXF environment. SMPTE 377M provides mechanisms for uniquely identifying the Metadata
Scheme(s) present in the file, mechanisms for preventing numerical conflict with existing metadata and a
mechanism for determining the version of the Descriptive Metadata Specification used.
The MXF Metadata plug-in scheme was developed as a result of strong User Requirements. No single Metadata
definition and structure will be appropriate for everyone. A mechanism that properly allows the integration of new
metadata schemes without redeveloping applications and equipment needed to be created. The MXF plug-in
mechanism is very lightweight and allows versatility for the implementers and extensibility for the users.
When Descriptive Metadata is added using the plug-in mechanism, many of the features of MXF are achieved
automatically. The ability to create multiple tracks and synchronize them against each other, the ability to add
metadata events synchronized with the video / audio or other tracks and the ability to use metadata in the output
timeline that was available in the source file are all part of the standard MXF feature set. This document will
outline only the basics of a descriptive metadata scheme. A fuller treatment of the subject can be found in the
Descriptive Metadata Engineering Guideline, SMPTE EG42. It is worth noting that Descriptive Metadata can be
for both Human and Machine use. Much of the machine-Descriptive Metadata relates to special properties of the
Essence and has an intimate spatio-temporal relationship to the Essence. For this reason it is often called
Intimate metadata.
Page 6 of 74 pages
SMPTE EG41
Dark metadata is the term given to metadata that is unknown by an application. This metadata may be privately
defined and generated, it may be new properties added to SMPTE 377M or it may be metadata that is part of
the MXF standard, but not relevant to the application processing the MXF file. It is important that there are some
rules on the use of Dark metadata to prevent numerical or namespace clashes when private metadata is added
to a file that already contains Dark Metadata. Rules are given in the SMPTE 377M along with the specification of
a data structure called the Primer Pack. Guidance on the use of this structure is given in section 8.5.1 of this
document. (!8.5.1)
Although only occupying a small fraction of the size of a typical MXF file, the Header Metadata is often, for those
inexperienced in data models, the most difficult part to understand. The following sections introduce the topic of
object-oriented coding in a general and easy to understand manner. For a more rigorous explanation, there are
many reference books that cover the principles and methods of implementation in far more detail than given
here.
A track can be thought of as a straight line on a piece of paper. It starts at the start; it ends at the end and it lasts
for its duration. The start, end and duration are known as properties.
A feature of object orientation is “inheritance”. This means that we can have different sorts of track. They all
share some common properties that they inherit from the parent or superclass, but have extra properties or
functionality added to make them useful. For example, consider an event track. The straight line on the piece of
paper can now be marked with events. An event can start at any point along the track. It may be instantaneous
(i.e. no duration) or it may last for a defined time. Events may also overlap.
Another sort of track is a timeline track. Similar to an event track, it starts at a certain time, ends at a certain
time and has a duration. This track has a restricted functionality in that it only allows Source Clips to be placed
on the track. All the Source Clips must be contiguous, which means there are no overlaps and no gaps. Both of
these track types inherit properties and functionality from a common track class.
This principle is the basis for the object-oriented definition of the MXF file Format. The definition of the classes
from which MXF objects are created comes primarily from the AAF Association class model. Generic classes
with general functionality are defined. Classes with specific functionality then inherit the general class features.
SMPTE 377M restricts some of the flexibility of these AAF classes to define the MXF sets. MXF applications
populate these sets with values to create MXF objects in a file.
During the development of MXF, a Zero Divergence Doctrine (ZDD) was created in order to ensure that any
change in the model of behavior between AAF and MXF was severely restricted and eliminated wherever
possible.
Page 7 of 74 pages
SMPTE EG41
the file such as a program name or scene description. There are a large number of metadata elements defined
in the SMPTE dictionary and in SMPTE 377M. To understand the restrictions on the use of metadata elements,
it is necessary to understand the terminology in section 5.2 below.
The Header Metadata area is able to contain Descriptive Metadata that allows a production to be described. For
example Production, Clip and Scene information is described in the MXF Descriptive Metadata Scheme 1
document (part 4 of this specification).
There are certain metadata parameters that might live in multiple places. The most obvious of these is
Timecode. This may exist in the Header Metadata, but might also live embedded within the Essence Container
data, e.g. in the GOP header of an MPEG Essence Container. This repetition is often important and the handling
of any conflict between the different instances of the data is application dependent.
3.4.5 How does AAF fit into the big picture?
At a first glimpse, the relationship is obvious; MXF is for simple transfers and AAF is for authoring. Both formats
exist to aid interchange of program material as files, which in turn will increase interoperability between file-
based products.
The meaning of the opening sentence is a little more difficult than it first seems. "Authoring" can be seen as a
catch-all phrase for a series of complex processes that take pieces of video and audio essence and put them
together using a variety of composition effects (cuts, dissolves, DVE, rendering, magic). When the authoring
process is complete, the "finished" program material can then be exchanged as a file. This is a simple transfer of
the compiled / rendered / etc., program.
The complexity of AAF has been simplified so that we can state that: "MXF files apply a subset of the AAF class
model". This means that the complexity of the authoring file format has been simplified. But beware; "simplified"
does not mean "completely obvious and like SDI". It is important to remember here that we are mixing two very
complex and different worlds – A/V and IT.
When video engineers look at a series of words in an SDI stream, there is an implicit understanding of the
complex spatio-temporal sampling and visual processing that went into creating those data words. A video
engineer would take great care before modifying any value to ensure proper clipping, filtering and possible
gamma correction took place.
The IT environment that has created AAF is just as complex (and just as "obvious" to those practiced in the art).
AAF arranges its file format in terms of objects. These objects are chosen and defined to reflect the actual
processes and content items that go into the authoring process. AAF is so powerful that the physical
representation of these objects could be redefined providing no information is created or lost. An IT engineer
would take great care before modifying the object model to remove things that looked like they were not needed
- the implications for future enhancement and interoperability might be very serious and not "obvious".
“MXF files apply a subset of the AAF class model," means that MXF contains just enough of the AAF object
model to allow it to represent a file interchange. This means it can represent an output timeline that has video,
audio and data. It has a logical metadata structure, a defined physical representation (KLV) and is interoperable
with other MXF systems and upward compatible with AAF. It has been designed so that an AAF system can
open an MXF file without modification to either the MXF file or the AAF System.
In practical situations this means that there is a lot of overlap between MXF and AAF functionality. MXF is
targeted at interchange throughout the broadcast and content creation chain, whereas AAF is optimized for
round-tripping in Post-Production. As a rough rule of thumb, content interchange, cut-edit functionality or simpler
is an MXF application; AAF is more appropriate for everything else. More details are given in section A.1.
Page 8 of 74 pages
SMPTE EG41
To synchronize two tracks, they must be somehow related. This is done by putting them within a package (a
container for tracks) that synchronizes the start and duration of multiple tracks. Note, however, that the tracks
may have different time measurement units within the package. Time is normalized within a track by its “Edit
Rate” property. This in turn gives us an Edit Unit, of 1 / Edit Rate.
An MXF file is highly structured. There are different structural elements that divide the file in different ways to
make the complexity manageable. This section describes some of these structural elements along with the
reasons for the division.
3.5.1 What are the File Header, the File Body and the File Footer?
The basic File Header, File Body and File Footer are explained in section 3.2.1 above. The reason for the split is
quite simple. The File Header is designed to be small enough that it can easily be isolated and sent to a
microprocessor for parsing. The bulk of the file will usually be the File Body – this is the picture, sound and data
essence. The File Footer provides a means to put the Header Metadata at the end of the file. Why? In certain
applications such as recording a stream to an MXF file, there will be Header Metadata values that won’t be
known until the recording is finished. The File Footer provides a mechanism for doing this. It also provides clear
indication that the file has terminated.
Page 9 of 74 pages
SMPTE EG41
Open – this marks the information in a partition with a “caution” notice. Any metadata information in the partition
was correct at the time of writing, but the application writing the file had not completed the writing process. This
means that some of the information may be absent, or may turn out to be plain wrong when the file is ultimately
closed. For example, a capture device may have identified a picture and a sound track when it initially started
writing the file. During the writing process, a second Sound track commenced – this track was not described in
the Open Header Metadata.
Closed – this marks any metadata information in the partition as finalized. The application or device creating the
file correctly terminated the file and all the properties of the Metadata sets were filled in to the best of the
application’s ability. In the example above, a repetition of the Header Metadata would be placed in the footer
that correctly described the existence and duration of the second Sound Track. All closed partitions in a file must
have the same Metadata property values. This is mandatory. This allows an MXF decoder to determine that the
metadata is correct as soon as it finds a closed partition. SMPTE 377M states that the File Footer, if present, will
always be a closed partition.
An MXF File can only be called a “Closed” File if there is at least one closed partition with Metadata. It is
important to note that robustness is enhanced when all the partitions in a file are closed (!8.2.6). If a file is
accidentally truncated during a transfer and the only closed partition in the file was the footer, then the file is no
longer a “Closed File”. If robustness is desired (and it usually is), application and device developers are urged to
close all the partitions of their files. All valid MXF files must be closed however certain situations, such as an
interrupted file transfer, may leave an “Open” file that is still partly usable. The ability of a device to handle
“Open” MXF files is an application issue.
In an ideal world, the two states of “Open” and “Closed” would be sufficient to describe all the files in existence.
The desire for cheap hardware and software, however, means that some capture devices and applications will
not be able to parse the wide variety of essence types they might expect to place in an MXF file. To cope with
this condition, the states “complete” and “incomplete” have been defined to mark the status of the Essence
Descriptor (s) in the MXF File.
Complete – each of the properties in the Header Metadata with a status of “required” or “best effort” exist in the
file and are correct. The status of each of the properties is given in SMPTE 377M.
Incomplete – One or more properties within in the Header Metadata with a status of “best effort” has a
distinguished value. The distinguished value is used to mark the property as “unknown at the time of writing”. An
MXF file may still be a closed file because all the other properties of the file are known. Some of the Header
Metadata may be incomplete due to the absence of an essence parser at the time of file creation. This allows an
application to report many of the metadata properties of the file, but certain Essence Decoders may need to
parse portions of the file before it is playable.
Maximum robustness is achieved when applications and devices create Closed and Complete MXF Files.
(!8.2.6)
Each partition starts with a Partition Pack that defines what sort of partition it is, followed by the following
optional items:
• Header Metadata
• Index Table Segment(s)
• Essence Container data
From these and other restrictions, we limit an MXF partition to contain only a single “thing”, i.e. a single Essence
type with its associated Index Table Segments. If different Essence Containers need to be multiplexed together
within the file, then a new partition must be started when the Essence Container changes.
Page 10 of 74 pages
SMPTE EG41
KLV coding is fully defined in SMPTE 336M and includes not just the encapsulation of individual data items, but
also the encapsulation of collections of individually coded KLV data items into logical data sets and packs (a.k.a.
objects as above).
A decoder that does not recognize a Key is able to skip over the unknown Value and inspect the next Key. This
allows extra functionality to be added to the MXF specification at a later date, knowing that older decoders will
be able skip over the Values.
Words within the Key are ISO Object Identifiers (OID) using primitive BER (Basic Encoding Rules: ISO/IEC
8825-1 ASN.1). This means that the most significant bit of each 8 bit value is a flag to say that the word is
greater than a 7 bit value. For example if the 12 bit value b (b11 .. b0) is to be mapped into a KLV key then here is
a possible mapping into bytes 14 and 15 of a key:
Word. 14. 14. 14. 14. 14. 14. 14. 14. 15. 15. 15. 15. 15. 15. 15. 15.
bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
value 1 0 0 b11 b10 b9 b8 b7 0 b6 b5 b4 b3 b2 b1 b0
Figure 3 shows a binary 1 in bit 7 of byte 14 to indicate that this is a multi-byte value. There is a binary 0 in bit 7
of byte 15 to show that this is the last byte of a multi-byte value. A byte value of 0 is often used to terminate a
label and a marker bit in bit 6 of byte 15 may be used to prevent accidental termination from occurring. Note that
the actual mapping of bits into a label Key must be normatively defined in an appropriate document.
Note: At the time of writing this Engineering Guideline, this multi-byte OID technique is not in use in any of the specifications.
MXF parser writers should be aware that this technique may be used in the future, and that, although the number of bytes in
a SMPTE key is 16. The number of words may be less than 16, or alternatively, there may be 16 bytes of which the final
words are assumed to be 0.
The Length field is BER (Basic Encoding Rules: ISO/IEC 8825-1 ASN.1) coded. This allows the length field to
have a variable number of bytes. So how do you know the length of the length field?
The length field is always coded MSB (most significant byte) first. If bit 7 of the first byte is a ‘0’ then the 7 least
significant bits contains the length value (0 .. 127). If bit 7 of the first byte is a ‘1’ then the 7 least significant bits
tell you the number of bytes in the length field. e.g. the value ‘83h’ means that the next 3 bytes contain the
length field. The Format document gives recommendations for the upper limit of the length field. Decoders must
be able to handle both long form and short form BER coding.
The examples below show a length value of 64 coded in the 3 different ways:
40h short form coded
83.00.00.40 long form coding using 4 bytes overall
87.00.00.00.00.00.00.40 long form coding using 8 bytes overall
The KAG can be thought of as gridlines spaced on uniform byte boundaries in each partition. To achieve good
performance, all the important KLV items within the file (Header Metadata, Content Packages of the Essence
etc.) should line up on the Grid. This means that the first byte of the Key should be on a grid boundary.
Page 11 of 74 pages
SMPTE EG41
The reference point for a KAG is the first byte of the key of a Partition Pack, and the KAG value is valid within
the partition. SMPTE 377M states “The first gridline in any partition is the first byte of the Key of the Partition
Pack that defines that partition.”. In order to have a global KAG value, each and every Partition Pack must have
the same KAG value. Additionally, to maintain this global KAG value, the first byte of each and every Partition
Pack must lie on a KAG boundary. Finally, if there is a run-in, its length in bytes must be an integer multiple of
the KAG value
This feature is a performance enhancer because it reduces the need to search every byte for the start of a new
file component. It is possible that some process may make a change to a file that breaks the KAG rules, but is
unable to modify the KAG value in the partition header. An MXF decoder that is receiving a file may desire a
certain KAG value because its internal storage is arranged on rigid boundaries. It should continue to check each
of the KLV triplets received for confirmation that they still lie on the KAG. The majority of files that use the KAG
feature will respect the value in the partition header, but some may not. The MXF application receiving the file
that does not respect the KAG should not fail under this condition, but performance may be severely restricted.
For example, the receiving application may choose to process the incoming stream to force it to be aligned to
the KAG by inserting Fill KLVs. This may slow it down and cause it to recalculate Index Tables.
The RIP, however, gives absolute positions of the Partitions, so all the Index Tables may be rapidly built without
parsing the entire file. The RIP contains a mechanism for quickly determining its existence.
Each Operational Pattern has an assigned SMPTE Label value that allows MXF decoders to quickly recognize
the complexity of an MXF file.
Page 12 of 74 pages
SMPTE EG41
An Essence Container specification defines a unique SMPTE Label for identification as well as a method for
encapsulating the essence in a KLV structure. Different Essence Containers may place restrictions on the
interleaving of the essence data to be compatible with existing applications. The SMPTE Label allows decoders
to make a fast go/no-go check of the essence type at the very beginning of the file.
An MXF file may have more than one Essence Container. The precise number of Essence Containers and their
relationships is constrained by the Operational Pattern with which the file complies.
A “Generic Container” is defined within MXF. This is intended to carry all the mainstream Essence types in
existence at the time of creating SMPTE 377M. It is very simple in operation, yet flexible enough to carry
uncompressed material as well as re-ordered MPEG compressed material. Associated with the Generic
Container are a number of mapping documents that define how the actual Essence byte stream should be
placed in the Essence Container.
MXF files may be directly created from standardized formats such as MPEG2 system and elementary streams,
AES3 data streams and DV DIF packet streams. These formats may be mapped from one of several real-time
interfaces such as SMPTE 259M (SDI), SMPTE 305M (SDTI), SMPTE 292M (HD-SDI), or transport interfaces
with real-time protocols such as IEEE-1394, ATM, IEEE802 (ethernet), ANSI Fibre Channel and so on.
When a streaming file is captured, a File Header is created and the essence is KLV wrapped on the fly. The
data rate increases due to the KLV wrapping and addition of headers. Real Time streaming devices must ensure
that any buffering requirements of a streaming interface are catered for with this change of data rate.
Conversion to and from the source format is always possible, but sometimes there will be loss of information.
Not all streaming and storage formats are able to store the rich metadata constructs available in an MXF file.
Often there will be a lossy data mapping where information in one format cannot be represented in the other.
Eliminating this undesired loss is a function of the systems engineering that interconnects MXF and non-MXF
systems. In many formats such as the MPEG2 Transport Stream, research is being done to find ways in which
MXF headers can be “tunneled” through the Transport Stream so that its use in an MXF system provides
transparency as well as interoperability.
As previously stated, MXF files apply a subset of the AAF class model. The Material Exchange Format provides
a data structure together with a set of constraints and plug-ins to create files that can be directly written and read
by AAF systems. MXF is also able to inter-operate with other existing file formats by utilizing techniques such as
external essence and using the run-in to “camouflage” the appearance of the file (see the end of 3.2.1 above).
Different metadata models can be plugged into the MXF file Format to provide extensions and the KLV structure
itself can be converted to formats such as XML for exporting MXF data to other systems.
Page 13 of 74 pages
SMPTE EG41
When an application needs to convert the contents of an MXF File to and from other formats, such as AVI, the
entire file will normally need to be unwrapped and re-coded in the new format. Often the Essence itself (for
example, MPEG Long GOP video) will not need re-MPEG encoding, however it is very likely that Metadata will
be lost when an MXF file is converted to another format.
MXF files must be amenable to implementation in high throughput hardware or software devices. This translates
into the need for well-defined design parameters for buffer size, latency, and the need for algorithmic simplicity.
MXF is also intended to cover a very large application space, and not all the requirements apply to all the
applications. The examples below are all application specific:
Example constraints:
• Buffer size must be minimized for low latency streamability.
• KLV wrapping and file partitioning latency must be small and bounded.
• Algorithms should not require distant look-ahead to calculate parameter values.
• Algorithms should not require deep stacks or high performance coprocessors, and should preferably be
straight-line (no looping).
• Operational Patterns should create controlled and bounded application environments that are
constrained enough to ensure interoperability, yet broad enough to allow many implementations.
The design can also be kept simple through the proper use of layering. Network, transport and session layer
functions and data units must be kept separate at all costs, so as not to burden any layer with processing that
belongs to another layer.
MXF files will often be processed in streaming environments. This will include streaming to and from videotape
and data tape, and transmission over unidirectional links or links with a narrow-band return-channel.
In these environments it is impractical to rewind the stream to update parameter values so files must be written
sequentially. This implies that the minimum buffer size and latency are determined by (among other things) the
maximum KLV packet size. Implementations of MXF streaming should take into account all the constraints of
the Operational Pattern in use, as well as extra restrictions imposed by the particular streaming data link before
recommending buffer sizes or latency requirements.
Sequential writing is necessary when source or link or destination operate only in streaming mode. Random
access writing is permissible before or after data transfer, for example, to optimize downstream access
performance.
Operational Patterns have a special qualifier bits that indicates that the file has been created for streaming.
Streaming environments also impose requirements for recovery and re-synchronization in several different
circumstances:
1. When a packet or other data block is lost.
2. When a decoder joins a transfer that is already in progress.
3. When a transfer or partial transfer is restarted.
4. When it is necessary to access or retransmit a file that is still being received (“Pre-Play”).
5. When overall metadata is modified during the time of transfer.
The first of these (packet loss) usually requires a return-channel or forward error correction for effective
protection. The other circumstances are addressed by judicious design of the Format to allow for re-
synchronization points and for repetition of important metadata.
Page 14 of 74 pages
SMPTE EG41
Different applications may require Metadata to be processed separately from the Essence. Other applications
(such as archive) may require Metadata to be stored with the Essence. This requires efficient insertion and
extraction of the Metadata from the Essence Container(s) of the file.
Some applications may prefer Index Tables to be accessed separately from the Essence; others may require
the two to be accessed together. In some cases, the Index Tables are most naturally stored at the start of the
file; however, while recording, the most natural location is at the end of the file. This diversity requires efficient
insertion, extraction and relocation of Index Tables within the file.
MXF uses different referencing mechanisms for different purposes. One example that causes confusion is the
difference between references to the Top-Level “File Package” and “The Essence”. The MXF Content Storage
Set uses Instance UIDs to reference all the Packages in an MXF File. One of these will match the Instance UID
of a File Package within the File. This is a strong reference to the package. The package itself is a description of
the Essence, but is not the Essence itself.
The Content Storage Set also uses Instance UIDs to keep a list of Essence Container Sets. These are used to
group the various IDs that enable an MXF Decoder to work out which Partitions and Index Tables relate to which
Top-Level File Package. Specific details are given in section 7.5.
This seems straightforward until we look at how a Material Package SourceClip references the Essence. This
structure does not use the Instance UID values, it uses the 32 byte UMID of the essence as a reference. This is
because the Material Package is referencing the Essence of which the Top-Level File Package is a description.
The MXF Format, at its lowest level, should support functionality that is commonly available in today’s video file-
servers. The MXF / AAF Joint File Interchange Working Group, in co-operation with the EBU P/PITV group and
the SMPTE have summarized the user requirements for MXF as follows:
A = Baseline ("Must"),
Forward, etc.)
B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)
Page 15 of 74 pages
SMPTE EG41
Authoring Interchange
Finished Interchange
Content Repository
Publication (Emission,
Transmission, Store &
A = Baseline ("Must"),
Forward, etc.)
B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)
Must be easy to understand & apply and standardized A++ Y Y Y Not easy
Must wrap A Y Y Y Y
Video Essence[s]
Audio Essence[s]
Data Essence[s]
Metadata
Page 16 of 74 pages
SMPTE EG41
Authoring Interchange
Finished Interchange
Content Repository
Publication (Emission,
Transmission, Store &
A = Baseline ("Must"),
Forward, etc.)
B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)
A/B LIST
Can provide random access: A/B Y Y Y Y
Play/access while transfer
Play/access while record (open ended)
Fast frame and field level access (E.g. by means of indexing to field/frame/audio A/B Y Y Y Y
frame level)
B-LIST
Page 17 of 74 pages
SMPTE EG41
Authoring Interchange
Finished Interchange
Content Repository
Publication (Emission,
Transmission, Store &
A = Baseline ("Must"),
Forward, etc.)
B = Enhanced ("Can"),
C = Extended ("May”),
U = Undecided or not determined,
X = not allowed (should not be allowed)
The technical requirements derive from the user requirements. The individual requirements are introduced
gradually throughout the document. A typical example of a technical requirement is that of Body Partitions. The
user requirements state that the file format must support partial transfers and must provide graceful recovery
after errors. The technical requirement from this is that the file must periodically contain repeated data to allow
partial transfers or recovery. The implementation chosen in MXF is Body Partitions.
Page 18 of 74 pages
SMPTE EG41
differences with AAF have not been highlighted in the specification because the MXF standard is self-consistent.
The main glossary of terms and data types can be found at the start of SMPTE 377M
5.1.1 Normative
The definition of Normative is given in the SMPTE Administrative Practices. For information, normative parts of a
document cover those elements of the format that are fully specified. The implication of a normative clause is “if
you do this particular function or encoding process, do it like this”. Normative does not imply that all decoders
must understand all normative elements, just as it does not imply that all encoders will encode all normative
elements. Normative clauses use the verb “shall”.
The value of a Normative clause is that it defines the parameters and syntax for a given function or process.
5.1.2 Informative
Informative parts of a document provide additional explanation or describe optional functions or processes. The
implication of an Informative clause is “you may do this particular function like this”. The value of an Informative
clause is that it provides an illuminating example of how to achieve a function or process to improve
interoperability. Informative clauses use the verb “may”.
Since neither Normative nor Informative convey any information as to which functions an implementation is
expected to perform, additional terminology is needed.
5.1.3 Recommendations
There are many recommendations in SMPTE 377M. There are many places where it was desirable to make a
normative provision, but the provision could not be enforced. For example “the duration property should be
correct in all Header Metadata repetitions”. Devices such as cameras cannot create an MXF File with the correct
duration because the header is written before the file is closed and completed. This provision is therefore a
recommendation rather than a normative requirement. Recommendations use the verb “should”.
One of the key points of developing any new techniques is to consider the layering of any file format and its
contents. This helps us to understand the meaning of an ‘encoder’ and a ‘decoder’ at any given layer.
Unfortunately attempts to introduce new words such as “encapsulate” have not been well accepted and words
such as “encoder” are forced to have slightly different meanings depending on context.
The layers for encoders and decoders can be broken down as follows:
Page 19 of 74 pages
SMPTE EG41
Note that not all processes will be supported by all equipment. Many devices will operate over all layers to
provide a network or stream interface at the lowest layer, and an interface to the user at the highest layer.
However, devices that simply ‘store and forward’ need only respond to the lowest 2 layers and devices that
‘unwrap’ the data contents to provide the raw data streams only respond to the lowest 3 layers.
The following terms have been proposed to describe functionality that must be supported in order to create an
interoperable MXF environment. SMPTE 377M defines the normative terms, extra text and words are given here
for information.
Summary:
5.3.1 Required
A Required Item is essential to both encoder and decoder. An example of a required metadata item is a Preface
Set. The encoder must encode this and the decoder must understand it and act on it.
Page 20 of 74 pages
SMPTE EG41
5.3.4 Optional
An Optional Item may be sent by the encoder if it is known. If sent, the decoder may choose to ignore the Item.
If not sent, then the decoder may either do nothing, or set the item to a default value or take a predefined
default action if specified by the relevant document.
Note that a ‘default’ value for an Item is the value that a decoder should use in the absence of the Item. A
‘distinguished’ value is used by an encoder to signal that the Item value is unknown by the encoder. The
difference between ‘default’ and ‘distinguished’ is important.
5.3.6 Dark
A Dark Item is one that is unknown by a decoder or an encoder. This Item may be proprietary and unknowable
by a decoder. It may be an extension to SMPTE 377M that has not been incorporated into a device or
application. It may even be metadata in the original specification that is not relevant to a device or application.
All that is certain is that the meaning of the metadata is unknown. In certain application environments, encoders
may be required to carry Dark metadata and decoder may be required to make Dark metadata available.
SMPTE 377M uses KLV local sets with 2 byte tags and 2 byte lengths and includes a special pack structure
called the “Primer Pack” to ensure that dark metadata properties can be created and handled without the
possibility of a numerical clash of local tag values.
Why is this important? Imagine that 2 companies X and Y each independently want to extend the MXF
Identification Set to include some vital property of their application in every MXF file that they save. Without the
Primer Pack, there is a finite chance that they will both choose the same local tag value for their private
metadata property and when they open each others’ files, they will mis-interpret or even corrupt each others’
metadata properties. The Primer Pack mechanism exists to prevent this happening.
5.3.7 Incompatible
An encoder must not send Incompatible Items. This data classification is provided to allow certain data items to
be forbidden if they could prevent successful or deterministic decoding. There are no “Incompatible” Items
defined within SMPTE 377M, but the concept of Incompatible Items is described here because it gives a
common word for designers and implementers to describe a class of metadata that should be avoided.
An MXF file may have external essence in addition to essence within the MXF File Body. The MXF File Body
may have several Essence Containers that are multiplexed together, each of which can sometimes be called a
stream. Each of these Essence Containers may have a single piece of essence or may have different essence
elements interleaved together. Each of these elements may be categorized into Picture Items, Audio Items, Data
Items and System Items. This results in an MXF File Body that may contain a multiplex of Essence Containers
that in turn contain interleaved Essence items that in turn contain the individual interleaved Essence Elements.
Page 21 of 74 pages
SMPTE EG41
Picture Track
Material Package stereo Sound Track
orchestral Sound Track
orchestral Sound Track
Stored Picture Track
Top-Level File Package (DV + AES Audio)
Stored Sound Track
That was horribly complicated, so an example will help to clarify this extreme example of MXF capabilities.
Imagine a file that was captured in DV and has had its stereo Sound extracted and separately edited. Later in
the process, an orchestral score was added using in a separate Essence Container and described by a separate
File Package. The Operational Pattern 1b mechanism is used to synchronize the two file packages. The
resulting file looks logically like Figure 4. This seems a simple logical view, but the physical representation is
much more complex as show in Figure 5.
Partition
Partition
Partition
Partition
R. Sound
R. Sound
R. Sound
L. Sound
L. Sound
L. Sound
Pack
Pack
Pack
Pack
Physical view K L K L DV Compound K L K L K L K L K L K L K L DV Compound K L Sound K L K L K L
of the MXF File Element Element Element
DV + stereo DV + stereo
In Generic Container Orchestral In Generic Container Orchestral
Score Score
The final sentence of the opening paragraph may now be a little clearer. The file is a multiplex of different
partitions; in this case two generic containers are multiplexed using the partition mechanism. One of these
Generic Containers is an interleave of Essence Items – a DV Compound Item and a Sound Item. In each of the
multiplexed Generic Containers, the Sound Items contain an interleave of Sound Elements – left and right
channel. It is also worth noting that the DV itself is an intrinsic interleave of DV-DIF blocks. In most MXF
processes, this level of interleaving is left to the Essence codec and is usually opaque to MXF.
There are normative descriptions of these words in the Format document and in the Generic Container
document. It is strongly recommended that new Essence Container documents follow this wording.
In many discussions of low level wrapping of the data in a Generic Container mapping, the term Essence
Element is used to mean, “A KLV wrapped essence entity that has a defined key”. For any given key, any
Essence Elements with that key relate to the same Essence stream.
In other macroscopic discussion of interleaving and multiplexing, the term Essence Element is used to describe
all the KLV wrapped essence entities with a given key, such as a single video data stream. When a stream has
a single video data stream and an associated audio data stream, the Essence Container would be regarded as
having two Essence Elements, regardless of how many KLVs were used to hold the essence.
This contextual use of the term Essence Element may cause confusion, but the authors felt it would be worse to
try to invent a new term for every one of the subtle changes in context.
Page 22 of 74 pages
SMPTE EG41
In this section, the concept of object implementation will be introduced, as will the idea of collecting objects and
information into packages. This section is intended to improve understanding of the concepts. It is not
intended to be a rigorous definition of the terms. The actual definitions of packages, Strong references and the
like can be found in SMPTE 377M and other MXF documents.
Classes are defined by a set of data items, where each item is commonly called a property. When an instance is
made from a class, it becomes an object and values are assigned to all the properties.
Modeling of a system can involve the creation of many similar classes. In this document, we have described that
there are different sorts of track. Each of these tracks has properties that are very similar. In modeling terms,
there is an abstract superclass that defines the common functionality of all the different tracks. Abstract means
that the class is never used directly. Superclass means that the purpose of this class is to create subclasses that
add to all the properties of the Superclass. A generic Track is an abstract superclass. A Timeline Track and an
Event Track are two subclasses that share all the common properties of the Track class and have added their
own specific properties and behaviors.
Each package describes some aspect of the essence or data in a file and the different types of package will be
explained here with the help of some real world analogies.. The Top-Level File Package contains a collection of
metadata items and sets that describe, for example, the embedded video essence. It is described as though the
essence tracks were in a file – hence the name Top-Level File Package.
It is important to note that the Tracks are synchronized in time. This synchronization is determined by a specified
Offset value from the beginning of each track.
For the AAF-conversant reader, it is useful to note that Composition Packages are not currently used in MXF.
Page 23 of 74 pages
SMPTE EG41
As can be seen in Figure 6, the Material Package can be viewed as a set of parallel tracks – one for each kind
of essence in the output stream. There is metadata associated with the file that has a global scope, such as the
Name, the UMID etc. Each Track contains further metadata to describe the way in which the final output should
be created from the Top-Level File Packages.
Figure 7 shows the relationship between the pictures. It shows how the Material Package track can define a
sequence of SourceClips. Each SourceClip in the Material Package indicates which portion of a Top-Level File
Package should be “played” next. This is the way in which MXF supports Edit Decision Lists (EDLs).
The Material Package in Figure 7 shows how the SourceClip references the entire Top-Level File Package. Only
the File Packages in the top level of an MXF File describe the actual Essence in the File Body.
The MXF Operational Patterns constrain the relationships between the Material Package SourceClips and the
File Package(s) in an MXF File. In an OP1a file, there is no EDL support and the Material Package references
the entire Top-Level File Package. In an OP3c file, complex timeline relationships are allowed that may require
the MXF decoder to have random access capabilities.
Page 24 of 74 pages
SMPTE EG41
SourceClip
Top-level Track(defines
Track (definesstart)
start)
File Packages Sequence(defines
(definesduration)
duration)
(describes the actual Sequence
Essence in the file)
segment segment segment
SourceClip SourceClip SourceClip
Body Container
Essence Descriptor
e.g. MPEG The Top-Level File Package SourceClip(s)
may reference Lower-Level Source
Packages. These do not describe actual
stored Essence. They describe where the
stored Essence came from e.g. previously
conformed MXF files.
The tracks in the Top-Level File Package may be made up from a number of SourceClips that are used as
historical annotation to indicate where the content came from.
In SMPTE 377M, a Source Package that is not at the top level is used to describe the derivation of the essence;
i.e. where it came from. This is very useful metadata, especially when creating archives or providing historical
information about the source of the File Package. Lower-level Source Packages often contain physical
descriptors such as Tape Descriptors that refer to a physical location or storage medium for the content.
Page 25 of 74 pages
SMPTE EG41
5.5.4 References
Within the MXF Format we need a way of referring to objects. For example the statement, “A Material Package
has one Timecode Track object”, is quite clear. This is known as a strong (one to one) reference between the
Material Package and the Timecode Track object.
Each metadata set is coded and identified as a KLV Local Set and has a Value that contains all the locally
coded metadata items in sequence as a Tag (typically 2 bytes), Length (typically also 2 bytes) and the individual
metadata item value. Note that most MXF sets contain a Unique Identifier (Instance UID) for that set. This
Instance UID is the core data construct used to connect objects together into a logical framework
A ‘Strong Reference’ to any KLV coded data set is a one-to-one relationship between the reference and the
target data set. In MXF files, a Strong Reference is made by matching the value of a “StrongRef” in the
referencing set to the Instance UID property of the referenced set.
A ‘Weak Reference’ also uses an Instance UID to connect data sets, but any weakly referenced data set or item
may be referenced by more than one other data set. Thus a weakly referenced set is a stand-alone data set with
an Instance UID to which one or more other data sets can refer through the value of a ‘WeakRef” property. In
order to properly construct an MXF File, each and every set must have one Strong Reference to it. There is no
limit to the number of weak reference which may be made to a set. Figure 8 illustrates the concept of Strong and
Weak References in a stream of KLV coded metadata sets. Figure 8 illustrates the concept of Strong and Weak
References in a stream of KLV coded metadata sets.
K L ID K L ID K L ID K L ID
Strong Ref
Strong Ref
Strong Ref
Figure 8 : Strong and Weak Referenced Data Sets in a KLV Coded Data Stream
Note that the metadata sets are contiguous in order to preserve the KLV coding protocol (i.e. there are no gaps
between the metadata sets.
Figure 9 provides a more detailed example of data set organization and includes three techniques for the
connection of data sets:
Page 26 of 74 pages
SMPTE EG41
Contribution Status Data Definition Data Definition Embedding Strongly Referenced sets is easier to
Job Function Duration Start Position understand, butÉ..
Job Function Code Duration
Role or Identity Name SourcePackageID If the length of the embedded set c hanges (e.g. by
SourceTrackID changing a text string), then the length value of
both the embedded set and all outer sets must
change accordingly
Strong Referencing. Strong Referencing implies ownership of the referenced object as well as a one to one
relationship with it. When an MXF application creates a tree of interlinked Objects starting at the MXF Preface
Set, all objects will have at least 1 strong reference so that they are “owned” and can fit into the overall tree. An
object may additionally be weakly referenced by a large number of other objects.
StrongReferencing by embedding. This can be used where a strongly referenced data set is easily embedded
into the referencing data set. It is used in applications requiring high-speed operation, but has the drawback that
when the referencing set is changed, the length fields of both the contained and surrounding sets must change
accordingly. This mechanism is not used in the MXF Header Metadata, but may be used in an Essence
Container specification. Ownership of the referenced object is implicit because it is contained within the
referencing object.
Page 27 of 74 pages
SMPTE EG41
Strong Referencing by UID. This requires an Instance UID property in the referenced data set and a property
of type StrongRef in the referring Data Set. The overhead is thus higher than the embedding method above, but
if a property value in the referenced set changes length, it impacts only that data set and its parent data sets, but
does not affect the length of the referencing data set.
Weak Referencing by UID. Weak referencing uses an Instance UID in the referenced data set; one or more
other data sets can refer to the referenced data set by using the same Weak Reference UID value. The
advantage of a weak reference is that the values of metadata items in a data set can be shared by several
referring data sets. It is worth noting that everything within an MXF file that is weak referenced must also be
strongly referenced.
A Reference Collection is a list of UIDs connecting the referencing entity to zero or more other entities (either
weak or strong).
A Reference Array is a set of ordered references (or vector). This implies that the order is significant for
whatever reason.
Note: because all properties in MXF are unique within the AAF class model, all StrongRef and WeakRef properties are
strongly typed. This means that the property can only have a StrongRef to a specific sort of Set (or one of its subclasses).
Thus, SMPTE 377M uses the nomenclature “StrongRef (MyClass)” to mean a strong reference only to an object of type
“MyClass” or an object derived from MyClass.
For every reference in an MXF File, an MXF Decoder should be able to find a set that is the target of that
reference. The previous sentence uses the word “should” and not “shall” – why? From the definitions above, you
would expect that a decoder would always be able to find the target of a strong reference. In the absence of any
extensions to SMPTE 377M, this would be a true statement; however, it is expected that additions will be made
and new metadata sets and schemes will be developed as the format matures. Decoders that do not understand
these extensions are likely to discover that there are Dark metadata sets (i.e. the set Key of the KLV is not
understood by the decoder) within the file and that there are references without identifiable targets.
“Clever” decoders may be able to help in this situation, by looking inside Dark sets, especially those whose local
tags appear to be stored in the Primer Pack. Instance UIDs could then be discovered with some high confidence
and the presence of Dark extensions to SMPTE 377M discovered. In some circumstances, this behavior may be
quite helpful, but in general, making intelligent guesses about Dark sets is outside the scope of SMPTE 377M. It
may also lead to unpredictable results!
Page 28 of 74 pages
SMPTE EG41
the "UL designator" column of the set definitions. Within the file, Local Set coding is being used in which a short
2-byte tag is used to substitute for a 16 byte UL.
Some of the properties in SMPTE 377M are themselves Universal Labels. Some of the values that these
Properties may take are ULs, and indeed some of the KLV keys in MXF may be ULs. These Labels are
generally used to identify lists of unique things. For example "Picture Coding Type" has a UL value. All the
Picture Coding Types that are known to MXF are simply listed in the SMPTE Labels Registry. Applications that
need to determine the meaning of a label use the SMPTE Labels Registry as the normative reference.
In certain cases an encoder may place an un-registered UL or a non-UL unique identifier in a property of type
“UL”. Example cases are where new MXF features are being developed, but have not yet been standardized,
and where private extensions are added for use in a carefully controlled MXF system. Some of these cases are
outside the scope of the MXF format, but decoders should make every effort to handle these files gracefully. For
example, decoders should not rely on the values being validly coded as a registered SMPTE Label.
If the scope of a UUID is local to the file then the byte order is unimportant, providing each occurrence of that
UUID uses the same byte order. In these cases the default order specified in ISO/IEC 11578 should be used.
Where UUIDs have global scope the byte order is significant. In these cases the byte order will be given when
the UUID value is published. For example, where a manufacturer publishes the UUID that a particular device
inserts into the “Product UID” field in the identification set, the byte order of that UUID will be specified as well as
the values of the bytes.
Page 29 of 74 pages
SMPTE EG41
The Operational Pattern UL identifies the timeline complexity of the file. The Essence Container ULs identify the
Essence Data that is contained in the file so that an application can determine if a suitable codec is available.
These numbers are registered values so that an application that cannot handle a particular Essence Container
Type is able to report the Essence Type in the file. This type of reporting behavior helps users to identify content
and is encouraged. Anonymous failure such as “a codec cannot be found” without reporting what sort of codec
was sought is not encouraged. Older decoders that are unaware of new UL values should at least attempt to
report the ULs that were not known. It is important to note that it may not be possible for this information to be
provided by all MXF encoders and that decoders should not fail if this information is empty or missing.
If an MXF File contains multiple Essence Containers, but these are all of the same type, then the Essence
Container Label appears in the Partition Pack only once. This non-duplication is to ensure that a higher
Operational Pattern file with 100 small MPEG clips need not insert 100 ULs in the list.
Some Essence Container specifications (such as the MPEG Long GOP Generic Container mapping) define
Essence Container ULs for the different MPEG streams that may be encountered when transwrapping from
MPEG Program Stream to MXF. It is possible that the list of Essence Containers will contain a UL for the Sound
data and a UL for the Picture data even when the resulting file contains only a single Essence Container with
interleaved Sound and Pictures. During the design of MXF it was felt that there needed to be a descriptor for
each of the different types of audio so that the MXF decoder requirements could be determined rapidly.
As an example, if you have an OP1a MXF file with MPEG 2 video, 2 channels of AES audio and Timecode, the
file would have:
• 2 ULs in the EssenceContainer list (1 video, 1 audio)
• OP1a declared in the Partition Pack and the Preface Set
• 4 Tracks: 1 Picture, 2 Sound, 1 Timecode
• Material Package Tracks have the same duration as the Top-Level File Package Tracks
MXF decoders must be able to cope with the case where there are many Essence Containers of the same type
with a single UL in the EssenceContainer list. MXF decoders must also be able to cope with the case where
there are several ULs in the EssenceContainer list, each of which relates to a different Element of a single MXF
Generic Container.
MXF doesn't have such a dictionary, so cannot work the same way. Instead the DataDef property in a
component actually is the 16-byte "magic number" that the application can use to figure out how to handle the
component. This is a very subtle change in behavior between AAF and MXF, and implementers of compatible
systems should take appropriate actions to ensure interoperability. This type of data is actually a weak reference
into an external data set – i.e. a registry or dictionary, such as the SMPTE Labels registry.
KLV coding allows related metadata items to be grouped together in sets; e.g. Titling metadata might be
grouped into a set for convenience. SMPTE 336M defines several mechanisms for grouping the data together.
Page 30 of 74 pages
SMPTE EG41
Basically, a set comprises an outer KLV that defines the set and a number of inner KLVs that define the data
items.
The inner keys could be full length (Universal set) or could be shortened for processing and storage
convenience. KLV sets using these shortened item keys are known as local sets and the technique is fully
defined in SMPTE 336M. This standard defines how all sets have Universal Labels with a consistent definition in
the first 8 bytes of the type of data set or data pack being used. The options provided are:
• Universal Set,
• Local Set,
• Variable Length Pack and
• Fixed Length Pack.
• Global Sets (not used in MXF)
All MXF decoders must support local sets. Encoders should use the sets as required by the Operational Pattern.
If there is no guidance in the Operational Pattern then the encoder should opt for a local set implementation
using the local Tags as defined in SMPTE 377M. Note that 2-byte lengths in local sets are always coded as Big-
endian (i.e. MSB first).
Every property in MXF has a full 16-byte Universal Label so that the property may be interchanged with other
systems as either a single KLV item or as a Universal set.
MXF-specified Metadata is currently implemented using 2 byte tags and lengths. This restriction does not apply
to private metadata schemes, although it is recommended because the Primer Pack mechanism for preventing
numerical clashes of local tags, is only defined for two-byte tags.
Many of the text fields in MXF are encoded using UNICODE. The coding technique is UTF-16 with big-endian
byte order to allow good international support. More information on UNICODE can be found in reference 5
(section C.1 below). There are occasions when ISO-646 text is used. This is often to comply with some other
standard such as the ISO-639 language descriptor codes.
Text is stored in a KLV or Tag-Length-Value structure. Zero word termination of strings is optional. A string may
be the same length as the “L” of the KLV or the “Length” of the Tag-Length-Value with no zero word at the end.
Alternatively, a shorter string may be placed in the space allocated by the KLV or the Tag-Length-Value
structure by inserting a zero word after the last character of the string. MXF Decoders must support both
mechanisms.
A Generation Number is a weak reference to the Identification Set that was created when the MXF file was
saved or modified by an application. Each time the MXF File is modified, a new Identification set is created. If a
metadata set is changed the Generation ID property is updated so its value will be the same as the Generation
ID of the Identification Set that was created when the property was modified.
It is important to note that Generation Number properties are optional and that decoders should not rely on their
existence; however in certain applications they can be very useful. If your application stores extended data that
is dependent on data stored in AAF’s built-in classes and properties, your application may need to check if
another application has modified the data in the built-in classes and properties.
The Generation property allows you to track whether another application has modified data in an MXF file that
may invalidate data that your application has stored in extensions. The Generation property is a weak reference
to the Identification object created when an MXF file is created or modified. If your application creates extended
data that is dependent on data stored in MXF built-in classes or properties, you can use the Generation property
to check if another application has modified the MXF file since the time that your application set the extended
data. To do this, your application stores the value of the Generation UID of the Identification object created when
your application sets the value of the extended data.
Page 31 of 74 pages
SMPTE EG41
The placement of metadata in a file may be in one or more of several possible locations most suited to the
application of the particular metadata item. Figure 10 below indicates several broad locations where metadata
may be stored.
Page 32 of 74 pages
SMPTE EG41
File Header
File
Wrapper
Header Metadata
Metadata link e.g.
Material,
Compositional
1 Content Labelling and Identification
Package Catalogue
Business (access)
Content Packages
Publication
Sequence of
Business
Essence
Container
To end of sequence
Embedded metadata (intra-track in the figure above) is that which is tightly embedded in the essence stream
such as is present in MPEG2 Video ES and AES3 data. Metadata that is embedded is typically:
Format: for decoder operation
Temporal: with particular reference to time-code
Spatial: such as pan-scan vectors and aspect ratio.
Extra data: such as captioning, subtitles etc.
6.2 Linked Metadata Location
Linked Metadata (inter-track in the figure above) is that which is closely linked to the content, whether video,
audio or data content, through a container on a picture-by-picture basis. Thus this metadata is interleaved with
the content and maintains a tight timing relationship with it. As an example, the System Item of SDTI-CP
provides this metadata location. Metadata that can be stored as linked to the frame is that relating to:
Format: often as a duplicate of the embedded metadata,
Temporal: mostly as temporally variable metadata extra to any embedded metadata,
Material: including the extended UMID and
Label: simple labeling of the content.
Page 33 of 74 pages
SMPTE EG41
Attached metadata (header metadata) is that which may appear in a File Header such as is present in MXF. It
can encompass a wide variety of metadata, in particular:
Content: providing metadata about the content in the File Body,
Compositional: providing simple or complex editing information for the clip or program,
Label: providing a full set of content labeling and identification,
Catalogue: for location of events, markers and for archival metadata and
Business: for access and security information.
6.4 Server Metadata Location
Server metadata can be used to replicate almost all of the metadata described so far. However, it is particularly
useful for the following metadata sets:
Label: providing a full set of content labeling and identification metadata,
Compositional: providing simple or complex editing information and historical derivation metadata
Catalogue: for use in off-line searches,
Publication: defining when and where content is to be delivered and
Business: for audience information, program statistics etc.
7 MXF in Detail
SMPTE 377M defines a file format for the transfer of program material between equipment in the professional
broadcast environment. Stream and file transfers are both used for the interchange of program material, with file
transfers increasing in proportion to stream transfers. Neither will dominate; rather they will co-exist and the
MXF file is designed to work within both transfer classes.
Files are often created directly from incoming streams and are often converted into streams for emission and
distribution. The MXF standard specifies an MXF File Format that is readily convertible to and from common
streaming formats with low overhead and without loss of data.
In order to appreciate the differences between stream and file transfers, we can summarize the major
characteristics of each as follows:
File transfers...
Stream transfers...
Page 34 of 74 pages
SMPTE EG41
3. Streams are normally synchronized to a clock or are asynchronous, with a specified minimum/maximum
transfer rate.
4. Are often point-to-multipoint or broadcast
5. Streaming formats are usually structured to allow access to essence data at sequential byte positions.
Streaming decoders are always sequential.
Figure 11 illustrates the interoperation between streaming transfers based on stream interfaces such as SDTI
and file transfers between disc servers and tape archives. One of the issues of the file transfer is that many
servers support playout before file closure (i.e. read from a partially written file while it is still in the process of
writing), so blurring the distinctions outlined above.
Metadata
Metadata Metadata
Removable
Interconnect Media
(SMPTE 305M SDTI, Fibre Channel, ATM, Ethernet, IEEE1394, etc)
The Content Model used in SMPTE 377M is based on that defined by the EBU/SMPTE Task Force Report,
which defines content as in the figure below:
Page 35 of 74 pages
SMPTE EG41
Wrapper
Content Package
The content model also uses the terminology of SMPTE 336M (KLV Coding) and SMPTE 298M (Universal
Labels), which define:
• Universal Labels (ULs) used as Keys
• Key-Length-Value formatting of individual metadata and essence items
• Coding of groups of data items into Sets and Packs
The content model also uses the terminology of SMPTE 326M – SDTI-CP, which defines frame-interleaved
content based on the following components:
A System Item that includes system level Descriptive Metadata and content metadata
Picture Item that includes one or more picture Elements
Sound Item that contains one or more audio Elements
Data Item that contains one or more data essence Elements
Compound Item that contains one or more intrinsically interleaved Elements
(such as an interleave of DV-DIF packets)
A link item that links metadata in the System Item to any one of the Elements.
Each of these essence Elements can be separately indexed in an Index Table and is also mapped to a track in
the Header Metadata. The track is the metadata object that controls the way in which this essence Element is
used.
Page 36 of 74 pages
SMPTE EG41
Different applications produce and consume material of various degrees of complexity and structure, from a
single clip to a multitude of clips and effects. Applications requiring only the simplest files should not be
burdened with support of the most complex. To maximize interoperability MXF uses Operational Patterns to
define constrained levels of file complexity.
During the development of MXF there were many different attempts at defining the functionality of an
Operational Pattern. The goal was to create a number of axes that allowed software and hardware developers to
create products with different levels of functionality (and hence cost). These different axes had to correspond to
real world ways of working, and had to provide mechanisms for a file to be “flattened” from a complex
Operational Pattern to a simple Operational Pattern in a way that made sense to someone working with the
Multimedia content.
The description below is of the different axes followed by a non-exhaustive discussion of some applications
The Operation Pattern axes are arranged so that any Operational Pattern to the left, or above another
Operational Pattern is a subset of its functionality. For example Operational Pattern 3b is a superset of the
functionality of OP1a, OP2a, OP1b, OP2b and OP3a, and includes not just the ability for each Material Package
to access sequential Top-Level File Packages, but also the ability to access a sequence of ganged Top-Level
File Packages.
Page 37 of 74 pages
SMPTE EG41
Item
Complexity
MP MP MP
Ganged
Packages b FPs AND
FPs FPs AND
Only 1 MP SourceClip = FP
Each MP SourceClip = entire FP Any MP track from any FP track
d ti
MP1 MP1 MP1
OR OR OR
Alternate
Packages MP2 MP2 MP2
c
Only 1 MP SourceClip = FP Each MP SourceClip = entire FP Any MP track from any FP track
d ti
Here we constrain the temporal relationship between different Top-Level File Packages within the MXF file. In
principle, there are 3 levels of constraint:
1 Single item the file contains Top-Level File Packages that have the same duration as the output
timeline (like a tape)
2 Playlist items the file contains Top-Level File Packages that are butted one against the other. All
tracks are switched synchronously with optional audio fade out / fade in to prevent
clicking. This can be likened to a playlist of tapes.
3 Edit items the file contains several Top-Level File Packages with one or more cut edits. Tracks
may have independent editing to allow audio and video to be switched at different
points in the timeline. This will often involve random access within the file and
therefore MXF files in this column are unlikely to be streamable.
a Single package the file contains only one active Essence Container at any point on the
output timeline
Page 38 of 74 pages
SMPTE EG41
b Ganged packages the file contains two or more Essence Containers that share a common
synchronized timeline. The MXF structure is used to wrap several Essence
Containers and multiplex them using the KLV and partitioning rules. This
could be used to gang together an MPEG Picture track in one package with
an uncompressed Sound track in another (possibly external) package.
c Alternate packages the file contains several versions of the “program”. There are several
Material Packages that might be used to control a browse track or different
language versions of a program, or different edits of some finished material
destined for different censorship zones. For example, an OP1c file may
have 2 continuous timelines – one for the French soundtrack and another
for the English Soundtrack. Another example is an OP3c file, where not
only is there a choice of English or French, but the cut lists for the output
tracks are different. Since this OP is a superset of the Ganged Package
complexity, it also has the capabilities of Ganged Packages as well as
Alternate Packages.
This is a simple flag that modifies an Operational Pattern. It has 2 states to indicate that all the Essence
Containers are internal to the file (Internal) or that one or more of the Essence Containers are in an external file
(eXternal). For example an OP1bx file may have internal Picture data, but external Sound data. (! 8.2.7.1)
7.3.2.2 Stream / Non-Stream (Wire / Storage) Flag
This is a simple flag that indicates either that the partitions in the file have been arranged so that it can be
streamed on a wire (Wire file), or that some other non-streaming arrangement has been used (Stored File). The
streamed file representation implies that Essence Containers are multiplexed together and that within an
Essence Container, any interleave that exists will allow decoding of the essence during streaming file transfer so
that the pictures may be viewed and the sound heard during transfer with minimal latency. The size of buffers
required to do this is an application issue and outside the scope of SMPTE 377M. Any file that does not have
this property is just a File. (! 8.2.7.2)
7.3.2.3 Uni-Track / Multi-Track Flag
This is a simple flag that indicates that all the Essence Containers in an MXF file have only a single essence
track. This flag is to aid workflows where all the different essence components of a production are required to be
individual files. This flag helps MXF decoders know that the file meets this criterion. The flag is either Uni-track
or Multi-track. (! 8.2.7.3)
7.3.3 Operational Pattern Applications
MXF applications should, where appropriate, be able to perform the following functions with respect to
Operational Patterns:
Encoders and Decoders should be able to report the most complex Operational Pattern they can handle.
A Decoder should be able to indicate what level of Operational Pattern has been processed when its capabilities
have been exceeded.
Encoders should ALWAYS correctly signal the Operation Pattern of the files they create. This means that an
MXF encoder capable of creating all possible Operational Patterns should not signal the files it creates with the
highest Operational Pattern code. It should signal the Operational Pattern to which the file complies.
Listed below are several MXF applications and possible ways in which they may be implemented using SMPTE
377M. They are intended to give a guide on how MXF might be used. They are not normative definitions of the
Operational Patterns concerned.
An Application might give a file a name depending on its functionality, for example:
Page 39 of 74 pages
SMPTE EG41
Test_OP1aiwm.mxf - mxf file with internal essence, wire-file, multitrack, Operational Pattern 1a
Test_OP3cxm.mxf - mxf file with external essence, not streamable, multitrack, Operational Pattern 3c
7.3.3.2 Archive
There are many different Archive applications. Often, it is desirable to have metadata or a browse track “online”
and the full-quality content in some deep store. This requires referencing of external essence as well as multiple
representations of the same content. There may only be one single item in each of the representations (each
having the same duration) and the content could be arranged for streaming or storage depending on the precise
application.
7.3.3.3 D-Cinema
For distribution of D-Cinema content, it may be desirable to have different representations of the same film
distributed on common media. Alternatively, MXF may be used to represent each “reel”, which is then
assembled via a composition list that itself may be an MXF File. Different representations may be as simple as
different language tracks, or may be as complicated as different audio-video cuts to meet local or regional
content restrictions. The Operational Pattern axes allow this split of functionality. In addition a D-Cinema
application will almost certainly require protection of the content. This can be achieved with a metadata plug-in
to describe the encryption / protection scheme and an Essence Container type to contain the encrypted /
protected essence(s). The other mechanisms within MXF remain unchanged.
The most common use of Handles is to adjust edit points, and / or to provide context for production processes
such as color correction. This use of Handles implies that the content within the Handle is not actually used in
the Material Package, but exists within the Top-Level File Package. The resulting file would be in the Edit Items
column of the Operational Pattern axes matrix. The precise row or column of the Operational Pattern would
depend on the construction of the essence within the file. For a mono-essence file it would be constructed as an
OP2a or OP3a file. Multi-track files would be either OP2b or OP3b depending on whether or not the cut points of
the Top-Level File Packages are synchronized on the timeline.
MXF files created in accordance with the MXF standard use Essence Containers to encapsulate one or more
essence elements. These essence elements may be intrinsically interleaved (for example a SMPTE 314M DV-
based stream) or may consist of a single non-interleaved essence element.
In order to support stream capability, the essence elements are interleaved over a limited duration (typically 1
frame). Each essence element can be encapsulated using KLV coding over the interleave duration to allow an
MXF decoder to access the essence on these KLV boundaries.
The MXF Format does not provide the individual Essence Container specifications, but defines the constraints
that a compliant Essence Container specification must meet in order for it to be encapsulated in an MXF File
Body. Constraints on the Essence Container are given in the Operational Pattern document and the Essence
Container document. They may be summarized as follows:
1. Must encapsulate each essence component with KLV coding using publicly registered Keys,
Page 40 of 74 pages
SMPTE EG41
2. Must provide for interleaving of the essence components over a limited duration (typically 1 frame), when
inputs or outputs are use for streaming.
3. Must be standardized as an open specification, preferably through the due-process of SMPTE,
4. Must meet the SMPTE criteria for a standard (see the SMPTE Administrative Practices).
It is expected that compliant Essence Containers will become available for the systems below.
Wrapping all essence variants in a common Essence Container format is advantageous for system design and
interoperability. The MXF document suite specifies mappings of a variety of essence formats into the MXF
Generic Container as described below.
The MXF Generic Container may also use Essence Elements and Metadata Items defined in SMPTE 331M
through application of the specifications in SMPTE 385M (Mapping SDTI-CP Essence and Metadata into the
MXF GC).
7.4.5 Audio
An MXF mapping document for the encapsulation of AES3 audio and Broadcast Wave compatible audio in the
Generic Container has been defined. This audio element may be used on its own, or may be used to add audio
to another Generic Container Element such as Uncompressed Pictures or MPEG Long GOP pictures.
Page 41 of 74 pages
SMPTE EG41
SMPTE 377M is a physical representation of the underlying AAF class model and uses the same methods for
data identification and data relationships. The method of relating the Structural Header Metadata to the Essence
Container is now described.
In each Partition of an MXF file, there may be any or all of the following core components:
1. A Partition Pack that defines:
- a Body SID for the container data stream in this partition,
- an Index SID for the Index Table in this partition.
2. A Primer Pack
3. Header Metadata repetition that includes:
- a Content Storage Set at the top level,
- one or more Top-Level File Packages each associated with an Essence Container Data Set.
- other metadata to describe the entire file (after all it’s a Header Metadata repetition)
4. An Essence Container (that occupies the whole File Body or a part).
5. Unique IDs that link data sets together (16-byte Instance UIDs).
6. Unique Material IDs (32-byte UMIDs) that identify the Essence Container.
These components are related as indicated in the following figure:
The Partition Pack includes a BodySID and an IndexSID that identify the Essence Container segment and Index
Table Segments in the partition. These are linked to the BodySID and IndexSID in the relevant Top-Level File
Package via the corresponding EssenceContainerData Set. They are also linked to the BodySID and IndexSID
Page 42 of 74 pages
SMPTE EG41
in the relevant Index Table. When the BodySID value in a partition is zero, it indicates that there is no Essence
Container segment in this partition. Likewise a zero IndexSID value indicates there are no Index Table
Segments in this partition
The Header Metadata has a Content Storage set at the top level that contains a set of Package UIDs and a set
of EssenceContainerData UIDs. The Content Storage set strongly references every Package, including each
Top-Level File Package as well as each Material Package. The Content Storage Set will also reference Lower-
Level Source Packages where these are present in the Header Metadata.
Within the Header Metadata, there is also an Essence Container Data set for every Top-level File Package. This
set provides the linking between BodySID, IndexSID and their related Package UMID value. This mechanism
relates the Partitions and Index Tables within the File Body to the Top-Level File Packages in the Header
Metadata.
Note: The Package UIDs are Basic UMIDs.
The MXF Format is intended to be platform neutral. This means it should not rely on resources available on any
specific platform. There are, however, two distinct ways in which multi-byte numbers are stored in computer
systems, Big-Endian and Little-Endian. Big-Endian systems place higher value bytes in the lower value
addresses, whereas Little-Endian systems do the reverse. This means that any data structure placed directly in
a processor’s memory by hardware can be read “in place” on one system, but must undergo a byte swap
process in the other.
In addition MXF is intended to have a common object model with AAF. AAF implements variable Endian-ism
based on a byte-order property within various classes.
Note that this feature applies only to the Metadata elements in the file. The Essence Containers have fixed byte
orders depending on the specification of the Essence Container.
There are several possible solutions in MXF, of which 3 are listed here:
1. all Header Metadata items will be Big-Endian
2. all Header Metadata items will be Little-Endian
3. the MXF encoder will signal the Endian-ness it used; i.e. Source-Endian.
There were many design discussions during the development of MXF and the final conclusion was that MXF
should be Big-Endian and should not indicate this in the file. The main reason behind this decision was to
simplify the handling of dark metadata where the Endian-ism cannot be known (because the metadata is dark).
MXF Decoder design is, of course an application-specific issue. This section is intended to advise implementers
of issues that will improve interoperability with other systems. It is desirable that all MXF decoders should be
able to parse (i.e. understand the syntactic structure) at least the following:
1. The KLV packet structure of all parts of the file (including the KLV packets of any kind of Essence
Container).
2. The KLV structure of the Header Partition, any Body Partition and any the Footer Partition
3. The KLV structure of any optional Index Tables.
4. The optional Random Index Pack
5. The basic Header Metadata structure in any partition.
6. Locate the SMPTE Universal Labels in all the Partition Packs
7. Skip over any run-in.
In addition, it is desirable that MXF decoders decode (i.e. interpret and act on the values within) at least:
Page 43 of 74 pages
SMPTE EG41
The metadata sets and individual metadata items defined in the minimum implementation of the simplest
Operational Pattern.
Decoding of other aspects such as the compressed bitstream or the specific Essence Container in the File Body
depends on the ability of the decoder to support those aspects. It is desirable that MXF Decoders be able to
locate and present the information that identifies the contents of the MXF file as follows:
1. The MXF file identification itself (that identifies that the file is MXF compliant) through the Key value of the
Header Partition Pack.
2. The UL of the Operational Pattern (Structural Metadata) to which the file conforms.
3. An array of ULs that identify each Essence Container and its contents in the File Body.
4. An array of ULs that identify each Descriptive Metadata collection within the file
The MXF Essence Descriptor contains a list of properties called “Locators”. MXF supports two different types of
locator – Network and Text. The Top-Level File Package that describes the Essence (i.e. the one that is
referenced by the Material Package) may have external essence, and the decoder must scan the Locators in the
order they are given to find the Essence. A typical example of this might be the creation of a CD-ROM where the
Network Locators are given as a file reference relative the location of the MXF file, followed by other locations in
which the file might be found, e.g.:
Even though the actual Essence Data is external to the file, there may be metadata describing the essence
within the file. In the extreme case, all the Essence could be external to the file leaving a small MXF stub that
fully describes the external Essence. MXF Files with Internal essence may also have locators. When all the
essence can be found internally, the locators should be treated as being for information purposes. In higher
Operational Patterns, it is possible that some of the Essence will be internal and some of it will be external. In
this case, Internal Essence, where present, should take precedence over external references. Where there is no
internal essence available from a Material Package SourceClip reference, the locators should be searched in
their listed order to find the content (see also 8.2.7.1). External content can be verified by checking the BodySID
value in the Essence Container set for the appropriate UMID. A zero value indicates external essence.
Page 44 of 74 pages
SMPTE EG41
This section is written in a decoder-centric fashion to illustrate why certain parameters are stored the way they
are. An Encoder should create a file so that the maximum number of decoders is likely to be able to read /
decode it. What does this mean? In practice, it means that the MXF Encoder’s designers may discover that
there are choices to be made when creating MXF Files. It may be the case that “elegant little tricks” with the
MXF syntax are found that may make life easier for the Encoder designer. If the use of such tricks reduces the
chance of interoperability with simple decoders, these tricks should be avoided. MXF is an Interchange File
format and the goal of all MXF devices should be to maximize the probability of Interoperability.
The order in which an MXF device or application searches for parameters within the file depends very much on
what the device or application is trying to do with the file. For example:
• An MXF file explorer GUI probably wants ownership information from the Identification Set
• An MXF Asset Manager needs to know UMIDs of the current and previous versions as well as whether
the content is in the file or externally referenced.
• An MXF Tape device probably wants the size of the Header Metadata and the Essence Container type
• A computer based MXF playback application probably wants to know the Operational Pattern and what
Essence Container Type(s) are in the file
• An MXF Edit conformer needs to know the Essence Container Types and whether or not all the
Essence is Internal to the file.
Notice from the list above that there are valid and important MXF applications that do not need to know the
exact Essence Type and are never likely to decode the content. To be able to read the file, the MXF decoder is
likely to go through a number of steps in both the physical and logical structures of the file.
Most of the “fail fast” information required by a decoder can be found in the Partition Pack. Typical processing by
the decoder may be:
Page 45 of 74 pages
SMPTE EG41
• Is this an MXF Version I understand? The MXF decoder checks the MajorVersion and MinorVersion
properties of the Partition Pack and checks them against the decoder’s reference value. Note that in
future versions of SMPTE 377M the Partition Pack key may have differences in bytes 14, 15 and 16
compared to previous versions of the specification.
• Is this an Operational Pattern I can handle? The MXF decoder checks the Operational Pattern UL
against the list of ULs it knows how to handle.
• Is the data in this Partition stable? The MXF decoder checks byte 15 of the Partition Pack key to
determine if this partition is of type “closed” or “closed and complete”. If the partition is of type “Open”
then the MXF application should find another Partition Pack because the information in this one may
have been created on the fly and may be inaccurate.
• Can I decode or process the Essence? The MXF decoder processes the EssenceContainers Batch in
the Partition Pack to compare each label against a list of labels it knows how to process. It is possible
that the Essence will be stored in several Essence Containers of the same type (e.g. 3 DV clips) – in
this case, there will be only 1 instance of the EssenceContainer Label. It is also possible that there will
be a single EssenceContainer in the file and that this will contain several different interleaved Essence
Types – for example, there may be uncompressed images in a Generic Container interleaved with
several tracks of AES audio. In this case there would be 2 Essence Container Labels – one for the
uncompressed pictures and the other for the interleaved audio.
• What is the duration of the file? The MXF decoder searches for the Primary Package UID in the Preface
Set and discovers the duration by inspecting the duration property of the sequences of the tracks in that
package.
• What device made it? This information is stored in the Identification Set which can be found using the
most recent Generation UUID.
• Is it HDTV or SDTV? This can be determined by inspecting the Essence Descriptor for the Picture
Track. The Picture Track in the Top-Level File Package(s) has a property called TrackID. This will match
one of the linked TrackID values in one of the EssenceDescriptors within the file. This
EssenceDescriptor contains many properties that fully describe the source Picture Essence. These
include horizontal and vertical sizes as well as the frame rate and nominal aspect ratio of the content.
• Where is the External Essence? Each Essence Descriptor has a Locators property, which is an ordered
list of places where the Essence might be. This list should be searched in order to find the essence. A
locator may be a URL or it may be text intended for a human operator (e.g. “all known URLs have been
searched (<list of URLs inserted by application>) and the essence was not found – it came from the
green cassette on the shelf behind the water cooler”). Mechanisms for finding external essence are
outside the scope of this document, but Media Asset Management systems that use UMIDs for
identification are becoming more common at the time of writing of this document.
Page 46 of 74 pages
SMPTE EG41
System System System Picture Picture Sound Sound Sound Data Data Data
element element element Element Element element element element element element element
System metadata
to element linking
These guidelines create files that are streamable, but may require large receiver buffers to synchronize the
Picture, Sound and Data. Many compression specifications provide a lot of information on buffering and
streaming, and creating a system with similar buffer characteristics is the goal here. For example, the MPEG-2
specification ISO /IEC 13818-1 gives rules and guidelines for multiplexing the audio and video streams into
either a Program Stream or a Transport Stream.
When streaming a file, the decoder is intended to display the pictures and recreate the sound while the file is
being sent. The delay through the video and audio decoders is often not the same; therefore buffering is
required in the decoder to bring the sound and pictures into synchronization. This buffering is often in addition to
any buffering required for compression decoding and basic demultiplexing of the streams.
The guidance given here is that an MXF encoder should create a stream as though it were creating the content
for streaming using the underlying compression standard; the GC Content Package guidelines above should
then be applied. This should result in a good compromise between low latency and KLV decodability.
This constraint is to ensure the continuous decodability of the Essence. It does not constrain changes in aspect
ratio, Active Format Descriptor, Colorimetry or any other parameter that can vary without resetting or crashing
an Essence Decoder. Changes of picture size, frame rate, Essence Coding Mode, discontinuities in timing
parameters and errored data are all examples of Essence Decodability conditions that would break the OP1a
Page 47 of 74 pages
SMPTE EG41
requirement. It is important to note that even if the OP1a Essence Decodability conditions are met, the file must
still be wrapped and delivered in an appropriate fashion to be a streaming file.
Now that the underlying Package type is known, the relationships between the packages can be determined.
The Material Package, or Material Packages, have Tracks that have Sequences that have SourceClips that refer
to Top-Level File Package tracks. Only these Top-Level File Packages are allowed to describe actual Essence.
The Top-Level File Packages have Tracks that have Sequences that have SourceClips that may reference
lower-level Source Packages. These lower-level Source Packages contain historical derivation information.
Lower level Source Packages whether File Packages or Physical Packages, will always describe essence that is
external to the MXF file.
Now that all the Packages are known, the Track types need to be identified. In MXF, all Tracks look the same
and it is not until the Sequence referenced by the Track is inspected that the Track type is known. Similarly, all
Sequence Sets look the same and it is not until the Data Definition Property value is resolved that the track type
can finally be worked out. The values of the ULs corresponding to the different Track types are given in the
SMPTE Labels Registry. There are different UL values for Picture, Sound and Data Tracks; this Data Definition
value should be consistent between the Sequences and SourceClips along a Track as well as those up and
down the Source Reference chain.
The most obvious physical constraint is to make a file that is streamable (!8.2.1). When there are multiple Top-
Level File Packages in the file, managing streaming buffers becomes slightly more complicated because of the
requirement that the essence for each Top-Level File Package must be in a partition with a unique BodySID
value. The management of the data in the Partition Packs and any Index Table segments must be done in such
a way that the receiver Essence buffers are still kept in a condition that prevents overflow and underflow.
8.2.3.1 Which Top-Level File Package goes with which Material Package track?
Each Material Package SourceClip has 2 properties that identify the appropriate Top-Level File Package:
SourcePackageID - a 32 byte Basic UMID
SourceTrackID - a 4 byte Uint32 Track Identifier
Page 48 of 74 pages
SMPTE EG41
These identify respectively the Top-Level Source Package and the track within it. The referenced Top-Level File
Package Set will have a PackageUID property that is the same as the SourcePackageID property of the Material
Package SourceClip. This Top-Level File Package will have an InstanceUID that is in the batch of Strong
References to Packages in the ContentStorage Set (when the Top-Level File Package is stored within the file).
8.2.3.2 Which Partition of Essence goes with which Top-Level File Package?
The important parameter here is the BodySID value, which is found in one of the Essence Container Data sets.
Having identified the Top-Level File Package UMID, which was the same as the SourcePackageID in the
Material Package SourceClip, each of the Essence Container Data sets is searched until the Package UID is
found in the Linked Package UID property. This set will contain a BodySID value and an IndexSID value that are
used to identify the partitions that contain the Essence Data and Index Table data for this Top-Level File
Package. This BodySID value will be found in the BodySID property of the Partition Packs where Essence Data
can be found.
For Essence Containers that use the MXF Generic Container, the Track Number property will match bytes 13-16
of the Key of wrapped Essence Data. Specific details of these 4 bytes can be found in the MXF Generic
Container specification as well as the individual Generic Container mapping documents.
If these assumptions are not valid, some math is required to determine the correct start point. In SMPTE 377M,
synchronization is discussed in section 8.4. The equation for synchronization is copied below:
Positionn Positionm
Essence on tracks n and m are synchronized when: =
EditRaten EditRatem
Page 49 of 74 pages
SMPTE EG41
In addition, a SourceClips StartPosition is measured in Edit Units of the Track containing the SourceClip, not of
the referenced Track. This means that when material is re-digitized or re-linked, you don’t have to go and re-
normalize all the tracks that reference that material.
Now it should be clear that the desired Position along the referenced track (in Edit Units of the referenced track)
is given by the equation below:
Position mp
Position along File Package Track is Position fp = EditRate fp ×
EditRate
mp
But this is not the end! The Origin Parameter for the File Package indicates how much stored essence exists
before the Position=0 point on the track. The final equation giving the start point along the stored essence
measured in File Package Edit Units is therefore given by the equation below:
Positionmp
Offset _ From _ Stored _ Essence_ Start fp = (Positionfp + Originfp ) = EditRatefp × + Originfp
EditRate
mp
The next question to be answered is “In which order should the Essence Containers appear in the file?”
If it is known that some of the Essence Containers are more likely to be changed than the others (for example
audio tracks that might be edited), then those Essence Containers should occur last in the file. The Essence
Container that is least likely to be changed should be placed first in the file.
If no knowledge of the likelihood of change is available to the MXF encoder then the Essence Containers should
be ordered so that the largest Essence Container appears first in the file. There are always going to be
circumstances when this rule is not optimal (e.g. when preview pictures are in the file), so implementers are
advised to think carefully about application requirements before committing to firm multiplexing rules.
Page 50 of 74 pages
SMPTE EG41
The question of “which Material Package do I use” is an application-specific question, but in general the
Package whose Instance UID value appears in the Preface Pack’s Primary Package property should by the one
chosen if no additional information is available.
The Partition Pack has two properties that should be consistent throughout the file:
ThisPartition: The offset to the start of this partition in the sequence of partitions (as a byte count relative
to the start of the Header Partition).
PreviousPartition: The offset to the start of the previous partition in the sequence of partitions (as a byte
count Byte relative to the start of the Header Partition).
In addition, the start of an MXF file is identified by the first 11 bytes of the Key of the Partition Pack.
It should now be possible to see that a push-mode transfer may be joined halfway through the stream by
detecting the first 11 bytes of a Partition Pack. If this is a valid Partition Pack then the remaining byte of the key
will match a known Partition Pack, and the values within the Pack will contain valid values. The very first
partition of the file is always the Header Partition and will have a “This Partition” value of 0. If a push-mode
transfer is joined and “ThisPartition” is non-zero then the number of missed bytes can be determined.
The PreviousPartition value can be used as a rough measure of the rate of insertion of partitions (assuming that
there is some consistency to the partitioning strategy used by the MXF Encoder). It should also be noted that
although the first 11 bytes of the Partition Pack key is quite a long byte sequence, it is not necessarily sufficiently
unique to never occur in the essence of a file. For this reason, a more robust decoder may wait until the second
Partition Pack header is received and check that:
Files that act as the Master for this Operation should be constructed with regular Body Partitions, a Random
Index Pack (RIP) and Index Tables. Ideally a complete Index Table for each and every Essence Container will
exist both in the Header and in the Footer of the File.
Page 51 of 74 pages
SMPTE EG41
The portion of the file to be extracted will most often be expressed in terms of time along the file. This example
will only consider the case of an Operational Pattern 1a file. In the higher Operational Patterns, extra work must
be undertaken to ensure that the correct portions of each and every referenced Top-Level File package are
extracted. The complexity of the Index Table handling will also increase because there is one Index Table per
Essence Container that may be segmented. Each Essence Container must be handled separately with the RIP
being used to identify the start of each partition.
In any MXF file, a RIP can be detected by accessing the last 32 bytes of the MXF File and using this as a Uint32
backwards offset from the end of the file to the start of the RIP (precise details are in SMPTE 377M). If the RIP
is present then the offset will point to the first byte of the KLV key of the RIP. The RIP can now be read and the
start point of each of the partitions in the file can be determined. In an OP1a file, this data is less critical than in a
higher Operational Pattern file where the Partitions will also be used to separate the different Essence
Containers. In OP1a files, there is only one Essence Container and therefore only one Index Table. An Index
Table Segment can now be located by finding a partition with the correct IndexSID value in the Partition Pack.
Now that the Index Table has been found, the byte Offsets within the Essence Stream can be found by an Index
Table look-up. If the partial file extraction is to be done with a minimum of processing then all the partitions from
the one containing the first byte up to and including the last partition containing the last byte can be extracted.
It is strongly recommended that after this extraction process has been done, the partition header data be
processed to correct the MXF file:
• The “ThisPartition” and “PreviousPartition” values in each partition header should be corrected
• Index Tables should be created that are consistent with the new partial file
• The UMIDs should be updated to show that this is not the same as the original material. (A combination
of SMPTE RP205 and Operational Practice will determine the exact UMID modification required)
Are Locators the only way of finding external metadata? No. If a Material Package references a File Package
that is simply not present in the File, then this is a valid external reference. In this case Bit 1 would have to be
set. Finding the Essence is rather more difficult – an external Media Asset Management system needs to be
used in order to resolve the UMID and find the content.
The next 3 figures attempt to show 3 different conditions that could result in external essence. Figure 16 shows
linkage using only UMID as the linking mechanism. The Material Package contains a SourceClip with a
SourcePackageID (UMID) that is not in the file. This can be determined by inspection of all the Top-Level File
Packages and optionally by the presence of an Essence Container Data Set with a BodySID of 0. Some external
mechanism (such as an asset management system) is required to resolve this UMID to a filename that can be
inspected for a UMID match as shown in the lower part of the diagram.
Page 52 of 74 pages
SMPTE EG41
Picture SourceClip
SourcePackageID= XX
SourceTrackID
Link by UMID Link by SourcePackageID
Start Position
Duration
IndexSID=y
BodySID=x
IndexSID=y File Package Essence Container Data
UMID= XX UMID= XX
BodySID= x
IndexSID= y
Picture Track
Picture Sequence
Locators provide a mechanism for discovering the location of external essence using only information within the
file. The advantage is that no external mechanism is required; the disadvantage is that when the external file is
moved, the locators should be updated. Figure 17 shows a similar example to the one above, although this time
the Material Package contains a SourceClip with a SourcePackageID (UMID) that appears to be in the file. Why
“appears to be”? Because a File Package exists in the file with the correct UMID, but the Essence Container
Data set indicates the BodySID value is 0. There are, however, two network locators and a text locator. The first
of these text locators is resolved to the file in the lower half of the figure. The locator identifies non-MXF essence
and because of this, it may be difficult for an application to check the UMID for correctness.
Page 53 of 74 pages
SMPTE EG41
Figure 18 shows an example where the external essence is an MXF File. As in the examples above, a Material
Package references a File Package that appears to be in the File. The Essence Container Data set indicates
that the essence is external because the BodySID is 0, equally obviously because there is no essence in the file!
The locator resolves to an MXF File, and this time checks can be made to determine that the target of the
reference is correct. The Top-Level File Package of the target file will have the same values as the Top-Level
File Package in the first file. If the UMIDs match, the target file has been found. If not, the rest of the Locators
should be inspected as above.
The Top-Level File Package in the external file should be identical to that in the original file. If there are any
discrepancies, then the metadata values in the external file should take precedence. The Top-Level File
Package in the original file should be regarded as a copy.
z:/tmp/clip.mxf
External MXF Essence File
Partition Header Metadata IndexTable Essence Container
Pack Segment BodySID= x given in Partition Pack
IndexSID=y
BodySID=x
IndexSID=y File Package Essence Container Data
UMID= XX UMID= XX
BodySID= x
IndexSID= y
Picture Track
Picture Sequence
Figure 18 : External Essence example using locators and UMIDs for linking
Reading the paragraph above 2 or 3 times, it seems clear. One possibly ambiguous case is where a file
contains only a single Essence Container that is intrinsically streamable but is clip-wrapped, either in a Generic
Container or in its own native container. In this case, Bit 2 should be set to “Wire File” because the resulting file
is still streamable according to the definition above. The interleave duration is set by the intrinsic streamability of
the underlying essence and there is no partitioning (i.e. multiplexing). The application has determined, therefore,
that the Multiplex duration is equal to the length of the file.
Page 54 of 74 pages
SMPTE EG41
Some cases of streamable status are clear and unambiguous. However, other cases can be subjective. The
following illustrate some possible cases of streamable files (all assuming that the essence is, itself, streamable):
1. A single, frame-wrapped, EC with a single essence element (e.g. OP1a with B-Wav essence).
2. A single, frame-wrapped, EC with multiple interleaved essence elements (e.g. OP1a with Type D-10
mapping).
3. Multiple, frame-wrapped, ECs where the ECs are in presentation sequence and in contiguous partitions
(e.g. OP2a with Type D-11 mapping).
4. A single, clip-wrapped, EC with a single essence element (e.g. OP1a with MPEG-2 long-GOP video ES).
5. A single, clip-wrapped, EC with an inherently interleaved essence stream (e.g. OP1a with a DV DIF stream).
6. Multiple, clip-wrapped, essence elements, each in separate ECs, which are multiplexed over clips of short
duration (say <1sec) (e.g. OP2b with MPEG-2 I-frame video ES multiplexed with B-Wav audio).
Note that the goal of this flag is to describe a Uni-Track file i.e. an OP2a file could be uni-track because it could
be constructed to have only one active track in the output timeline. OP1b could not be uni-track because there
will always be two or may synchronized tracks active on the output timeline.
MXF Index Tables are intended to be versatile, compression agnostic, streamable and applicable to any and all
of the MXF Operational Patterns defined in SMPTE 377M. The purpose of an Index Table is to convert from
time offsets to byte offsets within a file. The MXF Index Table specification may, at first seem rather complex,
but its resulting versatility gives huge functionality to random access systems:
• Cameras and streaming devices can create Segmented Index Tables on the fly
• Storage devices may have Index Tables at the start, end or both
• Index Tables are created for each Essence Container. Multiplexing Essence Containers or changing the
partitioning of a file does not change the Index Table
The Index Table structure for an Essence Container is defined by the “Delta Entries”. There is one Delta Entry
for each of the Interleaved Elements of the Essence Container. These Delta Entries allow an Essence Element
to be categorized as either CBE (Constant Bytes per Element) or VBE (Variable Bytes per Element). MXF
Encoders should always “play it safe” if there is any uncertainty that an Element is CBE. Each and every
Page 55 of 74 pages
SMPTE EG41
Element in the entire file should have the CBE byte count – if this is not true then each Index Table must use the
slice mechanism to indicate a VBE stream.
In this example, there are 3 separate Essence Tracks that need to be indexed. The Data and Sound Elements
are all CBE, but the Interleaving Rules used for this Essence Container lead to a variable number of Sound
Elements per Content Package. In the Content Package shown in Figure 19 there are two Sound Elements.
This results in a VBE Sound Stream for the purposes of Indexing because the number of bytes of Sound data
for a given Edit Unit is not constant.
The MXF Index Table Delta Entries are intended to allow identification of each of the Indexed Elements, and to
indicate whether they are CBE or VBE Elements. We will partially fill in the DeltaEntry Array here with the CBE
values we know. The VBE values will be filled in once slices have been introduced.
In the table below the expression BCSystem indicates the byte count for the System Element
Table 4
ex PosTable
System
Slice Slice number in IndexEntry 0 It’s the start of the Index Table – slice 0
Element Delta from start of slice to this 0 It’s the first entry – offset 0bytes from the
Delta Element start of the start of this Indexed Edit Unit
PosTableInd Temporal Reordering / Index into
ex PosTable
Delta Entry 1
Slice Slice number in IndexEntry 0 The previous element was CBE, so this is
Data
still slice 0
Element Delta from start of slice to this BCSystem This element starts at the end of the
Delta Element System Element, so this value is the byte
count of the System Element
PosTableInd Temporal Reordering / Index into
Delta Entry 2
ex PosTable
Picture
ex PosTable
Sound
Fill
ex PosTable
Slice Slice number in IndexEntry
Page 56 of 74 pages
SMPTE EG41
Page 57 of 74 pages
SMPTE EG41
Table 5
ex PosTable
System
Slice Slice number in IndexEntry 0 It’s the start of the Index Table – slice 0
Element Delta from start of slice to this 0 It’s the first entry – offset 0bytes from the
Delta Element start of the start of this Indexed Edit Unit
PosTableInd Temporal Reordering / Index into
ex PosTable
Delta Entry 1
Slice Slice number in IndexEntry 0 The previous element was CBE, so this is
Data
still slice 0
Element Delta from start of slice to this Sizeof(Syste This element starts at the end of the
Delta Element m) System Element, so this value is the byte
count of the System Element
PosTableInd Temporal Reordering / Index into
Delta Entry 2
ex PosTable
Picture
Slice Slice number in IndexEntry 0 This is the Element that terminates slice 0
Element Delta from start of slice to this BCSystem + The offset to the start of the Picture item
Delta Element BCData is the byte count of the system Element +
the byte count of the Data element
PosTableInd Temporal Reordering / Index into
Delta Entry 3
ex PosTable
Sound
Slice Slice number in IndexEntry 1 This is the Element that terminates slice 1
Element Delta from start of slice to this 0 It is also the first element in slice 1
Delta Element
PosTableInd Temporal Reordering / Index into
Delta Entry 4
ex PosTable
Fill
Slice Slice number in IndexEntry 2 This is the Element that terminates slice 2
Element Delta from start of slice to this 0 It is also the first element in slice 2
Delta Element
If the overall length of all the Elements in each frame is constant, then a Delta Entry Array and an "Edit Unit Byte
Count” Item are sufficient to define the Index Table Segment.
In Clip wrapping mode the tables index the first byte of the data for each indexed frame. For example, in the
MPEG Long GOP case, this will be the first byte of the start_code for the appropriate access unit. This means
that each Element Delta values give precisely the length of the data for each Frame.
Page 58 of 74 pages
SMPTE EG41
If the overall length of all the data for each and every indexed stream is constant for all frames, then a Delta
Entry Array and an "Edit Unit Byte Count” Item are sufficient to define the Index Table Segment.
This is a simple case where the Index Table points to the first byte of the DV-DIF Compound Element Generic
Container Key. There are no other Generic Container items and any Sound or Data Information is embedded
within the DV-DIF container. Therefore, the only pieces of information in the Index Table segment are the Start
Position, Duration and (fixed) size of each DV-DIF Compound Item KLV triplet.
This is a case where the Index Table points to the first byte of the MPEG Picture Element Generic Container
Key. The other Generic Container Elements should be indexed by correct use of the Delta Entries and Index
Entries. This example assumes that the Sound Elements that are Indexed require the use of the fractional
Position mechanism defined in SMPTE 377M. The following figure represents a typical Content Package being
indexed. This figure is based on a figure in SMPTE 377M.
Page 59 of 74 pages
SMPTE EG41
Index Entry n
start Synchronised Sound sample Position=PCP
within the Sound Frame
fill
element Picture Element
element Element 1 Element 2
CBE VBE VBE VBE Sound Element 1 Sound Element 2
Delta Entry 2 Delta Entry 3 Delta Entry 4 Delta Entry 5
Sound
Slice1 start point Slice2 start point Start Data
in Index Entry ‘n’ in Index Entry ‘n’ Position Start Data Element 1
offset Position
offset
In this example, the Picture, Sound and Fill are all VBE. The Fill is indexed so that it can be eliminated from any
Essence Byte Counting based solely on calculations in the Index Table.
Page 60 of 75 pages
SMPTE EG41
The Delta Entry Array contains an entry for every indexed Element in the Generic Container. The order of the
elements in the Delta Entry Array matches the order of the Elements in the Generic Container. The Example
below is a Delta Entry Array designed to match the example in Figure 19. Implementations should construct a
Delta Entry Array according to the properties of the actual Essence in the file.
In this example, we have several variable length Elements so that an Index Entry array is required.
Note also that the Delta Entry Array does not distinguish which element is which in the Index Table. To know
which element is indexed, the following rules apply when an MPEG Long GOP stream is indexed:
• Each Content Package starts with the same number and order of Elements and the previous Content
Package
• If new elements are introduced for whatever reason, they are appended to the end of the existing
Content Package elements
• If elements in the Content Package have no data, then an IndexEntry for a zero length VBE element is
created
• Index tables have the same number of delta entries as the maximum number of elements in any
Content Package
• The essence type of an index entry can be determined by inspecting the key that wraps the indexed
essence.
Page 61 of 75 pages
SMPTE EG41
The table above shows the descriptions of the various elements required in the Index Entry. Below, the Table
shows entries for the first 6 frames of a Long GOP sequence. The following values have been used in creating
the table:
• The GOP display sequence for frames 0-5 is B0I1B2P3B4P5. This is the indexed order of the frames.
• The GOP transmission order for frames 0-5 is I1B0P3B2P5B4. This is the stored order of the frames
• The GOP is closed (i.e. the first B frame contains predictions only from the I Frame).
• The Data Element length is fixed at 700 bytes and temporally offset by -0.25 edit units
• The I frames are 48000 bytes, P frames are 9000 bytes and B frames 1000 bytes.
• In the 6 Content Packages, there are 8 Sound Elements.
• The number of Sound Elements are multiplexed in the Content Packages as follows: (1)(1)(2)(1)(1)(2)
• Each Sound Elements is 1000 bytes.
• Each Fill element is 300 bytes
Page 62 of 74 pages
SMPTE EG41
Table 10
Page 63 of 74 pages
SMPTE EG41
Where there are a variable number of bytes per Element, the IndexEntry mechanism needs to be used as
shown in the examples above. An appropriate Indexing Rate is often to provide one Index Entry per second.
When Indexing an external Essence Container, it is recommended that Index Tables be constructed in the same
way they would be constructed if the Essence Container were internal. When Indexing External data that is not
KLV wrapped, the Index Table should be created where the byte offsets refer to the first byte of each Edit Unit of
the Essence – as in Clip Wrapping. Typically this will be the first byte of each frame.
8.4.1 Long GOP MPEG with uncompressed audio & other data
Long GOP MPEG is unlike many other essence types in that video frames are re-ordered when stored. This
leads to complications in the creation of Index Tables and the synchronization of associated Audio and Data
Page 64 of 74 pages
SMPTE EG41
Elements. The MXF MPEG mapping document goes into detail of how the different elements should be
arranged to achieve synchronization and to improve interoperability.
This MXF Engineering Guideline recommends that Frame wrapping of Long GOP MPEG should be used
wherever possible. It also recommends that the interleaving guidelines should be followed so that the
relationship between the Essence Elements in each Content Package is consistent.
The interleaving rules are designed so that when a group of Content Packages are extracted, the likelihood of
extracting the synchronized Picture, Sound and Data Elements is maximized. The figure below shows the
physical arrangement of the KLV triplets in a file. It can be seen that the different channels of Sound and Data
are KLV wrapped and kept contiguous with the Picture KLV.
` Sound Sound
Item Item
Picture Sound Sound Data Picture Sound Sound Data
Element Element Element Element Element Element Element Element
1 frame 1 frame
K L K L K L K L K L K L K L K L
V V V V V V V V
The intention of PixelLayout is to provide an algorithmic way of expressing the stored Pixels of bit packing
schemes which are likely to be used. The PixelLayout property is a zero terminated pairing of character codes
and Uint8 bit Depth values. These are all defined in SMPTE 377M, but a brief example illustrates the principle.
To describe 8-bit component 4:2:2 pixels packed into a 32-bit word, the 601 sequence would be:
If these bytes were stored contiguously in an MXF file, the PixelLayout property to describe this arrangement
would be:
This decodes as: 8 bit U (Cb) followed by 8 bits Y, followed by 8 bit V (Cr) followed by 8 bits Y. The final 2 zero
values terminate the property.
Page 65 of 74 pages
SMPTE EG41
identifies the payload in the Anc packet. VBI data is, by its nature, un-typed and can carry any kind of payload
without any local identification.
It is recommended that all data be wrapped using the MXF Generic Container Specification – even private data.
This allows the maximum re-use of existing tools, tests, code and knowledge in the MXF interchange
environment. This example will assume a mapping of private essence to the Generic Container.
To identify the private essence, there are certain unique identifiers that need to be generated:
1. Keys to wrap the private essence
2. An Essence Container UL to describe the essence containment used
3. An Essence Descriptor with appropriate ULs to describe the actual Essence
4. A Data Definition for use in the Sequences and SourceClips
When the Generic Container is used, the first of 13 bytes of the Key are already defined.
An Essence Container UL should ideally be a registered SMPTE UL. This could probably be an organizationally
registered UL if the Essence type is regularly used by an organization. Mechanisms for registering within
SMPTE are being created as this document is being written.
A private metadata item should have a unique 16-byte identifier. It is recommended that any metadata Item that
is likely to be used often is registered with SMPTE and a 16-byte UL is allocated. This is not always possible, so
a 16-byte UUID may be generated instead. It is important that this UUID is understood both by the encoder and
decoder of this private data, otherwise the data cannot be interchanged.
To extend an existing MXF set, the MXF encoder places the 16-byte identifier for the data in the Primer Pack
and generates a 2-byte local tag from the “dynamic” range of numbers given in the Format Specification. It is
important to check that this allocated number is not already used within the Primer Pack of the file.
Once this procedure is complete the private metadata value can be added to the appropriate Local Sets in the
MXF specification. The Primer Pack mechanism ensures that all decoders that don’t recognize the 16-byte
identifier will ignore it. In order to respect the AAF data model, all private metadata additions to the MXF
specification must follow the requirements of the single-inheritance hierarchy rules of the AAF class hierarchy.
Failure to follow this rule may lead to decoder errors. The best way to add private metadata such that it is
compatible with the AAF data model is to study the model, which is available from the AAF Association (see
section C.1).
A new KLV coded set or group is more straightforward to add. A new 16-byte UL must be registered with
SMPTE to wrap the set. The set should use the Primer Pack mechanism outlined already if 2-byte tags are
Page 66 of 74 pages
SMPTE EG41
being used in the set, otherwise the new set should follow SMPTE 336M. This set may be specified so that it
does not need to follow the single inheritance hierarchy rules and may therefore be “dark” to AAF decoders.
At the time of writing this document a Private Metadata Carrier set was being designed. If the design is
successful, this will provide a standardized way of adding sets of private metadata items to the MXF
specification.
If the intimate metadata is quite compact, it may be appropriate to represent it as private metadata in the
header. An example could be the representation of Camera movements as private Events on a Descriptive
Metadata Track.
It is likely that well-known intimate metadata properties such as Aspect Ratio and AFD information will have
intimate metadata mechanisms defined for tracking them in MXF.
It is also likely that certain kinds of metadata may be carried within the Essence Container itself, such as in
some Elements of a System Item of the Generic Container.
In MXF, Timecode is metadata annotation. The concept of time in MXF corresponds to a number of edit units
along a particular track. To determine the Timecode at a given position on a Track, the value of the Timecode
segment must be calculated or read for that Timecode Track. It is highly recommended that all MXF files are
created with Timecode Tracks in the Material and Top-Level File Packages, although this is not a normative
requirement. MXF decoders should still operate correctly if the Timecode Track is missing.
Parsing an MXF file in the forward direction is a relatively simple task thanks to KLV coding. Parsing the file in
the backwards direction is much more difficult without help. At the time of writing this Engineering Guideline, a
proposal exists for a simple Generic Container System Element that does nothing more than provide a
backwards pointer to the previous KLV wrapped Content Package. This allows very simple devices to provide
forwards and backwards play.
This subject is covered in great depth in SMPTE EG42 and will only be lightly covered here. The plug-in
mechanism is very simple and has the features described in the next sections.
Page 67 of 74 pages
SMPTE EG41
MXF also provides a DM SourceClip for referencing Descriptive Metadata. This is useful in the case where an
application wants to say, “The Descriptive Metadata for the Top-Level File Package is the same as the Lower-
Level Source Package”. Rather than duplicating the Metadata values, a reference can be created between the
two packages.
Both the DM Segment and DM SourceClip have a TrackIDs property that references all the MXF Tracks to
which this metadata applies. If this property is omitted the metadata applies to all the tracks in the Package.
Page 68 of 74 pages
SMPTE EG41
Annex A (Informative)
The Relationship of MXF to AAF
MXF was designed to have little or no divergence from the underlying AAF model. A joint working group ensured
that any deviation of the two formats was justified. Both formats have benefited from the work carried out on the
two different applications of the common underlying class model.
For MXF files, the Partitioning is designed for the following desirable characteristics:
1. Repetition of Header Metadata
2. Incremental sequential writing
3. The contents of the Index Tables do not change if the Index is relocated within the file
4. The contents of the Metadata KLV triplets do not change for each repetition
5. The Essence is completely unaffected by the insertion or deletion of Partition, Metadata or Index sectors
6. Multiple Independent Essence Containers and Index Tables
7. Simplicity – i.e. the ability for hardware only implementations to process MXF files
8. The encoding of Partitions does not require look ahead, except to record the number of bytes allocated for
Metadata and Index
9. The encoding of Metadata and Index Segments does not require any knowledge of context within the
multiplex
10. The stream can be picked up safely (after failure or join in progress) at ANY Body Partition
11. Minimal overhead
For AAF files using Structured Storage, the overhead of this low-level data structure serves other needs, which
are in conflict with some of the MXF requirements:
1. Efficient edit-in-place of individual Metadata Items and Sets
2. Complex hierarchical relationships between Sets
3. Efficient mixture of small and large data items
To maximize interoperability, the MXF and AAF low-level byte-stream formats share some key concepts:
1. A common class model
2. The widespread use of 16-byte universal labels
3. The notion of Streams
4. Stream Identifiers (SIDs)
5. Essence Descriptors
Using this commonality, the AAF SDK can efficiently implement MXF, with the following desired characteristics:
1. An MXF device can always read and / or write MXF files
2. An AAF application can always open MXF files with no conversion at all
3. If an AAF application modifies an EDL in an MXF file, an MXF device would see this as updated Header
Metadata
4. If an AAF application adds AAF specific metadata to an MXF file, an MXF device would see the additions as
"dark metadata"
5. An AAF application can flatten its internal Object Hierarchy to create an MXF file
6. An AAF application can also, of course, create AAF files
7. An AAF application can convert an AAF file into an MXF file in one of three ways (although not all of these
methods will be built in to the open source AAF SDK):
a. filter out non-MXF data
b. constrain the creation of the AAF file so it is never beyond MXF complexity
c. render an AAF composition.
Page 69 of 74 pages
SMPTE EG41
Note that Metadata to describe Effects other than cuts is defined by the Advanced Authoring Format.
Application-specific variants of MXF Files including Effects Metadata could be defined. However, that is outside
the scope of the MXF standard.
Files created according to the MXF standard may be opened by applications that are designed to read AAF, and
can be opened simultaneously by hardware or software designed for MXF.
An AAF data model persisted to Microsoft Structured Storage (MSS) represents a strong reference as MSS
storage containment of the target object. This means strong references are efficiently followed in an AAF file.
An MXF data model represented in KLV uses unique object instance identifiers to identify the target of a strong
reference. The target is persisted elsewhere in the KLV stream with the same identifier. (This is a bit like adding
an “artificial” key column to a relational database table when no combination of the existing columns is unique
for every possible row in the table.) These identifiers are transitory, in that provided this referential integrity is
maintained, they may be re-generated whenever an object is persisted to file.
There is ongoing work to create a common representation in XML of the shared data model. Details of this work
are available, at the time of writing, to the working committees of SMPTE, AAF and Pro-MPEG.
The AAF data model never required general weak references and they are not currently implemented. A future
implementation would probably use a path identifier (like a file system path) that identified the unique route from
the root object to the referenced object via the strong reference tree.
The MXF data model uses general weak references in the DMS 1. General weak references are easily
represented in MXF files by using the existing file-unique object instance identifiers.
Page 70 of 74 pages
SMPTE EG41
2. In the AAF data model, all object instances of each of these classes always reside in a known location in the
strong reference hierarchy.
Each set of objects in a file (data definitions, codec definitions, etc.) is in effect a local copy of a universal
registry, or at least all those entries that are used by the file. Each object includes the name of the definition and
a short description, much the same as is found in a SMPTE registry.
Definitions are also represented using the universally unique identifier in MXF files. The only difference is that
there is no local copy of the registry.
Page 71 of 74 pages
SMPTE EG41
Annex B
Preferred Enumerated String Values
This annex defines preferred string values for certain properties defined in SMPTE 377M using English terms
and words.
Strings are listed in the form [SetName : PropertyName] in order to ensure that the target property is clearly
identified.
The following values are preferred text string values that enable operating system type discovery for common
software platforms. Other string values may be used where needed.
“Windows 95”
“Windows 98”
“Windows ME”
“Windows 2000”
“Windows NT”
“Windows XP”
“Mac OS Classic”
“Mac OS X”
“Unix System V”
“Solaris”
“Unix BSD 4.3”
“Unix BSD 4.4”
“Irix”
“Linux”
“FreeBSD”
“AIX”
Page 72 of 74 pages
SMPTE EG41
Annex C
Bibliography
The following documents are referred to normatively in other parts of SMPTE 377M. The list is provided here for
information so that a complete list of references can be found in a single place. For dated references,
subsequent amendments to, or revisions of, any of these publications do not apply. For undated references, the
latest edition of the normative documents referred to applies. Members of ISO and IEC maintain registers of
currently valid International Standards.
Note: this list may not include Normative References for MXF documents defined after the publication point of this document
(for example new Essence Container documents).
Note: Approved SMPTE standards may be obtained from http://www.smpte.org. Drafts of SMPTE documents may be
obtained from ftp://smpte.vwh.net/pub. Approved ANSI standards may be obtained from http://www.ansi.org.
Page 73 of 74 pages
SMPTE EG41
24. SMPTE 382M – MXF Mapping AES3 and Broadcast Wave Audio into the MXF Generic Container
25. SMPTE 383M – MXF Mapping DV-DIF Data to the MXF Generic Container
26. SMPTE 384M – MXF Mapping of Uncompressed Pictures into the Generic Container
27. SMPTE 385M – MXF Mapping SDTI-CP Essence and Metadata into the MXF Generic Container
28. SMPTE 386M – MXF Mapping Type D-10 Essence Data to the MXF Generic Container
29. SMPTE 387M – MXF Mapping Type D-11 Essence Data to the MXF Generic Container
30. SMPTE 321M-2002, Television – Data Stream Format for the Exchange of DV-Based Audio Data and
Compressed Video over a Serial Data Transport Interface
31. SMPTE 322M-1999, Television – Format for Transmission of DV Compressed Video Audio and Data over a
Serial Data Transport Interface
32. SMPTE 359M-2001, Television and Motion Pictures – Dynamic Documents
33. SMPTE 352M-2002, Television (Dynamic) - Video Payload Identification for Digital Interfaces
34. ITU-R BR.1352-1:2002: Broadcast Wave Format (BWF), Annex 1, Annex 1 Appendix 1 and 2, and Annex 3
35. EBU tech T3285 Supplement 3 (2001): BWF, Peak Envelope Chunk
36. Draft AES project X66 (tentative designation AES31-2): File format for transferring digital audio data
37. AES3 (1992): Serial transmission format for two-channel linearly represented digital audio data
38. SMPTE 337M-2000, Format for Non-PCM Audio and Data in an AES3 Serial Digital Audio Interface
39. SMPTE 338M-2000, Television – Format for Non-PCM Audio and Data in AES3 – Data Types
40. SMPTE 339M-2000, Format for Non-PCM Audio and Data in an AES3 – Generic Data Types
41. SMPTE EG 42 MXF, Descriptive Metadata Engineering Guideline
42. ISO/IEC 8825-1:1998, ASN.1 Basic Encoding Rules
The following list of informative documents is provided to help give background information and an overview of
standards related to SMPTE 377M.
1. EBU / SMPTE Task Force for Harmonized Standards for the Exchange of Program Material as Bit-streams
– 1998, http://www.smpte.org and http://www.ebu.ch
2. Advanced Authoring Format, http://www.AAFassociation.org
3. DVCPRO White Papers, http://www.dvcpropartners.com
4. The SMPTE Data Coding Protocol and Dictionaries, Jim Wilkinson, SMPTE Journal, July 2000 Vol. 109, No
7, Engineering Report
5. UNICODE – http://www.unicode.org for informative reading on the coding of international characters.
6. Pro-MPEG forum web site http://www.pro-mpeg.org
7. UML information for understanding class diagrams and other aspects of data modeling and programming
http://www.oreilly.com
Page 74 of 74 pages