Amber Trajectory Netcdf Convention Version 1.0, Revision B: John Mongan (Jmongan@Mccammon - Ucsd.Edu) 20Th February 2006
Amber Trajectory Netcdf Convention Version 1.0, Revision B: John Mongan (Jmongan@Mccammon - Ucsd.Edu) 20Th February 2006
Introduction
The file format described in this document was developed for storing data generated by molecular dynamics simulations. It was introduced in version 9 of the
AMBER suite of programs (http://amber.scripps.edu).
The primary design goals of this format are:
Efficient input and output
Compact, high-precision representation of data
Portability of data files across different machine architectures
Extensibility of the format (ability to add additional data without re-writing
parsers)
Compatability with existing tools and formats
The file format is based on the NetCDF (Network Common Data Form) developed
by Unidata (http://www.unidata.ucar.edu/software/netcdf/). NetCDF is designed
for representation of arbitrary array-based data. Unidata provides libraries with
bindings in C, C++, Fortran (F77 and F90), Java, Python, Perl, Ruby and MATLAB
for reading and writing NetCDF files. The design goals above are largely met by
NetCDF and the libraries that implement it. It is expected that all I/O of the format
described here will occur through these libraries; this specification describes the
file format at a high level in terms of the API implemented by version 3.6 of these
libraries. In NetCDF terms, this document is a Convention, describing the names,
dimensions and attributes of the arrays that may be present in the file.
Program behavior
Programs creating trajectory files (creators) shall adhere strictly to the requirements of this document. Programs reading trajectory files (readers) shall be as
permissive as possible in applying the requirements of this document. Readers may
emit warnings if out-of-spec files are encountered; these warnings should include
information about the program that originally created the file (see Global attributes,
section 4). Readers shall not fail to read a file unless the required information cannot be located or interpreted. In particular, to ensure forward compatability with
later extension of the format, readers shall not fail or emit warnings if elements not
described in this document are present in the file.
Global attributes
Global attributes shall have type character string. Spelling and capitalization of attribute names shall be exactly as appears below. Creators shall include all attributes
marked required and may include attributes marked optional. Creators shall not
write an attribute string having a length greater than 80 characters. Readers may
warn about missing required attributes, but shall not fail, except in the case of a
missing or unexpected Conventions or ConventionVersion attributes.
Conventions (required)
Contents of this attribute are a comma or space delimited list of tokens representing all of the conventions to which the file conforms. Creators shall include the
string AMBER as one of the tokens in this list. In the usual case, where the file conforms only to this convention, the value of the attribute will simply be AMBER.
2
Readers may fail if this attribute is not present or none of the tokens in the list are
AMBER. Optionally, if the reader does not expect NetCDF files other than those
conforming to the AMBER convention, it may emit a warning and attempt to read
the file even when the Conventions attribute is missing.
ConventionVersion (required)
Contents are a string representation of the version number of this convention. Future revisions of this convention having the same version number may include definitions of additional variables, dimensions or attributes, but are guaranteed to have
no incompatible changes to variables, dimensions or attributes specified in previous
revisions. Creators shall set this attribute to 1.0. If this attribute is present and
has a value other than 1.0, readers may fail or may emit a warning and continue.
It is expected that the version of this convention will change rarely, if ever.
application (optional)
If the creator is part of a suite of programs or modules, this attribute shall be set to
the name of the suite.
program (required)
Creators shall set this attribute to the name of the creating program or module.
programVersion (required)
Creators shall set this attribute to the preferred textual formatting of the current
version number of the creating program or module.
title (optional)
Creators may set use this attribute to represent a user-defined title for the data
represented in the file. Absence of a title may be indicated by omitting the attribute
or by including it with an empty string value.
5 Dimensions
frame (required, length unlimited)
Coordinates along the frame dimension will generally represent data taken from
different time steps, but may represent arbitrary conformation numbers when the
3
trajectory file does not represent a true trajectory but rather a collection of conformations (e.g. from clustering).
Variables
6.1
Label variables
Label variables shall be written by creators whenever their corresponding dimension is present. These variables are for self-description purposes, so readers may
generally ignore them. Labels requiring more than one character per coordinate
4
shall use the label dimension. Individual coordinate labels that are shorter than
the length of the label dimension shall be space padded to the length of the label
dimension.
char spatial(spatial)
Creators shall write the string xyz to this variable, indicating the labels for coordinates along the spatial dimension.
char cell_spatial(cell_spatial)
Creators shall write the string abc to this variable, indicating the labels for the
three lengths defining the size of the unit cell.
6.2
Data variables
All data variables are optional. Some data variables have dependencies on other
data variables, as described below. Creators shall define a units attribute of type
character string for each variable as described below. Creators may define a scale_factor
attribute of type float for each variable. Creators shall ensure that the units of data
values, after being multiplied by the value of scale_factor (if it exists) are equal
to that described by the units attribute. If a scale_factor attribute exists for a variable, readers shall multiply data values by the value of the scale_factor attribute
before interpreting the data. This scaling burden is placed on the reader rather than
the creator, as writing data is expected to be a more time-sensitive operation than
reading it.
It is left as an implementation detail whether creators create a separate file for
each variable grouping (e.g. coordinates and velocities) or a single file containing
all variables. Some creators may allow the user to select the approach. Readers
should support reading both styles, that is, combining data from multiple files or
reading it all from a single file.
write a float for each frame coordinate representing the number of picoseconds
of simulated time elapsed since the start of the trajectory. When the file stores a
collection of conformations having no temporal sequence, creators shall omit this
variable.
Example
The following is an example of the CDL for a trajectory file conforming to the
preceding specification and containing most of the elements described in this document. This CDL was generated using ncdump -h <trajectory file>.
netcdf mdtrj {
dimensions:
frame = UNLIMITED ; // (10 currently)
spatial = 3 ;
atom = 28 ;
cell_spatial = 3 ;
cell_angular = 3 ;
label = 5 ;
variables:
char spatial(spatial) ;
char cell_spatial(cell_spatial) ;
char cell_angular(cell_angular, label) ;
float time(frame) ;
time:units = "picosecond" ;
float coordinates(frame, atom, spatial) ;
coordinates:units = "angstrom" ;
double cell_lengths(frame, cell_spatial) ;
cell_lengths:units = "angstrom" ;
double cell_angles(frame, cell_angular) ;
cell_angles:units = "degree" ;
float velocities(frame, atom, spatial) ;
velocities:units = "angstrom/picosecond" ;
velocities:scale_factor = 20.455f ;
// global attributes:
:title = "netCDF output test" ;
:application = "AMBER" ;
:program = "sander" ;
:programVersion = "9.0" ;
:Conventions = "AMBER" ;
:ConventionVersion = "1.0" ;
}
Standards and formats are most useful when they are supported widely, and become less useful and more burdensome if they fragment into multiple dialects. If
you plan to support additional variables, dimensions or attributes beyond those described here in a publicly released creator or reader program, please contact the
author (jmongan@mccammon.ucsd.edu) for inclusion of these elements into a future revision of this document.
Revision history
Revision A, Febuary 9, 2006: Initial document
Revision B, February 20, 2006: Better self-description for unit cells in periodic simulations; standards for indicating one and two dimensional periodicity.