.. |br| raw:: html
.. _coards-guide:
#####################################
Prepare COARDS-compliant netCDF files
#####################################
On this page we discuss how you can generate netCDF data files in the
proper format for HEMCO and and GEOS-Chem.
.. _coards-guide-coards:
==========================
The COARDS netCDF standard
==========================
The `Harmonized Emissions Compionent (HEMCO)
`_ reads data stored in `the netCDF file
format
`__,
which is a common data format used in atmospheric and climate
sciences.
NetCDF files contain **data arrays** as well as **metadata**, which is
a description of the data.
Several netCDF conventions have been developed in order to facilitate
data exchange and visualization. The `Cooperative Ocean Atmosphere
Research Data Service (COARDS) standard
`_
defines regular conventions for naming dimensions as well as the
`attributes `__
describing the data. You will find more information about these
conventions in the sections below. HEMCO requires its input data to be
adhere to the COARDS standard.
Our :ref:`our "Work with
netCDF files" supplemental guide `
contains detailed instructions on how you can check a netCDF file for
COARDS compliance.
.. _coards-guide-dims:
=================
COARDS dimensions
=================
The **dimensions** of a netCDF file define how many grid boxes there are
along a given direction. While the COARDS standard does not require any
specific names for dimensions, accepted practice is to use these names
for rectilinear grids:
.. _coards-guide-dims-time:
time
----
Specifies the number of points along the time (:literal:`T`) axis.
The :literal:`time` dimension must always be specified. When you create the
netCDF file, you may declare :literal:`time` to be
:literal:`UNLIMITED` and then later define its size. This allows
you to append further time points into the file later on.
.. _coards-guide-dims-lev:
lev
---
Specifies the number of points along the vertical level
(:literal:`Z`) axis.
This dimension may be omitted none of the data arrays in the netCDF
file have a vertical dimension.
.. _coards-guide-dims-lat:
lat
---
Specifies the number of points along the latitude (:literal:`Y`)
axis.
.. _coards-guide-dims-lon:
lon
---
Specifies the number of points along the longitude (:literal:`X`) axis.
.. note::
For non-rectilinear grids (e.g. cubed-sphere), the :ref:`coards-guide-dims-lat`
and :ref:`coards-guide-dims-lon` dimensions may be named :literal:`NY` and
:literal:`NX` instead.
.. _coards-guide-coordvec:
=========================
COARDS coordinate vectors
=========================
**Coordinate vectors** (aka **index variables** or **axis variables**) are
1-dimensional arrays that define the values along each axis.
The only COARDS requirement for coordinate vectors are these:
#. Each coordinate vector must be given the same name as the dimension
that is used to define it.
#. All of the values contained within a coordinate vector must be either
monotonically increasing or monotonically decreasing.
.. _coards-guide-coordvec-time:
time
----
A COARDS-compliant :literal:`time` coordinate vector will have these features:
.. code-block:: console
dimensions
time = UNLIMITED ; // (12 currently)
. . .
variables
double time(time) ;
time:long_name = "time" ;
time:units = "hours since 2010-01-01 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T";
.. note::
The output above was generated by the :command:`ncdump` command.
As you can see, :literal:`time` is an 8-byte floating point (aka
:code:`REAL*8` with 12 time points.
The :literal:`time` coordinate vector has following attributes:
.. option:: time:long_name
A detailed description of the contents of this array. This is
usually set to :literal:`time` or :literal:`Time`.
.. option:: time:units
Specifies the number of hours, minutes, seconds, etc. that has
elapsed with respect to a reference datetime :literal:`YYYY-MM-DD
hh:mn:ss`. Set this to one of the folllowing values:
- :literal:`"days since YYYY-MM-DD hh:mn:ss"`
- :literal:`"hours since YYYY-MM-DD hh:mn:ss"`
- :literal:`"minutes since YYYY-MM-DD hh:mn:ss"`
- :literal:`"seconds since YYYY-MM-DD hh:mn:ss"`
.. tip::
We recommend that you choose the reference datetime to correspond to
the first time value in the file (i.e. :literal:`time(0) = 0`).
.. option:: time:calendar
Specifies the calendar used to define the time system. Set this to
one of the following values:
.. option:: standard
Synonym for :option:`gregorian`.
.. option:: gregorian
Selects the Gregorian calendar system.
.. option:: time:axis
Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this
coordinate vector. Set this to :literal:`T`.
.. _coards-guide-additional-time:
Special considerations for time vectors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#. We recommend that index variables (such as :literal:`time`) be
declared with type :literal:`float` or :literal:`double`. `GCHP
`_ cannot parse files with that have
index variables of type :literal:`int`. |br|
|br|
#. We have noticed that netCDF files having a :option:`time:units`
reference datetime prior to :literal:`1900/01/01 00:00:00` may not
be read properly when using `HEMCO `_
or `GCHP `_ within an ESMF
environment. We therefore recommend that you use reference
datetime values after 1900 whenever possible. |br|
|br|
#. Weekly data must contain seven time slices in increments of one
day. The first entry must represent Sunday data, regardless of the
real weekday of the assigned datetime. It is possible to store
weekly data for more than one time interval, in which case the
first weekday (i.e. Sunday) must hold the starting date for the given set
of (seven) time slices.
- For instance, weekly data for every month of a year can be stored
as 12 sets of 7 time slices. The reference datetime of the first
entry of each set must fall on the first day of every month, and
the following six entries must be increments of one day.
Currently, weekly data from netCDF files is not correctly
read in an ESMF environment.
.. _coards-guide-coordvec-lev:
lev
---
A COARDS-compliant :literal:`lev` coordinate vector will have these features:
.. code-block:: console
dimensions:
lev = 72 ;
. . .
variables:
double lev(lev) ;
lev:long_name = "level" ;
lev:units = "level" ;
lev:positive = "up" ;
lev:axis = "Z" ;
Here, :literal:`lev` is an 8-byte floating point (aka
:literal:`REAL*8`) with 72 levels.
The :literal:`lev` coordinate vector has the following attributes:
.. option:: lev:long_name
A detailed description of the contents of this array. You may set
this to values such as:
- :literal:`"level"`
- :literal:`"GEOS-Chem levels"`
- :literal:`"Eta centers"`
- :literal:`"Sigma centers"`
.. option:: lev:units
**(Required)** Specifies the units of vertical levels. Set this
to one of the following:
- :literal:`"levels"`
- :literal:`"eta_level"`
- :literal:`"sigma_level"`
.. important::
If you set :literal:`long_name:` to :literal:`level` as well,
then HEMCO will be able to regrid between GEOS-Chem vertical
grids.
.. option:: lev:axis
Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this
coordinate vector. Set this to :literal:`Z`.
.. option:: lev:positive
Specifies the direction in which the vertical dimension is indexed.
Set this to one of these values:
- :literal:`"up"` (Level 1 is the surface, and level
indices increase upwards)
- :literal:`"down"` (Level 1 is the atmosphere top, and level
indices increase downwards)
For emisisons and most other data sets, you can set
:option:`lev:positive` to :literal:`"up"`.
.. important::
GCHP and the NASA GEOS-ESM use a vertical grid where
:option:`lev:positive` is :literal:`"down"`.
.. _coards-guide-additional-lev:
Additional considerations for lev vectors:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When using `GEOS-Chem `_ or `HEMCO
`_ in a non-ESMF environment, data is
interpolated onto the simulation levels if the input data is on
vertical levels other than the HEMCO model levels (see `HEMCO vertical
regridding
`_).
Data on non-model levels must be on a hybrid sigma pressure coordinate
system. In order to properly determine the vertical pressure levels of
the input data, the file must contain the surface pressure values and
the hybrid coefficients (a, b) of the coordinate system. Furthermore,
the level variable must contain the attributes standard_name and
formula_terms (the attribute positive is recommended but not
required). A header excerpt of a valid netCDF file is shown below:
.. code-block:: console
float lev(lev) ;
lev:standard_name = ”atmosphere_hybrid_sigma_pressure_coordinate” ;
lev:units = ”level” ;
lev:positive = ”down” ;
lev:formula_terms = ”ap: hyam b: hybm ps: PS” ;
float hyam(nhym) ;
hyam:long_name = ”hybrid A coefficient at layer midpoints” ;
hyam:units = ”hPa” ;
float hybm(nhym) ;
hybm:long_name = ”hybrid B coefficient at layer midpoints” ;
hybm:units = ”1” ;
float time(time) ;
time:standard_name = ”time” ;
time:units = ”days since 2000-01-01 00:00:00” ;
time:calendar = ”standard” ;
float PS(time, lat, lon) ;
PS:long_name = ”surface pressure” ;
PS:units = ”hPa” ;
float EMIS(time, lev, lat, lon) ;
EMIS:long_name = ”emissions” ;
EMIS:units = ”kg m-2 s-1” ;
.. _coards-guide-coordvec-lat:
lat
---
A COARDS-compliant :literal:`lat` coordinate vector will have these features:
.. code-block:: console
dimensions:
lat = 181 ;
variables:``
double lat(lat) ;
lat:long_name = "Latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
Here, :literal:`lat` is an 8-byte floating point (aka
:literal:`REAL*8`) with 181 values.
The :literal:`lat` coordinate vector has the following attributes:
.. option:: lat:long_name
A detailed description of the contents of this array. Set this to
:literal:`Latitude`.
.. option:: lat:units
Specifies the units of latitude. Set this to
:literal:`degrees_north`.
.. option:: lat:axis
Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this
coordinate vector. Set this to :literal:`Y`.
.. _coards-guide-coordvec-lon:
lon
---
A COARDS-compliant :literal:`lon` coordinate vector will have these features:
.. code-block:: console
dimensions:
lon = 360 ;
variables:``
double lon(lon) ;
lon:long_name = "Longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
Here, :literal:`lon` is an 8-byte floating point (aka
:literal:`REAL*8`) with 360 values.
The :literal:`lon` coordinate vector has following attributes:
.. option:: lon:long_name
A detailed description of the contents of this array. Set this to
:literal:`Longitude`.
.. option:: lon:units
Specifies the units of latitude. Set this to
:literal:`degrees_east`.
.. option:: lon:axis
Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this
coordinate vector. Set this to :literal:`X`.
Longitudes may be represented modulo 360. For example, -180, 180, and
540 are all valid representations of the International Dateline and 0
and 360 are both valid representations of the Prime Meridian. Note,
however, that the sequence of numerical longitude values stored in the
netCDF file must be monotonic in a non-modulo sense.
Practical guidelines:
#. If your grid begins at the International Dateline (-180°),
then place your longitudes into the range -180..180.
#. If your grid begins at the Prime Meridian (0°), then place
your longitudes into the range 0..360.
.. _coards-guide-data:
==================
COARDS data arrays
==================
A COARDS-compliant netCDF file may contain several **data arrays**. In
our example file shown above, there are two data arrays:
.. code-block:: console
dimensions:
time = UNLIMITED ; // (12 currently)
lev = 72 ;
lat = 181 ;
lon = 360 ;
variables:``
float PRPE(time, lev, lat, lon) ;
PRPE:long_name = "Propene" ;
PRPE:units = "kgC/m2/s" ;
PRPE:add_offset = 0.f ;
PRPE:missing_value = 1.e+15f ;
float CO(time, lev, lat, lon) ;``
CO:long_name = "CO" ;
CO:units = "kg/m2/s" ;
CO:_FillValue = 1.e+15f ;
CO:missing_value = 1.e+15f ;
These arrays contain emissions for species tracers PRPE (lumped < C3
alkenes) and CO.
.. _coards-guide-data-attr:
Attributes for data arrays
--------------------------
.. _coards-guide-data-attr-long-name:
long_name
---------
Gives a detailed description of the contents of the array.
.. _coards-guide-data-attr-units:
units
-----
Specifies the units of data contained within the array. SI units
are preferred.
Special usage for HEMCO:
- Use :literal:`kg/m2/s` or :literal:`kg m-2 s-1` for emission
fluxes of species
- Use :literal:`kg/m3` or :literal:`kg m-3` for concentration data;
- Use :literal:`1` for dimensionless data instead of
:literal:`unitless`. HEMCO will recognize :literal:`unitless`,
but it is non-standard and not recommended.
.. _coards-guide-data-attr-missing-value:
missing_value
-------------
Specifies the value that should represent missing data. This
should be set to a number that will not be mistaken for a valid
data value.
.. _coards-guide-data-attr-fillvalue:
_FillValue
----------
Synonym for :ref:`coards-guide-data-attr-missing-value`. It is recommended to set both
:ref:`coards-guide-data-attr-missing-value` and :literal:`_FillValue` to the same
value. Some data visualization packages look for one but not the
other.
.. _coards-guide-data-ordering:
Ordering of the data
--------------------
2D and 3D array variables in netCDF files must have specific dimension
order. If the order is incorrect you will encounter netCDF read error
"start+count exceeds dimension bound". You can check the dimension
ordering of your arrays by using the :command:`ncdump` command as
shown below:
.. code-block:: console
$ ncdump file.nc -h
Be sure to check the dimensions listed next to the array name rather
than the ordering of the dimensions listed at the top of the
:command:`ncdump` output.
The following dimension orders are acceptable:
.. code-block:: console
array(time,lat,lon)
array(time,lat,lon,lev)
The rest of this section explains why the dimension ordering of arrays
matters.
When you use :command:`ncdump` to examine the contents of a netCDF
file, you will notice that it displays the dimensions of the data in
the opposite order with respect to Fortran. In our sample file,
:command:`ncdump` says that the CO and PRPE arrays have these dimensions:
.. code-block:: console
CO(time,lev,lat,lon)
PRPE(time,lev,lat,lon)
But if you tried to read this netCDF file into GEOS-Chem (or any other
program written in Fortran), you must use data arrays that have these
dimensions:
.. code-block:: console
CO(lon,lat,lev,time)
PRPE(lon,lat,lev,time)
Here's why:
Fortran is a **column-major** language, which means that arrays are stored
in memory by columns first, then by rows. If you have declared an arrays
such as:
.. code-block:: fortran
INTEGER :: I, J, L, T
INTEGER, PARAMETER :: N_LON = 360
INTEGER, PARAMETER :: N_LAT = 181
INTEGER, PARAMETER :: N_LEV = 72
INTEGER, PARAMTER :: N_TIME = 12
REAL*4 :: CO (N_LON,N_LAT,N_LEV,N_TIME)
REAL*4 :: PRPE(N_LON,N_LAT,N_LEV,N_TIME)
then for optimal efficiency, the leftmost dimension (:code:`I`) needs
to vary the fastest, and needs to be accessed by the innermost
DO-loop. Then the next leftmost dimension (:code:`J`) should be
accessed by the next innermost DO-loop, and so on. Therefore, the
proper way to loop over these arrays is:
.. code-block:: fortran
DO T = 1, N_TIME
DO L = 1, N_LEV
DO J = 1, N_LAT
DO I = 1, N_LON
CO (I,J,L,N) = ...
PRPE(I,J,L,N) = ...
ENDDO
ENDDO
ENDDO
ENDDO
Note that the :code:`I` index is varying most often, since it is the
innermost DO-loop, then :code:`J`, :code:`L`, and :code:`T`. This is
opposite to how a car's odometer reads.
If you loop through an array in this fashion, with leftmost indices
varying fastest, then the code minimizes the number of times it has to
load subsections of the array into cache memory. In this optimal
manner of execution, all of the array elements sitting in the cache
memory are read in the proper order before the next array subsection
needs to be loaded into the cache. But if you step through array
elements in the wrong order, the number of cache loads is
proportionally increased. Because it takes a finite amount of time to
reload array elements into cache memory, the more times you have to
access the cache, the longer it will take the code to execute. This
can slow down the code dramatically.
On the other hand, C is a **row-major** language, which means that arrays
are stored by rows first, then by columns. This means that the outermost
do loop (:code:`I`) is varying the fastest. This is identical to how a
car's odometer reads.
If you use a Fortran program to write data to disk, and then try to
read that data from disk into a program written in C, then unless
you reverse the order of the DO loops, you will be reading the array
in the wrong order. In C you would have to use this ordering scheme
(using Fortran-style syntax to illustrate the point):
.. code-block:: fortran
DO I = 1, N_LON
DO J = 1, N_LAT
DO L = 1, N_LEV
DO T = 1, N_TIME
CO(T,L,J,I) = ...
PRPE(T,L,J,I) = ...
ENDDO
ENDDO
ENDDO
ENDDO
Because :program:`ncdump` is written in C, the order of the array appears
opposite with respect to Fortran. The same goes for any other code
written in a row-major programming language.
.. _coards-guide-global-attr:
========================
COARDS Global attributes
========================
**Global attributes** are `netCDF attributes
`_
that contain information about a netCDF file, as opposed to
information about an individual data array.
From our example in the :ref:`Examine the contents of a netCDF file
`, the output from :command:`ncdump` showed
that our sample netCDF file has several global attributes:
.. code-block:: console
// global attributes:
:Title = "COARDS/netCDF file containing X data"
:Contact = "GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" ;
:References = "www.geos-chem.org; wiki.geos-chem.org" ;
:Conventions = "COARDS" ;
:Filename = "my_sample_data_file.1x1"
:History = "Mon Mar 17 16:18:09 2014 GMT" ;
:ProductionDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
:ModificationDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
:VersionID = "1.2" ;
:Format = "NetCDF-3" ;
:Model = "GEOS5" ;
:Grid = "GEOS_1x1" ;
:Delta_Lon = 1.f ;
:Delta_Lat = 1.f ;
:SpatialCoverage = "global" ;
:NLayers = 72 ;
:Start_Date = 20050101 ;
:Start_Time = 00:00:00.0 ;
:End_Date = 20051231 ;
:End_Time = 23:59:59.99999 ;
Global attributes may either have the first letter capitalized, or all
letters in lower-case.
Title
-----
Provides a short description of the file.
Contact
-------
Provides contact information for the person(s) who created the
file.
References
----------
Provides a reference (citation, DOI, or URL) for the data contained
in the file.
Conventions
-----------
Indicates if the netCDF file adheres to a standard (e.g. COARDS or
CF).
Filename
--------
Specifies the name of the file.
History
-------
Specifies the datetime of file creation, and of any subsequent
modifications.
.. note::
If you edit the file with :program:`nco` or :program:`cdo`, then
this attribute will be updated to reflect the modification that
was done.
Format
------
Specifies the format of the netCDF file (such as
:literal:`netCDF-3` or :literal:`netCDF-4`).
.. _coards-guide-more-info:
====================
For more information
====================
Please see our :ref:`ncguide` Supplemental Guide for more information
about commands that you can use to combine, edit, or maniuplate data
in netCDF files.