Prepare COARDS-compliant netCDF files

On this page we discuss how you can generate netCDF data files in the proper format for HEMCO and and GEOS-Chem.

The COARDS netCDF standard

The Harmonized Emissions Compionent (HEMCO) reads data stored in the netCDF file format, which is a common data format used in atmospheric and climate sciences. NetCDF files contain data arrays as well as metadata, which is a description of the data.

Several netCDF conventions have been developed in order to facilitate data exchange and visualization. The Cooperative Ocean Atmosphere Research Data Service (COARDS) standard defines regular conventions for naming dimensions as well as the attributes describing the data. You will find more information about these conventions in the sections below. HEMCO requires its input data to be adhere to the COARDS standard.

Our our “Work with netCDF files” supplemental guide contains detailed instructions on how you can check a netCDF file for COARDS compliance.

COARDS dimensions

The dimensions of a netCDF file define how many grid boxes there are along a given direction. While the COARDS standard does not require any specific n

ames for dimensions, accepted practice is to use these names for rectilinear grids:

time

Specifies the number of points along the time (T) axis.

The time dimension must always be specified. When you create the netCDF file, you may declare time to be UNLIMITED and then later define its size. This allows you to append further time points into the file later on.

lev

Specifies the number of points along the vertical level (Z) axis.

This dimension may be omitted none of the data arrays in the netCDF file have a vertical dimension.

lat

Specifies the number of points along the latitude (Y) axis.

lon

Specifies the number of points along the longitude (X) axis.

Note

For non-rectilinear grids (e.g. cubed-sphere), the lat and lon dimensions may be named NY and NX instead.

COARDS coordinate vectors

Coordinate vectors (aka index variables or axis variables) are 1-dimensional arrays that define the values along each axis.

The only COARDS requirement for coordinate vectors are these:

  1. Each coordinate vector must be given the same name as the dimension that is used to define it.

  2. All of the values contained within a coordinate vector must be either monotonically increasing or monotonically decreasing.

time

A COARDS-compliant time coordinate vector will have these features:

dimensions
        time = UNLIMITED ; // (12 currently)
. . .
variables
        double time(time) ;
                 time:long_name = "time" ;
                 time:units = "hours since 2010-01-01 00:00:00" ;
                 time:calendar = "standard" ;
                 time:axis = "T";

Note

The above was generated by the ncdump command.

As you can see, time is an 8-byte floating point (aka REAL*8 with 12 time points.

The time coordinate vector has following attributes:

time:long_name

A detailed description of the contents of this array. This is usually set to time or Time.

time:units

Specifies the number of hours, minutes, seconds, etc. that has elapsed with respect to a reference datetime YYYY-MM-DD hh:mn:ss. Set this to one of the folllowing values:

  • "days since YYYY-MM-DD hh:mn:ss"

  • "hours since YYYY-MM-DD hh:mn:ss"

  • "minutes since YYYY-MM-DD hh:mn:ss"

  • "seconds since YYYY-MM-DD hh:mn:ss"

Tip

We recommend that you choose the reference datetime to correspond to the first time value in the file (i.e. time(0) = 0).

time:calendar

Specifies the calendar used to define the time system. Set this to one of the following values:

standard

Synonym for gregorian.

gregorian

Selects the Gregorian calendar system.

time:axis

Identifies the axis (X,Y,Z,T) corresponding to this coordinate vector. Set this to T.

Special considerations for time vectors

  1. We recommend that index variables (such as time) be declared with type float or double. GCHP cannot parse files with that have index variables of type int.

  2. We have noticed that netCDF files having a time:units reference datetime prior to 1900/01/01 00:00:00 may not be read properly when using HEMCO or GCHP within an ESMF environment. We therefore recommend that you use reference datetime values after 1900 whenever possible.

  3. Weekly data must contain seven time slices in increments of one day. The first entry must represent Sunday data, regardless of the real weekday of the assigned datetime. It is possible to store weekly data for more than one time interval, in which case the first weekday (i.e. Sunday) must hold the starting date for the given set of (seven) time slices.

    • For instance, weekly data for every month of a year can be stored as 12 sets of 7 time slices. The reference datetime of the first entry of each set must fall on the first day of every month, and the following six entries must be increments of one day.

    Currently, weekly data from netCDF files is not correctly read in an ESMF environment.

lev

A COARDS-compliant lev coordinate vector will have these features:

dimensions:
        lev = 72 ;
. . .
variables:
        double lev(lev) ;
                lev:long_name = "level" ;
                lev:units = "level" ;
                lev:positive = "up" ;
                lev:axis = "Z" ;

Here, lev is an 8-byte floating point (aka REAL*8) with 72 levels.

The lev coordinate vector has the following attributes:

lev:long_name

A detailed description of the contents of this array. You may set this to values such as:

  • "level"

  • "GEOS-Chem levels"

  • "Eta centers"

  • "Sigma centers"

lev:units

(Required) Specifies the units of vertical levels. Set this to one of the following:

  • "levels"

  • "eta_level"

  • "sigma_level"

Important

If you set long_name: to level as well, then HEMCO will be able to regrid between GEOS-Chem vertical grids.

lev:axis

Identifies the axis (X,Y,Z,T) corresponding to this coordinate vector. Set this to Z.

lev:positive

Specifies the direction in which the vertical dimension is indexed. Set this to one of these values:

  • "up" (Level 1 is the surface, and level indices increase upwards)

  • "down" (Level 1 is the atmosphere top, and level indices increase downwards)

For emisisons and most other data sets, you can set lev:positive to "up".

Important

GCHP and the NASA GEOS-ESM use a vertical grid where lev:positive is "down".

Additional considerations for lev vectors:

When using GEOS-Chem or HEMCO in a non-ESMF environment, data is interpolated onto the simulation levels if the input data is on vertical levels other than the HEMCO model levels (see HEMCO vertical regridding).

Data on non-model levels must be on a hybrid sigma pressure coordinate system. In order to properly determine the vertical pressure levels of the input data, the file must contain the surface pressure values and the hybrid coefficients (a, b) of the coordinate system. Furthermore, the level variable must contain the attributes standard_name and formula_terms (the attribute positive is recommended but not required). A header excerpt of a valid netCDF file is shown below:

float lev(lev) ;
    lev:standard_name = ”atmosphere_hybrid_sigma_pressure_coordinate” ;
    lev:units = ”level” ;
    lev:positive = ”down” ;
    lev:formula_terms = ”ap: hyam b: hybm ps: PS” ;
float hyam(nhym) ;
    hyam:long_name = ”hybrid A coefficient at layer midpoints” ;
    hyam:units = ”hPa” ;
float hybm(nhym) ;
    hybm:long_name = ”hybrid B coefficient at layer midpoints” ;
    hybm:units = ”1” ;
float time(time) ;
    time:standard_name = ”time” ;
    time:units = ”days since 2000-01-01 00:00:00” ;
    time:calendar = ”standard” ;
float PS(time, lat, lon) ;
    PS:long_name = ”surface pressure” ;
    PS:units = ”hPa” ;
float EMIS(time, lev, lat, lon) ;
    EMIS:long_name = ”emissions” ;
    EMIS:units = ”kg m-2 s-1” ;

lat

A COARDS-compliant lat coordinate vector will have these features:

dimensions:
        lat = 181 ;
variables:``
        double lat(lat) ;
                lat:long_name = "Latitude" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;

Here, lat is an 8-byte floating point (aka REAL*8) with 181 values.

The lat coordinate vector has the following attributes:

lat:long_name

A detailed description of the contents of this array. Set this to Latitude.

lat:units

Specifies the units of latitude. Set this to degrees_north.

lat:axis

Identifies the axis (X,Y,Z,T) corresponding to this coordinate vector. Set this to Y.

lon

A COARDS-compliant lat coordinate vector will have these features:

dimensions:
        lon = 360 ;
variables:``
        double lon(lon) ;
                lon:long_name = "Longitude" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;

Here, lon is an 8-byte floating point (aka REAL*8) with 360 values.

The lon coordinate vector has following attributes:

lon:long_name

A detailed description of the contents of this array. Set this to Longitude.

lon:units

Specifies the units of latitude. Set this to degrees_east.

lon:axis

Identifies the axis (X,Y,Z,T) corresponding to this coordinate vector. Set this to X.

Longitudes may be represented modulo 360. For example, -180, 180, and 540 are all valid representations of the International Dateline and 0 and 360 are both valid representations of the Prime Meridian. Note, however, that the sequence of numerical longitude values stored in the netCDF file must be monotonic in a non-modulo sense.

Practical guidelines:

  1. If your grid begins at the International Dateline (-180°), then place your longitudes into the range -180..180.

  2. If your grid begins at the Prime Meridian (0°), then place your longitudes into the range 0..360.

COARDS data arrays

A COARDS-compliant netCDF file may contain several data arrays. In our example file shown above, there are two data arrays:

dimensions:
        time = UNLIMITED ; // (12 currently)
        lev = 72 ;
        lat = 181 ;
        lon = 360 ;
variables:``
        float PRPE(time, lev, lat, lon) ;
                PRPE:long_name = "Propene" ;
                PRPE:units = "kgC/m2/s" ;
                PRPE:add_offset = 0.f ;
                PRPE:missing_value = 1.e+15f ;
        float CO(time, lev, lat, lon) ;``
                CO:long_name = "CO" ;
                CO:units = "kg/m2/s" ;
                CO:_FillValue = 1.e+15f ;
                CO:missing_value = 1.e+15f ;

These arrays contain emissions for species tracers PRPE (lumped < C3 alkenes) and CO.

Attributes for data arrays

long_name

Gives a detailed description of the contents of the array.

units

Specifies the units of data contained within the array. SI units are preferred.

Special usage for HEMCO:

  • Use kg/m2/s or kg m-2 s-1 for emission fluxes of species

  • Use kg/m3 or kg m-3 for concentration data;

  • Use 1 for dimensionless data instead of unitless. HEMCO will recognize unitless, but it is non-standard and not recommended.

missing_value

Specifies the value that should represent missing data. This should be set to a number that will not be mistaken for a valid data value.

_FillValue

Synonym for missing_value. It is recommended to set both missing_value and _FillValue to the same value. Some data visualization packages look for one but not the other.

Ordering of the data

2D and 3D array variables in netCDF files must have specific dimension order. If the order is incorrect you will encounter netCDF read error “start+count exceeds dimension bound”. You can check the dimension ordering of your arrays by using the ncdump command as shown below:

$ ncdump file.nc -h

Be sure to check the dimensions listed next to the array name rather than the ordering of the dimensions listed at the top of the ncdump output.

The following dimension orders are acceptable:

array(time,lat,lon)
array(time,lat,lon,lev)

The rest of this section explains why the dimension ordering of arrays matters.

When you use ncdump to examine the contents of a netCDF file, you will notice that it displays the dimensions of the data in the opposite order with respect to Fortran. In our sample file, ncdump says that the CO and PRPE arrays have these dimensions:

CO(time,lev,lat,lon)
PRPE(time,lev,lat,lon)

But if you tried to read this netCDF file into GEOS-Chem (or any other program written in Fortran), you must use data arrays that have these dimensions:

CO(lon,lat,lev,time)
PRPE(lon,lat,lev,time)

Here’s why:

Fortran is a column-major language, which means that arrays are stored in memory by columns first, then by rows. If you have declared an arrays such as:

INTEGER            :: I, J, L, T
INTEGER, PARAMETER :: N_LON  = 360
INTEGER, PARAMETER :: N_LAT  = 181
INTEGER, PARAMETER :: N_LEV  = 72
INTEGER, PARAMTER  :: N_TIME = 12
REAL*4             :: CO  (N_LON,N_LAT,N_LEV,N_TIME)
REAL*4             :: PRPE(N_LON,N_LAT,N_LEV,N_TIME)

then for optimal efficiency, the leftmost dimension (I) needs to vary the fastest, and needs to be accessed by the innermost DO-loop. Then the next leftmost dimension (J) should be accessed by the next innermost DO-loop, and so on. Therefore, the proper way to loop over these arrays is:

DO T = 1, N_TIME
DO L = 1, N_LEV
DO J = 1, N_LAT
DO I = 1, N_LON
   CO  (I,J,L,N) = ...
   PRPE(I,J,L,N) = ...
ENDDO
ENDDO
ENDDO
ENDDO

Note that the I index is varying most often, since it is the innermost DO-loop, then J, L, and T. This is opposite to how a car’s odometer reads.

If you loop through an array in this fashion, with leftmost indices varying fastest, then the code minimizes the number of times it has to load subsections of the array into cache memory. In this optimal manner of execution, all of the array elements sitting in the cache memory are read in the proper order before the next array subsection needs to be loaded into the cache. But if you step through array elements in the wrong order, the number of cache loads is proportionally increased. Because it takes a finite amount of time to reload array elements into cache memory, the more times you have to access the cache, the longer it will take the code to execute. This can slow down the code dramatically.

On the other hand, C is a row-major language, which means that arrays are stored by rows first, then by columns. This means that the outermost do loop (I) is varying the fastest. This is identical to how a car’s odometer reads.

If you use a Fortran program to write data to disk, and then try to read that data from disk into a program written in C, then unless you reverse the order of the DO loops, you will be reading the array in the wrong order. In C you would have to use this ordering scheme (using Fortran-style syntax to illustrate the point):

DO I = 1, N_LON
DO J = 1, N_LAT
DO L = 1, N_LEV
DO T = 1, N_TIME
   CO(T,L,J,I)   = ...
   PRPE(T,L,J,I) = ...
ENDDO
ENDDO
ENDDO
ENDDO

Because ncdump is written in C, the order of the array appears opposite with respect to Fortran. The same goes for any other code written in a row-major programming language.

COARDS Global attributes

Global attributes are netCDF attributes that contain information about a netCDF file, as opposed to information about an individual data array.

From our example in the Examine the contents of a netCDF file, the output from ncdump showed that our sample netCDF file has several global attributes:

// global attributes:
            :Title = "COARDS/netCDF file containing X data"
            :Contact = "GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" ;
            :References = "www.geos-chem.org; wiki.geos-chem.org" ;
            :Conventions = "COARDS" ;
            :Filename = "my_sample_data_file.1x1"
            :History = "Mon Mar 17 16:18:09 2014 GMT" ;
            :ProductionDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
            :ModificationDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
            :VersionID = "1.2" ;
            :Format = "NetCDF-3" ;
            :Model = "GEOS5" ;
            :Grid = "GEOS_1x1" ;
            :Delta_Lon = 1.f ;
            :Delta_Lat = 1.f ;
            :SpatialCoverage = "global" ;
            :NLayers = 72 ;
            :Start_Date = 20050101 ;
            :Start_Time = 00:00:00.0 ;
            :End_Date = 20051231 ;
            :End_Time = 23:59:59.99999 ;
Title (or title)

Provides a short description of the file.

Contact (or contact)

Provides contact information for the person(s) who created the file.

References (or references)

Provides a reference (citation, DOI, or URL) for the data contained in the file.

Conventions (or conventions)

Indicates if the netCDF file adheres to a standard (e.g. COARDS or CF).

Filename (or filename)

Specifies the name of the file.

History (or history)

Specifies the datetime of file creation, and of any subsequent modifications.

Note

If you edit the file with nco or cdo, then this attribute will be updated to reflect the modification that was done.

Format (or format)

Specifies the format of the netCDF file (such as netCDF-3 or netCDF-4).

For more information

Please see our Work with netCDF files Supplemental Guide for more information about commands that you can use to combine, edit, or maniuplate data in netCDF files.