.. |br| raw:: html
.. _coards-guide: ##################################### Prepare COARDS-compliant netCDF files ##################################### On this page we discuss how you can generate netCDF data files in the proper format for HEMCO and and GEOS-Chem. .. _coards-guide-coards: ========================== The COARDS netCDF standard ========================== The `Harmonized Emissions Compionent (HEMCO) `_ reads data stored in `the netCDF file format `__, which is a common data format used in atmospheric and climate sciences. NetCDF files contain **data arrays** as well as **metadata**, which is a description of the data. Several netCDF conventions have been developed in order to facilitate data exchange and visualization. The `Cooperative Ocean Atmosphere Research Data Service (COARDS) standard `_ defines regular conventions for naming dimensions as well as the `attributes `__ describing the data. You will find more information about these conventions in the sections below. HEMCO requires its input data to be adhere to the COARDS standard. Our :ref:`our "Work with netCDF files" supplemental guide ` contains detailed instructions on how you can check a netCDF file for COARDS compliance. .. _coards-guide-dims: ================= COARDS dimensions ================= The **dimensions** of a netCDF file define how many grid boxes there are along a given direction. While the COARDS standard does not require any specific n ames for dimensions, accepted practice is to use these names for rectilinear grids: .. option:: time Specifies the number of points along the time (:literal:`T`) axis. The :option:`time` dimension must always be specified. When you create the netCDF file, you may declare :option:`time` to be :literal:`UNLIMITED` and then later define its size. This allows you to append further time points into the file later on. .. option:: lev Specifies the number of points along the vertical level (:literal:`Z`) axis. This dimension may be omitted none of the data arrays in the netCDF file have a vertical dimension. .. option:: lat Specifies the number of points along the latitude (:literal:`Y`) axis. .. option:: lon Specifies the number of points along the longitude (:literal:`X`) axis. .. note:: For non-rectilinear grids (e.g. cubed-sphere), the :option:`lat` and :option:`lon` dimensions may be named :literal:`NY` and :literal:`NX` instead. .. _coards-guide-coordvec: ========================= COARDS coordinate vectors ========================= **Coordinate vectors** (aka **index variables** or **axis variables**) are 1-dimensional arrays that define the values along each axis. The only COARDS requirement for coordinate vectors are these: #. Each coordinate vector must be given the same name as the dimension that is used to define it. #. All of the values contained within a coordinate vector must be either monotonically increasing or monotonically decreasing. .. _coards-guide-coordvec-time: time ---- A COARDS-compliant :option:`time` coordinate vector will have these features: .. code-block:: console dimensions time = UNLIMITED ; // (12 currently) . . . variables double time(time) ; time:long_name = "time" ; time:units = "hours since 2010-01-01 00:00:00" ; time:calendar = "standard" ; time:axis = "T"; .. note:: The above was generated by the :command:`ncdump` command. As you can see, :option:`time` is an 8-byte floating point (aka :code:`REAL*8` with 12 time points. The :option:`time` coordinate vector has following attributes: .. option:: time:long_name A detailed description of the contents of this array. This is usually set to :literal:`time` or :literal:`Time`. .. option:: time:units Specifies the number of hours, minutes, seconds, etc. that has elapsed with respect to a reference datetime :literal:`YYYY-MM-DD hh:mn:ss`. Set this to one of the folllowing values: - :literal:`"days since YYYY-MM-DD hh:mn:ss"` - :literal:`"hours since YYYY-MM-DD hh:mn:ss"` - :literal:`"minutes since YYYY-MM-DD hh:mn:ss"` - :literal:`"seconds since YYYY-MM-DD hh:mn:ss"` .. tip:: We recommend that you choose the reference datetime to correspond to the first time value in the file (i.e. :literal:`time(0) = 0`). .. option:: time:calendar Specifies the calendar used to define the time system. Set this to one of the following values: .. option:: standard Synonym for :option:`gregorian`. .. option:: gregorian Selects the Gregorian calendar system. .. option:: time:axis Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this coordinate vector. Set this to :literal:`T`. .. _coards-guide-additional-time: Special considerations for time vectors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #. We recommend that index variables (such as :literal:`time`) be declared with type :literal:`float` or :literal:`double`. `GCHP `_ cannot parse files with that have index variables of type :literal:`int`. |br| |br| #. We have noticed that netCDF files having a :option:`time:units` reference datetime prior to :literal:`1900/01/01 00:00:00` may not be read properly when using `HEMCO `_ or `GCHP `_ within an ESMF environment. We therefore recommend that you use reference datetime values after 1900 whenever possible. |br| |br| #. Weekly data must contain seven time slices in increments of one day. The first entry must represent Sunday data, regardless of the real weekday of the assigned datetime. It is possible to store weekly data for more than one time interval, in which case the first weekday (i.e. Sunday) must hold the starting date for the given set of (seven) time slices. - For instance, weekly data for every month of a year can be stored as 12 sets of 7 time slices. The reference datetime of the first entry of each set must fall on the first day of every month, and the following six entries must be increments of one day. Currently, weekly data from netCDF files is not correctly read in an ESMF environment. .. _coards-guide-coordvec-lev: lev --- A COARDS-compliant :option:`lev` coordinate vector will have these features: .. code-block:: console dimensions: lev = 72 ; . . . variables: double lev(lev) ; lev:long_name = "level" ; lev:units = "level" ; lev:positive = "up" ; lev:axis = "Z" ; Here, :option:`lev` is an 8-byte floating point (aka :literal:`REAL*8`) with 72 levels. The :option:`lev` coordinate vector has the following attributes: .. option:: lev:long_name A detailed description of the contents of this array. You may set this to values such as: - :literal:`"level"` - :literal:`"GEOS-Chem levels"` - :literal:`"Eta centers"` - :literal:`"Sigma centers"` .. option:: lev:units **(Required)** Specifies the units of vertical levels. Set this to one of the following: - :literal:`"levels"` - :literal:`"eta_level"` - :literal:`"sigma_level"` .. important:: If you set :literal:`long_name:` to :literal:`level` as well, then HEMCO will be able to regrid between GEOS-Chem vertical grids. .. option:: lev:axis Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this coordinate vector. Set this to :literal:`Z`. .. option:: lev:positive Specifies the direction in which the vertical dimension is indexed. Set this to one of these values: - :literal:`"up"` (Level 1 is the surface, and level indices increase upwards) - :literal:`"down"` (Level 1 is the atmosphere top, and level indices increase downwards) For emisisons and most other data sets, you can set :option:`lev:positive` to :literal:`"up"`. .. important:: GCHP and the NASA GEOS-ESM use a vertical grid where :option:`lev:positive` is :literal:`"down"`. .. _coards-guide-additional-lev: Additional considerations for lev vectors: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When using `GEOS-Chem `_ or `HEMCO `_ in a non-ESMF environment, data is interpolated onto the simulation levels if the input data is on vertical levels other than the HEMCO model levels (see `HEMCO vertical regridding `_). Data on non-model levels must be on a hybrid sigma pressure coordinate system. In order to properly determine the vertical pressure levels of the input data, the file must contain the surface pressure values and the hybrid coefficients (a, b) of the coordinate system. Furthermore, the level variable must contain the attributes standard_name and formula_terms (the attribute positive is recommended but not required). A header excerpt of a valid netCDF file is shown below: .. code-block:: console float lev(lev) ; lev:standard_name = ”atmosphere_hybrid_sigma_pressure_coordinate” ; lev:units = ”level” ; lev:positive = ”down” ; lev:formula_terms = ”ap: hyam b: hybm ps: PS” ; float hyam(nhym) ; hyam:long_name = ”hybrid A coefficient at layer midpoints” ; hyam:units = ”hPa” ; float hybm(nhym) ; hybm:long_name = ”hybrid B coefficient at layer midpoints” ; hybm:units = ”1” ; float time(time) ; time:standard_name = ”time” ; time:units = ”days since 2000-01-01 00:00:00” ; time:calendar = ”standard” ; float PS(time, lat, lon) ; PS:long_name = ”surface pressure” ; PS:units = ”hPa” ; float EMIS(time, lev, lat, lon) ; EMIS:long_name = ”emissions” ; EMIS:units = ”kg m-2 s-1” ; .. _coards-guide-coordvec-lat: lat --- A COARDS-compliant :option:`lat` coordinate vector will have these features: .. code-block:: console dimensions: lat = 181 ; variables:`` double lat(lat) ; lat:long_name = "Latitude" ; lat:units = "degrees_north" ; lat:axis = "Y" ; Here, :option:`lat` is an 8-byte floating point (aka :literal:`REAL*8`) with 181 values. The :option:`lat` coordinate vector has the following attributes: .. option:: lat:long_name A detailed description of the contents of this array. Set this to :literal:`Latitude`. .. option:: lat:units Specifies the units of latitude. Set this to :literal:`degrees_north`. .. option:: lat:axis Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this coordinate vector. Set this to :literal:`Y`. .. _coards-guide-coordvec-lon: lon --- A COARDS-compliant :option:`lat` coordinate vector will have these features: .. code-block:: console dimensions: lon = 360 ; variables:`` double lon(lon) ; lon:long_name = "Longitude" ; lon:units = "degrees_east" ; lon:axis = "X" ; Here, :option:`lon` is an 8-byte floating point (aka :literal:`REAL*8`) with 360 values. The :option:`lon` coordinate vector has following attributes: .. option:: lon:long_name A detailed description of the contents of this array. Set this to :literal:`Longitude`. .. option:: lon:units Specifies the units of latitude. Set this to :literal:`degrees_east`. .. option:: lon:axis Identifies the axis :literal:`(X,Y,Z,T)` corresponding to this coordinate vector. Set this to :literal:`X`. Longitudes may be represented modulo 360. For example, -180, 180, and 540 are all valid representations of the International Dateline and 0 and 360 are both valid representations of the Prime Meridian. Note, however, that the sequence of numerical longitude values stored in the netCDF file must be monotonic in a non-modulo sense. Practical guidelines: #. If your grid begins at the International Dateline (-180°), then place your longitudes into the range -180..180. #. If your grid begins at the Prime Meridian (0°), then place your longitudes into the range 0..360. .. _coards-guide-data: ================== COARDS data arrays ================== A COARDS-compliant netCDF file may contain several **data arrays**. In our example file shown above, there are two data arrays: .. code-block:: console dimensions: time = UNLIMITED ; // (12 currently) lev = 72 ; lat = 181 ; lon = 360 ; variables:`` float PRPE(time, lev, lat, lon) ; PRPE:long_name = "Propene" ; PRPE:units = "kgC/m2/s" ; PRPE:add_offset = 0.f ; PRPE:missing_value = 1.e+15f ; float CO(time, lev, lat, lon) ;`` CO:long_name = "CO" ; CO:units = "kg/m2/s" ; CO:_FillValue = 1.e+15f ; CO:missing_value = 1.e+15f ; These arrays contain emissions for species tracers PRPE (lumped < C3 alkenes) and CO. .. _coards-guide-data-attr: Attributes for data arrays -------------------------- .. option:: long_name Gives a detailed description of the contents of the array. .. option:: units Specifies the units of data contained within the array. SI units are preferred. Special usage for HEMCO: - Use :literal:`kg/m2/s` or :literal:`kg m-2 s-1` for emission fluxes of species - Use :literal:`kg/m3` or :literal:`kg m-3` for concentration data; - Use :literal:`1` for dimensionless data instead of :literal:`unitless`. HEMCO will recognize :literal:`unitless`, but it is non-standard and not recommended. .. option:: missing_value Specifies the value that should represent missing data. This should be set to a number that will not be mistaken for a valid data value. .. option:: _FillValue Synonym for :option:`missing_value`. It is recommended to set both :option:`missing_value` and :option:`_FillValue` to the same value. Some data visualization packages look for one but not the other. .. _coards-guide-data-ordering: Ordering of the data -------------------- 2D and 3D array variables in netCDF files must have specific dimension order. If the order is incorrect you will encounter netCDF read error "start+count exceeds dimension bound". You can check the dimension ordering of your arrays by using the :command:`ncdump` command as shown below: .. code-block:: console $ ncdump file.nc -h Be sure to check the dimensions listed next to the array name rather than the ordering of the dimensions listed at the top of the :command:`ncdump` output. The following dimension orders are acceptable: .. code-block:: console array(time,lat,lon) array(time,lat,lon,lev) The rest of this section explains why the dimension ordering of arrays matters. When you use :command:`ncdump` to examine the contents of a netCDF file, you will notice that it displays the dimensions of the data in the opposite order with respect to Fortran. In our sample file, :command:`ncdump` says that the CO and PRPE arrays have these dimensions: .. code-block:: console CO(time,lev,lat,lon) PRPE(time,lev,lat,lon) But if you tried to read this netCDF file into GEOS-Chem (or any other program written in Fortran), you must use data arrays that have these dimensions: .. code-block:: console CO(lon,lat,lev,time) PRPE(lon,lat,lev,time) Here's why: Fortran is a **column-major** language, which means that arrays are stored in memory by columns first, then by rows. If you have declared an arrays such as: .. code-block:: fortran INTEGER :: I, J, L, T INTEGER, PARAMETER :: N_LON = 360 INTEGER, PARAMETER :: N_LAT = 181 INTEGER, PARAMETER :: N_LEV = 72 INTEGER, PARAMTER :: N_TIME = 12 REAL*4 :: CO (N_LON,N_LAT,N_LEV,N_TIME) REAL*4 :: PRPE(N_LON,N_LAT,N_LEV,N_TIME) then for optimal efficiency, the leftmost dimension (:code:`I`) needs to vary the fastest, and needs to be accessed by the innermost DO-loop. Then the next leftmost dimension (:code:`J`) should be accessed by the next innermost DO-loop, and so on. Therefore, the proper way to loop over these arrays is: .. code-block:: fortran DO T = 1, N_TIME DO L = 1, N_LEV DO J = 1, N_LAT DO I = 1, N_LON CO (I,J,L,N) = ... PRPE(I,J,L,N) = ... ENDDO ENDDO ENDDO ENDDO Note that the :code:`I` index is varying most often, since it is the innermost DO-loop, then :code:`J`, :code:`L`, and :code:`T`. This is opposite to how a car's odometer reads. If you loop through an array in this fashion, with leftmost indices varying fastest, then the code minimizes the number of times it has to load subsections of the array into cache memory. In this optimal manner of execution, all of the array elements sitting in the cache memory are read in the proper order before the next array subsection needs to be loaded into the cache. But if you step through array elements in the wrong order, the number of cache loads is proportionally increased. Because it takes a finite amount of time to reload array elements into cache memory, the more times you have to access the cache, the longer it will take the code to execute. This can slow down the code dramatically. On the other hand, C is a **row-major** language, which means that arrays are stored by rows first, then by columns. This means that the outermost do loop (:code:`I`) is varying the fastest. This is identical to how a car's odometer reads. If you use a Fortran program to write data to disk, and then try to read that data from disk into a program written in C, then unless you reverse the order of the DO loops, you will be reading the array in the wrong order. In C you would have to use this ordering scheme (using Fortran-style syntax to illustrate the point): .. code-block:: fortran DO I = 1, N_LON DO J = 1, N_LAT DO L = 1, N_LEV DO T = 1, N_TIME CO(T,L,J,I) = ... PRPE(T,L,J,I) = ... ENDDO ENDDO ENDDO ENDDO Because :program:`ncdump` is written in C, the order of the array appears opposite with respect to Fortran. The same goes for any other code written in a row-major programming language. .. _coards-guide-global-attr: ======================== COARDS Global attributes ======================== **Global attributes** are `netCDF attributes `_ that contain information about a netCDF file, as opposed to information about an individual data array. From our example in the :ref:`Examine the contents of a netCDF file `, the output from :command:`ncdump` showed that our sample netCDF file has several global attributes: .. code-block:: console // global attributes: :Title = "COARDS/netCDF file containing X data" :Contact = "GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" ; :References = "www.geos-chem.org; wiki.geos-chem.org" ; :Conventions = "COARDS" ; :Filename = "my_sample_data_file.1x1" :History = "Mon Mar 17 16:18:09 2014 GMT" ; :ProductionDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ; :ModificationDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ; :VersionID = "1.2" ; :Format = "NetCDF-3" ; :Model = "GEOS5" ; :Grid = "GEOS_1x1" ; :Delta_Lon = 1.f ; :Delta_Lat = 1.f ; :SpatialCoverage = "global" ; :NLayers = 72 ; :Start_Date = 20050101 ; :Start_Time = 00:00:00.0 ; :End_Date = 20051231 ; :End_Time = 23:59:59.99999 ; .. option:: Title (or title) Provides a short description of the file. .. option:: Contact (or contact) Provides contact information for the person(s) who created the file. .. option:: References (or references) Provides a reference (citation, DOI, or URL) for the data contained in the file. .. option:: Conventions (or conventions) Indicates if the netCDF file adheres to a standard (e.g. COARDS or CF). .. option:: Filename (or filename) Specifies the name of the file. .. option:: History (or history) Specifies the datetime of file creation, and of any subsequent modifications. .. note:: If you edit the file with :program:`nco` or :program:`cdo`, then this attribute will be updated to reflect the modification that was done. .. option:: Format (or format) Specifies the format of the netCDF file (such as :literal:`netCDF-3` or :literal:`netCDF-4`). .. _coards-guide-more-info: ==================== For more information ==================== Please see our :ref:`ncguide` Supplemental Guide for more information about commands that you can use to combine, edit, or maniuplate data in netCDF files.