Download Input Data

Input data for GEOS-Chem is available at the GEOS-Chem Input Data portal. You may browse the contents of the data at this link: https://geos-chem.s3.amazonaws.com/index.html

The bashdatacatalog is the recommended method for downloading and managing your GEOS-Chem input data. Refer to the bashdatacatalog’s Instructions for GEOS-Chem Users. Below is a brief summary of using the bashdatacatalog for aquiring GCHP input data.

Install the bashdatacatalog

Install the bashdatacatalog with the following command. Follow the prompts and restart your console.

$ bash <(curl -s https://raw.githubusercontent.com/geoschem/bashdatacatalog/main/install.sh)

Note

You can rerun this command to upgrade to the latest version.

Download Data Catalogs

Catalog files can be downloaded from http://geoschemdata.wustl.edu/ExtData/DataCatalogs/.

The catalog files define the input data collections that GEOS-Chem needs. There are four catalogs files:

MeteorologicalInputs.csv – Meteorological input data collections
ChemistryInputs.csv – Chemistry input data collections
EmissionsInputs.csv – Emissions input data collections
InitialConditions.csv – Initial conditions input data collections (restart files)

The latter 3 are version specific, so you need to download the catalogs for the version you intend to use (you can have catalogs for multiple versions at the same time).

Create a directory to house your catalog files in the top-level of your GEOS-Chem input data directory (commonly known as ExtData). You should create subdirectories for version-specific catalog files.

$ cd /ExtData                   # navigate to GEOS-Chem data
$ mkdir InputDataCatalogs       # new directory for catalog files
$ mkdir InputDataCatalogs/14.4  # for 14.4-*-specific catalogs (example)

Next, download the catalog for the appropriate version:

$ cd InputDataCatalogs
$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/MeteorologicalInputs.csv
$ cd 14.4
$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/14.4/ChemistryInputs.csv
$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/14.4/EmissionsInputs.csv
$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/14.4/InitialConditions.csv

Fetching Metadata and Downloading Input Data

Important

You should always run bashdatacatalog commands from the top-level of your GEOS-Chem data directory (the directory with HEMCO/, CHEM_INPUTS/, etc.).

Before you can run bashdatacatalog-list commands, you need to fetch the metadata of each collection. This is done with the command bashdatacatalog-fetch whose arguments are catalog files:

$ cd /ExtData  # IMPORTANT: navigate to top-level of GEOS-Chem input data

$ bashdatacatalog-fetch InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv

Fetching downloads the latest metadata for every active collection in your catalogs. You should run bashdatacatalog-fetch whenever you add or modify a catalog, as well as periodically so you get updates to your collections (e.g., new meteorological data that is processed and added to the meteorological collections). Now that you have fetched, you can run bashdatacatalog-list commands. You can tailor this command the generate various types of file lists using its command-line arguments. See bashdatacatalog-list -h for details. A common use case is generating a list of required input files that missing in your local file system.

$ bashdatacatalog-list -am -r 2018-06-30,2018-08-01 InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv

Here, -a means “all” files (temporal files and static files), -m means “missing” (list files that are absent locally), -r START,END is the date-range of your simulation (you should add an extra day before/after your simulation), and the remaining arguments are the paths to your catalog files.

The command can be easily modified so that it generates a list of missing files that is compatible with xargs curl to download all the files you are missing:

$ bashdatacatalog-list -am -r 2018-06-30,2018-08-01 -f xargs-curl InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv | xargs curl

Here, -f xargs-curl means the output file list should be formatted for piping into xargs curl.

Download Input Data

Install the bashdatacatalog

Download Data Catalogs

Fetching Metadata and Downloading Input Data

See Also