Download Input Data

Input data for GEOS-Chem is available at http://geoschemdata.wustl.edu/ExtData/.

The bashdatacatalog is the recommended for downloading and managing your GEOS-Chem input data. Refer to the bashdatacatalog’s Instructions for GEOS-Chem Users. Below is a brief summary of using the bashdatacatalog for aquiring GCHP input data.

Install the bashdatacatalog

Install the bashdatacatalog with the following command. Follow the prompts and restart your console.

gcuser:~$ bash <(curl -s https://raw.githubusercontent.com/LiamBindle/bashdatacatalog/main/install.sh)

Note

You can rerun this command to upgrade to the latest version.

Download Data Catalogs

Catalog files can be downloaded from http://geoschemdata.wustl.edu/ExtData/DataCatalogs/.

The catalog files define the input data collections that GEOS-Chem needs. There are four catalogs files:

  • MeteorologicalInputs.csv – Meteorological input data collections

  • ChemistryInputs.csv – Chemistry input data collections

  • EmissionsInputs.csv – Emissions input data collections

  • InitialConditions.csv – Initial conditions input data collections (restart files)

The latter 3 are version specific, so you need to download the catalogs for the version you intend to use (you can have catalogs for multiple versions at the same time).

Create a directory to house your catalog files in the top-level of your GEOS-Chem input data directory (commonly known as “ExtData”). You should create subdirectories for version-specific catalog files.

gcuser:~$ cd /ExtData  # navigate to GEOS-Chem data
gcuser:/ExtData$ mkdir InputDataCatalogs       # new directory for catalog files
gcuser:/ExtData$ mkdir InputDataCatalogs/13.3  # " for 13.3-specific catalogs (example)

Next, download the catalog for the appropriate version:

gcuser:/ExtData$ cd InputDataCatalogs
gcuser:/ExtData/InputDataCatalogs$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/MeteorologicalInputs.csv
gcuser:/ExtData/InputDataCatalogs$ cd 13.3
gcuser:/ExtData/InputDataCatalogs/13.3$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/13.3/ChemistryInputs.csv
gcuser:/ExtData/InputDataCatalogs/13.3$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/13.3/EmissionsInputs.csv
gcuser:/ExtData/InputDataCatalogs/13.3$ wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/13.3/InitialConditions.csv

Fetching Metadata and Downloading Input Data

Important

You should always run bashdatacatalog commands from the top-level of your GEOS-Chem data directory (the directory with HEMCO/, CHEM_INPUTS/, etc.).

Before you can run bashdatacatalog-list commands, you need to fetch the metadata of each collection. This is done with the command bashdatacatalog-fetch whose arguments are catalog files:

gcuser:~$ cd /ExtData  # IMPORTANT: navigate to top-level of GEOS-Chem input data
gcuser:/ExtData$ bashdatacatalog-fetch InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv

Fetching downloads the latest metadata for every active collection in your catalogs. You should run bashdatacatalog-fetch whenever you add or modify a catalog, as well as periodically so you get updates to your collections (e.g., new meteorological data that is processed and added to the meteorological collections).

Now that you have fetched, you can run bashdatacatalog-list commands. You can tailor this command the generate various types of file lists using its command-line arguments. See bashdatacatalog-list -h for details. A common use case is generating a list of required input files that missing in your local file system.

gcuser:/ExtData$ bashdatacatalog-list -am -r 2018-06-30,2018-08-01 InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv

Here, -a means “all” files (temporal files and static files), -m means “missing” (list files that are absent locally), -r START,END is the date-range of your simulation (you should add an extra day before/after your simulation), and the remaining arguments are the paths to your catalog files.

The command can be easily modified so that it generates a list of missing files that is compatible with xargs curl to download all the files you are missing:

gcuser:/ExtData$ bashdatacatalog-list -am -r 2018-06-30,2018-08-01 -f xargs-curl InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv | xargs curl

Here, -f xargs-curl means the output file list should be formatted for piping into xargs curl.