AWS ParallelCluster Setup

Important

AWS ParallelCluster and FSx for Lustre costs several hundred dollars per month to use. See FSx for Lustre Pricing and EC2 Pricing for details.

This page has instructions on setting up AWS ParallelCluster for running GCHP simulations. AWS ParallelCluster is a service that lets you create your own HPC cluster. Using GCHP on AWS ParallelCluster is similar to using GCHP on any other HPC, so these instructions focus on AWS ParallelCluster setup, and the other GCHP documentation like Compile, Download Input Data, and Run the model is appropriate for using GCHP on AWS ParallelCluster.

The workflow for getting started with GCHP simulations using AWS ParallelCluster is

  1. Create an FSx for Lustre file system (described on this page)

  2. Configure AWS CLI (described on this page)

  3. Configure AWS ParallelCluster (described on this page)

  4. Build GCHP’s dependencies on your AWS ParallelCluster

  5. Follow the normal GCHP User Guide

    1. Download the model

    2. Compile

    3. Create a Run Directory

    4. Download Input Data

    5. Run the model

These instructions were written using AWS ParallelCluster 3.0.1.

1. Create an FSx for Lustre file system

Start by creating an FSx for Lustre file system. This is persistent storage that will be mounted to your AWS ParallelCluster cluster. This file system will be used for storing GEOS-Chem input data and for housing your GEOS-Chem run directories.

Refer to the official FSx for Lustre Instructions for instructions on creating the file system. Only Step 1, Create your Amazon FSx for Lustre file system, is necessary. Step 2, Install the Lustre client, and subsequent steps have instructions for mounting your file system to EC2 instances, but AWS ParallelCluster automates this for us.

In subsequent steps you will need the following information about your FSx for Lustre file system:

  • its ID (fs-XXXXXXXXXXXXXXXXX)

  • its subnet (subnet-YYYYYYYYYYYYYYYYY)

  • its security group that has the inbound network rules (sg-ZZZZZZZZZZZZZZZZZ).

Once you have created the file system, proceed with 2. AWS CLI Installation and First-Time Setup.

2. AWS CLI Installation and First-Time Setup

Next you need to make sure you have the AWS CLI installed and configured. The AWS CLI is a terminal command, aws, for working with AWS services. If you have already installed and configured the AWS CLI previously, continue to 3. Create your AWS ParallelCluster.

Install the aws command: Official AWS CLI Install Instructions. Once you have installed the aws command, you need to configure it with the credentials for your AWS account:

$ aws configure

For instructions on aws configure, refer to the Official AWS Instructions or this YouTube tutorial.

3. Create your AWS ParallelCluster

Note

You should also refer to the offical AWS documentation on Configuring AWS ParallelCluster. Those instructions will have the latest information on using AWS ParallelCluster. The instructions on this page are meant to supplement the official instructions, and point out the important parts of the configuration for use with GCHP.

Next, install AWS ParallelCluster with pip. This requires Python 3.

$ pip install aws-parallelcluster

Now you should have the pcluster command. You will use this command to performs actions like: creating a cluster, shutting your cluster down (temporarily), destroying a cluster, etc.

Create a cluster config file by running the pcluster configure command:

$ pcluster configure --config cluster-config.yaml

The following settings are recommended:

  • Scheduler: slurm

  • Operating System: alinux2

  • Head node instance type: c5n.large

  • Number of queues: 1

  • Compute instance type: c5n.18xlarge

  • Maximum instance count: Your choice. This is the maximum number execution nodes that can run concurrently. Execution nodes automatically spinup and shutdown according when there are jobs in your queue.

Now you should have a file name cluster-config.yaml. This the configuration file with setting for a cluster. Before starting your cluster with the pcluster create-cluster command, you need to modify cluster-config.yaml so that your FSx for Lustre file system is mounted to your cluster. Use the following cluster-config.yaml as a template for these changes.

Region: us-east-1  # [replace with] the region with your FSx for Lustre file system
Image:
  Os: alinux2
HeadNode:
  InstanceType: c5n.large  # smallest c5n node to minimize costs when head-node is up
  Networking:
    SubnetId: subnet-YYYYYYYYYYYYYYYYY  # [replace with] the subnet of your FSx for Lustre file system
    AdditionalSecurityGroups:
      - sg-ZZZZZZZZZZZZZZZZZ  # [replace with] the security group with inbound rules for your FSx for Lustre file system
  LocalStorage:
    RootVolume:
      VolumeType: io2
  Ssh:
    KeyName: AAAAAAAAAA  # [replace with] the name of your ssh key name for AWS CLI
SharedStorage:
  - MountDir: /fsx  # [replace with] where you want to mount your FSx for Lustre file system
    Name: FSxExtData
    StorageType: FsxLustre
    FsxLustreSettings:
      FileSystemId: fs-XXXXXXXXXXXXXXXXX  # [replace with] the ID of your FSx for Lustre file system
Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: main
    ComputeResources:
    - Name: c5n18xlarge
      InstanceType: c5n.18xlarge
      MinCount: 0
      MaxCount: 10  # max number of concurrent exec-nodes
      DisableSimultaneousMultithreading: true  # disable hyperthreading (recommended)
      Efa:
        Enabled: true
    Networking:
      SubnetIds:
      - subnet-YYYYYYYYYYYYYYYYY  # [replace with] the subnet of your FSx for Lustre file system (same as above)
      AdditionalSecurityGroups:
        - sg-ZZZZZZZZZZZZZZZZZ  # [replace with] the security group with inbound rules for your FSx for Lustre file system
      PlacementGroup:
        Enabled: true
    ComputeSettings:
      LocalStorage:
        RootVolume:
          VolumeType: io2

When you are ready, run the pcluster create-cluster command.

$ pcluster create-cluster --cluster-name pcluster --cluster-configuration cluster-config.yaml

It may take 30 minutes or an hour for your cluster’s status to change to CREATE_COMPLETE. You can check the status of you cluster with the following command.

$ pcluster describe-cluster --cluster-name pcluster

Once your cluster’s status is CREATE_COMPLETE, run the pcluster ssh command to ssh into it.

$ pcluster ssh --cluster-name pcluster -i ~/path/to/keyfile.pem

At this point, your cluster is set up and you can use it like any other HPC. Your next steps will be Building GCHP’s Dependencies followed by the normal instructions found in the User Guide.