AWS ParallelCluster Setup¶
This page has instructions on setting up AWS ParallelCluster for running GCHP simulations. AWS ParallelCluster is a service that lets you create your own HPC cluster. Using GCHP on AWS ParallelCluster is similar to using GCHP on any other HPC, so these instructions focus on AWS ParallelCluster setup, and the other GCHP documentation like Compile, Download Input Data, and Run the model is appropriate for using GCHP on AWS ParallelCluster.
The workflow for getting started with GCHP simulations using AWS ParallelCluster is
Create an FSx for Lustre file system (described on this page)
Configure AWS CLI (described on this page)
Configure AWS ParallelCluster (described on this page)
Build GCHP’s dependencies on your AWS ParallelCluster
Follow the normal GCHP User Guide
These instructions were written using AWS ParallelCluster 3.0.1.
1. Create an FSx for Lustre file system¶
Start by creating an FSx for Lustre file system. This is persistent storage that will be mounted to your AWS ParallelCluster cluster. This file system will be used for storing GEOS-Chem input data and for housing your GEOS-Chem run directories.
Refer to the official FSx for Lustre Instructions for instructions on creating the file system. Only Step 1, Create your Amazon FSx for Lustre file system, is necessary. Step 2, Install the Lustre client, and subsequent steps have instructions for mounting your file system to EC2 instances, but AWS ParallelCluster automates this for us.
In subsequent steps you will need the following information about your FSx for Lustre file system:
its ID (
its subnet (
its security group that has the inbound network rules (
Once you have created the file system, proceed with 2. AWS CLI Installation and First-Time Setup.
2. AWS CLI Installation and First-Time Setup¶
Next you need to make sure you have the AWS CLI installed and configured.
The AWS CLI is a terminal command,
aws, for working with AWS services.
If you have already installed and configured the AWS CLI previously, continue to 3. Create your AWS ParallelCluster.
aws command: Official AWS CLI Install Instructions.
Once you have installed the
aws command, you need to configure it with the credentials for your AWS account:
$ aws configure
3. Create your AWS ParallelCluster¶
You should also refer to the offical AWS documentation on Configuring AWS ParallelCluster. Those instructions will have the latest information on using AWS ParallelCluster. The instructions on this page are meant to supplement the official instructions, and point out the important parts of the configuration for use with GCHP.
Next, install AWS ParallelCluster with
pip. This requires Python 3.
$ pip install aws-parallelcluster
Now you should have the
You will use this command to performs actions like: creating a cluster, shutting your cluster down (temporarily), destroying a cluster, etc.
Create a cluster config file by running the pcluster configure command:
$ pcluster configure --config cluster-config.yaml
The following settings are recommended:
Operating System: alinux2
Head node instance type: c5n.large
Number of queues: 1
Compute instance type: c5n.18xlarge
Maximum instance count: Your choice. This is the maximum number execution nodes that can run concurrently. Execution nodes automatically spinup and shutdown according when there are jobs in your queue.
Now you should have a file name
This the configuration file with setting for a cluster.
Before starting your cluster with the pcluster create-cluster command, you need to modify
cluster-config.yaml so that your FSx for Lustre file system is mounted to your cluster.
Use the following
cluster-config.yaml as a template for these changes.
Region: us-east-1 # [replace with] the region with your FSx for Lustre file system Image: Os: alinux2 HeadNode: InstanceType: c5n.large # smallest c5n node to minimize costs when head-node is up Networking: SubnetId: subnet-YYYYYYYYYYYYYYYYY # [replace with] the subnet of your FSx for Lustre file system AdditionalSecurityGroups: - sg-ZZZZZZZZZZZZZZZZZ # [replace with] the security group with inbound rules for your FSx for Lustre file system LocalStorage: RootVolume: VolumeType: io2 Ssh: KeyName: AAAAAAAAAA # [replace with] the name of your ssh key name for AWS CLI SharedStorage: - MountDir: /fsx # [replace with] where you want to mount your FSx for Lustre file system Name: FSxExtData StorageType: FsxLustre FsxLustreSettings: FileSystemId: fs-XXXXXXXXXXXXXXXXX # [replace with] the ID of your FSx for Lustre file system Scheduling: Scheduler: slurm SlurmQueues: - Name: main ComputeResources: - Name: c5n18xlarge InstanceType: c5n.18xlarge MinCount: 0 MaxCount: 10 # max number of concurrent exec-nodes DisableSimultaneousMultithreading: true # disable hyperthreading (recommended) Efa: Enabled: true Networking: SubnetIds: - subnet-YYYYYYYYYYYYYYYYY # [replace with] the subnet of your FSx for Lustre file system (same as above) AdditionalSecurityGroups: - sg-ZZZZZZZZZZZZZZZZZ # [replace with] the security group with inbound rules for your FSx for Lustre file system PlacementGroup: Enabled: true ComputeSettings: LocalStorage: RootVolume: VolumeType: io2
When you are ready, run the pcluster create-cluster command.
$ pcluster create-cluster --cluster-name pcluster --cluster-configuration cluster-config.yaml
It may take 30 minutes or an hour for your cluster’s status to change to
You can check the status of you cluster with the following command.
$ pcluster describe-cluster --cluster-name pcluster
Once your cluster’s status is
CREATE_COMPLETE, run the pcluster ssh command to ssh into it.
$ pcluster ssh --cluster-name pcluster -i ~/path/to/keyfile.pem
At this point, your cluster is set up and you can use it like any other HPC. Your next steps will be Building GCHP’s Dependencies followed by the normal instructions found in the User Guide.