diff --git a/Tutorial.md b/Tutorial.md index 7698cf4..56d52c1 100644 --- a/Tutorial.md +++ b/Tutorial.md @@ -104,7 +104,7 @@ Note: Images are also referred to as **AMI**s (Amazon Machine Images). Launch Instance **Step 1: Choose AMI** -Here we choose the base AMI (image) for our instance. Please scroll until you see Ubuntu Server 18.04 and click "Select" on the right. +Here we choose the base AMI (image) for our instance. Please scroll until you see Ubuntu Server 20.04 and click "Select" on the right. **Step 2: Choose an Instance Type** Next, we choose an instance type. Here we can decide how powerful our machine is. The caveat is that more powerful machines cost more per hour. To see the prices, follow [this link](https://aws.amazon.com/ec2/pricing/on-demand/). Since we are only installing software, let's choose a lower performance instance, the 't2.micro'. Then click on 'Next: Configure Instance Details' at the bottom right. @@ -113,7 +113,7 @@ Next, we choose an instance type. Here we can decide how powerful our machine is There are many options on this page, but you can ignore most of them. The one that is good to know is the 'Request Spot Instances' option at the top. Do not click on it now, but in the future when you run long jobs, you should choose this option as spot instances can save you a lot of money. For more information, see the [appendix item on Spot Instances](#spot-instances). For now, just click "Next" at the bottom right. **Step 4: Add Storage** -On this page, you can set the storage space for your instance. Let's set this to 50 GB to give ourselves some room. Please not this change will not make t2.micro eligible for free tier anymore. +On this page, you can set the storage space for your instance. Let's set this to 50 GB to give ourselves some room. Please note that this change will not make t2.micro eligible for free tier anymore. **Step 5: Add Tags** On this page you can add tags. This is only useful if you have many servers and you want to organize them all using tags. Click "Next". @@ -121,7 +121,7 @@ On this page you can add tags. This is only useful if you have many servers and **Step 6: Configure Security Group** On this page you can define which ports are open for your instance. By default, 22 will be open for SSH. There will be a warning that any IP address can access your instance. If you'd like you can fix this by specifying your device's IP address on this page to restrict access to your machine, but this isn't required. -We're going to open two more ports so that we can connect to a Jupyter notebook and R Studio Server on our instance. Click "Add Rule" twice and set up the new rules as shown in the image: +We're going to open two more ports so that we can connect to a Jupyter notebook (custom TCP port 8888) and R Studio Server (custom TCP port 8787) on our instance. Click "Add Rule" twice and set up the new rules as shown in the image: Configure Ports @@ -151,7 +151,7 @@ ssh -i "AWS-tutorial.pem" ubuntu@ec2-35-167-139-94.us-west-2.compute.amazonaws.c Now you have a computer in the cloud! Congratulations! So what can we do with it? Not much initially - first we'll have to install some software tools. -The bare Ubuntu 18.04 instance we launched has Python 3.6 installed already, but we'll need to install 'pip' to download other packages: +The bare Ubuntu 20.04 instance we launched has Python 3.8 installed already, but we'll need to install 'pip' to download other packages: ``` sudo apt-get update @@ -178,26 +178,48 @@ Now, on the left side of the dashboard, you can select 'AMIs' under the 'Images' # AWS ParallelCluster -[AWS ParallelCluster](https://aws.amazon.com/blogs/opensource/aws-parallelcluster/) introduces in 2018 is an AWS supported Open Source cluster management tool that makes it easy for you to deploy and manage High Performance Computing (HPC) clusters in the AWS cloud. +[AWS ParallelCluster](https://aws.amazon.com/blogs/opensource/aws-parallelcluster/) introduced in 2018 is an [AWS supported Open Source cluster management tool](https://github.com/aws/aws-parallelcluster) that makes it easy for you to deploy and manage High Performance Computing (HPC) clusters in the AWS cloud. It automatically sets up the required compute resources and a shared filesystem and offers a variety of batch schedulers such as AWS Batch, SGE, Torque, and Slurm. AWS ParallelCluster facilitates both quick start proof of concepts (POCs) and production deployments. ## Installing AWS ParallelCluster -### Linux/OSX - +For more details, and other options please see the [documentation](https://docs.aws.amazon.com/parallelcluster/). First setup a `virtualenv` for the Python package. If you do not have virtualenv installed, install it ``` -sudo pip install aws-parallelcluster +python3 -m pip install --upgrade pip +python3 -m pip install --user --upgrade virtualenv +``` +Then open a new terminal, and create a virtualenv +``` +python3 -m virtualenv awscluster +source awscluster/bin/activate +``` +Install aws-parallelcluster +``` +python3 -m pip install --upgrade "aws-parallelcluster" ``` -Windows support is experimental. For Windows see [here](https://aws-parallelcluster.readthedocs.io/en/latest/getting_started.html). - -For OSX, you might need to update your path following [these directions](https://docs.aws.amazon.com/parallelcluster/latest/ug/install-macos.html). -Install [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-macos.html): +Install [AWS CLI](https://docs.aws.amazon.com/cli/latest). +- Linux X86-64 ``` -curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" -unzip awscli-bundle.zip -sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws +curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip" -o "awscliv2.zip" +unzip awscliv2.zip +sudo ./aws/install +``` +- Linux ARM +``` +curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip" -o "awscliv2.zip" +unzip awscliv2.zip +sudo ./aws/install +``` +- Mac OS X +``` +curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg" +sudo installer -pkg AWSCLIV2.pkg -target / +``` +- Windows +``` +msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi ``` Check the installation: ``` @@ -206,7 +228,7 @@ aws --version ## Configuring AWS ParallelCluster -First you’ll need to setup your IAM credentials, see [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) for more information. +First you’ll need to setup your IAM credentials, see [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) for more information as well as [the parallel cluster IAM roles](https://docs.aws.amazon.com/parallelcluster/latest/ug/iam-roles-in-parallelcluster-v3.html). It is recommended to setup a separate IAM user. ``` $ aws configure @@ -218,92 +240,147 @@ Default output format [None]: Once installed you will need to setup some initial config. The easiest way to do this is below: ``` -$ pcluster configure +$ pcluster configure --config cluster-config.yaml ``` -This configure wizard will prompt you for everything you need to create your cluster. You will first be prompted for your cluster template name, which is the logical name of the template you will create a cluster from. +This configure wizard will prompt you for everything you need to create your cluster. You will first be prompted for the AWS region of your cluster. Choose the region from the list of valid AWS region identifiers in which you’d like your cluster to run. ``` -Cluster Template [mycluster]: +Allowed values for AWS Region ID: +1. af-south-1 +2. ap-northeast-1 +3. ap-northeast-2 +4. ap-south-1 +5. ap-southeast-1 +6. ap-southeast-2 +7. ca-central-1 +8. eu-central-1 +9. eu-north-1 +10. eu-west-1 +11. eu-west-2 +12. eu-west-3 +13. sa-east-1 +14. us-east-1 +15. us-east-2 +16. us-west-1 +17. us-west-2 +AWS Region ID [us-east-1]: +``` +Next, you will need to choose a key pair that already exists in EC2 in order to log into your master instance. If you do not already have a key pair, refer to the EC2 documentation on [EC2 Key Pairs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html). +``` +Acceptable Values for EC2 Key Pair Name: +1. keypair1 +2. keypair-test +3. production-key +EC2 Key Pair Name [keypair1]: 1 ``` -Now, you will be presented with a list of valid AWS region identifiers. Choose the region in which you’d like your cluster to run. +Choose the scheduler, ``` -Acceptable Values for AWS Region ID: - us-east-1 - cn-north-1 - ap-northeast-1 - eu-west-1 - ap-southeast-1 - ap-southeast-2 - us-west-2 - us-gov-west-1 - us-gov-east-1 - us-west-1 - eu-central-1 - sa-east-1 -AWS Region ID []: +Allowed values for Scheduler: +1. slurm +2. awsbatch +Scheduler [slurm]: 1 ``` -Choose a descriptive name for your VPC. Typically, this will be something like production or test. +Next, choose the operating system ``` -VPC Name [myvpc]: +Allowed values for Operating System: +1. alinux2 +2. centos7 +3. ubuntu1804 +4. ubuntu2004 +Operating System [alinux2]: 4 ``` -Next, you will need to choose a key pair that already exists in EC2 in order to log into your master instance. If you do not already have a key pair, refer to the EC2 documentation on [EC2 Key Pairs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html). +Indicate the type of EC2 instance for the headnode ``` -Acceptable Values for Key Name: - keypair1 - keypair-test - production-key -Key Name []: +Head node instance type [t2.micro]: t2.micro ``` -Choose the VPC ID into which you’d like your cluster launched. - +Indicate the number of queues ``` -Acceptable Values for VPC ID: - vpc-1kd24879 - vpc-blk4982d -VPC ID []: +Number of queues [1]: 1 ``` -Finally, choose the subnet in which you’d like your master server to run. - +the name of the queue +``` +Name of queue 1 [queue1]: queue1 +``` +Different types of nodes that can be provisioned from that queue +``` +Number of compute resources for queue1 [1]: 1` +``` +Type of compute instance +``` +Compute instance type for compute resource 1 in queue1 [t2.micro]: t2.micro +``` +Maximum number of instances that can be provisioned (note that this determines maximum billing): +``` +Maximum instance count [10]: 4 +``` +It is easiest to allow automated provisioning of the network, though this can be further customised. +``` +Automate VPC creation? (y/n) [n]: y ``` -Acceptable Values for Master Subnet ID: - subnet-9k284a6f - subnet-1k01g357 - subnet-b921nv04 -Master Subnet ID []: +Then choose the availability zone +``` +Allowed values for Availability Zone: +1. us-east-1a +2. us-east-1b +3. us-east-1c +4. us-east-1d +5. us-east-1e +6. us-east-1f +Availability Zone [us-east-1a]: 1 +``` +Finally indicate the network configuration +``` +Allowed values for Network Configuration: +1. Head node in a public subnet and compute fleet in a private subnet +2. Head node and compute fleet in the same public subnet +Network Configuration [Head node in a public subnet and compute fleet in a private subnet]: 1 +``` +After which you should see a message similar to +``` +Beginning VPC creation. Please do not leave the terminal until the creation is finalized +Creating CloudFormation stack... +Do not leave the terminal until the process has finished. +Stack Name: parallelclusternetworking-pubpriv-20320129080552 (id: arn:aws:cloudformation:us-east-1:512698773361:stack/parallelclusternetworking-pubpriv-20320129080552/3bce2dd1-9011-11ef-9d25-0e856fe96fgd) +Status: parallelclusternetworking-pubpriv-20320129080552 - CREATE_COMPLETE +The stack has been created. +Configuration file written to cluster-config.yaml + ``` ## Creating your First Cluster Once all of those settings contain valid values, you can launch the cluster by running the create command: ``` -$ pcluster create mycluster +$ pcluster create-cluster --cluster-configuration cluster-config.yaml --cluster-name test-cluster --region us-east-1 ``` -The message “CREATE_COMPLETE” shows that the cluster created successfully. It also provided us with the public and private IP addresses of our master node. We’ll need this IP to log in. - -This operation might take a few minutes based on the size of your clusters. In some case, it might help to open the config file and add the ```initial_count =``` field. - +The message “CREATE_IN_PROGRESS” shows that the cluster is being created. Check for the status of creation using ``` -vim ~/.parallelcluster/config +$ pcluster list-clusters ``` -Your path to the config file might be different. In the config file you can also specify one of your AMIs where you might have already installed software you need. - -Once it's done, you should see something like: +and wait until you get a message `“CREATE_COMPLETE”`. This operation might take a few minutes based on the size of your clusters. Once it's done, you should see something like: ``` -$ pcluster create mycluster -Beginning cluster creation for cluster: mycluster -Creating stack named: parallelcluster-mycluster -Status: parallelcluster-mycluster - CREATE_COMPLETE -ClusterUser: ubuntu -MasterPrivateIP: 10.0.0.237 -$ +$ pcluster list-clusters +{ + "clusters": [ + { + "clusterName": "test-cluster", + "cloudformationStackStatus": "CREATE_COMPLETE", + "cloudformationStackArn": "arn:aws:cloudformation:us-east-1:512698773361:stack/test-cluster/d473a1g0-8911-12ec-9252-0a1ba9938529", + "region": "us-east-1", + "version": "3.0.3", + "clusterStatus": "CREATE_COMPLETE" + } + ] +} + ``` ## Logging into Your Master Instance You’ll use your OpenSSH pem file to log into your master instance. ``` -$ pcluster ssh mycluster -i AWS-tutorial.pem +$ pcluster ssh --cluster-name test-cluster -i keypair1.pem ``` Remember the path/name to your key might be different. @@ -357,30 +434,79 @@ We can see that our job successfully ran on instance "compute-st-t2micro-1". Once you're done with your cluster, remember to shut it down: ``` -pcluster deleter mycluster +pcluster delete-cluster --cluster-name test-cluster ``` ## Running an MPI Job with AWS ParallelCluster and awsbatch Scheduler -Once you have created an AWS ParallelCluster as shown above, we can implement a different configuration and [run an MPI job on it using ```awsbatch``` as workload manager](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_03_batch_mpi.html). +Create a new AWS ParallelCluster, and choose a different configuration to [run an MPI job on it using ```awsbatch``` as workload manager](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_03_batch_mpi.html). -You might first need to delete the old config file: ``` -vim ~/.parallelcluster/config +$ pcluster configure --config cluster-config-batch.yaml ``` +This configure wizard will prompt you for everything you need to create your cluster. You will first be prompted for the AWS region of your cluster. Choose the region from the list of valid AWS region identifiers in which you’d like your cluster to run. -Then, you need to repeat the configuration but this timw we want to use ```awsbatch``` as workload manager instead of ```slurm```. +``` +Allowed values for AWS Region ID: +1. af-south-1 +2. ap-northeast-1 +3. ap-northeast-2 +4. ap-south-1 +5. ap-southeast-1 +6. ap-southeast-2 +7. ca-central-1 +8. eu-central-1 +9. eu-north-1 +10. eu-west-1 +11. eu-west-2 +12. eu-west-3 +13. sa-east-1 +14. us-east-1 +15. us-east-2 +16. us-west-1 +17. us-west-2 +AWS Region ID [us-east-1]: +``` +Next, you will need to choose a key pair that already exists in EC2 in order to log into your master instance. If you do not already have a key pair, refer to the EC2 documentation on [EC2 Key Pairs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html). +``` +Acceptable Values for EC2 Key Pair Name: +1. keypair1 +2. keypair-test +3. production-key +EC2 Key Pair Name [keypair1]: 1 +``` +Choose the scheduler, ``` -pcluster configure +Allowed values for Scheduler: +1. slurm +2. awsbatch +Scheduler [slurm]: 2 +``` +In this case, only Amazon Linux is available as the operating system for the head node, so +unlike with SLURM, there is no choice for the operating system. Next choose the EC2 instance type +for the login node, +``` +Head node instance type [t2.micro]: t2.micro +``` +Indicate the name of the queue +``` +Name of queue 1 [queue1]: +``` +Maximum number of instances that can be provisioned (note that this determines maximum billing): +``` +Maximum instance count [10]: 4 +``` +It is easiest to allow automated provisioning of the network, though this can be further customised. +``` +Automate VPC creation? (y/n) [n]: y ``` -You can also use the ```--config /Users/gguidi/.parallelcluster/config-2``` option to change the configuration file name (this is useful when creating multiple clusters simultaneously). Once the configuration is complete you should be able to create your cluster typing: ``` -pcluster create -c ~/.parallelcluster/config mpi-cluster +pcluster create-cluster --cluster-configuration cluster-config-batch.yaml --cluster-name batch-cluster --region us-east-1 ``` This operation might takes a few minutes. @@ -388,7 +514,7 @@ This operation might takes a few minutes. Once it's completed you can log in as: ``` -pcluster ssh mpi-cluster -i ~/AWS-tutorial.pem +pcluster ssh batch-cluster -i ~/AWS-tutorial.pem ``` Remember to substitute the above command with your AWS key. @@ -396,27 +522,22 @@ Once you are logged in, run the commands ```awsbqueues``` and ```awsbhosts``` to ``` [ec2-user@ip-10-0-0-11 ~]$ awsbqueues -/usr/lib/python2.7/site-packages/boto3/compat.py:86: PythonDeprecationWarning: Boto3 will no longer support Python 2.7 starting July 15, 2021. To continue receiving service updates, bug fixes, and security updates please upgrade to Python 3.6 or later. More information can be found here: https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-python-2-7-in-aws-sdk-for-python-and-aws-cli-v1/ - warnings.warn(warning, PythonDeprecationWarning) jobQueueName status ------------------------ -------- JobQueue-05db6c48e4a87f5 VALID [ec2-user@ip-10-0-0-11 ~]$ awsbhosts -/usr/lib/python2.7/site-packages/boto3/compat.py:86: PythonDeprecationWarning: Boto3 will no longer support Python 2.7 starting July 15, 2021. To continue receiving service updates, bug fixes, and security updates please upgrade to Python 3.6 or later. More information can be found here: https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-python-2-7-in-aws-sdk-for-python-and-aws-cli-v1/ - warnings.warn(warning, PythonDeprecationWarning) ec2InstanceId instanceType privateIpAddress publicIpAddress runningJobs ------------------- -------------- ------------------ ----------------- ------------- -i-01df1e66fceb51b8c m4.large 10.0.27.81 - 0 [ec2-user@ip-10-0-0-11 ~]$ ``` -As you can see from the output, we have one single running host. This is due to the value we chose for ```min_vcpus``` in the configuration. If you want to display additional details about the AWS Batch queue and hosts, add the ```-d``` flag to the command. +If you want to display additional details about the AWS Batch queue and hosts, add the ```-d``` flag to the command. -Logged into the head node, create a file in the ```/shared``` directory named ```mpi_hello_world.c```: +Logged into the head node, create a file in the ```$HOME``` directory named ```mpi_hello_world.c```: ``` -cd /shared -vim mpi_hello_world.c +cd $HOME +nano mpi_hello_world.c ``` Then, add the following MPI program to the file: @@ -518,22 +639,24 @@ watch awsbstat -d ``` When the job enters the ```RUNNING``` status, we can look at its output. To show the output of the main node, append ```#0``` to the job id. To show the output of the compute nodes, use ```#1``` and ```#2```. -You can look at the overall statues typing: +You can look at the overall status by typing: ``` awsbstat -s ALL ``` You might see something like: ``` [ec2-user@ip-10-0-0-11 shared]$ awsbstat -s ALL -/usr/lib/python2.7/site-packages/boto3/compat.py:86: PythonDeprecationWarning: Boto3 will no longer support Python 2.7 starting July 15, 2021. To continue receiving service updates, bug fixes, and security updates please upgrade to Python 3.6 or later. More information can be found here: https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-python-2-7-in-aws-sdk-for-python-and-aws-cli-v1/ - warnings.warn(warning, PythonDeprecationWarning) jobId jobName status startedAt stoppedAt exitCode --------------------------------------- ------------- -------- ----------- ----------- ---------- 776c0688-c522-4175-9612-e72d085a70ec *3 submit_mpi_sh RUNNABLE - - - ``` It means the job is still waiting to be run. If you want to terminate a job before it ends, you can use the ```awsbkill``` command. -Once you completed the tutorial, remember to delete your cluster and all the associated machinery using the [Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) we mentioned earlier. +Once you completed the tutorial, remember to delete your cluster and all the associated machinery + +``` +pcluster delete-cluster --cluster-name batch-cluster +``` ## Placement Group and Performance