AWS has announced AWS Parallel Computing Service (AWS PCS), a new managed service that helps customers set up and manage high performance computing (HPC) clusters.
The service makes it easy for system administrators to build clusters using Amazon Elastic Compute Cloud (Amazon EC2) instances, low-latency networking, and storage optimized for HPC workloads.
With AWS Parallel Computing Service, scientists and engineers can quickly scale simulations to validate models and designs, while system administrators and integrators can build and maintain HPC clusters on AWS using Slurm, the most popular open-source HPC workload manager.
This service accelerates innovation in areas such as fast-tracking drug discovery, uncovering genomic insights, building engineering designs, running weather applications, and building scientific and engineering models.
AWS has a history of innovation in supporting HPC workloads. That history includes releases like the open source cluster orchestration toolkit AWS ParallelCluster, fully managed batch computing service AWS Batch, low latency network interconnect Elastic Fabric Adapter, Amazon FSx for Lustre high performance storage, and dedicated AMD, Intel, and Graviton-based HPC compute instances, the latter delivering up to 65% better price-performance over comparable compute optimized x86-based instances. Thousands of customers from a wide range of industries have migrated their HPC workloads to AWS to fast-track drug discovery, uncover genomic insights, maximize energy resources, and spin up supercomputers with millions of cores. Today AWS continues our innovation in HPC by releasing a fully-managed and comprehensive HPC service, which removes the undifferentiated heavy lifting of creating and managing HPC clusters.
AWS Parallel Computing Service is a new managed service that helps customers easily set up and manage HPC so they can run scientific and engineering workloads at virtually any scale on AWS. With AWS Parallel Computing Service, system administrators can use familiar tools including AWS Management Console, CLI, and SDK to deploy a managed Slurm environment. AWS Parallel Computing Service builds from open-source foundations that customers know and have experience with, and delivers a managed Slurm experience with the reliability and availability of AWS. AWS Parallel Computing Service significantly reduces the operational burden of managing a cluster and regularly delivers new capabilities and fixes through managed service updates with minimal to no downtime, eliminating the need to apply manual patches and rebuilding clusters to receive feature updates. Highly available APIs also help developers and ISVs create end-to-end HPC solutions on top of AWS, so they can focus on providing value-added features to their users and customers instead of worrying about managing infrastructure. AWS Parallel Computing Service enables customers of all sizes (e.g., startups, enterprises, or national labs) to easily create and manage HPC clusters with the scalability, reliability, and security of AWS. This means scientists and engineers using Slurm can easily migrate their existing on-premises workflows to AWS without re-architecting them—giving scientists and engineers access to cloud infrastructure that scales automatically. And administrators who want to unblock capacity or capability constraints for their end-users can spin up clusters in just minutes instead of months, to run their simulations to address the world’s most challenging problems.
“Developing a cure for a catastrophic disease, designing novel materials, advancing renewable energy, and revolutionizing transportation are problems that we just can’t afford to have waiting in a queue,” said Ian Colle, director, advanced compute and simulation at AWS. “Managing HPC workloads, particularly the most complex and challenging extreme-scale workloads, is extraordinarily difficult. Our aim is that every scientist and engineer using AWS Parallel Computing Service, regardless of organization size, is the most productive person in their field because they have the same top-tier HPC capabilities as large enterprises to solve the world’s toughest challenges, any time they need to, and at any scale.”
To get started, system administrators use the AWS Management Console to spin up a Slurm cluster securely and execute jobs in just a few clicks, compared to manual orchestration today. With CloudFormation support coming soon, customers will be able to build and deploy HPC clusters using infrastructure as code. AWS Parallel Computing Service is now available in the following Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), Europe (Stockholm), Europe (Ireland), Asia Pacific (Sydney), Asia Pacific (Singapore), Asia Pacific (Tokyo).