High Performance Computing (HPC)
Last modified by superadmin on Fri, 02/28/2020, 6:32 PM
Basics
+
The COARE HPC is one of the services offered by the COARE which can be utilized for the processing of massive amounts of data that require high-speed calculations and powerful computing resources.
+
As of Oct 2020, the COARE HPC's current capacity is as follows:
More information on the COARE HPC can be found here.
CPU | 30 Tflops |
GPU | 72 Tflops |
+
The COARE HPC is one of the services offered by the COARE which can be utilized for the processing of massive amounts of data that require high-speed calculations and powerful computing resources.
+
As of Oct 2020, the COARE HPC's current capacity is as follows:
More information on the COARE HPC can be found here.
CPU | 30 Tflops |
GPU | 72 Tflops |
Accessing the HPC
+
You may have inputted the wrong private key. You can confirm this by looking for the keywords "Public key denied" in the logs. Logs can be found by using ssh @ -vv.
+
The COARE account is valid for three (3) months upon approval of your COARE account application.
+
No. You have to request the COARE Team to add another public key so you can use your other devices to access the COARE. To do this, please log a service request ticket on the COARE User Portal.
+
No. You have to request the COARE Team to add another public key so you can use your other devices to access the COARE. To do this, please log a service request ticket on the COARE User Portal.
+
The COARE account is valid for three (3) months upon approval of your COARE account application.
+
You may have inputted the wrong private key. You can confirm this by looking for the keywords "Public key denied" in the logs. Logs can be found by using ssh @ -vv.
HPC Resource Allocation and Quota Limits
+
The default allocation per COARE HPC user is summarized in the table below:
For more information, visit the COARE Service Catalogue.
CPU | 240 logical cores | |
Network filesystem (/home) | 100 GB usable | |
Parallel filesystem (Scratch directories: /scratch1 and /scratch2) | 5 TB for each scratch directory | |
GPU | 2 GPUs | |
Max running job | 30 jobs | |
Max submit job | 45 jobs | |
Job waiting time | No guarantee; depends on the status of the queue and the availability of the requested resource/s | |
Job walltime limit | Batch and GPU | One (1) hour default; automatically extended by one (1) hour by the HPC job scheduler to a maximum of three (3) days only |
Debug | One (1) hour default; automatically extended by one (1) hour by the HPC job scheduler to a maximum of three (3) hours only | |
Serial | One (1) day default; automatically extended by one (1) day by the HPC job scheduler to a maximum of seven (7) days only |
+
Some of the best practices on allocating memory and CPU can be found on our Wiki page on benchmarking/parallelizing jobs.
+
Any requests for allocation increase will be subject to the COARE Team's evaluation and the COARE's current capacity. If you already have a COARE account, you can request for an increase by submitting a service request ticket thru the COARE User Portal. If you do not have a COARE account yet, you can apply for a COARE account following the instructions here. You can also email us at gridops@asti.dost.gov.ph before applying for a COARE account if you wish to discuss your request for a higher allocation first.
+
Any requests for allocation increase will be subject to the COARE Team's evaluation and the COARE's current capacity. If you already have a COARE account, you can request for an increase by submitting a service request ticket thru the COARE User Portal. If you do not have a COARE account yet, you can apply for a COARE account following the instructions here. You can also email us at gridops@asti.dost.gov.ph before applying for a COARE account if you wish to discuss your request for a higher allocation first.
+
The default allocation per COARE HPC user is summarized in the table below:
For more information, visit the COARE Service Catalogue.
CPU | 240 logical cores | |
Network filesystem (/home) | 100 GB usable | |
Parallel filesystem (Scratch directories: /scratch1 and /scratch2) | 5 TB for each scratch directory | |
GPU | 2 GPUs | |
Max running job | 30 jobs | |
Max submit job | 45 jobs | |
Job waiting time | No guarantee; depends on the status of the queue and the availability of the requested resource/s | |
Job walltime limit | Batch and GPU | One (1) hour default; automatically extended by one (1) hour by the HPC job scheduler to a maximum of three (3) days only |
Debug | One (1) hour default; automatically extended by one (1) hour by the HPC job scheduler to a maximum of three (3) hours only | |
Serial | One (1) day default; automatically extended by one (1) day by the HPC job scheduler to a maximum of seven (7) days only |
+
Some of the best practices on allocating memory and CPU can be found on our Wiki page on benchmarking/parallelizing jobs.
Home and Scratch Directories
+
Scratch directories are intended for heavy io and temporary files. Because of this, the files in the scratch directory are not resilient enough to store long term/archived data. All files damaged in scratch directories are irrecoverable.
+
Files in the home directory are not purged. However, the scratch directories are regularly purged.
Unfortunately, you will no longer be able to retrieve your data that may have been purged because all files purged in the scratch directories are irrecoverable.
+
Files in the home directory are not purged. However, the scratch directories are regularly purged.
Unfortunately, you will no longer be able to retrieve your data that may have been purged because all files purged in the scratch directories are irrecoverable.
+
Scratch directories are intended for heavy io and temporary files. Because of this, the files in the scratch directory are not resilient enough to store long term/archived data. All files damaged in scratch directories are irrecoverable.
Running Jobs
+
On the batch script, we recommend that you set this to "unlimited" so that you will not encounter stack limit errors.
+
The default output file is .out on the directory where the sbatch script is executed. This wiki shows how to set the output file.
+
This wiki on how to use SLURM contains some useful commands that may be helpful to you.
+
The COARE Team will kill jobs that are running on the frontend. Another reason may be that there are errors in the script/application itself.
+
Here are some recommendations on job submissions that would yield optimal results in this wiki.
+
None.
+
Yes, but you need to justify this. You can log a service request ticket for this request here.
+
This can either mean that you may have already maximized your QOS allocation or your job has been assigned a lower priority (since the scheduler increases the priority of jobs that have been queued longer).
+
Here are some reasons why a job sits in queue:
- No Available nodes
- Maxed QOS allocations
- Job has been assigned a low priority (since the scheduler increases the priority of jobs that have been queued longer)
+
Yes, but you need to justify this. You can log a service request ticket for this request here.
+
None.
+
Here are some reasons why a job sits in queue:
- No Available nodes
- Maxed QOS allocations
- Job has been assigned a low priority (since the scheduler increases the priority of jobs that have been queued longer)
+
This wiki on how to use SLURM contains some useful commands that may be helpful to you.
+
The COARE Team will kill jobs that are running on the frontend. Another reason may be that there are errors in the script/application itself.
+
This can either mean that you may have already maximized your QOS allocation or your job has been assigned a lower priority (since the scheduler increases the priority of jobs that have been queued longer).
+
Here are some recommendations on job submissions that would yield optimal results in this wiki.
+
On the batch script, we recommend that you set this to "unlimited" so that you will not encounter stack limit errors.
+
The default output file is .out on the directory where the sbatch script is executed. This wiki shows how to set the output file.
Installation/Compilation/Containerization of Applications
+
If the software that you need requires a license, you will need to provide the license for the software.
+
While users are not given SUDO/ADMIN capabilities, they can still choose to compile their own applications. However, it is recommended that compilation and installation of software in the HPC are done by the COARE Team.
+
Installing a specific software/package is a case by case basis, The COARE Team refers to the documentation of the package to be installed and is optimized with compiler flags which can speed up the application.
+
Run this command:module avail
+
R packages can be installed by users. To do this, you can use the R module or Anaconda environment. If there is a specific R version that is not included in the list of available modules, you may ask the COARE Team to assist you in having this installed by logging a service request ticket on the COARE User Portal.
+
We have created a Wiki for transferring files, which you can view here.
+
While users are not given SUDO/ADMIN capabilities, they can still choose to compile their own applications. However, it is recommended that compilation and installation of software in the HPC are done by the COARE Team.
+
If the software that you need requires a license, you will need to provide the license for the software.
+
Run this command:module avail
+
R packages can be installed by users. To do this, you can use the R module or Anaconda environment. If there is a specific R version that is not included in the list of available modules, you may ask the COARE Team to assist you in having this installed by logging a service request ticket on the COARE User Portal.
+
Installing a specific software/package is a case by case basis, The COARE Team refers to the documentation of the package to be installed and is optimized with compiler flags which can speed up the application.
+
We have created a Wiki for transferring files, which you can view here.