High Performance Computing (HPC)

Last modified by Administrator on Tue, 05/07/2024, 2:07 PM
Page Rating
0 Votes

Accessing the HPC
+
Here are the procedures on how to generate public and private SSH keys.
+
Both Windows OS and MacOS users can generate SSH keys via terminal/ CLI (Command Line Interface), whereas only Windows OS users can use GUI (Graphical User Interface) by way of PuTTYgen as another way to generate SSH keys.
+
In case you switch to a different machine, we suggest that you generate a new set of SSH keys and have these appended by COARE team.
Package/ Program Installation in HPC
+
Most of the programs/ packages can be installed by users. You can use the Anaconda environment to do the installation of the package. Moreover, you may refer to our Wiki for the step-by-step process.
+
You may refer to our wiki for the standard installation procedures of a CUDA-enabled build package.
+
No, it is recommended to submit jobs using SLURM when installing programs.
+
Yes, you may install multiple packages. You can do this by submitting a job via SLURM. Here is the resource in installing multiple packages.
+
We highly suggest that you change the default location where you load and install your conda environments and packages to scratch folder because it is much faster than home folder.
+
To view the available modules, you may refer to this link for the procedures. For the installed packages in your Anaconda environment, you may check our COARE Wiki.
Running Jobs
+
Yes. However, by default, user’s jobs are queued unless you are using special QOS.
+
You may refer to our Wiki for more information about the QOS (Quality of Service) per job.
+
Knowing the partition to be used is on a case-to-case basis, depending on what is stated in the software’s documentation. However, here are the some of the general information per partition:
  • Debug – You may use this partition for test runs, creating conda environments, among others. 
  • Batch – This is COARE HPC’s main partition dedicated to processing multi-core/multi-node CPU jobs in parallel. 
  • Serial – This partition is specifically for serial jobs or those that only require 1 CPU core (e.g., replications/ running 5 replicas of a certain program 5 CPUs running independently.
  • GPU - These are used to make quick work of numerically intensive operations. For certain workloads like image processing, training artificial neural networks and solving complex equations, etc.
+
Here are some recommendations on job submissions that would yield optimal results.
Errors Encountered
+
You may refer to our Wiki.
+
Yes, as we regularly conduct automated purging. You may refer to the COARE AUP (Acceptable Use Policy) for more details. This also serves as a reminder not to use HPC for personal storage.
+
This error might be caused by the following factors:
  • PuTTY can’t find the private key; you may try to log-in once again.
  • You probably generated the SSH keys in a different format (e.g., SSH2). Please ensure that the SSH keys are in the correct OpenSSH format.
Alternatively, you may opt to generate a new set of SSH keys and have these appended by COARE Team.
+
Try logging in your account again using a different computer or network connection.
+
Files that are accidentally deleted by the rm command can no longer be recovered. We highly recommend that you always backup your files and be mindful when deleting files and folders. You may refer to our Acceptable Use Policy (AUP) - General Conditions of Use.
+
This can either mean that you may have already maximized your QOS allocation, or your job has been assigned a lower priority (since the scheduler increases the priority of jobs that have been queued longer).
+
This is a case-to-case basis. However, here are some of the possible reasons for sudden termination of jobs:
  • Software/ hardware issues
  • Storage is already full
  • Lack of CPU hours
+
You might have installed the module without using SLURM script. For the installation of mamba packages, please submit jobs using SLURM.
Tags: