Advanced Use
X11
X Window System (X11) provides basic framework for a GUI environment.
For Windows
Users using PuTTY
- Download Xming.
- Launch Xming first before connecting to HPC.
- Tick Enable X11.
Users using Command Prompt in Windows 10
- Download Xming.
- Launch Xming first before connecting to HPC.ssh <username>@saliksik.asti.dost.gov.ph -XC
-X Enables X11 -C Compressed all data
For Mac
- Download XQuartz.
- Launch XQuartz first before connecting to HPC.ssh <username>@saliksik.asti.dost.gov.ph -XC
-X Enables X11 -C Compressed all data
For Linux
- Linux distros has a built-in X11.ssh <username>@saliksik.asti.dost.gov.ph -XC
-X Enables X11 -C Compressed all data
X11 Example
Commonly-used Modules
Anaconda Distribution
- Load anaconda modulemodule load <anaconda>/<X-version>
- Create anaconda environmentconda create –n <environment-name> python=<version>
- Activate the conda environmentsource activate <environment-name>
- Install packagesconda install <package>
pip install <package>
- Deactivate conda environmentsource deactivate <environment-name>
- Remove environmentconda remove –n <environment-name> --all
R Studio
- Connect to the VPN.
- Login to the HPC frontend.
- Create an "Rlib" directory. This will serve as your directory of installed modules in R and will bind on Rlib directory created on the virtual environment by singularity.
- Create a job script with the following format:#!/bin/bash
#SBATCH --partition=batch
#SBATCH --qos=240c-1h_batch
#SBATCH --mail-user=<email>
#SBATCH --mail-type=ALL
#SBATCH --ntasks=8
#SBATCH –w saliksik-cpu-<node-number>
#SBATCH --output=rstudio-server-slurm.out
# Load RStudio Server Singularity module
module load singularity/rstudio-server-1.2.5033
# Bind create directory for R libraries to container
# 1st entry before the delimiter colon is the HPC user directory while the next entry is the mount point on the container environment
export SINGULARITY_BINDPATH="/home/<username>/Rlib:/Rlib,/scratch1:/scratch1,/scratch2:/scratch2"
# Temporary password so the other users cannot enter the RStudio Server but only you
# Preferably not your HPC login password as this is seen in plain text
export RSTUDIO_PASSWORD=<any_desired_password>
# Run RStudio Server with the prompt (username = login_user, password = $RSTUDIO_PASSWORD)
RStudio-Server --auth-none 0 --auth-pam-helper rstudio_auth - Execute the job script: sbatch <script.slurm>
- Determine which node the singularity server was instantiated: squeue -u <username>
- Ping the hostname to determine its IP address: ping saliksik-<cpu/gpu/debug>-XX
- Place the IP on your browser with the port 8787 after it. (i.e. 192.168.204.99:8787)
- Enter your username and the password you've set on the job script.
- After that, you should be able to access an r-studio environment.
Tensorflow
To run the TensorFlow application on the COARE, follow the instructions below:
- Log in to HPC using your COARE credentials.ssh <username>@saliksik.asti.dost.gov.ph
- Load the “anaconda/2” module which allows you to set up your very own python environment. This is necessary since the OS (CentOS 7.2) does not support Python 3.x as of yet.module load anaconda/2-5.3.1
- Load the latest CUDA.module load cuda/10.1_cudnn-7.6.5
- Create a new anaconda environment.conda create -n your_environment_name python=3.5
- Activate your newly created conda environmentsource activate your_environment_name
- Install the latest TensorFlow (1.4.1 as of this writing).pip install tensorflow-gpu
- Validate your installation. Try executing an interactive python session on your shell and import tensorflow. Make sure you invoke exit() to leave the interactive session.python
iPython 3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 13:26:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> - To start your first tensorflow job in the HPC, grab the latest version of the mnist code mnist_softmax.py from github and store it in scratch1. It’s a good idea to put it in the scratch directories instead of home. (Learn more about the different storage services attached to the HPC.)cd ~/scratch1
wget https://raw.githubusercontent.com/tensorflow/tensorflow/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py - Copy the job script written below and save as mnist.slurm in scratch directory. The SLURM script contains necessary information about the specific amount and type of computational resources you’ll be requiring for a particular job/run. It includes the sequence of commands you normally invoke in an interactive session in order to properly execute an application using the batch scheduler.#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --qos=12c-1h_2gpu
#SBATCH --output=my_first_job.out
#SBATCH --gres=gpu:1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
# check which GPU device was allocated
echo "CUDA_DEVICE=/dev/nvidia$CUDA_VISIBLE_DEVICES"
# prepare working environment
module load anaconda/2-5.3.1
module load cuda/10.1_cudnn-7.6.5
# activate your python environment
source activate your_environment_name
# execute your application
srun python mnist_softmax.py
source deactivateBy now you should have two files in your scratch directory.
mnist.slurm mnist_softmax.py - Edit the slurm script and insert the following line just below the #!/bin/bash directive. Replace the variable <jobname> with any preferred “string”. This string serves as an identifier for your job (especially when you are managing multiple jobs).#SBATCH -J <jobname>
- If you edit the name of your MNIST python script, make sure to reflect the changes to your slurm script. You will need to import the logging module and replace all the instances of the “print” function in the MNIST script with “logging.debug” so that the output messages emitted by the script are properly recorded in the SLURM output files. You will need to load the logging module in order to call logging.debug. Insert the following lines below the lines where you import other python modules.…
# load the logging module
import logging
# customize the log message format
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y%m%d%H%M%S',level=logging.DEBUG) - Submit your job script to the queue and wait for available resources to turn up.sbatch mnist.slurm
- Check the status of your job. R - Running; PD - Pendingsqueue -u <username>
- As soon as your job starts to run, all of the console messages generated by the MNIST script will appear in a file named my_first_job.out.
The file name can be altered by setting the appropriate parameters in the SLURM job script. More information about the usage of SLURM commands as well as the parameters available for configuring job runs here.
- To check the “occupancy” or usage of the GPU devices, one can issue:nvidia-smi
- Once the MNIST job is finished, you should see the following content in the my_first_job.out:CUDA_DEVICE=/dev/nvidia0
>> anaconda2/4.3.0 has been loaded.
>> cuda-8.0_cudnn-6.0 has been loaded.
2018-01-16 11:37:43.434840: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-16 11:37:48.596303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:04:00.0totalMemory: 11.17GiB freeMemory: 11.11GiB
2018-01-16 11:37:48.596340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gzExtracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
0.9156Note that the value 0.9156 varies in each run.