Advanced Use

Last modified by Administrator on Thu, 06/09/2022, 9:35 AM
Page Rating
2 Votes

X11

X Window System (X11) provides basic framework for a GUI environment.

For Windows

Users using PuTTY

  1. Download Xming.
  2. Launch Xming first before connecting to HPC.
  3. Tick Enable X11.

    1178195211.png

Users using Command Prompt in Windows 10

  1. Download Xming.
  2. Launch Xming first before connecting to HPC.
    ssh <username>@saliksik.asti.dost.gov.ph -XC 
    -XEnables X11
    -CCompressed all data

For Mac

  1. Download XQuartz.
  2. Launch XQuartz first before connecting to HPC.
    ssh <username>@saliksik.asti.dost.gov.ph -XC
    -XEnables X11
    -CCompressed all data

For Linux

  1. Linux distros has a built-in X11.
    ssh <username>@saliksik.asti.dost.gov.ph -XC
    -XEnables X11
    -CCompressed all data

X11 Example

x11-example.png

NOTES:

- Run your application in compute nodes only.
- A window should pop-out after executing the application.


Commonly-used Modules

Anaconda Distribution

  • Load anaconda module
    module load <anaconda>/<X-version>
  • Create anaconda environment
    conda create –n <environment-name> python=<version>
  • Activate the conda environment
    source activate <environment-name>
  • Install packages
    conda install <package>
    pip install <package>
  • Deactivate conda environment
    source deactivate <environment-name>
  • Remove environment
    conda remove –n <environment-name> --all

     

R Studio

  1. Connect to the VPN.
  2. Login to the HPC frontend.
  3. Create an "Rlib" directory. This will serve as your directory of installed modules in R and will bind on Rlib directory created on the virtual environment by singularity.
  4. Create a job script with the following format:
    #!/bin/bash
    #SBATCH --partition=batch
    #SBATCH --qos=240c-1h_batch
    #SBATCH --mail-user=<email>
    #SBATCH --mail-type=ALL
    #SBATCH --ntasks=8
    #SBATCH –w saliksik-cpu-<node-number>
    #SBATCH --output=rstudio-server-slurm.out

    # Load RStudio Server Singularity module
    module load singularity/rstudio-server-1.2.5033

    # Bind create directory for R libraries to container
    # 1st entry before the delimiter colon is the HPC user directory while the next entry is the mount point on the container environment
    export SINGULARITY_BINDPATH="/home/<username>/Rlib:/Rlib,/scratch1:/scratch1,/scratch2:/scratch2"

    # Temporary password so the other users cannot enter the RStudio Server but only you
    # Preferably not your HPC login password as this is seen in plain text
    export RSTUDIO_PASSWORD=<any_desired_password>

    # Run RStudio Server with the prompt (username = login_user, password = $RSTUDIO_PASSWORD)
    RStudio-Server --auth-none 0 --auth-pam-helper rstudio_auth
  5. Execute the job script: sbatch <script.slurm>
  6. Determine which node the singularity server was instantiated: squeue -u <username>
  7. Ping the hostname to determine its IP address: ping saliksik-<cpu/gpu/debug>-XX
  8. Place the IP on your browser with the port 8787 after it. (i.e. 192.168.204.99:8787)
  9. Enter your username and the password you've set on the job script.
  10. After that, you should be able to access an r-studio environment.

NOTES:

  • We will provide you the VPN config files.
  • Your OpenVPN account credentials is the same as your COARE User Portal credentials.

Tensorflow

To run the TensorFlow application on the COARE, follow the instructions below:

  1. Log in to HPC using your COARE credentials.
    ssh <username>@saliksik.asti.dost.gov.ph
  2. Load the “anaconda/2” module which allows you to set up your very own python environment. This is necessary since the OS (CentOS 7.2) does not support Python 3.x as of yet.
    module load anaconda/2-5.3.1
  3. Load the latest CUDA.
    module load cuda/10.1_cudnn-7.6.5 
  4. Create a new anaconda environment.
    conda create -n your_environment_name python=3.5
  5. Activate your newly created conda environment
    source activate your_environment_name
  6. Install the latest TensorFlow (1.4.1 as of this writing).
    pip install tensorflow-gpu
  7. Validate your installation. Try executing an interactive python session on your shell and import tensorflow. Make sure you invoke exit() to leave the interactive session.
    python

    iPython 3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 13:26:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    >>>
  8. To start your first tensorflow job in the HPC, grab the latest version of the mnist code mnist_softmax.py from github and store it in scratch1. It’s a good idea to put it in the scratch directories instead of home. (Learn more about the different storage services attached to the HPC.)
    cd ~/scratch1

    wget https://raw.githubusercontent.com/tensorflow/tensorflow/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py
  9. Copy the job script written below and save as mnist.slurm in scratch directory. The SLURM script contains necessary information about the specific amount and type of computational resources you’ll be requiring for a particular job/run. It includes the sequence of commands you normally invoke in an interactive session in order to properly execute an application using the batch scheduler.
    #!/bin/bash
    #SBATCH --partition=gpu
    #SBATCH --qos=12c-1h_2gpu
    #SBATCH --output=my_first_job.out
    #SBATCH --gres=gpu:1
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=2

    # check which GPU device was allocated
    echo "CUDA_DEVICE=/dev/nvidia$CUDA_VISIBLE_DEVICES"

    # prepare working environment
    module load anaconda/2-5.3.1
    module load cuda/10.1_cudnn-7.6.5

    # activate your python environment
    source activate your_environment_name

    # execute your application
    srun python mnist_softmax.py
    source deactivate

    By now you should have two files in your scratch directory.
    mnist.slurm mnist_softmax.py

  10. Edit the slurm script and insert the following line just below the #!/bin/bash directive. Replace the variable <jobname> with any preferred “string”. This string serves as an identifier for your job (especially when you are managing multiple jobs).
    #SBATCH -J <jobname>
  11. If you edit the name of your MNIST python script, make sure to reflect the changes to your slurm script. You will need to import the logging module and replace all the instances of the “print” function in the MNIST script with “logging.debug” so that the output messages emitted by the script are properly recorded in the SLURM output files. You will need to load the logging module in order to call logging.debug. Insert the following lines below the lines where you import other python modules.


    # load the logging module
    import logging

    # customize the log message format
    logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y%m%d%H%M%S',level=logging.DEBUG)

    NOTES:

    A few words about SLURM parameters:

    • The --gres=gpu:1 specifies the number of GPU devices that your job requires for it to run. The MNIST code used in the lecture only needs one (1) GPU.
       
    • The --ntasks=1 instructs the batch scheduler that the job will spawn one process. Take note that the base code of TensorFlow is written in Python and inherently, it is single-threaded. However, the developers of the TensorFlow were very smart and had the proper wits to incorporate threading mechanisms in the code to achieve parallelism in order to improve run times and scale the complexity of the models.
       
    • --cpus-per-task=2 indicates the number of processors that will be assigned to the “python” processes. Note that the batch scheduler leverages on Linux Control Groups (cgroups) to prevent users from consuming resources beyond their allocations. It is a kernel mechanism that isolate user processes from each other.
  12. Submit your job script to the queue and wait for available resources to turn up.
    sbatch mnist.slurm
  13. Check the status of your job. - Running; PD - Pending
    squeue -u <username>
  14. As soon as your job starts to run, all of the console messages generated by the MNIST script will appear in a file named my_first_job.out.

    The file name can be altered by setting the appropriate parameters in the SLURM job script. More information about the usage of SLURM commands as well as the parameters available for configuring job runs here.

  15. To check the “occupancy” or usage of the GPU devices, one can issue:
    nvidia-smi
  16. Once the MNIST job is finished, you should see the following content in the my_first_job.out:
    CUDA_DEVICE=/dev/nvidia0

    >> anaconda2/4.3.0 has been loaded.

    >> cuda-8.0_cudnn-6.0 has been loaded.

    2018-01-16 11:37:43.434840: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

    2018-01-16 11:37:48.596303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:

    name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235

    pciBusID: 0000:04:00.0totalMemory: 11.17GiB freeMemory: 11.11GiB

    2018-01-16 11:37:48.596340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)

    Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz

    Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gzExtracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz

    Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz

    0.9156

    Note that the value 0.9156 varies in each run.


Other Modules

MATLAB (Discontinued for now)

Provision of MATLAB has been DISCONTINUED for the time being. This page will be revised for any updates, if there are any.

Tags: