Advanced Use

Last modified by Administrator on Fri, 04/03/2020, 2:36 PM

X11

X Window System (X11) provides basic framework for a GUI environment.

For Windows

Windows 8 and below

  1. Download Xming.
  2. Launch Xming first before connecting to HPC.
  3. Tick Enable X11.

    1178195211.png

For Windows 10

  1. Download Xming.
  2. Launch Xming first before connecting to HPC.
    ssh <username>@X.X.X.X -XC

For Mac

  1. Download XQuartz.
  2. Launch XQuartz first before connecting to HPC.
    ssh <username>@X.X.X.X -XC

For Linux

  1. Linux distros has a built-in X11.
    ssh <username>@X.X.X.X -XC
    -XEnables X11
    -CCompressed all data

X11 Example

486711099.png

Commonly-used Modules

Anaconda Distribution

  • Load anaconda module
    module load <anacondaX>/<version>
  • Create anaconda environment
    conda create –n <environment-name> python=<version>
  • Activate the conda environment
    source activate <environment-name>
  • Install packages
    conda install <package>
    pip install <package>
  • Deactivate conda environment
    source deactivate <environment-name>
  • Remove environment
    conda remove –n <environment-name> --all

R Studio

  1. Connect to the VPN.
  2. Login to the HPC frontend.
  3. Create an "Rlib" directory. This will serve as your directory of installed modules in R and will bind on Rlib directory created on the virtual environment by singularity.
  4. Create a job script with the following format:
    #!/bin/bash
    #SBATCH --partition=batch
    #SBATCH --mail-user=<email>
    #SBATCH --mail-type=ALL
    #SBATCH --ntasks=8
    #SBATCH –w tux-<node-number>
    #SBATCH --output=rstudio-server-slurm.out

    # Load RStudio Server Singularity module
    module load singularity/rstudio-server-1.2.1335

    # Bind create directory for R libraries to container
    # 1st entry before the delimiter colon is the HPC user directory while the next entry is the mount point on the container environment

    export SINGULARITY_BINDPATH="/home/<username>/Rlib:/Rlib,/scratch1:/scratch1,/scratch2:/scratch2"

    # Temporary password so the other users cannot enter the RStudio Server but only you
    # Preferably not your HPC login password as this is seen in plain text
    export RSTUDIO_PASSWORD=<any desired password>

    # Run RStudio Server with the prompt (username= login_user, password = $RSTUDIO_PASSWORD)
    RStudio-Server --auth-none 0 --auth-pam-helper rstudio_auth
  5. Execute the job script: sbatch <script.slurm>
  6. Determine which node the singularity server was instantiated: squeue -u <username>
  7. Ping the hostname to determine its IP address: ping tux-<XX>
  8. Place the IP on your browser with the port 8787 after it. (i.e. 192.168.204.99:8787)
  9. Enter your username and the password you've set on the job script.
  10. After that, you should be able to access an r-studio environment.

    NOTES:

    • We will provide you the VPN config files.
    • Your OpenVPN account credentials is the same as your COARE User Portal credentials.

Tensorflow

To run the TensorFlow application on the COARE Facility (HPC-GPU), follow the instructions below:

  1. Log in using your COARE Facility credentials to the GPU Cluster
    ssh username@<gpu-cluster>
  2. Load the “anaconda2” module which allows you to set up your very own python environment. This is necessary since the OS (CentOS 7.2) does not support Python 3.x as of yet.
    module load anaconda2/4.3.0
  3. Load the latest CUDA.
    module load cuda/8.0_cudnn-6.0
  4. Create a new anaconda environment.
    conda create -n your_environment_name python=3.5
  5. Activate your newly created conda environment
    source activate your_environment_name
  6. Install the latest TensorFlow (1.4.1 as of this writing).
    pip install tensorflow-gpu
  7. Validate your installation. Try executing an interactive python session on your shell and import tensorflow. Make sure you invoke exit() to leave the interactive session.
    python

    iPython 3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 13:26:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    >>>
  8. To start your first tensorflow job in the HPC, grab the latest version of the mnist code mnist_softmax.py from github and store it in scratch1. It’s a good idea to put it in the scratch directories instead of home. (Learn more about the different storage services attached to the HPC.) <- insert link to the HPC System Archi
    cd ~/scratch1

    wget https://raw.githubusercontent.com/tensorflow/tensorflow/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py
  9. Copy the job script written below and save as mnist.slurm in scratch directory. The SLURM script contains necessary information about the specific amount and type of computational resources you’ll be requiring for a particular job/run. It includes the sequence of commands you normally invoke in an interactive session in order to properly execute an application using the batch scheduler.
    #!/bin/bash

    #SBATCH --output=my_first_job.out

    #SBATCH --gres=gpu:1

    #SBATCH --ntasks=1

    #SBATCH --cpus-per-task=2

    # check which GPU device was allocated

    echo "CUDA_DEVICE=/dev/nvidia$CUDA_VISIBLE_DEVICES"

    # prepare working environment

    module load anaconda2/4.3.0

    module load cuda/8.0_cudnn-6.0

    # activate your python environment

    source activate your_environment_name

    # execute your application

    srun python mnist_softmax.py

    source deactivate

    By now you should have two files in your scratch directory.
    mnist.slurm mnist_softmax.py

  10. Edit the job script and insert the following line just below the #!/bin/bash directive. Replace the variable <jobname> with any preferred “string”. This string serves as an identifier for your job (especially when you are managing multiple jobs).
    #SBATCH -J <jobname>
  11. If you edit the name of your MNIST python script, make sure to reflect the changes to your job script. You will need to import the logging module and replace all the instances of the “print” function in the MNIST script with “logging.debug” so that the output messages emitted by the script are properly recorded in the SLURM output files. You will need to load the logging module in order to call logging.debug. Insert the following lines below the lines where you import other python modules.


    # load the logging module

    import logging

    # customize the log message format

    logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y%m%d%H%M%S',level=logging.DEBUG)

    NOTE:

    A few words about SLURM parameters:

    The --gres=gpu:1 specifies the number of GPU devices that your job requires for it to run. The MNIST code used in the lecture only needs one (1) GPU.

    The --ntasks=1 instructs the batch scheduler that the job will spawn one process. Take note that the base code of TensorFlow is written in Python and inherently, it is single-threaded. However, the developers of the TensorFlow were very smart and had the proper wits to incorporate threading mechanisms in the code to achieve parallelism in order to improve run times and scale the complexity of the models.

    --cpus-per-task=2 indicates the number of processors that will be assigned to the “python” processes. Note that the batch scheduler leverages on Linux Control Groups (cgroups) to prevent users from consuming resources beyond their allocations. It is a kernel mechanism that isolate user processes from each other.

  12. Submit your job script to the queue and wait for available resources to turn up.
    sbatch mnist.slurm
  13. Check the status of your job. - Running; PD - Pending
    squeue
  14. As soon as your job starts to run, all of the console messages generated by the MNIST script will appear in a file named my_first_job.out.

    The file name can be altered by setting the appropriate parameters in the SLURM job script. More information about the usage of SLURM commands as well as the parameters available for configuring job runs here.

  15. To check the “occupancy” or usage of the GPU devices, one can issue:
    nvidia-smi
  16. Once the MNIST job is finished, you should see the following content in the my_first_job.out:
    CUDA_DEVICE=/dev/nvidia0

    >> anaconda2/4.3.0 has been loaded.

    >> cuda-8.0_cudnn-6.0 has been loaded.

    2018-01-16 11:37:43.434840: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

    2018-01-16 11:37:48.596303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:

    name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235

    pciBusID: 0000:04:00.0totalMemory: 11.17GiB freeMemory: 11.11GiB

    2018-01-16 11:37:48.596340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)

    Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz

    Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gzExtracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz

    Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz

    0.9156

    Note that the value 0.9156 varies in each run.

Other Modules

MATLAB

Installing a Cluster Profile

Install the Generic Scheduler for SLURM that can be found in the Add-Ons package.

  1. Click Add-Ons.
  2. Search for SLURM.
  3. Install the Parallel Computing Toolbox plugin for MATLAB Parallel Server with SLURM.

image-20200402181508-1.png

Creating a Cluster Profile

Cluster profiles define relevant properties for your cluster, then have these properties applied when you create cluster, job, and task objects in the MATLAB client.

  1. After installing the Parallel Computing Toolbox plugin, the Generic Profile Wizard prompt window will pop-up. Click Next.image-20200402181509-2.png
  2. For the Operating System, make sure to select Unix (Note: Unix is the only option for OS). Then, click Next.image-20200402181509-3.png
  3. Under Submission Mode, select No for nonshared submission mode. Click Next.image-20200402181509-4.png
  4. Specify the properties for the cluster and then tick Use Unique Subfolders to organize the MATLAB files in your directory.
    • Cluster Host: saliksik.asti.dost.gov.ph
    • Remote Job Storage Location: /scratchX/<username>
      image-20200402181509-5.png
  5. Specify worker properties.
    • Number of Workers: 4 (cores)
    • MATLAB Installation Folder for Workers: /var/MATLAB_R2019a
      image-20200402181509-6.png  
  6. Select Network License Manager for the license management of the cluster.
    image-20200402181509-7.png
  7. Name your SLURM profile (Profile name). Click Next.
    image-20200402181509-8.png
  8. The Profile Wizard will then generate a summary of the profile you created (see image below). Review the summary and check for any errors. If there are information that you need to revise, click Previous. Otherwise, if there are no corrections, click Create.
    image-20200402181509-9.png
  9. Set the generated cluster as the default cluster profile by ticking the Set new cluster profile as default option. Click Finish.
    image-20200402181509-10.png

NOTES:

  • Setting up the cluster profile requires the Parallel Computing Toolbox plugin for MATLAB Parallel Server with SLURM Add-On.
  • Executing parallel.cluster.generic.runProfileWizard() in the command window is another way to prompt the Generic Profile Wizard.

Validating your Cluster Profile

Before users can proceed with submitting their jobs, they need to confirm if the cluster profile can connect to the HPC:

  1. At the Environment Column of the Home tab, click Parallel then select Create and Manage Clusters.
    image-20200402181509-11.png
  2. Scroll down until you find the SCHEDULER INTEGRATION section. Go to the IntegrationScriptsLocation path.
  3. Edit independentJobWrapper and communicatingJobWrapper. Insert this command: #SBATCH --partition=debug --qos=240c-1h_debug.

    image-20200402181509-13.pngimage-20200402181509-14.png

  4. Go back to the Cluster Profile Manager and click the Validation tab. Uncheck Parallel Pool Test (ParPool), since ParPool will only work if the server has access to your machine – which is a security issue, so we don’t recommend using it. Click Validate.
    image-20200402181509-15.png
  5. Enter your COARE account username and then click OK.
    image-20200402181509-16.png
  6. Click Yes and attach the private key associated to your COARE account.
    image-20200402181509-17.png
  7. Click Yes if your private key has a passphrase. Click No if you don’t.
    image-20200402181509-18.png
  8. A successful validation should pass all 4 tests.
    image-20200402181509-19.png
  9. If your validation failed, click Show Report and save the generated report. Log an Incident Ticket in the User Portal and attach the file so the COARE Team could troubleshoot.
    image-20200402181509-20.png

Submitting Jobs

  1. Create cluster object.

    clusterObj = parcluster returns a cluster object representing the cluster identified by the default cluster profile, with the cluster object properties set to the values defined in that profile.

    clusterObj = parcluster;
  2. Set your SLURM parameters.
    clusterObj.AdditionalProperties.AdditionalSubmitArgs = ‘-p <partition> --qos=<partition-qos> ...’;
  3. Submit job that runs MATLAB Script or Function.

    batch – run MATLAB script in compute nodes.
    job.Tag – crucial to identify job later. It is recommended that each submitted job has its own unique tag.

    • Job that runs a Script.
    job = batch(clusterObj, ‘matlabScript’); job.Tag = ‘jobNameTag’;
    • Job that runs a Function.
    job = batch(clusterObj, @functionName, N, {x1,...,xN}); job.Tag = ‘jobNameTag’;

NOTES:

  • A User Credential window will pop-up. Enter your COARE username and attach your private key. In this case, you can now close the MATLAB application or submit a new job.
  • If your MATLAB script needs to access an input file, you must transfer it first to your scratch directory.
  • If your MATLAB script requires an add-on (ex. AlexNet), kindly create a Service Request Ticket so we can install the add-on in the compute nodes.

Fetching Outputs from HPC

MATLAB users can fetch and retrieve jobs that runs MATLAB Script or Function:

  • Script: Find and retrieve your job that runs a script.
    job = findJob(clusterObj,'Tag','jobNameTag');
    output = load(job)
  • Function: Find and retrieve your job that runs a function.
    job = findJob(clusterObj,'Tag','jobNameTag');
    output = fetchOutputs(job)

NOTE: You can also find the outputs of your job in your scratch directory.

Monitoring Jobs

  1. Monitor your jobs by executing the SLURM command, squeue.
    image-20200402181509-21.png
  2. Monitor your jobs by adding --mail-user=<email-address> and --mail-type=ALL in your SLURM parameters.
    image-20200402181509-22.png
  3. Monitor your jobs in MATLAB application by clicking Parallel and then select Monitor Jobs.
    image-20200402181509-23.png

Example of MATLAB Job submission

Job that runs a Script

Reference: https://www.mathworks.com/help/parallel-computing/run-a-batch-job.html#bu62o10

  1. Create the script and type the following:
    edit mywave
  2. In the MATLAB Editor, create a for-loop:
    for i = 1:1024

       A(i) = sin(i*2*pi/1024);

    end
  3. Save the file and close the Editor.
  4. Submit the job by executing the following in the MATLAB Command Window:
    c = parcluster;

    c.AdditionalProperties.AdditionalSubmitArgs = '-p debug --qos=240c-1h_debug';

    j = batch(c,'mywave'); j.Tag = 'mywave_test';
  5. After the job finishes, you can retrieve and view its results:
    job = findJob(c,'Tag','mywave_test');

    output = load(job);

    plot(output.A)

    image-20200402181509-24.png

Job that runs a Function
  1. For your reference, here is a sample MATLAB function: birthday.m

    image-20200402181509-25.png

  2. Submit the job by executing the following in the MATLAB Command Window:
    c = parcluster;

    c.AdditionalProperties.AdditionalSubmitArgs = '-p debug --qos=240c-1h_debug';


    % 1  number of expected output

    % 20  input for groupSize

    j = batch(c,@birthday, 1, {20}); j.Tag = 'birthday_test';
  3. After the job finishes, you can now retrieve and view its results:
    job = findJob(c,'Tag','birthday_test');

    fetchOutputs(job)

    image-20200402181509-26.png

List of Toolboxes Available

Softwares in Saliksik
  • MATLAB Parallel Server 7.0
  • MATLAB 9.6
  • Simulink 9.3
Toolboxes in Saliksik
  • 5G Toolbox 1.1
  • Aerospace Blockset 4.1
  • Aerospace Toolbox 3.1
  • Antenna Toolbox 4.0
  • Audio Toolbox 2.0
  • Automated Driving Toolbox 2.0
  • AUTOSAR Blockset 2.0
  • Bioinformatics Toolbox 4.12
  • Communications Toolbox 7.1
  • Computer Vision Toolbox 9.0
  • Control System Toolbox 10.6
  • Curve Fitting Toolbox 3.5.9
  • Database Toolbox 9.1
  • Datafeed Toolbox 5.8.1
  • Deep Learning Toolbox 12.1
  • DSP System Toolbox 9.8
  • Econometrics Toolbox 5.2
  • Embedded Coder 7.2
  • Filter Design HDL Coder 3.1.5
  • Financial Instruments Toolbox 2.9
  • Financial Toolbox 5.13
  • Fixed-Point Designer 6.3
  • Fuzzy Logic Toolbox 2.5
  • Global Optimization Toolbox 4.1
  • GPU Coder 1.3
  • HDL Coder 3.14
  • HDL Verifier 5.6
  • Image Acquisition Toolbox 6.0
  • Image Processing Toolbox 10.4
  • Instrument Control Toolbox 4.0
  • LTE HDL Toolbox 1.3
  • LTE Toolbox 3.1
  • Mapping Toolbox 4.8
  • MATLAB Coder 4.2
  • MATLAB Report Generator 5.6
  • Mixed-Signal Blockset 1.0
  • Model Predictive Control Toolbox 6.3
  • Optimization Toolbox 8.3
  • Parallel Computing Toolbox 7.0
  • Partial Differential Equation Toolbox 3.2
  • Phased Array System Toolbox 4.1
  • Powertrain Blockset 1.5
  • Predictive Maintenance Toolbox 2.0
  • Reinforcement Learning Toolbox 1.0
  • RF Blockset 7.2
  • RF Toolbox 3.6
  • Risk Management Toolbox 1.5
  • Robotics
  • System Toolbox 2.2
  • Robust Control Toolbox 6.6
  • Sensor Fusion and Tracking Toolbox 1.1
  • SerDes Toolbox 1.0
  • Signal Processing Toolbox 8.2
  • SimBiology 5.8.2
  • SimEvents 5.6
  • Simscape 4.6
  • Simscape Driveline 2.16
  • Simscape Electrical 7.1
  • Simscape Fluids 2.6
  • Simscape Multibody 6.1
  • Simulink 3D Animation 8.2
  • Simulink Check 4.3
  • Simulink Code Inspector 3.4
  • Simulink Coder 9.1
  • Simulink Control Design 5.3
  • Simulink Coverage 4.3
  • Simulink Design Optimization 3.6
  • Simulink Report Generator 5.6
  • Simulink Requirements 1.3
  • Simulink Test 3.0
  • SoC Blockset 1.0
  • Stateflow 10.0
  • Statistics and Machine Learning Toolbox 11.5
  • Symbolic Math Toolbox 8.3
  • System Composer 1.0
  • System Identification Toolbox 9.10
  • Text Analytics Toolbox 1.3
  • Trading Toolbox 3.5.1
  • Vehicle Dynamics Blockset 1.2
  • Vehicle Network Toolbox 4.2
  • Vision HDL Toolbox 1.8
  • Wavelet Toolbox 5.2
  • WLAN Toolbox 2.1
Tags: