How to use supercomputers from Texas Advanced Computing Center (TACC)

Boost your training speed

Last updated on 2019-10-06 2 min read

Before you begin

Create a TACC account (https://portal.tacc.utexas.edu/)
Solve Multi-factor authentication at TACC user portal
- different from utexas multi-factor authentication
- https://portal.tacc.utexas.edu/tutorials/multifactor-authentication

```
ssh xy0000@hikari.tacc.utexas.edu
```
- Replace xy0000 with your own eid
- Replace hikari with your own system (eg. maverick2, lonestar5)
Login using your TACC password and multi-factor authentication token code

Transfer file

  # For file
  localhost$ scp path/to/file xy0000@maverick2.tacc.utexas.edu:\$WORK/path
  # For folder
  localhost$ tar cvf ./mydata.tar mydata                                   # create archive
  localhost$ scp     ./mydata.tar xy0000@maverick2.tacc.utexas.edu:\$WORK  # transfer archive

WORK directory is usually larger than HOME directory

Run

Method 1 (sbatch)

Do not run the code directly at login.

Create a .slurm file

Example slurm file below:

#!/bin/bash
#----------------------------------------------------
# Example SLURM job script to run code
#----------------------------------------------------
#SBATCH -J lab_job                          # Job name
#SBATCH -o console_output.txt             # Name of stdout output file
#SBATCH -e console_error_output.txt     # Name of stdout output error file
#SBATCH -p normal                             # Queue name
#SBATCH -N 1                                  # Total number of nodes requested, multi-node means parallel
#SBATCH -n 1                                  # Total number of task requested
#SBATCH -t 01:30:00                         # Run time (hh:mm:ss) your allocation ends in 1.5 h (program can be unfinished)
# The next line is required if the user has more than one project
#SBATCH -A XXXX                             # Project/allocation number, the one you apply to TACC with
# This example will run 1 task on 1 nodes
# Launch the job, the file you want to run 
python ./file.py

Run slurm

login2.hikari(26)$ sbatch your_filename.slurm

Watch the job

login2.hikari(41)$ watch squeue
login2.hikari(29)$ squeue
JOBID   PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
47767      normal  lab_job   xy0000  R       0:06      1 c262-102

Check console output
```
cat console_output.txt 
```
- the file you define in slurm
Cancel job
```
login2.hikari(41)$ scancel 47767 
```
- scancel JOBID

Run

Method 2 (idev)

```
login2.hikari(36)$ idev -t 01:30:00 
```
- idev: interactive development something
- -t the total time you requested
- This one doesn’t need slurm file
```
c262-104.hikari(2)$ python file.py 
```
- Run the code as normal like in a terminal, same speed as the one with slurm

c262-104.hikari(3)$ squeue
         JOBID   PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       47769      normal idv08693   xy0000  R       3:20      1 c262-104

Logout

c262-104.hikari(4)$ exit
Connection to c262-104 closed.
Cleaning up: submitted job (yes) removing job 47769.

Deep Learning Using Python

Use Python 3 if you need h5py

Input following before running your code

module load intel/17.0.4 python3/3.6.3
module load cuda/10.0 cudnn/7.6.2 nccl/2.4.7
pip3 install --user tensorflow-gpu==1.13.2
pip3 install --user keras
pip3 install --user h5py
export HDF5_USE_FILE_LOCKING='FALSE'

TACC

Xuewen Yao

PhD student

My research interests include deep learning, wearable computing, and activity/affect detection.