HPC compatibility

This software was designed to be easily scalable on High Performance Computing (HPC) facilities. In this page, we show how to access and work on two facilities from the Lawrence Berkeley National Laboratory, the Lawrencium and Cori clusters.

Supercomputer access

Lawrencium cluster (LBNL)

Using your credentials, you can access the Lawrencium cluster from the Login Node as follows:

ssh <username>@lrc-login.lbl.gov

The DAS data in Lawrencium can be found in the bear mounted drive under the /clusterfs/bear/ML_trainingDataset/ repository. In order to transfer data to or from Lawrencium, we can follow the instructions in this page and use the Data Transfer Node with Secure Copy ( scp ), for instance as follows:

scp -r <username>@lrc-xfer.lbl.gov:/clusterfs/bear/ML_trainingDataset/1min_ch4650_4850 .

The above command will be 10 days worth of 1-minute 200-channel Distributed Acoustic Sensing data to your local repository. Data should generally be stored under the so-called SCRATCH folder, the full path to that folder in Lawrencium is as follows:

/global/scratch/<username>

Interactive:

salloc -N 3 -t 3:00:00 -C haswell -q interactive

Cori cluster (NERSC)

Software environment

Load/unload modules

Conda environment

In some clusters, like in Lawrencium, there might not be a module available that already contains all the dependent packages needed to fully run the MLDAS software. In such case, a custom Conda environment containing all the dependencies can be created and loaded every time one wishes to use the software. The procedure is pretty simple and consists first of loading Python:

module load python/3.6

Then, we can initialize and activate the custom environment straightforwardly as follows:

conda create -p /global/scratch/vdumont/myenv python=3.6
source activate /global/scratch/vdumont/myenv

Once activated, we can install all the dependencies to make the MLDAS software work. We note that the default PyTorch package will be the one with CUDA enabled, which is what we want:

conda install h5py matplotlib numpy pillow pyyaml scipy pytorch torchvision
pip install hdf5storage

MATLAB module

Some scripts are written in MATLAB and in order to execute those scripts on the cluster, the software should be loaded. There are several version available, we found that matlab/r2017b is pretty stable and loads quickly:

module load matlab/r2017b

Below we show and example command line to execute a script on the terminal. Warning: While the filename for a MATLAB script usually ends with .m , the extension should not be written when calling the script in the command line:

matlab -nodisplay -nosplash -nodesktop -r SCRIPT_convert_dsi2hdf5

Manual Installation

Octave on Lawrencium

https://wiki.octave.org/wiki/index.php?title=Building https://bugs.gentoo.org/730222 https://trac.macports.org/ticket/49824

Account & Jobs

Account manager

It can happen that you don’t remember which account and/or partitions your username is associated to, such information can be easily retrieved using the sacctmgr command as follows:

sacctmgr show assoc user=<username>

sinfo --partition lr3

Monitoring usage

In Cori, one can display the available storage space by typing myquota in the terminal.

SLURM directives

Job execution