Jobs without InfiniBand (Outdated)¶
Warning
This page is outdated.
Note
- These hints are meant only for the downtime of the IB fabric or parts of it. Do not use this setup in a normal, healthy system!
- This setup must not run by jobs producing large amounts of output data!
- MPI jobs over multiple nodes can not run.
- Jobs using /scratch or /lustre/ssd can not run.\
At the moment when parts of the IB stop we will start batch system plugins to parse for this batch
system option: --comment=NO_IB
. Jobs with this option set can run on nodes without
InfiniBand access if (and only if) they have set the --tmp
-option as well:
From the Slurm documentation:
--tmp
= Specify a minimum amount of temporary disk space per node. Default units are megabytes unless the SchedulerParameters configuration parameter includes the "default_gbytes" option for gigabytes. Different units can be specified using the suffix [K|M|G|T]. This option applies to job allocations.
Keep in mind: Since the scratch filesystem are not available and the project filesystem is read-only mounted at the compute nodes you have to work in /tmp.
A simple job script should do this:
- create a temporary directory on the compute node in
/tmp
and go there - start the application (under /sw/ or /projects/)using input data from somewhere in the project filesystem
- archive and transfer the results to some global location
#SBATCH --comment=NO_IB
#SBATCH --tmp 2G
MYTEMP=/tmp/$JOBID
mkdir $MYTEMP;
cd $MYTEMP
<path_to_binary>/myapp < <path_to_input_data> > ./$JOBID_out
# tar if it makes sense!
rsync -a $MYTEMP taurusexport3:<path_to_output_data>/
rm -rf $MYTEMP