#!/bin/csh # # Load Sharing Facility (LSF) options # #BSUB -L /bin/csh # job shell #BSUB -J myjob.csh # job title #BSUB -o myjob.out # output file (stdout and stderr merged) #BSUB -q xx_8nm # default job queue #BSUB -r # re-runnable (if the machine crashes) #and possibly also:
#BSUB -N # mail me when the job finishes #BSUB -B # mail me when the job startsThere are some options we urge you NOT to use! E.g. the options that limit file size and CPU usage are for most jobs not needed and in fact quite often they merely create confusion.
LSF queue description ========== ============= xx_8nm Short queue (8 mins), intended for tests xx_1nh Medium queue (1 hr) xx_8nh Long queue (8 hrs) xx_1nd Very Long queue (24 hrs), with lower priority xx_deltape Special queue for copying data to tapeThese queues are available both from DXPLUS and HPPLUS clusters, but not from LXPLUS.
Other available batch queues:
DELPHI users may be interested also in the following queues:
LSF Function
=== ========
bjobs to see the status of your batch jobs
bpeek to look at the output of a running job
bkill to delete a batch job
Please note that these commands must be used
always from the same cluster,
from which the job has been submitted.
delsub -J "myjob[1-20]" -o myjob.out.%I myjob.csh
The important points here:
In your script myjob.csh you can construct the PDLINPUT file like this, using the environment variable LSB_JOBINDEX (which is defined only in a job array context!):
if ( $?LSB_JOBINDEX ) then # Is this a job in a job array?
echo "VID=EK1402,NUMBER=${LSB_JOBINDEX}" > PDLINPUT
else # not a job array? then just the first file
echo "VID=EK1402,NUMBER=1" > PDLINPUT
endif
A more realistic, and slightly more complicated example:
if you want to read the first 200 files of FATMEN nickname
XSHORT97_E2 in 20 jobs of 10 files, construct your PDLINPUT like:
if ( $?LSB_JOBINDEX ) then # Is this a job in a job array?
@ last = ${LSB_JOBINDEX} * 10
@ first = ${last} - 9
echo "FAT=XSHORT97_E2/C${first}-${last}" > PDLINPUT
else # not a job array? then just the first file
echo "FAT=XSHORT97_E2/C1" > PDLINPUT
endif
Of course you can use this variable as well to make sure your
output files are uniquely named:
rfcp ww.ntp shd01:${DELPHI_PAW}/XSHORT97_E2_job${LSB_JOBINDEX}.ntp
rfcp myfile.rz ${CORE_SCRATCH}/myfile.${LSB_JOBINDEX}.rz
So: you can the use the %I in the (embedded) LSF options,
and the environment variable LSB_JOBINDEX in your script
Other commands: to bpeek or bkill the individual jobs, do like this:
bkill -J "myjob[16]"
Job <34416[16]> is being terminated
Note again the use of double quotes to escape the special character "["
bkill 34416 will kill ALL jobs in this job array!
It may happen that you want to send a special user signal to the executable only, and not to the script that has created it. For example, your SKELANA job is producing an Ntuple, and for some reason you want to interrupt this, but still get whatever has been produced until now. So you want to stop just the executable, and let the rest of your script take care of the distribution of the output (rfcp, ftp, ...). If you send a signal called USR1 to the SKELANA executable, it will be trapped and will finish immediately. To send this signal to just the executable only, you need delkill.
For example: you are analysing LEP-2 data, the FATMEN nickname of the data set is a parameter for your script myjob.csh, and you would like to submit
myjob.csh XSDST97_E2/C1-5with embedded options as stated before.
Now embedded LSF options are only considered by bsub if you redirect the standard input:
bsub < myjob.cshBut this means that you cannot specify your parameter XSDST97_E2/C1-5!
delsub takes care of this problem, and allows you to submit your job as
delsub myjob.csh XSDST97_E2/C1-5