Submitted by periv4 on
These are the most common LSF commands used:
Display the current status and job id, add the -w flag for wide information.
Example of bsub:
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 431956 user1 RUN gpu-v100 bmiclusterp 4*bmi-r740- bash Feb 21 11:12
bjobs -l <job id> will give more descriptions about the job
bjobs -l 431956
Job <431956>, User <user1>, Project <default>, Status <RUN>, Queue <gpu-v100>, Interactive pseudo-terminal shell mode, Job Priority <50>, Command <bash>, Esub <set-defaults dynamic-reject> Tue Feb 21 11:12:06: Submitted from host <bmiclusterp2>, CWD <$HOME>, 4 Process ors Requested, Requested Resources <span[hosts=1] rusage[m em=128000] order[cpuf:-mem]>, Requested GPU <num=1>; Tue Feb 21 11:12:06: Started on 4 Hosts/Processors <4*bmi-r740-02>, Execution H ome </users/user1>, Execution CWD </users/user1>; Wed Feb 22 14:02:39: Resource usage collected. The CPU time used is 183 seconds. MEM: 302 Mbytes; SWAP: 0 Mbytes; NTHREAD: 56 PGID: 34918; PIDs: 34918 RUNLIMIT 2880.0 min MEMLIMIT 125 G MEMORY USAGE: MAX MEM: 302 Mbytes; AVG MEM: 300 Mbytes; MEM Efficiency: 0.24% CPU USAGE: CPU PEAK: 0.08 ; CPU Efficiency: 2.02% SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - EXTERNAL MESSAGES: MSG_ID FROM POST_TIME MESSAGE ATTACHMENT 0 user1 Feb 21 11:12 bmi-r740-02:gpus=1; N RESOURCE REQUIREMENT DETAILS: Combined: select[(ngpus>0) && (type == local)] order[cpuf:-mem] rusage[mem=128 000.00:ngpus_physical=1.00] span[hosts=1] Effective: select[( (ngpus>0)) && (type == local)] order[cpuf:-mem] rusage[mem =128000.00,ngpus_physical=1.00] span[hosts=1] GPU REQUIREMENT DETAILS: Combined: num=1:mode=shared:mps=no:j_exclusive=no:gvendor=nvidia Effective: num=1:mode=shared:mps=no:j_exclusive=no:gvendor=nvidia
This will display the available queues on the system, add -w flag for wide information and -l flag for more information.
for example:
#bqueues -w
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP private1 60 Open:Active - - - - 0 0 0 0 docker 60 Open:Active - - - - 0 0 0 0 upgrade 60 Open:Active - - - - 0 0 0 0 gpu-v100 60 Open:Active - - - - 6 0 6 0 gpu-a100 60 Open:Active - - - - 12 0 12 0
This will terminate a job, for example:
bkill <job id>
This command will stop your job, usage:
bstop <job id>
This command will resume your job, usage:
bresume <job id>
This command will change the parameters of the job, most are admin controlled but the wall time can be changed by typing:
bmod -W <Current time + Extended time> <job id>
This command will display all compute nodes on the HPC cluster.
This command will display the current utilization of the compute nodes, use with the flag -w to wide the description.