Common LSF commands | High Performance Computing

Submitted by periv4 on Wed, 02/22/2023 - 13:52

These are the most common LSF commands used:

bjobs

Display the current status and job id, add the -w flag for wide information.

Example of bsub:

JOBID   USER  STAT  QUEUE    FROM_HOST    EXEC_HOST   JOB_NAME  SUBMIT_TIME
431956  user1 RUN   gpu-v100 bmiclusterp  4*bmi-r740- bash      Feb 21 11:12

bjobs -l <job id> will give more descriptions about the job

bjobs -l 431956

Job <431956>, User <user1>, Project <default>, Status <RUN>, Queue <gpu-v100>,
                     Interactive pseudo-terminal shell mode, Job Priority <50>,
                     Command <bash>, Esub <set-defaults dynamic-reject>
Tue Feb 21 11:12:06: Submitted from host <bmiclusterp2>, CWD <$HOME>, 4 Process
                     ors Requested, Requested Resources <span[hosts=1] rusage[m
                     em=128000] order[cpuf:-mem]>, Requested GPU <num=1>;
Tue Feb 21 11:12:06: Started on 4 Hosts/Processors <4*bmi-r740-02>, Execution H
                     ome </users/user1>, Execution CWD </users/user1>;
Wed Feb 22 14:02:39: Resource usage collected.
                     The CPU time used is 183 seconds.
                     MEM: 302 Mbytes;  SWAP: 0 Mbytes;  NTHREAD: 56
                     PGID: 34918;  PIDs: 34918

 RUNLIMIT
 2880.0 min

 MEMLIMIT
    125 G

 MEMORY USAGE:
 MAX MEM: 302 Mbytes;  AVG MEM: 300 Mbytes; MEM Efficiency: 0.24%

 CPU USAGE:
 CPU PEAK: 0.08 ;  CPU Efficiency: 2.02%

 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -

 EXTERNAL MESSAGES:
 MSG_ID FROM       POST_TIME      MESSAGE                             ATTACHMENT
 0      user1     Feb 21 11:12   bmi-r740-02:gpus=1;                     N

 RESOURCE REQUIREMENT DETAILS:
 Combined: select[(ngpus>0) && (type == local)] order[cpuf:-mem] rusage[mem=128
                     000.00:ngpus_physical=1.00] span[hosts=1]
 Effective: select[( (ngpus>0)) && (type == local)] order[cpuf:-mem] rusage[mem
                     =128000.00,ngpus_physical=1.00] span[hosts=1]

 GPU REQUIREMENT DETAILS:
 Combined: num=1:mode=shared:mps=no:j_exclusive=no:gvendor=nvidia
 Effective: num=1:mode=shared:mps=no:j_exclusive=no:gvendor=nvidia

bqueues

This will display the available queues on the system, add -w flag for wide information and -l flag for more information.

for example:

#bqueues -w

QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP
private1        60  Open:Active       -    -    -    -     0     0     0     0
docker          60  Open:Active       -    -    -    -     0     0     0     0
upgrade         60  Open:Active       -    -    -    -     0     0     0     0
gpu-v100        60  Open:Active       -    -    -    -     6     0     6     0
gpu-a100        60  Open:Active       -    -    -    -    12     0    12     0

bkill

This will terminate a job, for example:

bkill <job id>

bstop

This command will stop your job, usage:

bstop <job id>

bresume

This command will resume your job, usage:

bresume <job id>

bmod

This command will change the parameters of the job, most are admin controlled but the wall time can be changed by typing:

bmod -W <Current time + Extended time> <job id>

bhosts

This command will display all compute nodes on the HPC cluster.

lsload

This command will display the current utilization of the compute nodes, use with the flag -w to wide the description.