skip to primary navigationskip to content
 

Monitoring jobs

In SLURM, the command squeue shows what jobs are currently submitted in the queueing system and the command squeue -u spqr1 shows only those jobs belonging to the user spqr1 (other selections are possible, e.g. use -A to select on a particular project). An example (partial) output from CSD3 is shown below:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

            264496   skylake TROVE-sc    abc123 PD      0:00    512 (QOSMaxCpuPerUserLimit)
264551 skylake mcl6_m5_ abc123 R 11:53:06 2 cpu-e-[208-209] 264834 skylake etac_pi_ spqr1 R 4:19:15 1 cpu-e-11 264116 skylake-h gc7 xyz10 R 1-03:47:20 32 cpu-e-[545-576] 264225 knl 32_k0.9 spqr2 R 2:19:14 64 knl-e-[21,101,104,112,115,118,122,125,127,130,134,136,138,194,196,199-200,202-204,210-214,216,218,221-223,225,229-230,232-233,236,239-244,246,248,251-252,254-255,258,260,272,274-275,278-279,281-282,284,287-288,294,297-298,302] 264876 pascal gpujob xyz12 R 3:02:07 4 gpu-e-[53,56,59,63]

In the above skylake, knl and pascal indicate jobs targeting Peta4-Skylake, Peta3-KNL and Wilkes2-GPU respectively. If the state is PENDING (PD), i.e. the job is still waiting in the queue and not yet running, the final column lists the reason for this. E.g. the pending reason QOSMaxCpuPerUserLimit shows that job 264496 is blocked because the user is already using as many CPUs as is permitted by their service level and quality of service. If the state is RUNNING (R) the same column lists which nodes have been allocated to the job.

The jobids reported as mmmm_n are elements of an array job, where mmmm is the SLURM_ARRAY_JOB_ID common to all jobs in the array, and n is the array index (SLURM_ARRAY_TASK_ID).

The command scontrol is a more powerful command allowing more detailed queries. E.g. to examine a particular job with id <jobid> in detail:

scontrol show job <jobid>

or

scontrol show node <nodename>

to see information regarding the node <nodename>.

Schematic representations of activity across the entire system can be obtained from sinfo and sview.

Further details can be found on the manual pages.