Project

General

Profile

Slurm » History » Version 9

Kerstin Paech, 09/19/2013 07:27 AM

1 1 Kerstin Paech
h1. How to run jobs on the euclides nodes
2 1 Kerstin Paech
3 7 Kerstin Paech
Use slurm to submit jobs to the euclides nodes (node1-8), ssh login access to those nodes will be restricted in the near future.
4 1 Kerstin Paech
5 9 Kerstin Paech
*Please read through this entire wikipage so everyone can make efficient use of this cluster*
6 9 Kerstin Paech
7 1 Kerstin Paech
h2. alexandria
8 1 Kerstin Paech
9 1 Kerstin Paech
*Please do not use alexandria as a compute node* - it's hardware is different from the nodes. It hosts our file server and other services that are important to us. 
10 1 Kerstin Paech
11 1 Kerstin Paech
You should use alexandria to
12 1 Kerstin Paech
- transfer files
13 1 Kerstin Paech
- compile your code
14 1 Kerstin Paech
- submit jobs to the nodes
15 1 Kerstin Paech
16 1 Kerstin Paech
If you need to debug, please start an interactive job to one of the nodes using slurm. For instructions see below.
17 1 Kerstin Paech
18 1 Kerstin Paech
h2. euclides nodes
19 1 Kerstin Paech
20 1 Kerstin Paech
Job submission to the euclides nodes is handled by the slurm jobmanager (see http://slurm.schedmd.com and https://computing.llnl.gov/linux/slurm/). 
21 1 Kerstin Paech
*Important: In order to run jobs, you need to be added to the slurm accounting system - please contact Kerstin*
22 1 Kerstin Paech
23 4 Kerstin Paech
All slurm commands listed below have very helpful man pages (e.g. man slurm, man squeue, ...). 
24 4 Kerstin Paech
25 4 Kerstin Paech
If you are already familiar with another jobmanager the following information may be helpful to you http://slurm.schedmd.com/rosetta.pdf‎.
26 1 Kerstin Paech
27 1 Kerstin Paech
h3. Scheduling of Jobs
28 1 Kerstin Paech
29 9 Kerstin Paech
At this point there are two queues, called partitions in slurm: 
30 9 Kerstin Paech
* *normal* which is the default partition your jobs will be sent to if you do not specify it otherwise. At this point there is a time limit of
31 9 Kerstin Paech
two days. Jobs at this point can only run on 1 node.
32 9 Kerstin Paech
* *debug* which is meant for debugging, you can only run one job at a time, other jobs submitted will remain in the queue
33 1 Kerstin Paech
34 9 Kerstin Paech
We have also set up a scheduler that goes beyond the first come first serve - some jobs will be favoured over others depending
35 9 Kerstin Paech
on how much you or your group have been using euclides in the past 2 weeks, how long the job has been queued and how much
36 9 Kerstin Paech
resources it will consume.
37 9 Kerstin Paech
38 9 Kerstin Paech
This is serves as a starting point, we may have to adjust parameters once the slurm jobmanager is used. Job scheduling is a complex
39 9 Kerstin Paech
issue and we still need to build expertise and gain experience what are the user needs in our groups. Please feel free to speak out if
40 9 Kerstin Paech
there is something that can be improved without creating an unfair disadvantage for other users.
41 9 Kerstin Paech
42 9 Kerstin Paech
You can run interactive jobs on both partitions.
43 9 Kerstin Paech
44 1 Kerstin Paech
h3. Running an interactive job with slurm
45 1 Kerstin Paech
46 9 Kerstin Paech
To run an interactive job with slurm in the default partition, use
47 1 Kerstin Paech
48 1 Kerstin Paech
<pre>
49 1 Kerstin Paech
srun -u bash -i
50 1 Kerstin Paech
</pre>
51 9 Kerstin Paech
52 9 Kerstin Paech
In case the 'normal' partition is overcrowded, to use the 'debug' partition, use:
53 9 Kerstin Paech
<pre>
54 9 Kerstin Paech
srun --account cosmo_debug -p debug -u bash -i # if you are part of the Cosmology group
55 9 Kerstin Paech
srun --account euclid_debug -p debug -u bash -i  # if you are part of the EuclidDM group
56 9 Kerstin Paech
</pre>
57 9 Kerstin Paech
58 9 Kerstin Paech
59 1 Kerstin Paech
60 1 Kerstin Paech
As soon as a slot is open, slurm will log you in to an interactive session on one of the nodes.
61 1 Kerstin Paech
62 5 Kerstin Paech
h3. Running a simple once core batch job with slurm
63 1 Kerstin Paech
64 1 Kerstin Paech
* To see what queues are available to you (called partitions in slurm), run:
65 1 Kerstin Paech
<pre>
66 1 Kerstin Paech
sinfo
67 1 Kerstin Paech
</pre>
68 1 Kerstin Paech
69 1 Kerstin Paech
* To run slurm, create a myjob.slurm containing the following information:
70 1 Kerstin Paech
<pre>
71 1 Kerstin Paech
#!/bin/bash
72 1 Kerstin Paech
#SBATCH --output=slurm.out
73 1 Kerstin Paech
#SBATCH --error=slurm.err
74 1 Kerstin Paech
#SBATCH --mail-user <put your email address here>
75 1 Kerstin Paech
#SBATCH --mail-type=BEGIN
76 8 Kerstin Paech
#SBATCH -p normal
77 1 Kerstin Paech
78 1 Kerstin Paech
/bin/hostname
79 1 Kerstin Paech
</pre>
80 1 Kerstin Paech
81 1 Kerstin Paech
* To submit a batch job use:
82 1 Kerstin Paech
<pre>
83 1 Kerstin Paech
sbatch myjob.slurm
84 1 Kerstin Paech
</pre>
85 1 Kerstin Paech
86 1 Kerstin Paech
* To see the status of you job, use 
87 1 Kerstin Paech
<pre>
88 1 Kerstin Paech
squeue
89 1 Kerstin Paech
</pre>
90 1 Kerstin Paech
91 1 Kerstin Paech
* For some more information on your job use
92 1 Kerstin Paech
<pre>
93 1 Kerstin Paech
scontrol show job <jobid>
94 1 Kerstin Paech
</pre>
95 1 Kerstin Paech
the <jobid> you can get from using squeue.
96 6 Kerstin Paech
97 6 Kerstin Paech
h3. Batch script for running a multi-core job
98 6 Kerstin Paech
99 6 Kerstin Paech
To run a 4 core job you can use
100 6 Kerstin Paech
<pre>
101 6 Kerstin Paech
#!/bin/bash
102 6 Kerstin Paech
#SBATCH --output=slurm.out
103 6 Kerstin Paech
#SBATCH --error=slurm.err
104 6 Kerstin Paech
#SBATCH --mail-user <put your email address here>
105 6 Kerstin Paech
#SBATCH --mail-type=BEGIN
106 6 Kerstin Paech
#SBATCH -n 4
107 6 Kerstin Paech
108 6 Kerstin Paech
mpirun <executable>
109 6 Kerstin Paech
110 6 Kerstin Paech
</pre>
Redmine Appliance - Powered by TurnKey Linux