Project

General

Profile

Slurm » History » Version 10

Kerstin Paech, 09/19/2013 08:45 AM

1 1 Kerstin Paech
h1. How to run jobs on the euclides nodes
2 1 Kerstin Paech
3 7 Kerstin Paech
Use slurm to submit jobs to the euclides nodes (node1-8), ssh login access to those nodes will be restricted in the near future.
4 1 Kerstin Paech
5 9 Kerstin Paech
*Please read through this entire wikipage so everyone can make efficient use of this cluster*
6 9 Kerstin Paech
7 1 Kerstin Paech
h2. alexandria
8 1 Kerstin Paech
9 1 Kerstin Paech
*Please do not use alexandria as a compute node* - it's hardware is different from the nodes. It hosts our file server and other services that are important to us. 
10 1 Kerstin Paech
11 1 Kerstin Paech
You should use alexandria to
12 1 Kerstin Paech
- transfer files
13 1 Kerstin Paech
- compile your code
14 1 Kerstin Paech
- submit jobs to the nodes
15 1 Kerstin Paech
16 1 Kerstin Paech
If you need to debug, please start an interactive job to one of the nodes using slurm. For instructions see below.
17 1 Kerstin Paech
18 1 Kerstin Paech
h2. euclides nodes
19 1 Kerstin Paech
20 1 Kerstin Paech
Job submission to the euclides nodes is handled by the slurm jobmanager (see http://slurm.schedmd.com and https://computing.llnl.gov/linux/slurm/). 
21 1 Kerstin Paech
*Important: In order to run jobs, you need to be added to the slurm accounting system - please contact Kerstin*
22 1 Kerstin Paech
23 4 Kerstin Paech
All slurm commands listed below have very helpful man pages (e.g. man slurm, man squeue, ...). 
24 4 Kerstin Paech
25 4 Kerstin Paech
If you are already familiar with another jobmanager the following information may be helpful to you http://slurm.schedmd.com/rosetta.pdf‎.
26 1 Kerstin Paech
27 1 Kerstin Paech
h3. Scheduling of Jobs
28 1 Kerstin Paech
29 9 Kerstin Paech
At this point there are two queues, called partitions in slurm: 
30 9 Kerstin Paech
* *normal* which is the default partition your jobs will be sent to if you do not specify it otherwise. At this point there is a time limit of
31 9 Kerstin Paech
two days. Jobs at this point can only run on 1 node.
32 9 Kerstin Paech
* *debug* which is meant for debugging, you can only run one job at a time, other jobs submitted will remain in the queue
33 1 Kerstin Paech
34 9 Kerstin Paech
We have also set up a scheduler that goes beyond the first come first serve - some jobs will be favoured over others depending
35 9 Kerstin Paech
on how much you or your group have been using euclides in the past 2 weeks, how long the job has been queued and how much
36 9 Kerstin Paech
resources it will consume.
37 9 Kerstin Paech
38 9 Kerstin Paech
This is serves as a starting point, we may have to adjust parameters once the slurm jobmanager is used. Job scheduling is a complex
39 9 Kerstin Paech
issue and we still need to build expertise and gain experience what are the user needs in our groups. Please feel free to speak out if
40 9 Kerstin Paech
there is something that can be improved without creating an unfair disadvantage for other users.
41 9 Kerstin Paech
42 9 Kerstin Paech
You can run interactive jobs on both partitions.
43 9 Kerstin Paech
44 1 Kerstin Paech
h3. Running an interactive job with slurm
45 1 Kerstin Paech
46 9 Kerstin Paech
To run an interactive job with slurm in the default partition, use
47 1 Kerstin Paech
48 1 Kerstin Paech
<pre>
49 1 Kerstin Paech
srun -u bash -i
50 1 Kerstin Paech
</pre>
51 9 Kerstin Paech
52 9 Kerstin Paech
In case the 'normal' partition is overcrowded, to use the 'debug' partition, use:
53 9 Kerstin Paech
<pre>
54 9 Kerstin Paech
srun --account cosmo_debug -p debug -u bash -i # if you are part of the Cosmology group
55 9 Kerstin Paech
srun --account euclid_debug -p debug -u bash -i  # if you are part of the EuclidDM group
56 9 Kerstin Paech
</pre>
57 9 Kerstin Paech
58 9 Kerstin Paech
59 1 Kerstin Paech
60 1 Kerstin Paech
As soon as a slot is open, slurm will log you in to an interactive session on one of the nodes.
61 1 Kerstin Paech
62 10 Kerstin Paech
h3. Running a simple once core batch job with slurm using the default partition
63 1 Kerstin Paech
64 1 Kerstin Paech
* To see what queues are available to you (called partitions in slurm), run:
65 1 Kerstin Paech
<pre>
66 1 Kerstin Paech
sinfo
67 1 Kerstin Paech
</pre>
68 1 Kerstin Paech
69 1 Kerstin Paech
* To run slurm, create a myjob.slurm containing the following information:
70 1 Kerstin Paech
<pre>
71 1 Kerstin Paech
#!/bin/bash
72 1 Kerstin Paech
#SBATCH --output=slurm.out
73 1 Kerstin Paech
#SBATCH --error=slurm.err
74 1 Kerstin Paech
#SBATCH --mail-user <put your email address here>
75 1 Kerstin Paech
#SBATCH --mail-type=BEGIN
76 8 Kerstin Paech
#SBATCH -p normal
77 1 Kerstin Paech
78 1 Kerstin Paech
/bin/hostname
79 1 Kerstin Paech
</pre>
80 1 Kerstin Paech
81 1 Kerstin Paech
* To submit a batch job use:
82 1 Kerstin Paech
<pre>
83 1 Kerstin Paech
sbatch myjob.slurm
84 1 Kerstin Paech
</pre>
85 1 Kerstin Paech
86 1 Kerstin Paech
* To see the status of you job, use 
87 1 Kerstin Paech
<pre>
88 1 Kerstin Paech
squeue
89 1 Kerstin Paech
</pre>
90 1 Kerstin Paech
91 1 Kerstin Paech
* For some more information on your job use
92 1 Kerstin Paech
<pre>
93 1 Kerstin Paech
scontrol show job <jobid>
94 1 Kerstin Paech
</pre>
95 1 Kerstin Paech
the <jobid> you can get from using squeue.
96 1 Kerstin Paech
97 10 Kerstin Paech
h3. Running a simple once core batch job with slurm using the debug partition
98 10 Kerstin Paech
99 10 Kerstin Paech
Change the partition to debug and add the appropriate account depending if you're part of
100 10 Kerstin Paech
the euclid or cosmology group.
101 10 Kerstin Paech
102 10 Kerstin Paech
<pre>
103 10 Kerstin Paech
#!/bin/bash
104 10 Kerstin Paech
#SBATCH --output=slurm.out
105 10 Kerstin Paech
#SBATCH --error=slurm.err
106 10 Kerstin Paech
#SBATCH --mail-user <put your email address here>
107 10 Kerstin Paech
#SBATCH --mail-type=BEGIN
108 10 Kerstin Paech
#SBATCH -p debug
109 10 Kerstin Paech
#SBATCH -account [cosmo_debug/euclid_debug]
110 10 Kerstin Paech
111 10 Kerstin Paech
/bin/hostname
112 10 Kerstin Paech
</pre>
113 10 Kerstin Paech
114 10 Kerstin Paech
115 6 Kerstin Paech
h3. Batch script for running a multi-core job
116 6 Kerstin Paech
117 6 Kerstin Paech
To run a 4 core job you can use
118 6 Kerstin Paech
<pre>
119 6 Kerstin Paech
#!/bin/bash
120 6 Kerstin Paech
#SBATCH --output=slurm.out
121 6 Kerstin Paech
#SBATCH --error=slurm.err
122 6 Kerstin Paech
#SBATCH --mail-user <put your email address here>
123 6 Kerstin Paech
#SBATCH --mail-type=BEGIN
124 1 Kerstin Paech
#SBATCH -n 4
125 6 Kerstin Paech
126 10 Kerstin Paech
<mpirun call/pogram>
127 6 Kerstin Paech
128 6 Kerstin Paech
</pre>
Redmine Appliance - Powered by TurnKey Linux