Slurm » History » Version 81
Martin Kuemmel, 01/23/2018 09:45 AM
1 | 21 | Kerstin Paech | {{toc}} |
---|---|---|---|
2 | 21 | Kerstin Paech | |
3 | 53 | Sebastian Bocquet | h1. Hardware overview |
4 | 53 | Sebastian Bocquet | |
5 | 67 | Martin Kuemmel | You access the Euclid cluster through either cosmogw.kosmo.physik.uni-muenchen.de or cosmofs1.kosmo.physik.uni-muenchen.de |
6 | 1 | Kerstin Paech | |
7 | 67 | Martin Kuemmel | * cosmogw and cosmofs1 are gateway machines and should *not* be used for computing |
8 | 77 | Martin Kuemmel | * there are 21 compute nodes named euclides01--euclides11, euclides12-os--euclides17-os (called the os-machines hereafter) and euclides18--euclides21; |
9 | 77 | Martin Kuemmel | * euclides01-05 are available via cosmofs1; |
10 | 77 | Martin Kuemmel | * euclides06-21 are available via cosmogw; |
11 | 77 | Martin Kuemmel | * euclides01-euclides11 have each 32 logical CPUs and 64GB of RAM; |
12 | 77 | Martin Kuemmel | * euclides12-euclides21 have each 56 logical CPUs and 128GB of RAM; |
13 | 53 | Sebastian Bocquet | |
14 | 1 | Kerstin Paech | h1. How to run jobs on the euclides nodes (using Slurm) |
15 | 1 | Kerstin Paech | |
16 | 74 | Martin Kuemmel | Use slurm to submit jobs or login to the euclides nodes (euclides01-21). |
17 | 42 | Kerstin Paech | |
18 | 9 | Kerstin Paech | *Please read through this entire wikipage so everyone can make efficient use of this cluster* |
19 | 9 | Kerstin Paech | |
20 | 77 | Martin Kuemmel | h2. Control nodes cosmogw and cosmofs1 |
21 | 1 | Kerstin Paech | |
22 | 77 | Martin Kuemmel | The machines cosmofs1 and cosmogw are the login nodes and submit nodes for the slurm queues, so please do not use them as a simple compute nodes - it's hardware is different from the nodes. It hosts our file server and other services that are important to us. |
23 | 1 | Kerstin Paech | |
24 | 68 | Martin Kuemmel | You should use cosmogw or cosmofs1 to |
25 | 1 | Kerstin Paech | * transfer files |
26 | 68 | Martin Kuemmel | * compile your code |
27 | 51 | Sebastian Bocquet | * submit jobs to the nodes via the slurm queues |
28 | 51 | Sebastian Bocquet | |
29 | 51 | Sebastian Bocquet | If you need to debug and would like to login to a node, please start an interactive job to one of the nodes using slurm. For instructions see below. |
30 | 51 | Sebastian Bocquet | |
31 | 51 | Sebastian Bocquet | h2. euclides nodes |
32 | 1 | Kerstin Paech | |
33 | 1 | Kerstin Paech | Job submission to the euclides nodes is handled by the slurm jobmanager (see http://slurm.schedmd.com and https://computing.llnl.gov/linux/slurm/). |
34 | 1 | Kerstin Paech | *Important: In order to run jobs, you need to be added to the slurm accounting system - please contact the admin* |
35 | 1 | Kerstin Paech | |
36 | 4 | Kerstin Paech | All slurm commands listed below have very helpful man pages (e.g. man slurm, man squeue, ...). |
37 | 4 | Kerstin Paech | |
38 | 4 | Kerstin Paech | If you are already familiar with another jobmanager the following information may be helpful to you http://slurm.schedmd.com/rosetta.pdf. |
39 | 1 | Kerstin Paech | |
40 | 75 | Martin Kuemmel | h3. Scheduling of Jobs |
41 | 69 | Martin Kuemmel | |
42 | 77 | Martin Kuemmel | At this point there are three queues, called partitions in slurm: |
43 | 1 | Kerstin Paech | * on cosmofs1: |
44 | 69 | Martin Kuemmel | ** *normal* which is the default partition your jobs will be sent to if you do not specify it otherwise. At this point there is a time limit of |
45 | 16 | Kerstin Paech | two days. Jobs at this point can only run on 1 node. |
46 | 70 | Martin Kuemmel | * on cosmofgw: |
47 | 70 | Martin Kuemmel | ** *normal* which is the default partition your jobs will be sent to if you do not specify it otherwise. At this point there is a time limit of |
48 | 77 | Martin Kuemmel | four days; this queue comprises the computing nodes euclides06-21; |
49 | 77 | Martin Kuemmel | ** the *lowpri* partition also comprises the computing nodes euclides06-21; it is a so called preempty queue, allowing more resources for the users; however jobs are re-queued (canceled and re-scheduled) if the resources are demanded on the normal queue; |
50 | 1 | Kerstin Paech | |
51 | 38 | Kerstin Paech | The default memory per core used is 2GB, if you need more or less, please specify with the --mem or --mem-per-cpu option. |
52 | 38 | Kerstin Paech | |
53 | 9 | Kerstin Paech | We have also set up a scheduler that goes beyond the first come first serve - some jobs will be favoured over others depending |
54 | 9 | Kerstin Paech | on how much you or your group have been using euclides in the past 2 weeks, how long the job has been queued and how much |
55 | 9 | Kerstin Paech | resources it will consume. |
56 | 9 | Kerstin Paech | |
57 | 9 | Kerstin Paech | This is serves as a starting point, we may have to adjust parameters once the slurm jobmanager is used. Job scheduling is a complex |
58 | 9 | Kerstin Paech | issue and we still need to build expertise and gain experience what are the user needs in our groups. Please feel free to speak out if |
59 | 9 | Kerstin Paech | there is something that can be improved without creating an unfair disadvantage for other users. |
60 | 9 | Kerstin Paech | |
61 | 9 | Kerstin Paech | You can run interactive jobs on both partitions. |
62 | 9 | Kerstin Paech | |
63 | 41 | Kerstin Paech | h3. Running an interactive job with slurm (a.k.a. logging in) |
64 | 1 | Kerstin Paech | |
65 | 9 | Kerstin Paech | To run an interactive job with slurm in the default partition, use |
66 | 1 | Kerstin Paech | |
67 | 1 | Kerstin Paech | <pre> |
68 | 14 | Kerstin Paech | srun -u --pty bash |
69 | 1 | Kerstin Paech | </pre> |
70 | 9 | Kerstin Paech | |
71 | 15 | Shantanu Desai | If you want to use tcsh use |
72 | 15 | Shantanu Desai | |
73 | 15 | Shantanu Desai | <pre> |
74 | 15 | Shantanu Desai | srun -u --pty tcsh |
75 | 15 | Shantanu Desai | </pre> |
76 | 15 | Shantanu Desai | |
77 | 30 | Shantanu Desai | If you want to use a larger memory per job do |
78 | 30 | Shantanu Desai | |
79 | 30 | Shantanu Desai | <pre> |
80 | 31 | Shantanu Desai | srun -u --mem-per-cpu=8000 --pty tcsh |
81 | 30 | Shantanu Desai | </pre> |
82 | 30 | Shantanu Desai | |
83 | 20 | Kerstin Paech | In case you want to open x11 applications, use the --x11=first option, e.g. |
84 | 1 | Kerstin Paech | <pre> |
85 | 20 | Kerstin Paech | srun --x11=first -u --pty bash |
86 | 20 | Kerstin Paech | </pre> |
87 | 20 | Kerstin Paech | |
88 | 71 | Martin Kuemmel | In case the 'normal' partition on cosmofs1 is overcrowded, to use the 'debug' partition, use: |
89 | 9 | Kerstin Paech | <pre> |
90 | 14 | Kerstin Paech | srun --account cosmo_debug -p debug -u --pty bash # if you are part of the Cosmology group |
91 | 14 | Kerstin Paech | srun --account euclid_debug -p debug -u --pty bash # if you are part of the EuclidDM group |
92 | 12 | Kerstin Paech | </pre> As soon as a slot is open, slurm will log you in to an interactive session on one of the nodes. |
93 | 1 | Kerstin Paech | |
94 | 44 | Kerstin Paech | h3. limited ssh access |
95 | 44 | Kerstin Paech | |
96 | 44 | Kerstin Paech | If you have an active job (batch or interactive), you can login to the node the job is running on. Your ssh session will be killed if the job terminates. Your ssh session will be restricted to the same resources as your job (so you cannot accidentally bypass the job scheduler and harm other user's jobs). |
97 | 44 | Kerstin Paech | |
98 | 77 | Martin Kuemmel | h3. Running a simple one core batch job with slurm using the default partition |
99 | 1 | Kerstin Paech | |
100 | 1 | Kerstin Paech | * To see what queues are available to you (called partitions in slurm), run: |
101 | 1 | Kerstin Paech | <pre> |
102 | 1 | Kerstin Paech | sinfo |
103 | 1 | Kerstin Paech | </pre> |
104 | 1 | Kerstin Paech | |
105 | 1 | Kerstin Paech | * To run slurm, create a myjob.slurm containing the following information: |
106 | 1 | Kerstin Paech | <pre> |
107 | 1 | Kerstin Paech | #!/bin/bash |
108 | 1 | Kerstin Paech | #SBATCH --output=slurm.out |
109 | 1 | Kerstin Paech | #SBATCH --error=slurm.err |
110 | 1 | Kerstin Paech | #SBATCH --mail-user <put your email address here> |
111 | 1 | Kerstin Paech | #SBATCH --mail-type=BEGIN |
112 | 8 | Kerstin Paech | #SBATCH -p normal |
113 | 1 | Kerstin Paech | |
114 | 1 | Kerstin Paech | /bin/hostname |
115 | 1 | Kerstin Paech | </pre> |
116 | 1 | Kerstin Paech | |
117 | 1 | Kerstin Paech | * To submit a batch job use: |
118 | 1 | Kerstin Paech | <pre> |
119 | 1 | Kerstin Paech | sbatch myjob.slurm |
120 | 1 | Kerstin Paech | </pre> |
121 | 1 | Kerstin Paech | |
122 | 1 | Kerstin Paech | * To see the status of you job, use |
123 | 1 | Kerstin Paech | <pre> |
124 | 1 | Kerstin Paech | squeue |
125 | 1 | Kerstin Paech | </pre> |
126 | 1 | Kerstin Paech | |
127 | 11 | Kerstin Paech | * To kill a job use: |
128 | 11 | Kerstin Paech | <pre> |
129 | 11 | Kerstin Paech | scancel <jobid> |
130 | 11 | Kerstin Paech | </pre> the <jobid> you can get from using squeue. |
131 | 1 | Kerstin Paech | |
132 | 1 | Kerstin Paech | * For some more information on your job use |
133 | 11 | Kerstin Paech | <pre> |
134 | 1 | Kerstin Paech | scontrol show job <jobid> |
135 | 11 | Kerstin Paech | </pre>the <jobid> you can get from using squeue. |
136 | 1 | Kerstin Paech | |
137 | 77 | Martin Kuemmel | h3. Running a simple once core batch job with slurm using the lowpri partition |
138 | 10 | Kerstin Paech | |
139 | 77 | Martin Kuemmel | Change the partition to lowpri and add the appropriate account depending if you're part of |
140 | 10 | Kerstin Paech | the euclid or cosmology group. |
141 | 10 | Kerstin Paech | |
142 | 10 | Kerstin Paech | <pre> |
143 | 10 | Kerstin Paech | #!/bin/bash |
144 | 10 | Kerstin Paech | #SBATCH --output=slurm.out |
145 | 10 | Kerstin Paech | #SBATCH --error=slurm.err |
146 | 10 | Kerstin Paech | #SBATCH --mail-user <put your email address here> |
147 | 10 | Kerstin Paech | #SBATCH --mail-type=BEGIN |
148 | 77 | Martin Kuemmel | #SBATCH --account=[euclid_lowpri/cosmo_lowpri] |
149 | 77 | Martin Kuemmel | #SBATCH --partition=lowpri |
150 | 10 | Kerstin Paech | |
151 | 10 | Kerstin Paech | /bin/hostname |
152 | 10 | Kerstin Paech | </pre> |
153 | 10 | Kerstin Paech | |
154 | 22 | Kerstin Paech | h3. Accessing a node where a job is running or starting additional processes on a node |
155 | 22 | Kerstin Paech | |
156 | 25 | Kerstin Paech | You can attach an srun command to an already existing job (batch or interactive). This |
157 | 22 | Kerstin Paech | means you can start an interactive session on a node where a job of yours is running |
158 | 26 | Kerstin Paech | or start an additional process. |
159 | 22 | Kerstin Paech | |
160 | 22 | Kerstin Paech | First determine the jobid of the desired job using squeue, then use |
161 | 22 | Kerstin Paech | |
162 | 22 | Kerstin Paech | <pre> |
163 | 22 | Kerstin Paech | srun --jobid <jobid> [options] <executable> |
164 | 22 | Kerstin Paech | </pre> |
165 | 22 | Kerstin Paech | Or more concrete |
166 | 22 | Kerstin Paech | <pre> |
167 | 22 | Kerstin Paech | srun --jobid <jobid> -u --pty bash # to start an interactive session |
168 | 22 | Kerstin Paech | srun --jobid <jobid> ps -eaFAl # to start get detailed process information |
169 | 22 | Kerstin Paech | </pre> |
170 | 22 | Kerstin Paech | |
171 | 24 | Kerstin Paech | The processes will only run on cores that have been allocated to you. This works |
172 | 24 | Kerstin Paech | for batch as well as interactive jobs. |
173 | 23 | Kerstin Paech | *Important: If the original job that was submitted is finished, any process |
174 | 23 | Kerstin Paech | attached in this fashion will be killed.* |
175 | 22 | Kerstin Paech | |
176 | 10 | Kerstin Paech | |
177 | 6 | Kerstin Paech | h3. Batch script for running a multi-core job |
178 | 6 | Kerstin Paech | |
179 | 61 | Martin Kuemmel | mpi is installed on cosmofs1. |
180 | 17 | Kerstin Paech | |
181 | 18 | Kerstin Paech | To run a 4 core job for an executable compiled with mpi you can use |
182 | 6 | Kerstin Paech | <pre> |
183 | 6 | Kerstin Paech | #!/bin/bash |
184 | 6 | Kerstin Paech | #SBATCH --output=slurm.out |
185 | 6 | Kerstin Paech | #SBATCH --error=slurm.err |
186 | 6 | Kerstin Paech | #SBATCH --mail-user <put your email address here> |
187 | 6 | Kerstin Paech | #SBATCH --mail-type=BEGIN |
188 | 6 | Kerstin Paech | #SBATCH -n 4 |
189 | 1 | Kerstin Paech | |
190 | 18 | Kerstin Paech | mpirun <programname> |
191 | 1 | Kerstin Paech | |
192 | 1 | Kerstin Paech | </pre> |
193 | 18 | Kerstin Paech | and it will automatically start on the number of nodes specified. |
194 | 1 | Kerstin Paech | |
195 | 18 | Kerstin Paech | To ensure that the job is being executed on only one node, add |
196 | 18 | Kerstin Paech | <pre> |
197 | 18 | Kerstin Paech | #SBATCH -n 4 |
198 | 18 | Kerstin Paech | </pre> |
199 | 18 | Kerstin Paech | to the job script. |
200 | 17 | Kerstin Paech | |
201 | 19 | Kerstin Paech | If you would like to run a program that itself starts processes, you can use the |
202 | 19 | Kerstin Paech | environment variable $SLURM_NPROCS that is automatically defined for slurm |
203 | 19 | Kerstin Paech | jobs to explicitly pass the number of cores the program can run on. |
204 | 19 | Kerstin Paech | |
205 | 17 | Kerstin Paech | To check if your job is acutally running on the specified number of cores, you can check |
206 | 17 | Kerstin Paech | the PSR column of |
207 | 17 | Kerstin Paech | <pre> |
208 | 17 | Kerstin Paech | ps -eaFAl |
209 | 17 | Kerstin Paech | # or ps -eaFAl | egrep "<yourusername>|UID" if you just want to see your jobs |
210 | 6 | Kerstin Paech | </pre> |
211 | 27 | Jiayi Liu | |
212 | 28 | Kerstin Paech | h3. environment for jobs |
213 | 27 | Jiayi Liu | |
214 | 29 | Kerstin Paech | By default, slurm does not initialize the environment (using .bashrc, .profile, .tcshrc, ...) |
215 | 29 | Kerstin Paech | |
216 | 28 | Kerstin Paech | To use your usual system environment, add the following line in the submission script: |
217 | 27 | Jiayi Liu | <pre> |
218 | 27 | Jiayi Liu | #SBATCH --get-user-env |
219 | 1 | Kerstin Paech | </pre> |
220 | 1 | Kerstin Paech | |
221 | 78 | Martin Kuemmel | h3. Some points on the 'normal' versus 'lowpri' queue on cosmogw |
222 | 78 | Martin Kuemmel | |
223 | 78 | Martin Kuemmel | The allowances for each user on the *normal* partition are 250CPU's and 554700MB, which corresponds to 1/3 of the entire cluster (euclides06-21). In short, every user is allowed to use up to 1/3 of the cluster in the normal partition. |
224 | 78 | Martin Kuemmel | |
225 | 78 | Martin Kuemmel | On the partition *lowpri* (for low priority) there are no limits on the CPU numbers or RAM consumption, meaning the user can take all available resources up to the *entire* cluster! However, jobs on the partition "lowpri" have a lower priority through the so called preemption mechanism. This means if all nodes are busy (partially through the lowpri queue) and an additional job is submitted to the "normal" partition, slurm will re-queue (meaning cancel and re-schedule to the lowpri-queue) job(s) on the "lowpri" partition to get the job on the "normal" partition running. |
226 | 78 | Martin Kuemmel | |
227 | 78 | Martin Kuemmel | Here is an example scenario to illustrate the opportunities the "lowpri" partition offers: |
228 | 78 | Martin Kuemmel | I want to submit a number of jobs for in total 752cpu's. The entire cluster has 752 cpu's in total, this means in the optimal case I get 1/3 of the cluster on the "normal" partition, and it takes at least three cycles to get all my jobs finished. However, if I submit to the "lowpri" partition, in the case of an empty cluster I can use the *entire* cluster and finish in only one cycle. Of course it may happen that other users submit lots of jobs to the "normal" partition afterwards and many of my jobs are being re-queued. That would then delay the finishing of my jobs on the "lowpri" partition correspondingly. To highlight some aspects of using the "lowpri" partition: |
229 | 78 | Martin Kuemmel | |
230 | 78 | Martin Kuemmel | * it is relevant especially when you want to submit several jobs that significantly exceed the user allowance on the "normal" partition and need the entire cluster to get finished; |
231 | 78 | Martin Kuemmel | * on average, the available ressources on the "lowpri" partition are much *larger* than on the "normal" partition, especially during the night or on the weekend; |
232 | 78 | Martin Kuemmel | * please not that *no job gets ever lost* at the "lowpri" partition; if re-queuing occurs, the user gets an email (Subject: "SLURM Job_id=2563 Name=test_mpi_gather.slurm Failed, Run time 00:01:58, PREEMPTED, ExitCode 0") when the job is stopped and subsequently when it starts again and when it finishes (see 1.); |
233 | 78 | Martin Kuemmel | * also on the "lowpri" partition there is a queue which decides which job comes first (of course only in the case of an oversubscription); |
234 | 78 | Martin Kuemmel | * the preemption mechanism tries to minimize the number of re-queued jobs necessary to get the job in the "normal" partition going; so, if 8 cpus are requested and the "lowpri" partion contains one job using 8 cpus, three jobs using 4 cpus and several dozens jobs using 1 cpu, only the job with 8 cpus is re-scheduled independent on the run times and other parameters. |
235 | 78 | Martin Kuemmel | |
236 | 79 | Martin Kuemmel | To submit a job to the "lowpri" partition please insert the following lines into the slurm batch script (see also example above): |
237 | 79 | Martin Kuemmel | <pre> |
238 | 79 | Martin Kuemmel | #SBATCH --account=<your_acount> |
239 | 79 | Martin Kuemmel | #SBATCH -p lowpri |
240 | 79 | Martin Kuemmel | </pre> |
241 | 79 | Martin Kuemmel | |
242 | 79 | Martin Kuemmel | with <your_acount> being either "cosmo_lowpri" or "euclid_lowpri". |
243 | 79 | Martin Kuemmel | |
244 | 80 | Martin Kuemmel | There are two typical scenarios where a user can gain from the lowpri queue: |
245 | 80 | Martin Kuemmel | * if a job stores intermediate results at regular intervals and picks up from there once started again; then even a long job looses only the computing time since the last storage point if a job is re-scheduled; |
246 | 80 | Martin Kuemmel | * if a single job needs only a small amount of computing time (perhaps <12h) but a lot of jobs need to be run; then the loss of computing time is rather small if a job is re-scheduled; |
247 | 80 | Martin Kuemmel | |
248 | 58 | Martin Kuemmel | h2. desdb node |
249 | 58 | Martin Kuemmel | |
250 | 58 | Martin Kuemmel | Some specific jobs in cosmodb, such as the "catalog ingest", need to be performed on the machines desdb1/2. For those jobs there is the slurm account "euclid_cat_ing" with the partition "cat_ing". Only selected persons from the Euclid group have access to this node. Please specify "-p cat_ing" and "--account euclid_cat_ing" on the command line or in the slurm script. |
251 | 28 | Kerstin Paech | |
252 | 28 | Kerstin Paech | h2. Software specific setup |
253 | 28 | Kerstin Paech | |
254 | 28 | Kerstin Paech | h3. Python environment |
255 | 28 | Kerstin Paech | |
256 | 28 | Kerstin Paech | You can use the python 2.7.3 installed on the euclides cluster by using |
257 | 27 | Jiayi Liu | |
258 | 27 | Jiayi Liu | <pre> |
259 | 27 | Jiayi Liu | source /data2/users/ccsoft/etc/setup_all |
260 | 37 | Kerstin Paech | source /data2/users/ccsoft/etc/setup_python2.7.3 |
261 | 33 | Shantanu Desai | </pre> |
262 | 32 | Shantanu Desai | |
263 | 32 | Shantanu Desai | |
264 | 34 | Shantanu Desai | h2. Notes For Euclid users |
265 | 32 | Shantanu Desai | |
266 | 35 | Shantanu Desai | For those submitting jobs to euclides* nodes through Cosmo DM pipeline here are some things which need to be specified for customized job submissions, |
267 | 35 | Shantanu Desai | since a different interface to slurm is used. |
268 | 34 | Shantanu Desai | |
269 | 34 | Shantanu Desai | * To use larger memory per block , specify max_memory = 6000 (for 6G) and so on. inside block definition or in the submit file (in |
270 | 34 | Shantanu Desai | case you want to use it for all blocks) |
271 | 34 | Shantanu Desai | |
272 | 34 | Shantanu Desai | * If you want to run on multiple cores/cores then use |
273 | 34 | Shantanu Desai | nodes='<number of nodes>:ppn=<number of cores> inside the block definition of a particular block or in the submit file in case you want |
274 | 1 | Kerstin Paech | to use it for all blocks. |
275 | 34 | Shantanu Desai | |
276 | 35 | Shantanu Desai | * If you want to use a larger wall time then specify wall_mod=<wall time in minutes> inside the module definition |
277 | 39 | Shantanu Desai | |
278 | 61 | Martin Kuemmel | * note that queue=serial does not work on cosmofs1 (we usually use it for c2pap) |
279 | 45 | Roy Henderson | |
280 | 45 | Roy Henderson | h1. Admin |
281 | 45 | Roy Henderson | |
282 | 49 | Martin Kuemmel | There is a user "slurm" which however is not really necessary for the administration work. The slurm administrator needs sudo access. Some script for adding a user and similar things are in "/data1/users/slurm". With the sudo access the admin can execute those scripts. In the mysql database there is the username "slurmdb" with password. |
283 | 48 | Martin Kuemmel | |
284 | 63 | Martin Kuemmel | |
285 | 63 | Martin Kuemmel | h2. Slurm configuration |
286 | 63 | Martin Kuemmel | |
287 | 63 | Martin Kuemmel | h3. Slurm configuration file |
288 | 63 | Martin Kuemmel | |
289 | 72 | Martin Kuemmel | The currently valid version of the configuration file are "/data1/users/slurm/slurm.conf" and "/data1/users/slurm/cosmo/slurm.conf" on cosmofs1 and cosmogw, respectively. To apply a modified slurm configuration, the script "newconfig.sh" can be used. |
290 | 63 | Martin Kuemmel | |
291 | 63 | Martin Kuemmel | The script |
292 | 63 | Martin Kuemmel | |
293 | 63 | Martin Kuemmel | * copies the configuration file to the submit node and restarts the submit service; |
294 | 63 | Martin Kuemmel | * copies the configuration file to all computing nodes and triggers the reconfiguration there; |
295 | 63 | Martin Kuemmel | |
296 | 1 | Kerstin Paech | Then the slurm daemon needs to be started on the submit and all computing nodes with the script "restart.sh". |
297 | 72 | Martin Kuemmel | |
298 | 72 | Martin Kuemmel | *Note:* Right now the slurmd deamons do not properly start on cosmogw. Even if the start fails, the slurmd daemon is there and working. |
299 | 72 | Martin Kuemmel | |
300 | 63 | Martin Kuemmel | |
301 | 62 | Martin Kuemmel | h2. User management |
302 | 1 | Kerstin Paech | |
303 | 62 | Martin Kuemmel | h3. Overview over users, accounts, etc. |
304 | 62 | Martin Kuemmel | |
305 | 50 | Sebastian Bocquet | No sudo access needed: |
306 | 50 | Sebastian Bocquet | <pre> |
307 | 50 | Sebastian Bocquet | /usr/local/bin/sacctmgr show account withassoc |
308 | 1 | Kerstin Paech | </pre> |
309 | 1 | Kerstin Paech | |
310 | 62 | Martin Kuemmel | h3. Adding a new user |
311 | 45 | Roy Henderson | |
312 | 62 | Martin Kuemmel | As root on @cosmofs1@, |
313 | 45 | Roy Henderson | |
314 | 45 | Roy Henderson | <pre> |
315 | 55 | Sebastian Bocquet | cd /data1/users/slurm/ |
316 | 1 | Kerstin Paech | ./add_user.sh UserName account(cosmo or euclid) |
317 | 45 | Roy Henderson | /usr/local/bin/.scontrol reconfigure |
318 | 45 | Roy Henderson | </pre> |
319 | 62 | Martin Kuemmel | |
320 | 45 | Roy Henderson | h3. To increase memory, cores etc for a user |
321 | 45 | Roy Henderson | |
322 | 45 | Roy Henderson | Inside script above, various commands for changing user settings, e.g. |
323 | 1 | Kerstin Paech | |
324 | 1 | Kerstin Paech | <pre> |
325 | 1 | Kerstin Paech | /usr/local/bin/sacctmgr -i modify user name=$1 set GrpCPUs=32 |
326 | 45 | Roy Henderson | /usr/local/bin/sacctmgr -i modify user name=$1 set GrpMem=128000 |
327 | 45 | Roy Henderson | </pre> |
328 | 62 | Martin Kuemmel | |
329 | 62 | Martin Kuemmel | h2. Trouble shooting |
330 | 1 | Kerstin Paech | |
331 | 63 | Martin Kuemmel | h3. Information on a particular node |
332 | 1 | Kerstin Paech | |
333 | 63 | Martin Kuemmel | The command "/usr/local/bin/scontrol show node <nodename>" gives detailed information on a particular node (status, reason for being down and so on) |
334 | 63 | Martin Kuemmel | |
335 | 63 | Martin Kuemmel | h3. Node in state "drain" |
336 | 63 | Martin Kuemmel | |
337 | 50 | Sebastian Bocquet | When a node is in "drain" state when calling <pre>sinfo</pre> |
338 | 50 | Sebastian Bocquet | run |
339 | 50 | Sebastian Bocquet | <pre> |
340 | 50 | Sebastian Bocquet | /usr/local/bin/scontrol update nodename=NODE_NAME state=resume |
341 | 50 | Sebastian Bocquet | </pre> |
342 | 50 | Sebastian Bocquet | to put it back to operation. |
343 | 48 | Martin Kuemmel | |
344 | 48 | Martin Kuemmel | h2. Nodes down |
345 | 48 | Martin Kuemmel | |
346 | 1 | Kerstin Paech | Sometimes nodes are reported as "down". This seems to happen as a result of network problems. Here is some "troubleshooting":https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes for this situation. Also after a re-boot of cosmofs1 some manual work on slurm might be necessary to get going again. |
347 | 63 | Martin Kuemmel | |
348 | 76 | Martin Kuemmel | If a job does not finish and remains int eh state "CG" then the sequence: |
349 | 76 | Martin Kuemmel | <pre> |
350 | 76 | Martin Kuemmel | /usr/local/bin/scontrol update NodeName=euclides13-os State=down Reason=hung_proc |
351 | 76 | Martin Kuemmel | /usr/local/bin/scontrol update NodeName=euclides13-os State=resume Reason=hung_proc |
352 | 76 | Martin Kuemmel | </pre> |
353 | 76 | Martin Kuemmel | brings the node back again. |
354 | 76 | Martin Kuemmel | |
355 | 1 | Kerstin Paech | h2. History |
356 | 65 | Martin Kuemmel | |
357 | 81 | Martin Kuemmel | * January 23rd 2018: Jobs on euclides12 are no longer finishing. They end up in the state "CG" and hang there forever. In the slurmd log there is the entry "[2018-01-23T10:12:17.477] [18153] error: Unable to establish controller machine" basically every 15mins or so. ssh from euclides12 to cosmogw via name and IP address was possible, so it is difficult to interpret this error message. At the end the problem was solved by: |
358 | 81 | Martin Kuemmel | ** stopping slurmd |
359 | 81 | Martin Kuemmel | ** removing /var/run/slurmd.pid |
360 | 81 | Martin Kuemmel | ** creating /var/run/slurmd.pid via touch |
361 | 81 | Martin Kuemmel | ** re-starting slurmd again |
362 | 81 | Martin Kuemmel | |
363 | 73 | Martin Kuemmel | * May 18th 2017: On cosmogw, three nodes were reported as "DOWN" despite running the slurmd daemon and having connections to the slurmctl daemon on the control node; turns out that with a normal "/etc/init.d/slurm start" on the control machine only nodes are considered that are *not* DOWN; "/etc/init.d/slurm startclean" must be used to establish new connections to all nodes to take them back into the queue; |
364 | 73 | Martin Kuemmel | |
365 | 66 | Martin Kuemmel | * May 2nd 2017: the control daemon on cosmofs1 was no longer working; also it could not e re-started; the corresponding commands "/etc/init.d/slurm status/start" were not giving back any kind of feedback, the log files were empty; the relevant daemon on the nodes "slurmd", was running smoothly; a comparison revealed that the difference was whether the command "/usr/local/bin/scontrol show daemon" does return the daemon name or nothing, and in the later case nothing happens and the daemon does not run well; further investigation showed that the machine name given in "slurm.conf" as "ControlMachine=" needs to be identical to the name returned of the command "hostname"; this was no longer the case, likely induced due to moving the machines to the new sub-net (the exact mechanism is unclear); |
366 | 66 | Martin Kuemmel | |
367 | 65 | Martin Kuemmel | * April 24th 2017: taking euclides11 out of the queues to free it for the new OS and the slurm test on it; euclides10 is now the development node; |
368 | 63 | Martin Kuemmel | |
369 | 63 | Martin Kuemmel | * April 07th 2017: Applying "/usr/local/bin/scontrol show node euclides11" for the debug partition euclides11 says "Reason=Node unexpectedly rebooted [root@2016-12-14T13:25:01]"; internet research suggested to change "ReturnToService=" from 1 to 2 in the configuration file; after applying and restarting the new configuration file the debug nodes works again.; |
370 | 63 | Martin Kuemmel | |
371 | 63 | Martin Kuemmel | * April 06th 2017: After the reconfiguration of the cluster the slurm confguration file was adjusted (to reflect the new machine names); also minor changes had to be applied to the scripts "newconfig.sh" and "restart.sh" to loop over the new names; the new configuration files were applied and slurm restarted; all computing nodes for the normal partition came up, the debug partition stayed down; |
372 | 63 | Martin Kuemmel | |
373 | 63 | Martin Kuemmel | * March 29th 2017: euclides7 is in drain state; "/usr/local/bin/scontrol show node euclides2" says "Reason=Epilog error"; when resumed, seems to work normal; |
374 | 63 | Martin Kuemmel | |
375 | 63 | Martin Kuemmel | * March 28th 2017: euclides2 is in drain state; when resumed, it goes into drain state when using it the next time; "/usr/local/bin/scontrol show node euclides2" says "Reason=Prolog error"; after a reboot the machine was in status "idle*"; when resumed, it worked again; |