Version 57 - History - Slurm - Cluster Cosmology - Redmine

Slurm » History » Version 57

Martin Kuemmel, 07/26/2016 12:09 PM

-Kerstin Paech
+{{toc}}
 Kerstin Paech
-Sebastian Bocquet
+h1. Hardware overview
 Sebastian Bocquet
-Sebastian Bocquet
+You access the Euclid cluster through alexandria@usm.uni-muenchen.de
 Sebastian Bocquet
-Sebastian Bocquet
+* alexandria is the file server and should not be used for computing
-Sebastian Bocquet
+* There are 12 compute nodes named euclides1--euclides12
-Sebastian Bocquet
+* euclides8 hosts a virtual machine and is not available for computing
-Sebastian Bocquet
+* euclides12 is only available for debugging, see below
-Martin Kuemmel
+* euclides11 is currently used to test a different OS
-Sebastian Bocquet
+* each node has 32 logical CPUs and 64GB of RAM
 Sebastian Bocquet
-Roy Henderson
+h1. How to run jobs on the euclides nodes (using Slurm)
 Kerstin Paech
-Kerstin Paech
+Use slurm to submit jobs or login to the euclides nodes (euclides1-12).
 Kerstin Paech
-Kerstin Paech
+*Please read through this entire wikipage so everyone can make efficient use of this cluster*
 Kerstin Paech
-Kerstin Paech
+h2. alexandria
 Kerstin Paech
-Kerstin Paech
+*Please do not use alexandria as a compute node* - it's hardware is different from the nodes. It hosts our file server and other services that are important to us.
 Kerstin Paech
-Kerstin Paech
+You should use alexandria to
-Kerstin Paech
+* transfer files
-Sebastian Bocquet
+* compile your code
-Sebastian Bocquet
+* submit jobs to the nodes
 Sebastian Bocquet
-Sebastian Bocquet
+If you need to debug and would like to login to a node, please start an interactive job to one of the nodes using slurm. For instructions see below.
 Sebastian Bocquet
-Sebastian Bocquet
+h2. euclides nodes
 Sebastian Bocquet
 Kerstin Paech
-Kerstin Paech
+Job submission to the euclides nodes is handled by the slurm jobmanager (see http://slurm.schedmd.com and https://computing.llnl.gov/linux/slurm/).
-Sebastian Bocquet
+*Important: In order to run jobs, you need to be added to the slurm accounting system - please contact the admin*
 Kerstin Paech
-Kerstin Paech
+All slurm commands listed below have very helpful man pages (e.g. man slurm, man squeue, ...).
 Kerstin Paech
-Kerstin Paech
+If you are already familiar with another jobmanager the following information may be helpful to you http://slurm.schedmd.com/rosetta.pdf‎.
 Kerstin Paech
-Kerstin Paech
+h3. Scheduling of Jobs
 Kerstin Paech
-Kerstin Paech
+At this point there are two queues, called partitions in slurm:
-Kerstin Paech
+* *normal* which is the default partition your jobs will be sent to if you do not specify it otherwise. At this point there is a time limit of
-Kerstin Paech
+two days. Jobs at this point can only run on 1 node.
-Kerstin Paech
+* *debug* which is meant for debugging, you can only run one job at a time, other jobs submitted will remain in the queue. Time limit is
-Kerstin Paech
+hours.
 Kerstin Paech
-Kerstin Paech
+The default memory per core used is 2GB, if you need more or less, please specify with the --mem or --mem-per-cpu option.
 Kerstin Paech
-Kerstin Paech
+We have also set up a scheduler that goes beyond the first come first serve - some jobs will be favoured over others depending
-Kerstin Paech
+on how much you or your group have been using euclides in the past 2 weeks, how long the job has been queued and how much
-Kerstin Paech
+resources it will consume.
 Kerstin Paech
-Kerstin Paech
+This is serves as a starting point, we may have to adjust parameters once the slurm jobmanager is used. Job scheduling is a complex
-Kerstin Paech
+issue and we still need to build expertise and gain experience what are the user needs in our groups. Please feel free to speak out if
-Kerstin Paech
+there is something that can be improved without creating an unfair disadvantage for other users.
 Kerstin Paech
-Kerstin Paech
+You can run interactive jobs on both partitions.
 Kerstin Paech
-Kerstin Paech
+h3. Running an interactive job with slurm (a.k.a. logging in)
 Kerstin Paech
-Kerstin Paech
+To run an interactive job with slurm in the default partition, use
 Kerstin Paech
-Kerstin Paech
+<pre>
-Kerstin Paech
+srun -u --pty bash
-Kerstin Paech
+</pre>
 Kerstin Paech
-Shantanu Desai
+If you want to use tcsh use
 Shantanu Desai
-Shantanu Desai
+<pre>
-Shantanu Desai
+srun -u --pty tcsh
-Shantanu Desai
+</pre>
 Shantanu Desai
-Shantanu Desai
+If you want to use a larger memory per job do
 Shantanu Desai
-Shantanu Desai
+<pre>
-Shantanu Desai
+srun -u --mem-per-cpu=8000 --pty tcsh
-Shantanu Desai
+</pre>
 Shantanu Desai
-Kerstin Paech
+In case you want to open x11 applications, use the --x11=first option, e.g.
-Kerstin Paech
+<pre>
-Kerstin Paech
+srun --x11=first -u   --pty  bash
-Kerstin Paech
+</pre>
 Kerstin Paech
-Kerstin Paech
+In case the 'normal' partition is overcrowded, to use the 'debug' partition, use:
-Kerstin Paech
+<pre>
-Kerstin Paech
+srun --account cosmo_debug -p debug -u --pty bash # if you are part of the Cosmology group
-Kerstin Paech
+srun --account euclid_debug -p debug -u --pty bash  # if you are part of the EuclidDM group
-Kerstin Paech
+</pre> As soon as a slot is open, slurm will log you in to an interactive session on one of the nodes.
 Kerstin Paech
-Kerstin Paech
+h3. limited ssh access
 Kerstin Paech
-Kerstin Paech
+If you have an active job (batch or interactive), you can login to the node the job is running on. Your ssh session will be killed if the job terminates. Your ssh session will be restricted to the same resources as your job (so you cannot accidentally bypass the job scheduler and harm other user's jobs).
 Kerstin Paech
-Kerstin Paech
+h3. Running a simple once core batch job with slurm using the default partition
 Kerstin Paech
-Kerstin Paech
+* To see what queues are available to you (called partitions in slurm), run:
-Kerstin Paech
+<pre>
-Kerstin Paech
+sinfo
-Kerstin Paech
+</pre>
 Kerstin Paech
-Kerstin Paech
+* To run slurm, create a myjob.slurm containing the following information:
-Kerstin Paech
+<pre>
-Kerstin Paech
+#!/bin/bash
-Kerstin Paech
+#SBATCH --output=slurm.out
-Kerstin Paech
+#SBATCH --error=slurm.err
-Kerstin Paech
+#SBATCH --mail-user <put your email address here>
-Kerstin Paech
+#SBATCH --mail-type=BEGIN
-Kerstin Paech
+#SBATCH -p normal
 Kerstin Paech
-Kerstin Paech
+/bin/hostname
-Kerstin Paech
+</pre>
 Kerstin Paech
-Kerstin Paech
+* To submit a batch job use:
-Kerstin Paech
+<pre>
-Kerstin Paech
+sbatch myjob.slurm
-Kerstin Paech
+</pre>
 Kerstin Paech
-Kerstin Paech
+* To see the status of you job, use
-Kerstin Paech
+<pre>
-Kerstin Paech
+squeue
-Kerstin Paech
+</pre>
 Kerstin Paech
-Kerstin Paech
+* To kill a job use:
-Kerstin Paech
+<pre>
-Kerstin Paech
+scancel <jobid>
-Kerstin Paech
+</pre> the <jobid> you can get from using squeue.
 Kerstin Paech
-Kerstin Paech
+* For some more information on your job use
-Kerstin Paech
+<pre>
-Kerstin Paech
+scontrol show job <jobid>
-Kerstin Paech
+</pre>the <jobid> you can get from using squeue.
 Kerstin Paech
-Kerstin Paech
+h3. Running a simple once core batch job with slurm using the debug partition
 Kerstin Paech
-Kerstin Paech
+Change the partition to debug and add the appropriate account depending if you're part of
-Kerstin Paech
+the euclid or cosmology group.
 Kerstin Paech
-Kerstin Paech
+<pre>
-Kerstin Paech
+#!/bin/bash
-Kerstin Paech
+#SBATCH --output=slurm.out
-Kerstin Paech
+#SBATCH --error=slurm.err
-Kerstin Paech
+#SBATCH --mail-user <put your email address here>
-Kerstin Paech
+#SBATCH --mail-type=BEGIN
-Martin Kuemmel
+#SBATCH --account [cosmo_debug/euclid_debug]
-Kerstin Paech
+#SBATCH -p debug
 Kerstin Paech
-Kerstin Paech
+/bin/hostname
-Kerstin Paech
+</pre>
 Kerstin Paech
-Kerstin Paech
+h3. Accessing a node where a job is running or starting additional processes on a node
 Kerstin Paech
-Kerstin Paech
+You can attach an srun command to an already existing job (batch or interactive). This
-Kerstin Paech
+means you can start an interactive session on a node where a job of yours is running
-Kerstin Paech
+or start an additional process.
 Kerstin Paech
-Kerstin Paech
+First determine the jobid of the desired job using squeue, then use
 Kerstin Paech
-Kerstin Paech
+<pre>
-Kerstin Paech
+srun  --jobid <jobid> [options] <executable>
-Kerstin Paech
+</pre>
-Kerstin Paech
+Or more concrete
-Kerstin Paech
+<pre>
-Kerstin Paech
+srun  --jobid <jobid> -u --pty  bash # to start an interactive session
-Kerstin Paech
+srun  --jobid <jobid> ps -eaFAl  # to start get detailed process information
-Kerstin Paech
+</pre>
 Kerstin Paech
-Kerstin Paech
+The processes will only run on cores that have been allocated to you. This works
-Kerstin Paech
+for batch as well as interactive jobs.
-Kerstin Paech
+*Important: If the original job that was submitted is finished, any process
-Kerstin Paech
+attached in this fashion will be killed.*
 Kerstin Paech
 Kerstin Paech
-Kerstin Paech
+h3. Batch script for running a multi-core job
 Kerstin Paech
-Kerstin Paech
+mpi is installed on alexandria.
 Kerstin Paech
-Kerstin Paech
+To run a 4 core job for an executable compiled with mpi you can use
-Kerstin Paech
+<pre>
-Kerstin Paech
+#!/bin/bash
-Kerstin Paech
+#SBATCH --output=slurm.out
-Kerstin Paech
+#SBATCH --error=slurm.err
-Kerstin Paech
+#SBATCH --mail-user <put your email address here>
-Kerstin Paech
+#SBATCH --mail-type=BEGIN
-Kerstin Paech
+#SBATCH -n 4
 Kerstin Paech
-Kerstin Paech
+mpirun <programname>
 Kerstin Paech
-Kerstin Paech
+</pre>
-Kerstin Paech
+and it will automatically start on the number of nodes specified.
 Kerstin Paech
-Kerstin Paech
+To ensure that the job is being executed on only one node, add
-Kerstin Paech
+<pre>
-Kerstin Paech
+#SBATCH -n 4
-Kerstin Paech
+</pre>
-Kerstin Paech
+to the job script.
 Kerstin Paech
-Kerstin Paech
+If you would like to run a program that itself starts processes, you can use the
-Kerstin Paech
+environment variable $SLURM_NPROCS that is automatically defined for slurm
-Kerstin Paech
+jobs to explicitly pass the number of cores the program can run on.
 Kerstin Paech
-Kerstin Paech
+To check if your job is acutally running on the specified number of cores, you can check
-Kerstin Paech
+the PSR column of
-Kerstin Paech
+<pre>
-Kerstin Paech
+ps -eaFAl
-Kerstin Paech
+# or ps -eaFAl | egrep "<yourusername>|UID" if you just want to see your jobs
-Kerstin Paech
+</pre>
 Jiayi Liu
-Kerstin Paech
+h3. environment for jobs
 Jiayi Liu
-Kerstin Paech
+By default, slurm does not initialize the environment (using .bashrc, .profile, .tcshrc, ...)
 Kerstin Paech
-Kerstin Paech
+To use your usual system environment, add the following line in the submission script:
-Jiayi Liu
+<pre>
-Jiayi Liu
+#SBATCH --get-user-env
-Kerstin Paech
+</pre>
 Kerstin Paech
 Kerstin Paech
-Kerstin Paech
+h2. Software specific setup
 Kerstin Paech
-Kerstin Paech
+h3. Python environment
 Kerstin Paech
-Kerstin Paech
+You can use the python 2.7.3 installed on the euclides cluster by using
 Jiayi Liu
-Jiayi Liu
+<pre>
-Jiayi Liu
+source /data2/users/ccsoft/etc/setup_all
-Kerstin Paech
+source  /data2/users/ccsoft/etc/setup_python2.7.3
-Shantanu Desai
+</pre>
 Shantanu Desai
 Shantanu Desai
-Shantanu Desai
+h2. Notes For Euclid users
 Shantanu Desai
-Shantanu Desai
+For those submitting jobs to euclides* nodes through Cosmo DM pipeline  here are some things which need to be specified for customized job submissions,
-Shantanu Desai
+since a different interface to slurm is used.
 Shantanu Desai
-Shantanu Desai
+* To use larger memory per block , specify max_memory = 6000 (for 6G) and so on. inside block definition or in the submit file (in
-Shantanu Desai
+case you want to use it for all blocks)
 Shantanu Desai
-Shantanu Desai
+* If you want to run on multiple cores/cores then use
-Shantanu Desai
+nodes='<number of nodes>:ppn=<number of cores> inside the block definition of a particular block or in the submit file in case you want
-Kerstin Paech
+to use it for all blocks.
 Shantanu Desai
-Shantanu Desai
+* If you want to use a larger wall time then specify wall_mod=<wall time in minutes> inside the module definition
 Shantanu Desai
-Shantanu Desai
+* note that queue=serial does not work on alexandria(we usually use it for c2pap)
 Roy Henderson
-Roy Henderson
+h1. Admin
 Roy Henderson
-Martin Kuemmel
+There is a user "slurm" which however is not really necessary for the administration work. The slurm administrator needs sudo access. Some script for adding a user and similar things are in "/data1/users/slurm". With the sudo access the admin can execute those scripts. In the mysql database there is the username "slurmdb" with password.
 Martin Kuemmel
-Sebastian Bocquet
+h2. Overview over users, accounts, etc.
 Sebastian Bocquet
-Sebastian Bocquet
+No sudo access needed:
-Sebastian Bocquet
+<pre>
-Sebastian Bocquet
+/usr/local/bin/sacctmgr show account withassoc
-Sebastian Bocquet
+</pre>
 Sebastian Bocquet
-Roy Henderson
+h2. Adding a new user
 Roy Henderson
-Roy Henderson
+As root on @alexandria@,
 Roy Henderson
-Roy Henderson
+<pre>
-Roy Henderson
+cd /data1/users/slurm/
-Sebastian Bocquet
+./add_user.sh UserName account(cosmo or euclid)
-Sebastian Bocquet
+/usr/local/bin/.scontrol reconfigure
-Roy Henderson
+</pre>
 Roy Henderson
-Roy Henderson
+h2. To increase memory, cores etc for a user
 Roy Henderson
-Roy Henderson
+Inside script above, various commands for changing user settings, e.g.
 Roy Henderson
-Roy Henderson
+<pre>
-Roy Henderson
+/usr/local/bin/sacctmgr -i modify user  name=$1 set GrpCPUs=32
-Roy Henderson
+/usr/local/bin/sacctmgr -i modify user  name=$1 set GrpMem=128000
-Roy Henderson
+</pre>
 Sebastian Bocquet
-Sebastian Bocquet
+h2. Node state "drain"
 Sebastian Bocquet
-Sebastian Bocquet
+When a node is in "drain" state when calling <pre>sinfo</pre>
-Sebastian Bocquet
+run
-Sebastian Bocquet
+<pre>
-Sebastian Bocquet
+/usr/local/bin/scontrol update nodename=NODE_NAME state=resume
-Sebastian Bocquet
+</pre>
-Sebastian Bocquet
+to put it back to operation.
 Martin Kuemmel
-Martin Kuemmel
+h2. Nodes down
 Martin Kuemmel
-Martin Kuemmel
+Sometimes nodes are reported as "down". This seems to happen as a result of network problems. Here is some "troubleshooting":https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes for this situation. Also after a re-boot of alexandria some manual work on slurm might be necessary to get going again.

Project

General

Profile

Cluster Cosmology

Slurm » History » Version 57