Project

General

Profile

New cluster » History » Version 21

Martin Kuemmel, 07/04/2023 08:49 AM

1 15 Igor Zinchenko
h1. New computing cluster in Koeniginstrasse
2 1 Martin Kuemmel
3 1 Martin Kuemmel
h2. Introduction
4 1 Martin Kuemmel
5 15 Igor Zinchenko
Since January 2022 we have a new computing cluster which is installed int he server room of the physiscs department at Koeniginstrasse.
6 1 Martin Kuemmel
7 3 Martin Kuemmel
h2. Hardware
8 3 Martin Kuemmel
9 16 Martin Kuemmel
* there are in total 9 compute nodes available;
10 16 Martin Kuemmel
* eight nodes are named "usm-cl-bt01n[1-4]" and "usm-cl-bt02n[1-4]";
11 16 Martin Kuemmel
* there is one new node (Nov. 2022) named "usm-cl-1024us01";
12 16 Martin Kuemmel
* each node usm-cl-bt[01, 02]n[1-4] has 128 logical cores (64 physical cores) and 512GB RAM available;
13 16 Martin Kuemmel
* the node "usm-cl-1024us01" has 256 logical (126 physical cores) and 1024GB RAM available;
14 15 Igor Zinchenko
* one storage for our group has 686Tb (/project/ls-mohr);
15 3 Martin Kuemmel
16 1 Martin Kuemmel
h2. Login
17 1 Martin Kuemmel
18 19 Martin Kuemmel
* public login server (for non-graphic, e.g. ssh): login.physik.uni-muenchen.de;
19 1 Martin Kuemmel
* Jupyterhub: https://jupyter.physik.uni-muenchen.de;
20 19 Martin Kuemmel
* both the server and the Jupyterhub require a two-factor-authentication;
21 19 Martin Kuemmel
* for the second factor you need to register with a smartphone app such as Google Authenticator (or any other app that generates time-based one-time-passwords) here: https://otp.physik.uni-muenchen.de. You need to create a so called "soft-token".
22 19 Martin Kuemmel
* for all logins you need to provide:
23 19 Martin Kuemmel
** the user name of your *physics account*;
24 19 Martin Kuemmel
** the pwd of your *physics account*;
25 19 Martin Kuemmel
** the 6 digit number (soft-token) you read from the smartphone app;
26 1 Martin Kuemmel
27 13 Martin Kuemmel
h2. Graphic Remote Login
28 13 Martin Kuemmel
29 13 Martin Kuemmel
A graphical remote login from outside the LMU network require a VPN connection. From June 2022 the only VPN connection  is provided by "eduVPN":https://doku.lrz.de/display/PUBLIC/VPN+-+eduVPN+-+Installation+und+Konfiguration. After establishing a VPN connection the login is then done with X2GO as explained "here":https://www.en.it.physik.uni-muenchen.de/dienste/netzwerk/rechnerzugriff/zugriff3/remote_login/index.html. I was pointed to using the following logins:
30 13 Martin Kuemmel
* cip-sv-login01.cip.physik.uni-muenchen.de
31 13 Martin Kuemmel
* cip-sv-login02.cip.physik.uni-muenchen.de
32 13 Martin Kuemmel
33 20 Martin Kuemmel
but I am assuming the other connections recommended on the web page of the physics department (e.g. Garching) work as well. X2GO opens a KDE desktop, and of course the machine can connect to our cluster.
34 1 Martin Kuemmel
35 1 Martin Kuemmel
h2. Processing
36 1 Martin Kuemmel
37 1 Martin Kuemmel
* as on our local cluster "slurm" is being used as the job scheduling system. Access to the computing nodes and running jobs requires starting a corresponding slurm job;
38 1 Martin Kuemmel
* the partition of our cluster is "usm-cl";
39 1 Martin Kuemmel
* from the login node you can start an interactive job via "intjob --partition=usm-cl" (additional slurm arguments are accepted as well);
40 8 Martin Kuemmel
* I created a "python script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/285/scontrol.py which provides information on our partition (which jobs are running on which node, the owner of the job and so on);
41 11 Martin Kuemmel
* I have also put together a rather silly "slurm script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/283/test.slurm which can be used as a starting point;
42 11 Martin Kuemmel
* note that it is possible to directly "ssh" to all nodes on which one of your batch jobs is running. This can help to supervise the processing;
43 1 Martin Kuemmel
44 1 Martin Kuemmel
h2. Disk space
45 1 Martin Kuemmel
46 1 Martin Kuemmel
* users can create their own disk space under "/project/ls-mohr/users/" such as "/project/ls-mohr/users/martin.kuemmel";
47 1 Martin Kuemmel
48 1 Martin Kuemmel
h2. Installed software
49 1 Martin Kuemmel
50 1 Martin Kuemmel
We use a package manager called spack to download and install software that is not directly available from the linux distribution. To see what is already installed, do the following on a computing node:
51 1 Martin Kuemmel
52 1 Martin Kuemmel
* "module load spack"
53 1 Martin Kuemmel
* "module avail"
54 1 Martin Kuemmel
55 1 Martin Kuemmel
Adding more software is not a problem.
56 9 Martin Kuemmel
57 10 Martin Kuemmel
h2. Euclid processing on the cluster
58 10 Martin Kuemmel
59 10 Martin Kuemmel
While OS, libraries and setup is different from EDEN-?.?, it is possible to load and run in an EDEN-3.0 environment using a container solution. The cluster offers "singularity":https://sylabs.io/guides/3.0/user-guide/quick_start.html as a container solution. While singularity is not officially supported in Euclid, it is being used in a limited role, and singularity is able to run docker images, which is the supported container format in Euclid. To work in an EDEN-3.0 on the new cluster you need to get the docker image doing:
60 10 Martin Kuemmel
* load singularity via:
61 10 Martin Kuemmel
  <pre>
62 10 Martin Kuemmel
  $ module load spack
63 10 Martin Kuemmel
  $ module load singularity</pre> Note that the singularity version which is directly available on the computing nodes at "/usr/bin/singularity" does *not* work. The correct version loaded via the modules is at "/software/opt/focal/x86_64/singularity/v3.8.1/bin/singularity".
64 14 Martin Kuemmel
* it is *recommended* to move the singularity cache to somewhere under "/scratch-local", e.g. via:<pre>$ mkdir -p /scratch-local/$USER/singularity
65 14 Martin Kuemmel
$ export SINGULARITY_CACHEDIR=/scratch-local/$USER/singularity</pre> On the default cache location "/home/$HOME/.cache/singularity" there are problems deleting the entire cache when leaving singularity.
66 1 Martin Kuemmel
67 21 Martin Kuemmel
There are docker images available on cvmfs, and one image can be run interactively via:
68 21 Martin Kuemmel
 <pre>singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr /cvmfs/euclid-dev.in2p3.fr/WORKNODE/CentOS7.sif
69 21 Martin Kuemmel
</pre>
70 10 Martin Kuemmel
It is also possible to directly issue a command in EDEN-3.0:
71 21 Martin Kuemmel
 <pre>$ singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr /cvmfs/euclid-dev.in2p3.fr/WORKNODE/CentOS7.sif <command_name></pre>
72 12 Martin Kuemmel
In both cases the relevant EDEN environment must first be loaded with:
73 10 Martin Kuemmel
<pre>
74 10 Martin Kuemmel
$ source /cvmfs/euclid-dev.in2p3.fr/CentOS7/EDEN-3.0/bin/activate
75 21 Martin Kuemmel
$ source /cvmfs/euclid-dev.in2p3.fr/EDEN-3.1/bin/activate
76 10 Martin Kuemmel
</pre>
77 10 Martin Kuemmel
78 10 Martin Kuemmel
Information on the usage of singularity in Euclid is available at the "Euclid Redmine":https://euclid.roe.ac.uk/projects/codeen-users/wiki/EDEN_SINGULARITY.
79 10 Martin Kuemmel
80 17 Martin Kuemmel
h3. Problems with the cvmfs
81 17 Martin Kuemmel
82 17 Martin Kuemmel
All Euclid related software is centrally installed and deployed via cvmfs. This means that on the host machine the two directories:
83 17 Martin Kuemmel
<pre>
84 17 Martin Kuemmel
martin.kuemmel@usm-cl-bt02n4:~$ ls -ltr /cvmfs/
85 1 Martin Kuemmel
drwxr-xr-x 2 cvmfs cvmfs 0 Feb 13 16:09 euclid-dev.in2p3.fr
86 1 Martin Kuemmel
drwxr-xr-x 2 cvmfs cvmfs 0 Feb 13 17:22 euclid.in2p3.fr
87 1 Martin Kuemmel
</pre>
88 1 Martin Kuemmel
*must* exist on the host machine such that they can be mounted in singularity as indicated above. It looks like cvmfs "sometimes get stuck" and needs to be re-installed or re-mounted. I there are problem mounting cvmfs in singularity and the above directories do not exist on the host, please write a ticket to the sysadmins below and they will fix it.
89 21 Martin Kuemmel
90 21 Martin Kuemmel
h3. Old
91 21 Martin Kuemmel
92 21 Martin Kuemmel
In 2022 the docker image could/had to be downloaded via
93 21 Martin Kuemmel
* pull the Euclid docker image via: <pre>singularity pull --docker-login docker://gitlab.euclid-sgs.uk:4567/st-tools/ct_xodeen_builder/dockeen</pre> With the gitlab credentials the docker image is stored in the file "dockeen_latest.sif"
94 21 Martin Kuemmel
95 21 Martin Kuemmel
Now (July 2023) I am not sure whether this is still possible.
96 21 Martin Kuemmel
97 21 Martin Kuemmel
98 17 Martin Kuemmel
99 9 Martin Kuemmel
h2. Support
100 9 Martin Kuemmel
101 9 Martin Kuemmel
Support is provided by the IT support (Rechnerbetriebsgruppe) of the LMU faculty of physics with the helpdesk email: helpdesk@physik.uni-muenchen.de. Please keep Joe Mohr and me (Martin Kuemmel: mkuemmel@usm.lmu.de) in the loop such that we can maintain an overview on the cluster performance.
Redmine Appliance - Powered by TurnKey Linux