New cluster » History » Version 19
Martin Kuemmel, 03/10/2023 09:24 AM
1 | 15 | Igor Zinchenko | h1. New computing cluster in Koeniginstrasse |
---|---|---|---|
2 | 1 | Martin Kuemmel | |
3 | 1 | Martin Kuemmel | h2. Introduction |
4 | 1 | Martin Kuemmel | |
5 | 15 | Igor Zinchenko | Since January 2022 we have a new computing cluster which is installed int he server room of the physiscs department at Koeniginstrasse. |
6 | 1 | Martin Kuemmel | |
7 | 3 | Martin Kuemmel | h2. Hardware |
8 | 3 | Martin Kuemmel | |
9 | 16 | Martin Kuemmel | * there are in total 9 compute nodes available; |
10 | 16 | Martin Kuemmel | * eight nodes are named "usm-cl-bt01n[1-4]" and "usm-cl-bt02n[1-4]"; |
11 | 16 | Martin Kuemmel | * there is one new node (Nov. 2022) named "usm-cl-1024us01"; |
12 | 16 | Martin Kuemmel | * each node usm-cl-bt[01, 02]n[1-4] has 128 logical cores (64 physical cores) and 512GB RAM available; |
13 | 16 | Martin Kuemmel | * the node "usm-cl-1024us01" has 256 logical (126 physical cores) and 1024GB RAM available; |
14 | 15 | Igor Zinchenko | * one storage for our group has 686Tb (/project/ls-mohr); |
15 | 3 | Martin Kuemmel | |
16 | 1 | Martin Kuemmel | h2. Login |
17 | 1 | Martin Kuemmel | |
18 | 19 | Martin Kuemmel | * public login server (for non-graphic, e.g. ssh): login.physik.uni-muenchen.de; |
19 | 1 | Martin Kuemmel | * Jupyterhub: https://jupyter.physik.uni-muenchen.de; |
20 | 19 | Martin Kuemmel | * both the server and the Jupyterhub require a two-factor-authentication; |
21 | 19 | Martin Kuemmel | * for the second factor you need to register with a smartphone app such as Google Authenticator (or any other app that generates time-based one-time-passwords) here: https://otp.physik.uni-muenchen.de. You need to create a so called "soft-token". |
22 | 19 | Martin Kuemmel | * for all logins you need to provide: |
23 | 19 | Martin Kuemmel | ** the user name of your *physics account*; |
24 | 19 | Martin Kuemmel | ** the pwd of your *physics account*; |
25 | 19 | Martin Kuemmel | ** the 6 digit number (soft-token) you read from the smartphone app; |
26 | 1 | Martin Kuemmel | |
27 | 13 | Martin Kuemmel | h2. Graphic Remote Login |
28 | 13 | Martin Kuemmel | |
29 | 13 | Martin Kuemmel | A graphical remote login from outside the LMU network require a VPN connection. From June 2022 the only VPN connection is provided by "eduVPN":https://doku.lrz.de/display/PUBLIC/VPN+-+eduVPN+-+Installation+und+Konfiguration. After establishing a VPN connection the login is then done with X2GO as explained "here":https://www.en.it.physik.uni-muenchen.de/dienste/netzwerk/rechnerzugriff/zugriff3/remote_login/index.html. I was pointed to using the following logins: |
30 | 13 | Martin Kuemmel | * cip-sv-login01.cip.physik.uni-muenchen.de |
31 | 13 | Martin Kuemmel | * cip-sv-login02.cip.physik.uni-muenchen.de |
32 | 13 | Martin Kuemmel | |
33 | 13 | Martin Kuemmel | but I am assuming the connections for Garching work as well. X2GO opens a KDE desktop, and of course the machine can connect to our cluster. |
34 | 13 | Martin Kuemmel | |
35 | 1 | Martin Kuemmel | |
36 | 1 | Martin Kuemmel | h2. Processing |
37 | 1 | Martin Kuemmel | |
38 | 1 | Martin Kuemmel | * as on our local cluster "slurm" is being used as the job scheduling system. Access to the computing nodes and running jobs requires starting a corresponding slurm job; |
39 | 1 | Martin Kuemmel | * the partition of our cluster is "usm-cl"; |
40 | 1 | Martin Kuemmel | * from the login node you can start an interactive job via "intjob --partition=usm-cl" (additional slurm arguments are accepted as well); |
41 | 8 | Martin Kuemmel | * I created a "python script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/285/scontrol.py which provides information on our partition (which jobs are running on which node, the owner of the job and so on); |
42 | 11 | Martin Kuemmel | * I have also put together a rather silly "slurm script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/283/test.slurm which can be used as a starting point; |
43 | 11 | Martin Kuemmel | * note that it is possible to directly "ssh" to all nodes on which one of your batch jobs is running. This can help to supervise the processing; |
44 | 1 | Martin Kuemmel | |
45 | 1 | Martin Kuemmel | h2. Disk space |
46 | 1 | Martin Kuemmel | |
47 | 1 | Martin Kuemmel | * users can create their own disk space under "/project/ls-mohr/users/" such as "/project/ls-mohr/users/martin.kuemmel"; |
48 | 1 | Martin Kuemmel | |
49 | 1 | Martin Kuemmel | h2. Installed software |
50 | 1 | Martin Kuemmel | |
51 | 1 | Martin Kuemmel | We use a package manager called spack to download and install software that is not directly available from the linux distribution. To see what is already installed, do the following on a computing node: |
52 | 1 | Martin Kuemmel | |
53 | 1 | Martin Kuemmel | * "module load spack" |
54 | 1 | Martin Kuemmel | * "module avail" |
55 | 1 | Martin Kuemmel | |
56 | 1 | Martin Kuemmel | Adding more software is not a problem. |
57 | 9 | Martin Kuemmel | |
58 | 10 | Martin Kuemmel | h2. Euclid processing on the cluster |
59 | 10 | Martin Kuemmel | |
60 | 10 | Martin Kuemmel | While OS, libraries and setup is different from EDEN-?.?, it is possible to load and run in an EDEN-3.0 environment using a container solution. The cluster offers "singularity":https://sylabs.io/guides/3.0/user-guide/quick_start.html as a container solution. While singularity is not officially supported in Euclid, it is being used in a limited role, and singularity is able to run docker images, which is the supported container format in Euclid. To work in an EDEN-3.0 on the new cluster you need to get the docker image doing: |
61 | 10 | Martin Kuemmel | * load singularity via: |
62 | 10 | Martin Kuemmel | <pre> |
63 | 10 | Martin Kuemmel | $ module load spack |
64 | 10 | Martin Kuemmel | $ module load singularity</pre> Note that the singularity version which is directly available on the computing nodes at "/usr/bin/singularity" does *not* work. The correct version loaded via the modules is at "/software/opt/focal/x86_64/singularity/v3.8.1/bin/singularity". |
65 | 14 | Martin Kuemmel | * it is *recommended* to move the singularity cache to somewhere under "/scratch-local", e.g. via:<pre>$ mkdir -p /scratch-local/$USER/singularity |
66 | 14 | Martin Kuemmel | $ export SINGULARITY_CACHEDIR=/scratch-local/$USER/singularity</pre> On the default cache location "/home/$HOME/.cache/singularity" there are problems deleting the entire cache when leaving singularity. |
67 | 10 | Martin Kuemmel | * pull the Euclid docker image via: <pre>singularity pull --docker-login docker://gitlab.euclid-sgs.uk:4567/st-tools/ct_xodeen_builder/dockeen</pre> With the gitlab credentials the docker image is stored in the file "dockeen_latest.sif" |
68 | 10 | Martin Kuemmel | |
69 | 10 | Martin Kuemmel | The docker image can be run interactively: |
70 | 12 | Martin Kuemmel | <pre>$ singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path_to>dockeen_latest.sif</pre> |
71 | 10 | Martin Kuemmel | It is also possible to directly issue a command in EDEN-3.0: |
72 | 12 | Martin Kuemmel | <pre>$ singularity exec --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path_to>dockeen_latest.sif <command_name></pre> |
73 | 10 | Martin Kuemmel | In both cases the relevant EDEN environment must first be loaded with: |
74 | 10 | Martin Kuemmel | <pre> |
75 | 10 | Martin Kuemmel | $ source /cvmfs/euclid-dev.in2p3.fr/CentOS7/EDEN-3.0/bin/activate |
76 | 10 | Martin Kuemmel | </pre> |
77 | 10 | Martin Kuemmel | |
78 | 10 | Martin Kuemmel | Information on the usage of singularity in Euclid is available at the "Euclid Redmine":https://euclid.roe.ac.uk/projects/codeen-users/wiki/EDEN_SINGULARITY. |
79 | 10 | Martin Kuemmel | |
80 | 17 | Martin Kuemmel | h3. Problems with the cvmfs |
81 | 17 | Martin Kuemmel | |
82 | 17 | Martin Kuemmel | All Euclid related software is centrally installed and deployed via cvmfs. This means that on the host machine the two directories: |
83 | 17 | Martin Kuemmel | <pre> |
84 | 17 | Martin Kuemmel | martin.kuemmel@usm-cl-bt02n4:~$ ls -ltr /cvmfs/ |
85 | 17 | Martin Kuemmel | drwxr-xr-x 2 cvmfs cvmfs 0 Feb 13 16:09 euclid-dev.in2p3.fr |
86 | 17 | Martin Kuemmel | drwxr-xr-x 2 cvmfs cvmfs 0 Feb 13 17:22 euclid.in2p3.fr |
87 | 17 | Martin Kuemmel | </pre> |
88 | 17 | Martin Kuemmel | *must* exist on the host machine such that they can be mounted in singularity as indicated above. It looks like cvmfs "sometimes get stuck" and needs to be re-installed or re-mounted. I there are problem mounting cvmfs in singularity and the above directories do not exist on the host, please write a ticket to the sysadmins below and they will fix it. |
89 | 17 | Martin Kuemmel | |
90 | 9 | Martin Kuemmel | h2. Support |
91 | 9 | Martin Kuemmel | |
92 | 9 | Martin Kuemmel | Support is provided by the IT support (Rechnerbetriebsgruppe) of the LMU faculty of physics with the helpdesk email: helpdesk@physik.uni-muenchen.de. Please keep Joe Mohr and me (Martin Kuemmel: mkuemmel@usm.lmu.de) in the loop such that we can maintain an overview on the cluster performance. |