MPSD Software manager
Contents
This repository provides the mpsd-software
tool which is used to install
package sets and toolchains on the
MPSD HPC cluster.
It can also be used to install the software on other machines, such as Linux laptops and desktops. This can be useful to work - on a local machine - with the same software environment, for example to debug a problem.
Note that this tool is under development, and the recommended way to install and use as well as the user interface may change. This document will be kept up-to-date in any case.
Quick start
To install, for example, the foss2022a-serial
toolchain:
-
Install this mpsd-software-manager Python package. The recommended way is to use
pipx
to that this tool is available independent from the use of any other Python environments:$ pipx install git+https://gitlab.gwdg.de/mpsd-cs/mpsd-software-manager
-
Navigate to the location in your file system where you would like to store your "MPSD software instance" that contains the compiled software. Once compiled, the location cannot be changed. For example:
$ cd /home/user/mpsd-software
-
Initiate the installation at this location using:
$ mpsd-software init
Future calls of the mpsd-software command need to be executed from this "mpsd-software-root" directory or in one of its subdirectories.
(The above command creates a hidden file
.mpsd-software-root
to tag the location for as the root of the installation. All compiled files, logs etc are written in or below this subdirectory.) -
From the same directory, run the command to install the
foss2022a-serial
toolchain:$ mpsd-software install dev-23a foss2022a-serial
This will take some time (up to several hours depending on hardware).
-
To see installed releases, use the
status
command:$ mpsd-software status Installed MPSD software releases: dev-23a
-
To see the installation status, and the required
module use
command line to activate the created modules, try thestatus
command and specify one release:$ mpsd-software status dev-23a Installed toolchains (dev-23a): - cascadelake foss2022a-serial [module use /home/user/mpsd-software/dev-23a/cascadelake/lmod/Core]
-
The
status
command can also be used to get a list of all packages in a toolchain and details of how they were compiled (running this may take a few seconds):$ mpsd-software status dev-23a foss2022a-serial listing packages installed for package_set='foss2022a-serial', microarch='cascadelake' autoconf@2.71%gcc@11.3.0 build_system=autotools arch=linux-debian11-cascadelake autoconf-archive@2022.02.11%gcc@11.3.0 build_system=autotools patches=139214f arch=linux-debian11-cascadelake automake@1.16.5%gcc@11.3.0 build_system=autotools arch=linux-debian11-cascadelake bdftopcf@1.0.5%gcc@11.3.0 build_system=autotools arch=linux-debian11-cascadelake berkeley-db@18.1.40%gcc@11.3.0+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc arch=linux-debian11-cascadelake berkeleygw@2.1%gcc@11.3.0~debug~elpa+hdf5~mpi+openmp~python~scalapack~verbose build_system=makefile arch=linux-debian11-cascadelake binutils@2.38%gcc@11.3.0~gas~gold+headers~interwork+ld~libiberty~lto+nls+plugins build_system=autotools libs=shared,static arch=linux-debian11-cascadelake boost@1.80.0%gcc@11.3.0~atomic~chrono~clanglibcpp+container~context~contract~coroutine~date_time~debug+exception~fiber~filesystem~graph~graph_parallel~icu~iostreams~json~locale~log+math~mpi+multithreaded~nowide~numpy~pic~program_options~python+random~regex~serialization+shared~signals~singlethreaded~stacktrace+system~taggedlayout~test+thread~timer~type_erasure~versionedlayout~wave build_system=generic cxxstd=98 patches=a440f96 visibility=hidden arch=linux-debian11-cascadelake ... <truncated in the README>
-
To compile Octopus, source the provided configure script, for example
foss2022a-serial-config.sh
, as explained here). The configure scripts are located indev-23a/spack-environments/octopus
:$ ls -1 dev-23a/spack-environments/octopus foss2021a-cuda-mpi-config.sh foss2021a-mpi-config.sh foss2021a-serial-config.sh foss2022a-cuda-mpi-config.sh foss2022a-mpi-config.sh
Documentation
More detailed documentation that goes beyond the Quick Start section.
Package sets and toolchains
-
Package sets are a combination of particular versions of multiple software packages (such as anaconda3, or gcc and fftw). In the way the SSU Computational Science provides software on the MPSD HPC cluster, and for the Octopus continuous integration services, these package sets are compiled together (using Spack).
-
Toolchains are a particular type of package sets:
-
the choice of software packages (typically a compiler and scientific computing libraries) and their versions follows the Easybuild toolchains (such as the FOSS toolchains).
-
all packages grouped together in a toolchain can be loaded together using the
module load
command.Example: the
foss2022a-serial
tool chain provides (in spack notation):- gcc@11.3.0 - binutils@2.38+headers+ld - fftw@3.3.10+openmp~~mpi - openblas@0.3.20
-
in addition to the Easybuild-driven choice of packages, there are additional packages included in each toolchain which support the build of Octopus within these toolchains. For
foss2022a-serial
these packages include:- libxc@5.2.3 # octopus-dependencies: - gsl@2.7.1 - sparskit@develop # 2021.06.01 - nlopt@2.7.0 - libgd@2.2.4 # 2.3.1 - libvdwxc@0.4.0~~mpi - nfft@3.2.4 - berkeleygw@2.1~~mpi~scalapack - python@3.9.5 - cgal@5.0.3 # 5.2 - hdf5@1.12.2~mpi - etsf-io@1.0.4
-
MPSD software releases
As explained in the MPSD HPC documentation, we
label software releases available on the HPC using a naming scheme of the year
(such as 23
) and a letter starting from a
. There is an exception that
the first available software version is dev-23a
(starting with dev-
to
indicate this was a development prototype).
At the moment (June 2023), there is only one release (that is dev-23a
).
For each MPSD software release, multiple toolchains and package sets are available:
$> mpsd-software available dev-23a
MPSD software release dev-23a, AVAILABLE for installation are
Toolchains:
foss2021a-cuda-mpi
foss2021a-mpi
foss2021a-serial
foss2022a-cuda-mpi
foss2022a-mpi
foss2022a-serial
Package sets:
global (octopus@12.1, octopus@12.1)
global_generic (anaconda3@2022.10)
Prerequisites
What needs to be installed for the installation to succeed?
The mpsd-software-manager
Python package.
-
This needs a recent Python (3.9 or later).
-
Install via pip or pipx.
Pipx commands are:
- to install:
pipx install git+https://gitlab.gwdg.de/mpsd-cs/mpsd-software-manager
- to update:
pipx upgrade mpsd-software-manager
- to uninstall:
pipx uninstall mpsd-software-manager
- to install:
-
Requirements to be able to run spack
-
The installation is only expected to work for x86 architectures at the moment.
-
The installation is only expected to work on Linux at the moment (i.e. not on OSX).
Requirements for particular toolchains and package sets
-
foss*-serial
should compile with the dependencies outlined above -
foss*-mpi
currently needs linux header files installed (to compile theknem
package) -
foss*-cuda-mpi
(proably as *-mpi, needs testing TODO)
Finding the Octopus configure wrapper
For each Octopus toolchain, there is an Octopus configure wrapper available.
The wrapper essentially calls the configure script with the right parameters,
and library locations for the current toolchain. Once the
toolchain is loaded, the variable $MPSD_OCTOPUS_CONFIGURE
contains that
path. The path can also be seen using the module show TOOLCHAIN_NAME
command. For example:
$ mpsd-software install dev-23a foss2022a-mpi
$ module use ~/mpsd-software/dev-23a/cascadelake/lmod/Core
$ module show toolchains/foss2022a-mpi
...
depends_on("cgal/5.0.3")
depends_on("hdf5/1.12.2")
setenv("MPSD_OCTOPUS_CONFIGURE","~/mpsd-software/dev-23a/spack-environments/octopus/foss2022a-mpi-config.sh")
$ module load toolchains/foss2022a-mpi
$ echo $MPSD_OCTOPUS_CONFIGURE
~/mpsd-software/dev-23a/spack-environments/octopus/foss2022a-mpi-config.sh
Working example
There is an
example
compilation that shows the complete compilation cycle (including compilation of
Octopus) using the foss2022a-serial
toolchain.
Frequently asked questions
-
Can I install the
mpsd-software-manager
package in a Python virtual environment?Yes.
pipx
is probably more convenient, but you can create your own Pyton virtual environment and install thempsd-software-manager
in that as a regular Python package:python3 -m venv venv . venv/bin/activate pip install git+https://gitlab.gwdg.de/mpsd-cs/mpsd-software-manager
You just need to activate that Python virtual environment before being able to use the tool.
-
Does the command write anything outside the mpsd-software-root directory?
No. All changes to disk take place in and below the mpsd-software-root directory (which is the one in which the
mpsd-software
command is called). -
How can I uninstall the mpsd-software?
For now, the easiest is to delete the
mpsd-software-root
directory. You can probably delete just a release subdirectory (such asdev-23a
) if you have multiple release subdirectories installed and you only want to delete one. (Untested.) -
How long does the compilation take?
This depends on the hardware. A few hours are typical per toolchain. If a second toolchain is compiled in the same MPSD software instance and the same MPSD release it is likely to be faster, in particular if the same compiler is used (and thus the compiler does not need to be re-compiled for the second toolchain).
-
How much disk storage do I need?
A toolchain needs of the order of 5GB on disk. The second or third toolchain (in the same MPSD software instance) will use less additional space, as libraries and tools are re-used where possible.
-
Can I have more than one MPSD software instance?
Yes.
We call "MPSD software instance" all the compiled software that is stored in and below a "mpsd-software-root" directory (see instructions above).
It is possible to install multiple MPSD software instances on the same computer (just in different (not nested) directories. This makes it possible to experiment with toolchains etc.
Development
Developers documentation is available at development.rst.
Upcoming release 24a (for alpha testing)
Work has started to provide newer toolchains and compiler versions. For these, the structure and layout of the modules has been modified compared to what is described above (the changes are experimentally and may change in the future, feedback is very welcome).
To use the new compilers on the HPC system run:
mpsd-modules 24a
module avail
You get access (as of Dec 2023) to gcc/12.2.0
(easybuild toolchain
foss2022b), gcc/12.3.0
(easybuild toolchain foss2023a), and gcc/13.2.0
(also via the toolchain gcc13 as easybuild did not yet release a toolchain with
any gcc13 that we could mimic). The packages are currently only compiled on the
Sandybridge architecture. They can be used on more modern architectures but will
not make use of additional CPU features.
When loading the toolchain metamodules (called toolchain/<easybuild version>|gcc<gcc major version>
), you will load (ignoring internal dependencies)
- gcc
- openmpi
- fftw
- openblas
- scalapack
Several additional octopus dependencies have also been compiled with all gcc versions. They can be loaded individually, we currently do not provide any metamodule for octopus dependencies. Please report any problems or missing modules, or difficulties loading modules that can come with or without MPI support.
Each module sets an environment variable MPSD_<MODULE_NAME>_ROOT
that can be used for configure and the rpath (see below).
Spack requires adding rpath in the linker options (LDLIBRARY_PATH
is not set
[and should not be set]), more details are available at:
https://computational-science.mpsd.mpg.de/docs/mpsd-hpc.html#setting-the-rpath-finding-libraries-at-runtime
(also https://docs.mpcdf.mpg.de/faq/hpc_software.html#how-do-i-set-the-rpath ).
Setting the flags for all loaded modules should work via:
export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'`
Note An intel toolchain may at some point also show up in the list. It might however not be functional yet so expect that loading that module might fail.