Skip to content
Snippets Groups Projects

MPSD Software manager

This repository provides the mpsd-software tool which is used to install package sets and toolchains on the MPSD HPC cluster.

It can also be used to install the software on other machines, such as Linux laptops and desktops. This can be useful to work - on a local machine - with the same software environment, for example to debug a problem.

Note that this tool is under development, and the recommended way to install and use as well as the user interface may change. This document will be kept up-to-date in any case.

Quick start

To install, for example, the foss2022a-serial toolchain:

  1. Install this mpsd-software-manager Python package. The recommended way is to use pipx to that this tool is available independent from the use of any other Python environments:

    $ pipx install git+https://gitlab.gwdg.de/mpsd-cs/mpsd-software-manager
  2. Navigate to the location in your file system where you would like to store your "MPSD software instance" that contains the compiled software. Once compiled, the location cannot be changed. For example:

    $ cd /home/user/mpsd-software
  3. Initiate the installation at this location using:

    $ mpsd-software init

    Future calls of the mpsd-software command need to be executed from this "mpsd-software-root" directory or in one of its subdirectories.

    (The above command creates a hidden file .mpsd-software-root to tag the location for as the root of the installation. All compiled files, logs etc are written in or below this subdirectory.)

  4. From the same directory, run the command to install the foss2022a-serial toolchain:

    $ mpsd-software install dev-23a foss2022a-serial

    This will take some time (up to several hours depending on hardware).

  5. To see installed releases, use the status command:

    $ mpsd-software status
    
    Installed MPSD software releases:
        dev-23a
  6. To see the installation status, and the required module use command line to activate the created modules, try the status command and specify one release:

    $ mpsd-software status dev-23a
    
    Installed toolchains (dev-23a):
    
    - cascadelake
        foss2022a-serial
        [module use /home/user/mpsd-software/dev-23a/cascadelake/lmod/Core]
  7. The status command can also be used to get a list of all packages in a toolchain and details of how they were compiled (running this may take a few seconds):

    $ mpsd-software status dev-23a foss2022a-serial
    
    listing packages installed for package_set='foss2022a-serial', microarch='cascadelake'
    autoconf@2.71%gcc@11.3.0 build_system=autotools arch=linux-debian11-cascadelake
    autoconf-archive@2022.02.11%gcc@11.3.0 build_system=autotools patches=139214f arch=linux-debian11-cascadelake
    automake@1.16.5%gcc@11.3.0 build_system=autotools arch=linux-debian11-cascadelake
    bdftopcf@1.0.5%gcc@11.3.0 build_system=autotools arch=linux-debian11-cascadelake
    berkeley-db@18.1.40%gcc@11.3.0+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc arch=linux-debian11-cascadelake
    berkeleygw@2.1%gcc@11.3.0~debug~elpa+hdf5~mpi+openmp~python~scalapack~verbose build_system=makefile arch=linux-debian11-cascadelake
    binutils@2.38%gcc@11.3.0~gas~gold+headers~interwork+ld~libiberty~lto+nls+plugins build_system=autotools libs=shared,static arch=linux-debian11-cascadelake
    boost@1.80.0%gcc@11.3.0~atomic~chrono~clanglibcpp+container~context~contract~coroutine~date_time~debug+exception~fiber~filesystem~graph~graph_parallel~icu~iostreams~json~locale~log+math~mpi+multithreaded~nowide~numpy~pic~program_options~python+random~regex~serialization+shared~signals~singlethreaded~stacktrace+system~taggedlayout~test+thread~timer~type_erasure~versionedlayout~wave build_system=generic cxxstd=98 patches=a440f96 visibility=hidden arch=linux-debian11-cascadelake
    ... <truncated in the README>
  8. To compile Octopus, source the provided configure script, for example foss2022a-serial-config.sh, as explained here). The configure scripts are located in dev-23a/spack-environments/octopus:

    $ ls -1 dev-23a/spack-environments/octopus
    
    foss2021a-cuda-mpi-config.sh
    foss2021a-mpi-config.sh
    foss2021a-serial-config.sh
    foss2022a-cuda-mpi-config.sh
    foss2022a-mpi-config.sh

Documentation

More detailed documentation that goes beyond the Quick Start section.

Package sets and toolchains

  • Package sets are a combination of particular versions of multiple software packages (such as anaconda3, or gcc and fftw). In the way the SSU Computational Science provides software on the MPSD HPC cluster, and for the Octopus continuous integration services, these package sets are compiled together (using Spack).

  • Toolchains are a particular type of package sets:

    • the choice of software packages (typically a compiler and scientific computing libraries) and their versions follows the Easybuild toolchains (such as the FOSS toolchains).

    • all packages grouped together in a toolchain can be loaded together using the module load command.

      Example: the foss2022a-serial tool chain provides (in spack notation):

      - gcc@11.3.0
      - binutils@2.38+headers+ld
      - fftw@3.3.10+openmp~~mpi
      - openblas@0.3.20
    • in addition to the Easybuild-driven choice of packages, there are additional packages included in each toolchain which support the build of Octopus within these toolchains. For foss2022a-serial these packages include:

      - libxc@5.2.3       # octopus-dependencies:
      - gsl@2.7.1
      - sparskit@develop  # 2021.06.01
      - nlopt@2.7.0
      - libgd@2.2.4       # 2.3.1
      - libvdwxc@0.4.0~~mpi
      - nfft@3.2.4
      - berkeleygw@2.1~~mpi~scalapack
      - python@3.9.5
      - cgal@5.0.3  # 5.2
      - hdf5@1.12.2~mpi
      - etsf-io@1.0.4

MPSD software releases

As explained in the MPSD HPC documentation, we label software releases available on the HPC using a naming scheme of the year (such as 23) and a letter starting from a. There is an exception that the first available software version is dev-23a (starting with dev- to indicate this was a development prototype).

At the moment (June 2023), there is only one release (that is dev-23a).

For each MPSD software release, multiple toolchains and package sets are available:

$> mpsd-software available dev-23a

MPSD software release dev-23a, AVAILABLE for installation are
Toolchains:
    foss2021a-cuda-mpi
    foss2021a-mpi
    foss2021a-serial
    foss2022a-cuda-mpi
    foss2022a-mpi
    foss2022a-serial
Package sets:
    global (octopus@12.1, octopus@12.1)
    global_generic (anaconda3@2022.10)

Prerequisites

What needs to be installed for the installation to succeed?

The mpsd-software-manager Python package.

  • This needs a recent Python (3.9 or later).

  • Install via pip or pipx.

    Pipx commands are:

    • to install: pipx install git+https://gitlab.gwdg.de/mpsd-cs/mpsd-software-manager
    • to update: pipx upgrade mpsd-software-manager
    • to uninstall: pipx uninstall mpsd-software-manager
  • Requirements to be able to run spack

  • The installation is only expected to work for x86 architectures at the moment.

  • The installation is only expected to work on Linux at the moment (i.e. not on OSX).

Requirements for particular toolchains and package sets

  • foss*-serial should compile with the dependencies outlined above
  • foss*-mpi currently needs linux header files installed (to compile the knem package)
  • foss*-cuda-mpi (proably as *-mpi, needs testing TODO)

Finding the Octopus configure wrapper

For each Octopus toolchain, there is an Octopus configure wrapper available. The wrapper essentially calls the configure script with the right parameters, and library locations for the current toolchain. Once the toolchain is loaded, the variable $MPSD_OCTOPUS_CONFIGURE contains that path. The path can also be seen using the module show TOOLCHAIN_NAME command. For example:

$ mpsd-software install dev-23a foss2022a-mpi
$ module use ~/mpsd-software/dev-23a/cascadelake/lmod/Core
$ module show toolchains/foss2022a-mpi
...
depends_on("cgal/5.0.3")
depends_on("hdf5/1.12.2")
setenv("MPSD_OCTOPUS_CONFIGURE","~/mpsd-software/dev-23a/spack-environments/octopus/foss2022a-mpi-config.sh")
$ module load toolchains/foss2022a-mpi
$ echo $MPSD_OCTOPUS_CONFIGURE
~/mpsd-software/dev-23a/spack-environments/octopus/foss2022a-mpi-config.sh

Working example

There is an example compilation that shows the complete compilation cycle (including compilation of Octopus) using the foss2022a-serial toolchain.

Frequently asked questions

  • Can I install the mpsd-software-manager package in a Python virtual environment?

    Yes. pipx is probably more convenient, but you can create your own Pyton virtual environment and install the mpsd-software-manager in that as a regular Python package:

    python3 -m venv venv
    . venv/bin/activate
    pip install git+https://gitlab.gwdg.de/mpsd-cs/mpsd-software-manager

    You just need to activate that Python virtual environment before being able to use the tool.

  • Does the command write anything outside the mpsd-software-root directory?

    No. All changes to disk take place in and below the mpsd-software-root directory (which is the one in which the mpsd-software command is called).

  • How can I uninstall the mpsd-software?

    For now, the easiest is to delete the mpsd-software-root directory. You can probably delete just a release subdirectory (such as dev-23a) if you have multiple release subdirectories installed and you only want to delete one. (Untested.)

  • How long does the compilation take?

    This depends on the hardware. A few hours are typical per toolchain. If a second toolchain is compiled in the same MPSD software instance and the same MPSD release it is likely to be faster, in particular if the same compiler is used (and thus the compiler does not need to be re-compiled for the second toolchain).

  • How much disk storage do I need?

    A toolchain needs of the order of 5GB on disk. The second or third toolchain (in the same MPSD software instance) will use less additional space, as libraries and tools are re-used where possible.

  • Can I have more than one MPSD software instance?

    Yes.

    We call "MPSD software instance" all the compiled software that is stored in and below a "mpsd-software-root" directory (see instructions above).

    It is possible to install multiple MPSD software instances on the same computer (just in different (not nested) directories. This makes it possible to experiment with toolchains etc.

Development

Developers documentation is available at development.rst.

Upcoming release 24a (for alpha testing)

Work has started to provide newer toolchains and compiler versions. For these, the structure and layout of the modules has been modified compared to what is described above (the changes are experimentally and may change in the future, feedback is very welcome).

To use the new compilers on the HPC system run:

mpsd-modules 24a
module avail

You get access (as of Dec 2023) to gcc/12.2.0 (easybuild toolchain foss2022b), gcc/12.3.0 (easybuild toolchain foss2023a), and gcc/13.2.0 (also via the toolchain gcc13 as easybuild did not yet release a toolchain with any gcc13 that we could mimic). The packages are currently only compiled on the Sandybridge architecture. They can be used on more modern architectures but will not make use of additional CPU features.

When loading the toolchain metamodules (called toolchain/<easybuild version>|gcc<gcc major version>), you will load (ignoring internal dependencies)

  • gcc
  • openmpi
  • fftw
  • openblas
  • scalapack

Several additional octopus dependencies have also been compiled with all gcc versions. They can be loaded individually, we currently do not provide any metamodule for octopus dependencies. Please report any problems or missing modules, or difficulties loading modules that can come with or without MPI support.

Each module sets an environment variable MPSD_<MODULE_NAME>_ROOT that can be used for configure and the rpath (see below).

Spack requires adding rpath in the linker options (LDLIBRARY_PATH is not set [and should not be set]), more details are available at: https://computational-science.mpsd.mpg.de/docs/mpsd-hpc.html#setting-the-rpath-finding-libraries-at-runtime (also https://docs.mpcdf.mpg.de/faq/hpc_software.html#how-do-i-set-the-rpath ).

Setting the flags for all loaded modules should work via:

export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'`

Note An intel toolchain may at some point also show up in the list. It might however not be functional yet so expect that loading that module might fail.