Skip to content
Snippets Groups Projects
Forked from Olusola Khalid Yusuf / tranSMART Docker
2 commits ahead of the upstream repository.

The purpose of this repository is to provide a Docker-based installation of tranSMART configured for the use in a project at the University Medical Center Göttingen. This repository is based on the work by Denny Verbeeck featured here at GitHub. Since TranSMART consists of multiple services, docker-compose is used to build images for the different services and manage the links between them. Docker images are uploaded to UMG-MI@DockerHub. Apache is used to reverse proxy requests to the Tomcat server. This branch of the repository contains Transmart Foundation version 16.2-UMG available at [GWDG-GitLab]https://gitlab.gwdg.de/medinf/AGITF/transmart (internal). A demo study Breast Cancer Desmedt GSE7390 is automatically uploaded into your new tranSMART by an auto-upload-process using transmart-batch (place STUDY.zip into /var/lib/docker/volumes/tm_umg_tm_opt/_data/data/incoming or inside your running container). You can login with admin/admin into your fresh tranSMART installation.

Contact

Having problems or questions? Send us a mail to mi.translationaleforschung@med.uni-goettingen.de. A short overview and an exercise with the included demo data can be found in this repository as tranSMART-exercise.pdf.

Running a tranSMART server

Install docker and docker-compose to your local machine. Clone this repository to an easily accessible location on your server. There are a few configuration files to be modified before building the images.

Configuration

Open your ports

If you use CentOS, which is very afraid of it users, you have to open some ports before starting your server:

firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --complete-reload
# check if http and https are open now
firewall-cmd --list-all

.env

Create a local copy of the env file to fill with individual passwords and other settings

cp .env_RENAME .env

You have to at least change the DOMAIN variable. Here is an explanation of all variables:

variable standard explanation
DOMAIN CHANGE_ME the domain of the server (mandatory)
POSTGRES_PASSWORD CHANGE_ME come up with your unique password for the postgresql databases
PROTOCOL http:// either 'http://' or 'https//', start with http and then follow the enryption guide bellow
ADMIN_MAIL NO_ADMIN_MAIL set to mail adress of the server admin, it's displayed on the login screen
LETS_ENCRYPT_EMAIL CHANGE_ME optional for retrieving letsencrypt certificate, necessary to receive updates on expiring certificates
GET_CURRENT_WAR false choose true/false if the latest tranSMART build from our server should be used or the workig one provided in this repository
WAR_BRANCH release-16.2-UMG choose branch from https://gitlab.gwdg.de/medinf/AGITF/transmart/transmartApp

Port 80/443, include SSL certificate (.crt & .key)

To just use port 80, change PROTOCOL variable (.env) to http://. (only available for tmweb, not tmwebletsencrypt) For port 443, change PROTOCOL to https://.

The tmwebletsencrypt container can be used if you don't have a certificate yet. tmweb has to be commented out (docker-compose.yml). Please consider letsencrypt limits, e.g. only 7 certificates can be retrieved for one domain per week.

If you already have a certificate (first line: -----BEGIN CERTIFICATE-----) and a private key (first line: -----BEGIN RSA PRIVATE KEY-----), e.g. from OpenSSL, or copied from tmwebletsencrypt container, you can include it this way:

  • Copy the files (as server.crt, server.key) to transmart-web/certs/
  • Uncomment tmweb instead of tmwebletsencrypt, if needed (docker-compose.yml)

Include LDAP

LDAP configuration has to be added in transmart-app/Config.groovy (search for TODO). An example can be found at https://github.com/transmart/transmartApp-Config/blob/master/Config.groovy. After docker-compose build it can be changed in the respective tmapp volume.

Preparing your local images

It should be sufficient now to execute docker-compose pull in the root directory of the repository. This will automatically download all the necessary components and the build images. This process can running for a long time since some dependencies assembling (transmart-batch) will be very slow.

Starting your server

Now you can start your local tranSMART server with a command like this:

docker-compose up -d && docker-compose logs -f

This will create the network and run the containers in the background while still giving you direct output of the logs. When you see a line like this

tmapp_1     | INFO: Server startup in 40888 ms

this means the services are up and running. Verify this by running docker-compose ps:

$ docker-compose ps
           Name                         Command               State                  Ports
---------------------------------------------------------------------------------------------------------
tm_umg_tmapp_1      /bin/bash -c /entrypoint.s ...   Up      127.0.0.1:8009->8009/tcp, 8080/tcp, 8443/tcp
tm_umg_tmbatch_1    /entrypoint.sh                   Up
tm_umg_tmdb_1       /usr/lib/postgresql/9.3/bi ...   Up      127.0.0.1:5432->5432/tcp
tm_umg_tmgwava_1    catalina.sh run                  Up      8080/tcp
tm_umg_tmrserve_1   R CMD Rserve.dbg --no-save ...   Up
tm_umg_tmsolr_1     java -jar start.jar              Up      8983/tcp
tm_umg_tmweb_1      httpd-foreground                 Up

This overview gives us a lot of information. We can see all services are up and running. We also see that port 5432 of our own machine is forwarded to port 5432 of the tmdb container, and that port 8009 is forwarded to port 8009 of the tmapp container. Exposing the database port to the localhost allows us to connect to it using tools like psql. Port 8009 is used by the tmweb container to proxy requests to the web application over the ajp protocol. Point your browser to your server URL to see your installation running. By default you can log in with username and password admin. Change the password for the admin user as soon as possible.

After your first docker-compose up command, use docker-compose stop and docker-compose start to stop and start the TranSMART stack.

If no war is found, the current version can also be found in /artifacts/artifacts.zip and has to be copied into tm_umg_tmappwebapp/_data/webapps, which requires a restart of tomcat (or the whole tmapp container).

Upgrading

For all services it is sufficient to modify the tag in the docker-compose file (or pulling a new version of the file from this repository), and executing docker-compose up -d again. Compose will auto-detect which services should be recreated. In tmapp a new war file is downloaded automatically if a new version on GitLab can be found when starting the container.

Components

This docker-compose project consists of the following services:

  • tmweb: nginx frontend and reverse-proxy for tomcat,
  • tmwebletsencrypt: EXCLUDED PER DEFAULT, tmweb with letsencrypt certificate, including renewal testing,
  • tmapp: the tomcat server and application,
  • tmdb: the Postgres database, the database in this image has a superadmin with username docker and password docker
  • tmsolr: the SOLR installation for faceted search,
  • tmrserve: Rserve instance for advanced analyses,
  • tmload: DEPRECATED, a Kettle installation you can use for loading data and,
  • tmgwava: Genome Wide Association Study Visualizer. (EXCLUDED because of port overlaps)
  • tmbatch: the transmart-batch ETL tool for loading datasets

Loading your own data

The container tmbatch contains a precompiled and configured version of the data upload tool transmart-batch. It can directly be called with

docker-compose exec tmbatch 

Uploading a study can be initiated with the following call and by replacing # with the path to your params file.

docker-compose exec tmbatch /opt/git/transmart-batch/transmart-batch.jar -c /opt/git/transmart-batch/batchdb.properties -n -p #

Please refer to the documentation of transmart-batch on how to set-up and upload your data. Your data will be automatically uploaded if you zip a full transmart-batch structure (example) as "STUDYID.zip" (where STUDYID is the actual ID) and copy it directly into the named docker volumne "tm_opt" from your host system to the folder. This folder can normally be found at:

/var/lib/docker/volumes/tm_umg_tm_opt/_data/data/incoming

Logs of the upload process are directly forwarded to docker-compose ("docker-compose logs") or can be viewed in the same volumne at "/var/lib/docker/volumes/tm_umg_tm_opt/_data/data/logs".

Loading public datasets

Note: If you plan on copying an existing tranSMART database to your new docker-based one, please do this first, it is explained in the next section.

You can use the tmbatch image to load data to the database through transmart-batch. The easiest way of loading public datasets is using the pre-curated library hosted by the tranSMART foundation. For more information, please read their wiki page.

Copy data from an existing instance

psql dump

If you have an existing instance of tranSMART running, you may want to copy the database to your new dockerized instance. It is best you do this to an empty, but initialized tranSMART database, since everything will be copied, including things like sequence values. The most portable way of copying is using pg_dump to dump all data from the old database in the form of attribute inserts, and use this file to load data into the new database. Using the --attribute-inserts option ensures that a single failed insertion (e.g. a row that exists in the new database, like the definition of the admin user) does not cause the whole table not to be loaded. It also guards against minor schema changes, such as a column with default value that was added to an existing table. On the host where the old database resides, log in as the postgres user (or any other means that allows you access to the database) and execute the following:

pg_dump -a --disable-triggers --attribute-inserts transmart | gzip > tmdump.sql.gz

Depending on the size of your database, this can take some time. When the command is finished, you will have a file called tmdump.sql.gz. This is the compressed file containing all SQL statements necessary to restore your database. Copy this file to the host running the transmart-db container. The default configuration exposes port 5432 of the container to localhost, so you should be able to connect to it. Use the following command to unzip the file and immediately send the SQL commands to the database:

zcat tmdump.sql.gz | psql -h 127.0.0.1 -U <user> transmart

You will be asked for the password, which is the one from the .env file. After the command finishes, you should have all your old data in your new tranSMART server!

docker volumes

Another way to copy old data is to copy whole docker volumes. The password has to be the same as in the old installation. The following volumes have to be copied:

  - tm_postgresconfig
  - tm_postgresdata
  - tm_postgreslogs

Create new volumes from with current docker-compose.yml. Write down the owner of the files inside the new postgres volumes (see above) and then replace them with the old ones.

Inside the volumes, ownership has to be set changed:

chown -R new_user:new_group tmumg_tm_postgres*/_data/*

And this link can be lost in the copying process, but needs to be existend:

/var/lib/docker/volumes/tmumg_tm_postgresdatad/_data/9.3/main/pg_tblspc# cp -d * /var/lib/docker/volumes/tmumg_tm_postgresdata/_data/9.3/main/pg_tblspc

Database changes

Listing all available studies with the view i2b2metadata.i2b2_trial_nodes has been very slow:

 SELECT DISTINCT ON (i2b2.c_comment) i2b2.c_fullname,
    "substring"(i2b2.c_comment, 7) AS trial
   FROM i2b2metadata.i2b2
  WHERE i2b2.c_comment IS NOT NULL 
  ORDER BY  i2b2.c_comment, char_length(i2b2.c_fullname::text);

Instead of ordering by length of c_fullname this view now utilizes the c_hlevel column to directly get a studies top level (c_visualattributes could also be used).

CREATE OR REPLACE VIEW i2b2metadata.i2b2_trial_nodes AS
   SELECT DISTINCT ON (i2b2.c_comment) i2b2.c_fullname,
    "substring"(i2b2.c_comment, 7) AS trial
   FROM i2b2metadata.i2b2
  WHERE i2b2.c_comment IS NOT NULL AND i2b2.c_hlevel = 1::numeric
  ORDER BY i2b2.c_comment;

Helpful commands

reload webserver config

docker-compose exec letsencrypt nginx -s reload

restart database

docker-compose exec tmdb /etc/init.d/postgresql restart

direct datebase access

docker-compose exec tmdb psql

remove a study from tranSMART (replace both STUDYX occurrences)

docker-compose exec tmbatch touch /tmp/backout.params && docker-compose exec tmbatch /opt/git/transmart-batch/transmart-batch.jar -c /opt/git/transmart-batch/batchdb.properties -n -p /tmp/backout.params -d TOP_NODE="\Internal Studies\STUDYX\\" -d STUDY_ID=STUDYX