The purpose of this repository is to provide a Docker-based installation of tranSMART configured for the use in a project at the University Medical Center Göttingen. This repository is based on the work by Denny Verbeeck featured here at GitHub. Since TranSMART consists of multiple services, docker-compose
is used to build images for the different services and manage the links between them. Docker images are uploaded to UMG-MI@DockerHub. Apache is used to reverse proxy requests to the Tomcat server. This branch of the repository contains Transmart Foundation version 16.2-UMG
available at [GWDG-GitLab]https://gitlab.gwdg.de/medinf/AGITF/transmart (internal). A demo study Breast Cancer Desmedt GSE7390 is automatically uploaded into your new tranSMART by an auto-upload-process using transmart-batch (place STUDY.zip into /var/lib/docker/volumes/tm_umg_tm_opt/_data/data/incoming
or inside your running container). You can login with admin/admin into your fresh tranSMART installation.
Contact
Having problems or questions? Send us a mail to mi.translationaleforschung@med.uni-goettingen.de.
A short overview and an exercise with the included demo data can be found in this repository as tranSMART-exercise.pdf
.
Running a tranSMART server
Install docker and docker-compose to your local machine. Clone this repository to an easily accessible location on your server. There are a few configuration files to be modified before building the images.
Configuration
Open your ports
If you use CentOS, which is very afraid of it users, you have to open some ports before starting your server:
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --complete-reload
# check if http and https are open now
firewall-cmd --list-all
.env
Create a local copy of the env file to fill with individual passwords and other settings
cp .env_RENAME .env
You have to at least change the DOMAIN variable. Here is an explanation of all variables:
variable | standard | explanation |
---|---|---|
DOMAIN |
CHANGE_ME | the domain of the server (mandatory) |
POSTGRES_PASSWORD |
CHANGE_ME | come up with your unique password for the postgresql databases |
PROTOCOL |
http:// | either 'http://' or 'https//', start with http and then follow the enryption guide bellow |
ADMIN_MAIL |
NO_ADMIN_MAIL | set to mail adress of the server admin, it's displayed on the login screen |
LETS_ENCRYPT_EMAIL |
CHANGE_ME | optional for retrieving letsencrypt certificate, necessary to receive updates on expiring certificates |
GET_CURRENT_WAR |
false | choose true/false if the latest tranSMART build from our server should be used or the workig one provided in this repository |
WAR_BRANCH |
release-16.2-UMG | choose branch from https://gitlab.gwdg.de/medinf/AGITF/transmart/transmartApp |
Port 80/443, include SSL certificate (.crt & .key)
To just use port 80, change PROTOCOL variable (.env) to http://. (only available for tmweb, not tmwebletsencrypt) For port 443, change PROTOCOL to https://.
The tmwebletsencrypt container can be used if you don't have a certificate yet. tmweb has to be commented out (docker-compose.yml). Please consider letsencrypt limits, e.g. only 7 certificates can be retrieved for one domain per week.
If you already have a certificate (first line: -----BEGIN CERTIFICATE-----) and a private key (first line: -----BEGIN RSA PRIVATE KEY-----), e.g. from OpenSSL, or copied from tmwebletsencrypt container, you can include it this way:
- Copy the files (as server.crt, server.key) to transmart-web/certs/
- Uncomment tmweb instead of tmwebletsencrypt, if needed (docker-compose.yml)
Include LDAP
LDAP configuration has to be added in transmart-app/Config.groovy (search for TODO). An example can be found at https://github.com/transmart/transmartApp-Config/blob/master/Config.groovy. After docker-compose build it can be changed in the respective tmapp volume.
Preparing your local images
It should be sufficient now to execute docker-compose pull
in the root directory of the repository. This will automatically download all the necessary components and the build images. This process can running for a long time since some dependencies assembling (transmart-batch) will be very slow.
Starting your server
Now you can start your local tranSMART server with a command like this:
docker-compose up -d && docker-compose logs -f
This will create the network and run the containers in the background while still giving you direct output of the logs. When you see a line like this
tmapp_1 | INFO: Server startup in 40888 ms
this means the services are up and running. Verify this by running docker-compose ps
:
$ docker-compose ps
Name Command State Ports
---------------------------------------------------------------------------------------------------------
tm_umg_tmapp_1 /bin/bash -c /entrypoint.s ... Up 127.0.0.1:8009->8009/tcp, 8080/tcp, 8443/tcp
tm_umg_tmbatch_1 /entrypoint.sh Up
tm_umg_tmdb_1 /usr/lib/postgresql/9.3/bi ... Up 127.0.0.1:5432->5432/tcp
tm_umg_tmgwava_1 catalina.sh run Up 8080/tcp
tm_umg_tmrserve_1 R CMD Rserve.dbg --no-save ... Up
tm_umg_tmsolr_1 java -jar start.jar Up 8983/tcp
tm_umg_tmweb_1 httpd-foreground Up
This overview gives us a lot of information. We can see all services are up and running. We also see that port 5432 of our own machine is forwarded to port 5432 of the tmdb
container, and that port 8009 is forwarded to port 8009 of the tmapp
container. Exposing the database port to the localhost allows us to connect to it using tools like psql
. Port 8009 is used by the tmweb
container to proxy requests to the web application over the ajp
protocol. Point your browser to your server URL to see your installation running. By default you can log in with username and password admin. Change the password for the admin user as soon as possible.
After your first docker-compose up
command, use docker-compose stop
and docker-compose start
to stop and start the TranSMART stack.
If no war is found, the current version can also be found in /artifacts/artifacts.zip and has to be copied into tm_umg_tmappwebapp/_data/webapps, which requires a restart of tomcat (or the whole tmapp container).
Upgrading
For all services it is sufficient to modify the tag in the docker-compose
file (or pulling a new version of the file from this repository), and executing docker-compose up -d
again. Compose will auto-detect which services should be recreated.
In tmapp
a new war file is downloaded automatically if a new version on GitLab can be found when starting the container.
Components
This docker-compose
project consists of the following services:
-
tmweb
: nginx frontend and reverse-proxy for tomcat, -
tmwebletsencrypt
: EXCLUDED PER DEFAULT, tmweb with letsencrypt certificate, including renewal testing, -
tmapp
: the tomcat server and application, -
tmdb
: the Postgres database, the database in this image has a superadmin with username docker and password docker -
tmsolr
: the SOLR installation for faceted search, -
tmrserve
: Rserve instance for advanced analyses, -
tmload
: DEPRECATED, a Kettle installation you can use for loading data and, -
tmgwava
: Genome Wide Association Study Visualizer. (EXCLUDED because of port overlaps) -
tmbatch
: the transmart-batch ETL tool for loading datasets
Loading your own data
The container tmbatch
contains a precompiled and configured version of the data upload tool transmart-batch. It can directly be called with
docker-compose exec tmbatch
Uploading a study can be initiated with the following call and by replacing # with the path to your params file.
docker-compose exec tmbatch /opt/git/transmart-batch/transmart-batch.jar -c /opt/git/transmart-batch/batchdb.properties -n -p #
Please refer to the documentation of transmart-batch on how to set-up and upload your data. Your data will be automatically uploaded if you zip a full transmart-batch structure (example) as "STUDYID.zip" (where STUDYID is the actual ID) and copy it directly into the named docker volumne "tm_opt" from your host system to the folder. This folder can normally be found at:
/var/lib/docker/volumes/tm_umg_tm_opt/_data/data/incoming
Logs of the upload process are directly forwarded to docker-compose ("docker-compose logs") or can be viewed in the same volumne at "/var/lib/docker/volumes/tm_umg_tm_opt/_data/data/logs".
Loading public datasets
Note: If you plan on copying an existing tranSMART database to your new docker-based one, please do this first, it is explained in the next section.
You can use the tmbatch
image to load data to the database through transmart-batch. The easiest way of loading public datasets is using the pre-curated library hosted by the tranSMART foundation. For more information, please read their wiki page.
Copy data from an existing instance
psql dump
If you have an existing instance of tranSMART running, you may want to copy the database to your new dockerized instance. It is best you do this to an empty, but initialized tranSMART database, since everything will be copied, including things like sequence values. The most portable way of copying is using pg_dump
to dump all data from the old database in the form of attribute inserts, and use this file to load data into the new database. Using the --attribute-inserts
option ensures that a single failed insertion (e.g. a row that exists in the new database, like the definition of the admin user) does not cause the whole table not to be loaded. It also guards against minor schema changes, such as a column with default value that was added to an existing table. On the host where the old database resides, log in as the postgres
user (or any other means that allows you access to the database) and execute the following:
pg_dump -a --disable-triggers --attribute-inserts transmart | gzip > tmdump.sql.gz
Depending on the size of your database, this can take some time. When the command is finished, you will have a file called tmdump.sql.gz
. This is the compressed file containing all SQL statements necessary to restore your database. Copy this file to the host running the transmart-db
container. The default configuration exposes port 5432 of the container to localhost, so you should be able to connect to it. Use the following command to unzip the file and immediately send the SQL commands to the database:
zcat tmdump.sql.gz | psql -h 127.0.0.1 -U <user> transmart
You will be asked for the password, which is the one from the .env file. After the command finishes, you should have all your old data in your new tranSMART server!
docker volumes
Another way to copy old data is to copy whole docker volumes. The password has to be the same as in the old installation. The following volumes have to be copied:
- tm_postgresconfig
- tm_postgresdata
- tm_postgreslogs
Create new volumes from with current docker-compose.yml. Write down the owner of the files inside the new postgres volumes (see above) and then replace them with the old ones.
Inside the volumes, ownership has to be set changed:
chown -R new_user:new_group tmumg_tm_postgres*/_data/*
And this link can be lost in the copying process, but needs to be existend:
/var/lib/docker/volumes/tmumg_tm_postgresdatad/_data/9.3/main/pg_tblspc# cp -d * /var/lib/docker/volumes/tmumg_tm_postgresdata/_data/9.3/main/pg_tblspc
Database changes
Listing all available studies with the view i2b2metadata.i2b2_trial_nodes has been very slow:
SELECT DISTINCT ON (i2b2.c_comment) i2b2.c_fullname,
"substring"(i2b2.c_comment, 7) AS trial
FROM i2b2metadata.i2b2
WHERE i2b2.c_comment IS NOT NULL
ORDER BY i2b2.c_comment, char_length(i2b2.c_fullname::text);
Instead of ordering by length of c_fullname this view now utilizes the c_hlevel column to directly get a studies top level (c_visualattributes could also be used).
CREATE OR REPLACE VIEW i2b2metadata.i2b2_trial_nodes AS
SELECT DISTINCT ON (i2b2.c_comment) i2b2.c_fullname,
"substring"(i2b2.c_comment, 7) AS trial
FROM i2b2metadata.i2b2
WHERE i2b2.c_comment IS NOT NULL AND i2b2.c_hlevel = 1::numeric
ORDER BY i2b2.c_comment;
Helpful commands
reload webserver config
docker-compose exec letsencrypt nginx -s reload
restart database
docker-compose exec tmdb /etc/init.d/postgresql restart
direct datebase access
docker-compose exec tmdb psql
remove a study from tranSMART (replace both STUDYX occurrences)
docker-compose exec tmbatch touch /tmp/backout.params && docker-compose exec tmbatch /opt/git/transmart-batch/transmart-batch.jar -c /opt/git/transmart-batch/batchdb.properties -n -p /tmp/backout.params -d TOP_NODE="\Internal Studies\STUDYX\\" -d STUDY_ID=STUDYX