... | ... | @@ -11,10 +11,18 @@ The user can select different pre-specified model structures, including |
|
|
by setting the `model` parameter of `gsem.fit()` to `Cholesky`, `Independent`, `IPC` or `Common` respectively. `grmsem` fits, like GREML, all available data to the model. Each model can be adapted by the user by setting free parameters and starting values. Note that the likelihood for ill-specified models may not necessarily reach the global maximum and the model fit should be confirmed using different starting values. Although k, the number of different phenotypes is not restricted, in principle, computational demands will typically set a limit based on k x n \~ 30,000 for Cholesky decomposition models, where n is the number of observations per trait; models using less parameters can handle larger k x n.
|
|
|
|
|
|
# Installation instructions
|
|
|
can be found [here](Installation).
|
|
|
The calculations performed by the package `grmsem` are computationally very demanding. Hence it is highly recommended to optimise the R software prior to installing the package. This can be done within a Linux environment using OpenBLAS (an optimised Basic Linear Algebra Subsystem library), ATLAS (Automatically Tuned Linear Algebra Software) or the Intel MKL (Math Kernel Library) to improve the performance of basic vector and matrix operations (see [here](https://csantill.github.io/RPerformanceWBLAS/) for further information).
|
|
|
|
|
|
The package itself can be installed by standard commands, using either of the two options:
|
|
|
|
|
|
* `install.packages("grmsem")` to obtain the newest release from CRAN
|
|
|
* `devtools::install_git('https://gitlab.gwdg.de/beate.stpourcain/grmsem')` for the current development version.
|
|
|
|
|
|
All data sets can be downloaded from here: `https://gitlab.gwdg.de/beate.stpourcain/grmsem_external`.
|
|
|
`grmsem` should be run in parallel by setting the `cores` parameter of `gsem.fit()`. Empirically, good performance was achieved by setting `cores=4`, while sharing memory across as many cores as possible. For this, the entire node should be blocked so that memory across all cores is available for the job (depending on the system ranging usually between 8-24 cores). For example, the fit of a Cholesky model (12 parameters) to simulated trivariate data with 5000 observations per trait and 20,000 SNPs per genetic factor, requires 1h40min using 4 cores, sharing memory across 24 cores with vmem max of 6.9 Gb, using R MKL 3.6.3.
|
|
|
|
|
|
# Examples
|
|
|
Multivariate models with `grmsem` are time-consuming, especially with large numbers of observations per trait. To illustrate the functionality of `grmsem`, we carried out several analyses using a range of different [data sets](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external), as described in detail in the vignette. An example of a large data set, with a defined genetic architecture but high run-time, is shown here.
|
|
|
To illustrate the functionality of `grmsem`, we carried out several analyses using a range of different [data sets](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external), as described in detail in the vignette. An example of a large data set, with a defined genetic architecture but high run-time, is shown here.
|
|
|
|
|
|
**Quad-variate Cholesky decomposition model**
|
|
|
|
... | ... | |