... | @@ -8,18 +8,18 @@ The user can select different pre-specified model structures, including |
... | @@ -8,18 +8,18 @@ The user can select different pre-specified model structures, including |
|
* [Combined independent pathway and Cholesky model](IPC model)
|
|
* [Combined independent pathway and Cholesky model](IPC model)
|
|
* [Common pathway model](Common pathway model)
|
|
* [Common pathway model](Common pathway model)
|
|
|
|
|
|
by setting the `model` parameter of `gsem.fit()` to `Cholesky`, `Independent`, `IPC` or `Common` respectively. `grmsem` fits, like GREML, all available data to the model. Each model can be adapted by the user by setting free parameters and starting values. Note that the likelihood for ill-specified models may not necessarily reach the global maximum and the model fit should be confirmed using different starting values. Although k, the number of different phenotypes is not restricted, in principle, computational demands will typically set a limit based on k x n \~ 30,000 for Cholesky decomposition models, where n is the number of observations per trait; models using less parameters can handle larger k x n.
|
|
by setting the `model` parameter of `grmsem.fit()` to `Cholesky`, `Independent`, `IPC` or `Common` respectively. grmsem fits, like GREML, all available data to the model. Each model can be adapted by the user by setting free parameters and starting values. Note that the likelihood for ill-specified models may not necessarily reach the global maximum and the model fit should be confirmed using different starting values. Although k, the number of different phenotypes is not restricted, in principle, computational demands will typically set a limit based on k x n \~ 30,000 for Cholesky decomposition models, where n is the number of observations per trait; models using less parameters can handle larger k x n.
|
|
|
|
|
|
# Download and installation
|
|
# Download and installation
|
|
## Package
|
|
## Package
|
|
The calculations performed by the package `grmsem` are computationally very demanding. Hence it is highly recommended to optimise the R software prior to installing the package. This can be done within a Linux environment using OpenBLAS (an optimised Basic Linear Algebra Subsystem library), ATLAS (Automatically Tuned Linear Algebra Software) or the Intel MKL (Math Kernel Library) to improve the performance of basic vector and matrix operations (see [here](https://csantill.github.io/RPerformanceWBLAS/) for further information).
|
|
The calculations performed by the package grmsem are computationally very demanding. Hence it is highly recommended to optimise the R software prior to installing the package. This can be done within a Linux environment using OpenBLAS (an optimised Basic Linear Algebra Subsystem library), ATLAS (Automatically Tuned Linear Algebra Software) or the Intel MKL (Math Kernel Library) to improve the performance of basic vector and matrix operations (see [here](https://csantill.github.io/RPerformanceWBLAS/) for further information).
|
|
|
|
|
|
The package itself can be installed by standard commands, using either of the two options:
|
|
The package itself can be installed by standard commands, using either of the two options:
|
|
|
|
|
|
* `install.packages("grmsem")` to obtain the newest release from CRAN
|
|
* `install.packages("grmsem")` (latest CRAN release)
|
|
* `devtools::install_git('https://gitlab.gwdg.de/beate.stpourcain/grmsem')` for the current development version.
|
|
* `devtools::install_git('https://gitlab.gwdg.de/beate.stpourcain/grmsem')` (development version)
|
|
|
|
|
|
`grmsem` should be run in parallel by setting the `cores` parameter of `gsem.fit()`. Empirically, good performance was achieved by setting `cores=4`, while sharing memory across as many cores as possible. For this, the entire node should be blocked so that memory across all cores is available for the job (depending on the system ranging usually between 8-24 cores). For example, the fit of a Cholesky model (12 parameters) to simulated trivariate data with 5000 observations per trait and 20,000 SNPs per genetic factor, requires 1h40min using 4 cores, sharing memory across 24 cores with vmem max of 6.9 Gb, using R MKL 3.6.3.
|
|
grmsem estimations should be run in parallel by setting the `cores` parameter of `grmsem.fit()`. Empirically, good performance can be achieved by setting `cores=4`, while sharing memory across as many cores as possible. For this, the entire node should be blocked so that memory across all cores is available for the job (depending on the system ranging usually between 8-24 cores). For example, the fit of a Cholesky model (12 parameters) to simulated trivariate data with 5000 observations per trait and 20,000 SNPs per genetic factor, requires 1h40min using 4 cores, sharing memory across 24 cores with vmem max of 6.9 Gb, using R MKL 3.6.3.
|
|
|
|
|
|
## Data sets
|
|
## Data sets
|
|
All data sets used in the vignette and this wiki can be downloaded from here:
|
|
All data sets used in the vignette and this wiki can be downloaded from here:
|
... | @@ -34,10 +34,10 @@ Note that small data sets are already included in the package. |
... | @@ -34,10 +34,10 @@ Note that small data sets are already included in the package. |
|
A GRM consisting of pairs of unrelated individuals (with a 0.05 relatedness cut-off or lower) with genome-wide genotyping information can be estimated using [PLINK](https://www.cog-genomics.org/plink2) or [GCTA](https://cnsgenomics.com/software/gcta/#Overview) software. The lower triangle elements of the GRM can be saved in two different forms:
|
|
A GRM consisting of pairs of unrelated individuals (with a 0.05 relatedness cut-off or lower) with genome-wide genotyping information can be estimated using [PLINK](https://www.cog-genomics.org/plink2) or [GCTA](https://cnsgenomics.com/software/gcta/#Overview) software. The lower triangle elements of the GRM can be saved in two different forms:
|
|
|
|
|
|
* **grm.gz files:**
|
|
* **grm.gz files:**
|
|
These files have no header line and contain three columns, which are indices of pairs of individuals, number of non-missing SNPs and the estimate of genetic relatedness. The files are generated with the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-gz` command. The unzipped GRM file can be imported into `grmsem` using the `grm.input.R()` function.
|
|
These files have no header line and contain three columns, which are indices of pairs of individuals, number of non-missing SNPs and the estimate of genetic relatedness. The files are generated with the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-gz` command. The unzipped GRM file can be imported into grmsem using the `grm.input.R()` function.
|
|
|
|
|
|
* **grm-bin files:**
|
|
* **grm-bin files:**
|
|
The GRM is saved in binary form and generated with the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-bin` command. The binary `grm-bin` file can be imported into `grmsem` using the `grm.bin.input.R()` function.
|
|
The GRM is saved in binary form and generated with the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-bin` command. The binary `grm-bin` file can be imported into grmsem using the `grm.bin.input.R()` function.
|
|
|
|
|
|
## Phenotype files
|
|
## Phenotype files
|
|
**Z-standardised** phenotype files are created in wide-format and need to be imported with individual observations **in the order** of the constructed GRM matrix, as given, for example, in [GCTA](https://cnsgenomics.com/software/gcta/#Overview) grm.id files (2-column file: family ID, individual ID).
|
|
**Z-standardised** phenotype files are created in wide-format and need to be imported with individual observations **in the order** of the constructed GRM matrix, as given, for example, in [GCTA](https://cnsgenomics.com/software/gcta/#Overview) grm.id files (2-column file: family ID, individual ID).
|
... | @@ -69,7 +69,7 @@ An example of a [large data set](https://gitlab.gwdg.de/beate.stpourcain/grmsem_ |
... | @@ -69,7 +69,7 @@ An example of a [large data set](https://gitlab.gwdg.de/beate.stpourcain/grmsem_ |
|
* [Bivariate heritabilities](Bivariate heritabilities)
|
|
* [Bivariate heritabilities](Bivariate heritabilities)
|
|
|
|
|
|
## Comparison of bi-variate GCTA and GRMSEM estimates
|
|
## Comparison of bi-variate GCTA and GRMSEM estimates
|
|
The software [gcta](https://cnsgenomics.com/software/gcta/) [@Yang2011] can be used to estimate bivariate genetic correlations based on variance/covariance estimates for two traits. Transforming the quad-variate simulated data `large` gsem data above into GCTA format (`large.gcta.grm.gz`,`large.gcta.phe`, `large.gcta.grm.id`), the bivariate model estimates from `GREML` and `grmsem` analyses can be compared, [here shown for traits Y1 and Y2](GCTA GSEM comparison).
|
|
The software [gcta](https://cnsgenomics.com/software/gcta/) [@Yang2011] can be used to estimate bivariate genetic correlations based on variance/covariance estimates for two traits. Transforming the quad-variate simulated data `large` grmsem data above into GCTA format (`large.gcta.grm.gz`,`large.gcta.phe`, `large.gcta.grm.id`), the bivariate model estimates from `GREML` and `grmsem` analyses can be compared, [here shown for traits Y1 and Y2](GCTA GRMSEM comparison).
|
|
|
|
|
|
# References
|
|
# References
|
|
|
|
|
... | | ... | |