|
|
# Synopsis
|
|
|
The `grmsem` package is an open-source quantitative genetics tool that supports the modelling of multivariate genetic and residual covariance structures in samples of unrelated individuals with genome-wide genotyping information. `grmsem` allows fitting different models describing the underlying multivariate genetic architecture of quantitative traits, as captured by a genetic-relationship-matrix (GRM), using structural equation modelling (SEM) techniques and a maximum likelihood approach. Analogous to twin models, the `grmsem` package includes multiple models, such as a `Cholesky decomposition` model, an `Independent Pathway` model and the `Direct Symmetric` model, but also novel models such as a hybrid `Independent Pathway / Cholesky` model. A general form of these models can be automatically fitted. The user can adapt each model by changing the pre-fitted parameters. All estimates can be obtained in standardised form. Follow-up analyses include estimations of genetic correlations, bivariate heritabilities and factorial co-heritabilities. `grmsem` replaces the package `gsem`, presented in [1], because an unrelated package with the same name had been released simultaneously.
|
|
|
The `grmsem` package is an open-source quantitative genetics tool that supports the modelling of multivariate genetic and residual covariance structures in samples of unrelated individuals with genome-wide genotyping information. `grmsem` allows fitting different models describing the underlying multivariate genetic architecture of quantitative traits, as captured by a genetic-relationship-matrix (GRM), using structural equation modelling (SEM) techniques and a maximum likelihood approach. Analogous to twin models, the `grmsem` package includes multiple models, such as a `Cholesky decomposition` model, an `Independent Pathway` model and the `Direct Symmetric` model, but also novel models such as a hybrid `Independent Pathway / Cholesky` model. A general form of these models can be automatically fitted. The user can adapt each model by changing the pre-fitted parameters. All estimates can be obtained in a standardised form. Follow-up analyses include estimations of genetic correlations, bivariate heritabilities and factorial co-heritabilities. `grmsem` replaces the package `gsem`, presented in [1], because an unrelated package with the same name had been released simultaneously.
|
|
|
|
|
|
The user can select pre-specified model structures, including the models
|
|
|
|
... | ... | @@ -11,7 +11,7 @@ by setting the `model` option of `grmsem.fit()` to `Cholesky`, `IP` or `IPC` res |
|
|
|
|
|
- Direct Symmetric (DS)
|
|
|
|
|
|
model, estimating genetic and residual covariances directly, using the `model` option `DS`. `grmsem` fits, like GREML, all available data to the model. Each model can be adapted by the user by setting free parameters and starting values.
|
|
|
model, estimating genetic and residual covariances directly, using the `model` option `DS`. `grmsem` fits, like GREML, all available data to the model. Each model can be adapted by the user by setting the vector of fitted parameters.
|
|
|
|
|
|
# Download and installation
|
|
|
## Package
|
... | ... | @@ -20,7 +20,7 @@ The calculations performed by the package `grmsem` are computationally demanding |
|
|
* `install.packages("grmsem")` (latest CRAN release)
|
|
|
* `devtools::install_git('https://gitlab.gwdg.de/beate.stpourcain/grmsem')` (development version)
|
|
|
|
|
|
`grmsem` should be run in parallel by setting the `cores` option of `grmsem.fit()`. Empirically, good performance was achieved with `cores=4`, while sharing memory across as many cores as possible. For this, the entire node should be blocked so that memory across all cores is available for the job (depending on the system ranging usually between 8-24 cores).
|
|
|
`grmsem` should be run in parallel by setting the `cores` option of `grmsem.fit()`. Empirically, good performance was achieved with `cores=4`, while sharing memory across as many cores as possible. For this, the entire node should be blocked, so that memory across all cores is available for the job (depending on the system usually ranging between 8-24 cores).
|
|
|
Run times and memory requirements for different examples are detailed below.
|
|
|
|
|
|
## Data sets
|
... | ... | @@ -31,7 +31,7 @@ The `small` data set is already included in the package. All data sets used in t |
|
|
A Genetic Relationship Matrix (GRM) is a symmetric matrix with entries representing the (standardized) number of mutually shared alleles among individuals of a sample. A GRM, consisting of pairs of unrelated individuals (relatedness cut-off $\leq$ 0.05) with genome-wide information, can be estimated using [PLINK](https://www.cog-genomics.org/plink2) or [GCTA](https://cnsgenomics.com/software/gcta/#Overview) software, of which the lower triangle elements are saved in two different forms:
|
|
|
|
|
|
* **grm.gz files:**
|
|
|
The lower triangle GRM elements are generated and saved with the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-gz` command. `grm.gz` files have no header and contain four columns: indices of pairs of individuals (column 1,2; corresponding to row numbers of grm.id files), number of non-missing SNPs (column 3) and the estimate of genetic relatedness (column 4). The compressed GRM file (`grm.gz`) can be imported using the `grm.input()` function, specifying the file name in the working directory, and will be returned as symmetric matrix. An example is shown below:
|
|
|
The lower triangle GRM elements are generated and saved with the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-gz` command. `grm.gz` files have no header and contain four columns: indices of pairs of individuals (columns 1,2; corresponding to row numbers of grm.id files), number of non-missing SNPs (column 3) and the estimate of genetic relatedness (column 4). The compressed GRM file (`grm.gz`) can be imported using the `grm.input()` function, specifying the file name in the working directory, and will be returned as a symmetric matrix. An example is shown below:
|
|
|
|
|
|
```{r eval = FALSE}
|
|
|
> G <- grm.input("large.gcta.grm.gz")
|
... | ... | @@ -43,7 +43,7 @@ The lower triangle GRM elements are generated and saved with the [GCTA](https:// |
|
|
```
|
|
|
|
|
|
* **grm-bin files:**
|
|
|
The lower triangle GRM elements are saved in binary form using the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-bin` command. The binary `grm.bin` file can be imported using the `grm.bin.input()` function, specifying the file name in the working directory, and will be returned as symmetric matrix.
|
|
|
The lower triangle GRM elements are saved in binary form using the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) `--make-grm-bin` command. The binary `grm.bin` file can be imported using the `grm.bin.input()` function, specifying the file name in the working directory, and will be returned as a symmetric matrix.
|
|
|
|
|
|
## Phenotype file
|
|
|
**Z-standardised** scores are required in form of tables (data frames) with each column representing a different phenotype. The number of columns determines the number of phenotypes **k** in the model. The observations must be **in the same order** as the individuals in the columns/rows of the GRM matrix. This order is shown, for example, in the [GCTA](https://cnsgenomics.com/software/gcta/#Overview) .id file (2-column file: family ID, individual ID). An example of a quad-variate phenotype is shown below:
|
... | ... | @@ -60,7 +60,7 @@ The lower triangle GRM elements are saved in binary form using the [GCTA](https: |
|
|
To illustrate the functionality of `grmsem`, we carried out several analyses using a range of different [data sets](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external), as described in detail in the vignette.
|
|
|
|
|
|
## Quad-variate Cholesky decomposition model
|
|
|
An example of a [large data set](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external), with a defined genetic architecture but high run-time, is shown below, base on the files, `G.large.RData`, `ph.large.RData` and a pre-fitted output model `fit.large.RData`.
|
|
|
An example of a [large data set](https://gitlab.gwdg.de/beate.stpourcain/grmsem_external), with a defined genetic architecture but high run-time, is shown below, based on the files, `G.large.RData`, `ph.large.RData` and a pre-fitted output model `fit.large.RData`.
|
|
|
|
|
|
* [Data simulation](Data simulation)
|
|
|
* [Model fit and output](Model fit and output)
|
... | ... | |