grmsem package is an open-source quantitative genetics tool that supports the modelling of multivariate genetic and residual covariance structures in samples of unrelated individuals with genome-wide genotyping information.
grmsem allows fitting different models describing the underlying multivariate genetic architecture of quantitative traits, as captured by a genetic-relationship-matrix (GRM), using structural equation modelling (SEM) techniques and a maximum likelihood approach. Analogous to twin models, the
grmsem package includes multiple models, such as a
Cholesky decomposition model, an
Independent Pathway model and the
Direct Symmetric model, but also novel models such as a hybrid
Independent Pathway / Cholesky model. A general form of these models can be automatically fitted. The user can adapt each model by changing the pre-fitted parameters. All estimates can be obtained in standardised form. Follow-up analyses include estimations of genetic correlations, bivariate heritabilities and factorial co-heritabilities.
grmsem replaces the package
gsem, presented in , because an unrelated package with the same name had been released simultaneously.
The user can select pre-specified model structures, including the models
by setting the
model option of
IPC respectively. In addition, the Cholesky model can be re-parametrised as
- Direct Symmetric (DS)
model, estimating genetic and residual covariances directly, using the
grmsem fits, like GREML, all available data to the model. Each model can be adapted by the user by setting free parameters and starting values.
Download and installation
The calculations performed by the package
grmsem are computationally demanding. Hence it is highly recommended to optimise the R software prior to installing the package. This can be done within a Linux environment using OpenBLAS (an optimised Basic Linear Algebra Subsystem library), ATLAS (Automatically Tuned Linear Algebra Software) or the Intel MKL (Math Kernel Library) to improve the performance of basic vector and matrix operations (see here for further information). The package itself can be installed by standard commands, using either of the two options:
install.packages("grmsem")(latest CRAN release)
grmsem should be run in parallel by setting the
cores option of
grmsem.fit(). Empirically, good performance was achieved with
cores=4, while sharing memory across as many cores as possible. For this, the entire node should be blocked so that memory across all cores is available for the job (depending on the system ranging usually between 8-24 cores).
Run times and memory requirements for different examples are detailed below.
small data set is already included in the package. All data sets used in the vignette can be downloaded here.
Genetic relationship matrix
A Genetic Relationship Matrix (GRM) is a symmetric matrix with entries representing the (standardized) number of mutually shared alleles among individuals of a sample. A GRM, consisting of pairs of unrelated individuals (relatedness cut-off $\leq$ 0.05) with genome-wide information, can be estimated using PLINK or GCTA software, of which the lower triangle elements are saved in two different forms:
The lower triangle GRM elements are generated and saved with the GCTA
grm.gzfiles have no header and contain four columns: indices of pairs of individuals (column 1,2; corresponding to row numbers of grm.id files), number of non-missing SNPs (column 3) and the estimate of genetic relatedness (column 4). The compressed GRM file (
grm.gz) can be imported using the
grm.input()function, specifying the file name in the working directory, and will be returned as symmetric matrix. An example is shown below:
> G <- grm.input("large.gcta.grm.gz") > G[1:3,1:3] #Relationships among the first three individuals [,1] [,2] [,3] [1,] 0.99354762 0.02328514 0.01644197 [2,] 0.02328514 0.99406837 0.01021175 [3,] 0.01644197 0.01021175 1.02751472
The lower triangle GRM elements are saved in binary form using the GCTA
--make-grm-bincommand. The binary
grm.binfile can be imported using the
grm.bin.input()function, specifying the file name in the working directory, and will be returned as symmetric matrix.
Z-standardised scores are required in form of tables (data frames) with each column representing a different phenotype. The number of columns determines the number of phenotypes k in the model. The observations must be in the same order as the individuals in the columns/rows of the GRM matrix. This order is shown, for example, in the GCTA .id file (2-column file: family ID, individual ID). An example of a quad-variate phenotype is shown below:
> load("ph.large.RData") > ph.large[1:2,] Y1 Y2 Y3 Y4 [1,] -0.7640819 -0.6016908 -0.3981901 -0.3169821 [2,] -0.5099606 0.6671311 -1.3119328 -0.5601261
To illustrate the functionality of
grmsem, we carried out several analyses using a range of different data sets, as described in detail in the vignette.
Quad-variate Cholesky decomposition model
An example of a large data set, with a defined genetic architecture but high run-time, is shown below, base on the files,
ph.large.RData and a pre-fitted output model
- Data simulation
- Model fit and output
- Standardisation of parameters
- Factorial co-heritabilities and environmentalities
- Bivariate heritabilities
Comparison of bi-variate GREML and GRMSEM estimates
The software gcta can be used to estimate bivariate genetic correlations based on variance/covariance estimates for two traits. Transforming the quad-variate simulated data
large grmsem data above into GCTA format (
large.gcta.grm.id), the bivariate model estimates from
grmsem analyses can be compared, here shown for traits Y1 and Y2.
- St Pourcain, B. et al. Developmental Changes Within the Genetic Architecture of Social Communication Behavior: A Multivariate Study of Genetic Variance in Unrelated Individuals. Biological Psychiatry 83, 598–606 (2018). doi:10.1016/j.biopsych.2017.09.020
- Shapland, C. Y. et al. The Multivariate Genetic Architecture of Literacy-, Language- and Working Memory-related Abilities as Captured by Genome-wide Variation. bioRxiv 2020.08.14.251199 (2020). doi:10.1101/2020.08.14.251199