Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
RESEARCH ARTICLE
Fast and flexible linear mixed models for
genome-wide genetics
Abstract
Linear mixed effect models are powerful tools used to account for population structure in
genome-wide association studies (GWASs) and estimate the genetic architecture of com-
plex traits. However, fully-specified models are computationally demanding and common
simplifications often lead to reduced power or biased inference. We describe Grid-LMM
, an extendable algorithm for repeatedly fitting com-
plex linear models that account for multiple sources of heterogeneity, such as additive and
non-additive genetic variance, spatial heterogeneity, and genotype-environment interac-
tions. Grid-LMM can compute approximate (yet highly accurate) frequentist test statistics
or Bayesian posterior summaries at a genome-wide scale in a fraction of the time compared
to existing general-purpose methods. We apply Grid-LMM to two types of quantitative
genetic analyses. The first is focused on accounting for spatial variability and non-additive
genetic variance while scanning for QTL; and the second aims to identify gene expression
traits affected by non-additive genetic variation. In both cases, modeling multiple sources of
heterogeneity leads to new discoveries.
Author summary
The goal of quantitative genetics is to characterize the relationship between genetic varia-
tion and variation in quantitative traits such as height, productivity, or disease susceptibil-
ity. A statistical method known as the linear mixed effect model has been critical to the
development of quantitative genetics. First applied to animal breeding, this model now
forms the basis of a wide-range of modern genomic analyses including genome-wide asso-
ciations, polygenic modeling, and genomic prediction. The same model is also widely
used in ecology, evolutionary genetics, social sciences, and many other fields. Mixed mod-
els are frequently multi-faceted, which is necessary for accurately modeling data that is
generated from complex experimental designs. However, most genomic applications use
only the simplest form of linear mixed methods because the computational demands for
model fitting can be too great. We develop a flexible approach for fitting linear mixed
models to genome scale data that greatly reduces their computational burden and pro-
vides flexibility for users to choose the best statistical paradigm for their data analysis. We
demonstrate improved accuracy for genetic association tests, increased power to discover
PLOS Genetics
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Runcie DE, Crawford L (2019) Fast and
flexible linear mixed models for genome-wide
genetics. PLoS Genet 15(2): e1007978. https://doi.
org/10.1371/journal.pgen.1007978
Editor: Michael P. Epstein, Emory University,
UNITED STATES
Received: September 7, 2018
Accepted: January 21, 2019
Published: February 8, 2019
Copyright: © 2019 Runcie, Crawford. This is an
open access article distributed under the terms of
the Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: In this study, we
used published data from three sources as
examples to demonstrate our methods.
Arabidopsis phenotype and genotype data on 199
accessions was accessed from: https://github.com/
Gregor-Mendel-Institute/atpolydb/wiki. Body mass
and genotype data of the heterogeneous stock
mice from the Wellcome Trust Centre for Human
Genetics were accessed through the R package
BGLR. Arabidopsis gene expression data from the
1001 genomes project was accessed from NCBI
GEO, accession GSE80744.
causal genetic variants, and the ability to provide accurate summaries of model uncer-
tainty using both simulated and real data examples.
Introduction
Population stratification, genetic relatedness, ascertainment, and other sources of heterogene-
ity lead to spurious signals and reduced power in genetic association studies [1–5]. When not
properly taken into account, non-additive genetic effects and environmental variation can also
bias estimates of heritability, polygenic adaptation, and genetic values in breeding programs
[5–8]. Both issues are caused by departures from a key assumption underlying linear models
that observations are independent. Non-independent samples lead to a form of pseudo-repli-
cation, effectively reducing the true sample size. Linear mixed effect models (LMMs) are
widely used to account for non-independent samples in quantitative genetics [9]. The flexibil-
ity and interpretability of LMMs make them a dominant statistical tool in much of biological
research [9–18].