  Review Article
  Published:

Classical twin studies and beyond

Key Points

  • Combining twin studies with state-of-the-art molecular genetics results in a powerful approach to the genetics of complex traits. Large twin registers are available in various countries, offering collaborative opportunities for studies of medical, as well as psychological, traits.

  • The study of complex multifactorial traits and diseases in humans is facing various problems, including ascertainment bias, the interaction of genes and the environment, and the change of traits over time. Twin studies can address many of these problems in a systematic manner.

  • Beyond the classical estimation of the heritability of single traits, twin studies might now use multivariate techniques to analyse co-morbidity between diseases, to incorporate genetic and environmental covariates, and to estimate linkage as a sub-fraction of the total heritability.

  • Various twin-study designs are described, including twin-based case–control studies and the use of twins in molecular-genetic studies.

  • Some of the basics of setting up and running a twin register are provided, together with key information on how to contact groups experienced in twin research.


Twin studies have been a valuable source of information about the genetic basis of complex traits. To maximize the potential of twin studies, large, worldwide registers of data on twins and their relatives have been established. Here, we provide an overview of the current resources for twin research. These can be used to obtain insights into the genetic epidemiology of complex traits and diseases, to study the interaction of genotype with sex, age and lifestyle factors, and to study the causes of co-morbidity between traits and diseases. Because of their design, these registers offer unique opportunities for selected sampling for quantitative trait loci linkage and association studies.

Figure 1: Velvet twins.
Figure 2: Examples of results from classical twin analysis.
Figure 3: Twin discordance.

We thank J. F. Orlebeke, A. L. Beem, J. M. Vink, J. J. Hudziak and N. G. Martin for their contributions to this paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Dorret Boomsma.

attention-deficit hyperactivity disorder


Crohn disease

Van der Woude syndrome


International Society of Twin Studies



A systematic distortion in measuring the true frequency of a phenomenon, such as a trait or a disease, owing to the way in which the data are obtained.


(QTL). Genetic locus or chromosomal region that contributes to the variability in complex quantitative traits, as identified by statistical analysis. Quantitative traits are typically affected by several genes and the environment.


(LD). The condition in which the frequency of a particular haplotype for two loci is significantly greater than that expected from the product of the observed allelic frequencies at each locus.


A study in which repeated measurements are taken from the same subjects at different time points.


(G × E). The influence of specific combinations of genetic and environmental factors on a trait that goes beyond the additive action of these factors. It refers to genes that control sensitivity to the environment, or the environment that controls gene expression.


The simultaneous inclusion of two or more (dependent) variables in one analysis, for example, in estimating the genetic correlation of birth weight with blood pressure.


In a multivariate analysis, a variable with known effects that is used to test the effect of the main variables that are independent of those known effects. The inclusion of age in studies of age-dependent traits is a simple example.


The proportion of the total phenotypic variation in a given characteristic that can be attributed to additive genetic effects. In the broad sense, heritability involves all additive and non-additive genetic variance, whereas in the narrow sense, it involves only additive genetic variance.


The occurrence of the same trait in both members of a pair of twins. Concordance might occur for diseases as well as for behaviours, such as smoking.


(ANOVA). A statistical method to test the null hypothesis that the mean values of two or more groups are equal. The variance around the means in groups is compared with the variance of the group means. In genetic applications, the variance between families is compared with the variance within families. A significant F-ratio implies that variance between families is larger than within families.


A statistical measure for the strength and direction of resemblance between two variables (or two family members). It can vary between −1 and +1. Intra-class correlation refers to the correlation in defined subgroups — for example, in monozygotic or dizygotic pairs — and can be derived from ANOVA as t = (F − 1)/(F + 1).


(SEM). Also known as covariance modelling. A method that estimates regression coefficients ('parameters') between latent (unobserved) and observed variables. These estimates minimize the difference between the covariance structure of the observed data and that predicted by the model. Alternative models (such as family resemblance being due to shared genes versus shared environment) can be compared by how well they fit the data and by the number of parameters estimated.


A group of genes that influence a complex trait. In contrast to monogenic traits, most traits and diseases are influenced by several genes, only a sum of which is sufficient to cause the effect.


Linear regression is a statistical method to test and to describe the linear relationship between two or more variables. The regression coefficient describes the angle of the regression line and reflects the amount of variance of the dependent variable that is explained by variation of the independent variable.


(ML). A statistical method that works by varying the estimates for parameters of a model, so that the likelihood of the observed data points is maximized. Under normal theory, the likelihood corresponds to the height of the normal curve (one variable) and to the height of the multivariate normal probability density function for two or more variables.


(WLS). An alternative method for estimating parameters during model fitting. The square of the difference between the observed statistic (for example, mean or covariance) and the statistic that is predicted by the theoretical model is weighted and minimized. Weights are usually chosen to correspond to the accuracy of the observed statistics.


A sub-scale of the psychological trait 'sensation seeking', including items that describe experiences or attitudes that relate to sensation seeking through other exciting people, disinhibited or 'wild' parties, social drinking and sexual variety.


The physiological traits that are related to a disease trait; for example, for hypertension this could include blood pressure, angiotensin levels or salt sensitivity.


The proportion of affected individuals among the carriers of a particular genotype. If all individuals with a disease genotype show the disease phenotype, then the disease is said to be completely penetrant.

Boomsma, D., Busjahn, A. & Peltonen, L. Classical twin studies and beyond. Nat Rev Genet 3, 872–882 (2002).

