Twins Flashcards
When extracting the influence of d on a phenotype what are the mz and dz probabilities? I.e what do you need to multiply the non additive genetic effects by in a path diagram..
Mz =1
DZ = 1/4 (0.25)
What does concordance mean?
The probability that a pair of individuals will both have a certain trait (I.e twins have the same psychopathology)
What is the rule for model identification?
The number of unknown parameters must equal the number of predictive statistics (co variance MZ, covariance DZ, Vp)
What does Vp stand for?
Variance of phenotype
When would you decide to run an ADE model over an ACE model?
When the difference between the mz and dz correlations is more than half the mz correlations (I.e heritability estimates would be greater than the mz correlation -not possible)
(this shows that the effects can not be due to additive genetic influences but must be due to non additive genetic influence)
E.g rMZ = .6
rDZ =.25
What does ‘free’ mean in mx language?
= TRUE or FALSE and it refers to whether we are estimating these parameters or not.
What is a saturated model?
Most basic model to describe the data. It’s a perfectly fitting model as the covariances are treated as free parameters, so that their maximum likelihood estimates will be the sample covariances.
ACE model is compared to the saturated model (is it a better fit of the data or not?)
But there is a lot of noise!
Also used to check assumptions of an ACE model (equality of means, equality of variances, twin specific environment (twin-sib data), sex differences)
To obtain a baseline fit statistic (-2LL)
Explain a variance covariance matrix
The number on the diagonal refers to the variance of each trait and on the off diagonals (symmetrically) is the covariance a (between traits).
What is an identity matrix?
It has 1 on the diagonals and 0 on the off diagonals
Can be used to multiply with other matrices
What can the 2log likelihood do?
Can quantify differences between models with others
How could you tell a saturated model from an ACE model?
In an ACE model the variance for twin 1 and twin 2 will be the same but in a saturated model they are free to vary.
What is R?
R is a a programming language and a software environment
What is open MX?
It is a package in R
What does the library command do?
Makes packages available
The recycling rule means?
Refers to a situation where 2 vectors of different lengths are computed and the shorter vector is repeated until it matches the length of the longer vector.
What is a vector?
A sequence of data elements of the same basic type. Members in a vector are officially called components.
What is a matrix?
A matrix is a collection of data elements arranged in two-dimensional rectangular layout. Data must all be off the same type
What is a data frame?
Is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable and each row contains one case. (A case does not necessarily mean the same as an experimental subject). Data can be of different types.
“a” “b” “c” “d” “e” “f” “g” “h” “I” “j”
What letters does
C(2:5,8) refer to?
What’s does (4:8)
b c d e and h
d e f g h
When indexing how do you drop an element from the analysis? (E.g item 1 in a vector)
Z [-1]
How do you select elements from a vector? (E.g 1 & 4 & 7)
Z[c(1,4,7)]
What does c stand for in r programming language?
Concatenation (combine)
When indexing from a matrix what is the nemonic to help remember the order written or read?
Roman Catholic
rows then columns!
How to define a string variable in r?
Quotation marks
Quantitative genetics is based on what theory?
Biometric all genetic theory
What does quantitative genetics aim to do?
Infer a direct relationship between the observed variance of a trait to unobserved genetic, shared and unique environmental factors
How does quantitative genetics make predictions of the effects of underlying latent factors?
By using the relatedness between individuals (e.g MZ twins are clones of each other and DZ share on average half their segregating genes .5)
What three groups of participants do you get from an adoption design?
Genetic relatives
Environmental relatives
And genetic and environmental relatives
Define non additive genetic effects (dominance and epistaxis)
Dominance is an interaction between alleles at the same locus
Epistasis is an interaction on difference loci.
(Dominance does not contribute to the genetic covariance)
Which part of ACE contains error?
E
What two ways can phenotypic variance be decomposed into?
ACE
ADE
What is falconers formula for the correlations for MZ and DZ twins?
rMZ = A + C
rDZ = .5A + C
What are three assumptions of the twin model?
Equal environments
Genotype- environment effects (Random mating, no gxe interactions and no gxe correlations)
Generalisability
What does path analysis allow us to do?
Represent linear models for the relationship between variables in a diagram.
Makes it easy to derive expectations for the variances and covariance a of variables in terms of the parameters of the proposed linear model.
Also permits easy translation into matrix formulation as used by programs such as mx and open mx
In path analysis what do squares or rectangles represent?
Observed variables
In path analysis what do circles or ellipses denote?
Latent (unmeasured) variables
In path analysis what do upper-case letters denote?
Variables
In path analysis what do lower-case letters or numeric values denote?
Covariances or path coefficients
In path analysis what do single-headed arrows or paths (->) represent?
Hypothesised causal relationships where the variable at the tail is hypothesised to have a direct causal influence on the variable at the head.
In path analysis what do double headed arrows represent?
Covariance between two variables, which may arise from a common cause not represented in the model.
Double headed arrows may also be used to represent the variance of a variable.
What is meant by the term causal in path analysis?
The meaning of causal is the assumption that change in the variable at the tail of the arrow will result in change in the variable at the head of the arrow, with all other variables in the diagram held constant. The causal relationships represented by straight arrows are assumed to be linear.
In path analysis what are variables that do not receive causal input from any other variable called?
Independent, source or predictor variables. Or exogenous
In general, only independent variables are connected by double-headed arrows
In path analysis what are variables that do receive causal input from another variable called?
Dependent variables or endogenous variables
In path analysis where can single headed arrows be drawn from?
From independent to dependent variables and from dependent to dependent variables.
In path analysis what does omission of a two-headed arrow between two independent variables imply?
That the covariance of those variables is zero
In path analysis what does omission of a direct path from an independent or dependent variable to a dependent variable imply?
That there is no direct causal effect of the former on the latter variable.
In path tracing the covariance between any two variables can be calculated by?
Summing all legitimate chains connecting the variables
The numerical value of a chain is the product of all traces path coefficients in it.
In path tracing what are the three rules?
Trace backwards, then forwards, or simply forwards from one variable to another. NEVER forward then backwards. (Include double-headed arrows from the independent variable to itself, these variances will be 1 for latent variables).
Loops are not allowed, I.e. We can not trace twice through the same variable
There is a maximum of one curved arrow per path. So the double headed arrow from the independent variable to itself are included unless the chain includes another double headed arrow (e.g. A correlation path)
In path tracing what is the variance?
Since the variance of a variable is the covariance of the variable with itself, the expected variance will be the sum of all oaths from the variable to itself, which follow the path tracing rules.
A D C AND E are all what? (2 points)
Independent variables and parameter estimates.
Why in the classical twin model can only 3 model parameters be estimated at a time? (ACE OR ADE)
The number of parameters estimated cannot exceed the number of predictive statistics (covariance of MZ And DZ ANF Vp) - when 3 parameters are unknown the model is just identified.
Note back E must always be in the model as it contains error.
How can we estimate A C E AND D in one model?
By combing twin and adoption data it is possible to work them out
Cov(MZ) = a2 + d2 + c2
Cov (dz) = 1/2a2 + 1/4d2 + c2
Cov (adopsibs) = c2
Vp = a2 + d2 + c2 + e2
4 unknown parameters (a, c, d and e) and 4 predictive statistics.
How do you calculate the variance of a set of data?
- Calculate the mean
- Calculate each squared deviation (subtract the mean from each observation and square individually)
3 divided the sum of the squared deviations by (N-1)
How do you calculate the covariance?
- Calculate the mean for each variable
- Calculate each deviations (subtract the mean for variable 1 from each observation & do the same for variable 2)
- Multiply the deviations for variable 1 and variable 2
- Sum up all the multiplied deviations
- Divide sum of he multiplied deviations by (N-1)
How are ACE assumptions tested using a saturated model?
(Means across twins & zygosities and both & variances across twins and zygosities and both).
All tested by comparison of fit statistics
What does the saturated model do?
Allows us to test assumptions that underlie the ACE twin model
And equate means and variances across twin 1 and twin 2 and across MZ and me zygosities group
What is matrix algebra?
Branch of mathematics devoted to working with matrices
What is element-wise multiplication?
Requires two matrices of the same dimensions
Multiplies corresponding elements
Less common than the more complex matrix multiplications
What is another name for element-wise multiplication?
Dot product
What is the rule for matrix multiplication?
The number of columns in the 1st matrix must equal the number of rows in the 2nd matrix.
How do you determine the size of the matrix required after matrix multiplication?
The number of rows of matrix one by the number of columns of matrix two.
What is a cross-product
Used in the context of vector multiplication:
We have 2 vectors of the same length (same number of elements)
The cross-product (x) is the sum of the products of the elements e.g.
Vector 1 = { a b c} vector 2 = {d e f}
V1 x V2 = ad + be + c*f
Why is
Matrix multiplication not commutative?
Because A ** B does not equal B ** A
Things to remember about matrix multiplications
The product of a lower matrix and its transpose is symmetric
Not all matrices were made to multiply with one another
What do multivariate models want to discover?
The reason behind covariance/correlation between traits
Do the same genes/environments influence different traits?
As well as decomposing the variance of each trait, we can decompose the co-variance between two traits
Is a phenotypic correlation required for a genetic correlation to exist?
No
Explain the difference between a univariate and a bivariate decomposition of variance?
Using the example of bmi and waist
Univariate = bmi for twin 1 and bmi for twin 2
Bivariate = bmi for twin 1 and waist in twin 2
Cross twin- cross trait correlations
What do the cross-twin covariance in combination with the variance enable us to calculate?
A C and E for e.g. BMI and waist
What do the within-twin, cross-trait covariance tell us?
Tells us the phenotypic covariance between traits (these are held the same across twins).
What do the cross twin, cross train covariance tell us?
These are held the same across twins and contain information necessary to calculating A, C and E contribution to covariance.
2 facts about bivariate matrices?
Matrices are symmetrical and variance/covariance a and held to be the same across twins
When should you use a cholesky decomposition over a correlated factors?
If you have good reason to sequence variables in a specific manner (I.e. If the data is longitudinal)
If not then correlated factors is the most and most interpretable model to use.
What are the assumptions of the correlated factors solution?
Each variable is influenced by a set of genetic, shared and non-shared environmental factors
The factors associated with each variable are allowed to correlate with each other through rA rC and rE
Correlations among phenotypes are a function of rA, rC and RE and the standardised A C and E paths connecting them
What does rbind and cbind mean?
Row bind and column bind
I think this is something to do with multiplying matrices and wanting the position of the data to reflect the label it is given and not the position of the data.
In R when free=true what does this mean?
It means that the parameters of the model are being estimated
When would you run an A E model?
When you want to compare the fur of the two models to test the significance of the parameter (shared environment) remember you can never drop non shared environment as it contains error.
What do small a c and e denote?
Path coefficients
What do capital A C and E denote?
Variances and covariances
How do you work out the phenotypic correlation between two traits?
It’s the sun of the components attributable to A C and E
Square root of ‘a2’ pathway for trait 1 x genetic correlations x square root of a2 pathway for trait 2 + the same for shared and non shared environments.
Which should approximate the phenotypic correlations (Pearsons)
How do you calculate the proportion of the phenotypic correlation that is explained by A C and E?
You work out the overall phenotypic correlation and you divide the genetic, shared and non shared environmental total pathways by the total phenotypic correlations .
E.g.
A = .56/.75 = .75
C = .10/.75 = .13
E = .09/.75 = .12
This means that 75% of the phenotypic correlation can be attributed to additive genetic effects.
Why can not you interpret the shared ethology of two traits just by looking at the genetic and environmental correlations?
You need to look at both the value of the correlation and the pathways in order to understand the aetiology of a phenotypic correlation.
As for example weakly heritable traits can still have a large proportion of their correlations attributable to genetic effects
E.g rA = .3 and rA = .7 can have the same proportion of the phenotypic correlation attributable to genetic effects
How is the goodness of fit of the model measured?
It is measured to the perfectly fitting (saturated) model by the likelihood ratio chi-square (x2) statistic.
What does a significant x2 result mean when testing the goodness of fit?
It means that the model provides a poor fit to the data and can be rejected.
What are the degrees of freedom for x2?
The number of observed statistics (which is normally 3 covariances and variances (mz & dz) - the number of parameters being estimated in the model.
How has the equal environments assumption been tested?
By observing twin correlation for twins who have mislabelled zygosities - mz twins who have been mislabelled as dz twins should be less similar to mz twins correctly labelled. This has not been found.
Mz twins reared apart have provided correlations for personality variables that are almost the same as those for mz twins reared together
Amount of Contact of mz twins has Also been tested and no difference between mz and s-s dz has been found. (Slight difference between high contact mz and low contact mz - but result is small)
What is assortative mating?
Non-random pairing of mates in the basis of factors other than biological relatedness.
(Tested by observing the phenotypic correlation between parents over time of the trait in question)