Knowledge Questions Part 1 Flashcards
What type of graph or table is used to represent the correlation between two continuous
variables, such as intelligence and income? Which statistical measure and test correspond with
this?
A Scatterplot; Statistical measures such as the mean and std.dev. correspond to it. You can measure the correlation using (linear) regression analysis.
What type of graph or table is used to represent the correlation between two binary
variables? Which statistical measure and test correspond with
this?
You can use contingency tables to display the data. The data is classified in terms of proportions and probabilities and you can apply binary logistic regression and other tests making use of the chi-square statistic.
Let’s assume we have a contingency table for treatment (yes/no) and recovery (yes/no).
Formulate a null hypothesis to establish a difference and a null hypothesis to establish a
correlation. How do these hypotheses relate to each other?
H0: P(Treatment) = P(Recovery)
H0: P(Recovery) = P(Recovery | Treatment)
-> In both cases, the phenomenon mentioned in the question is supported by the rejection of these hypotheses.
-> Hypothesis 2: If the probability of recovery significantly differs depending on whether there was treatment or not, you can say that there is a correlation.
How does the contingency table calculate the frequencies expected under the null hypothesis?
It calculates them under the assumption, that the variables are not correlated. That is: P(X) = P(X|Y).
If both variables are dichotomous, Pearson’s correlation coefficient can be rewritten to which
measure of association?
Pearson’s Phi.
What is the formula for calculating the odds ratio as a measure of association for a 2*2 table?
(CellA * CellD) / (CellB * CellC)
What is the relation between the odds ratio, the log of the odds ratio and the correlation? What
value must each of these three measures have in order for the association to be negative, positive
or absent?
The odds ratio indicates the difference between the odds of Y, given each value of X. If this value differs significantly from 1, there is likely some correlation. Large numbers indicate positive correlations, and numbers between 0 and 1 indicate negative correlations. When plotting the P(Y) on X, the log of the odds ratio gives the slope of the resulting line. Here, positive numbers indicate a positive correlation and vice versa.
Let’s assume we split a 2*2 table for variables X and Y into the levels for C, a third variable.
We then perform the Mantel-Haenszel test to establish the so-called ‘common odds ratio’. Which
null hypothesis is being tested in this case? Which assumption must be applied in order for this
test to be meaningful?
It tests the assumption that X and Y are independent while correcting for C. This test investigates the main effect of X.
How does one determine that an interaction between X and C exists in the sample? And what part of the SPSS output for contingency table analysis tests whether there is interaction in the population?
One has to compare the odds ratios between X and Y given the different levels of C. The test of homogeneity of odds ratios is used to investigate this statistically.
How does one determine that an interaction between X and C exists in the sample? And what
part of the SPSS output for contingency table analysis tests whether there is interaction in the
population?
One has to compare the odds ratios between X and Y given the different levels of C. The Mantel-Haenszel test may be used for this, which is reported under the test of common odds ratio in SPSS.
What is the correct follow-up analysis for the effect of X on Y if interaction is present? And
if there is no interaction?
One should now look at simple effects, that is, effects of X on Y within the respective levels of C. If there is no interaction, one can proceed by looking at the main effect of X on Y.
When can confounding be said to exist between X and C?
When the design is unbalanced. That is, when the relative distributions of X and Y are not the same across levels of C.
Can confounding and interaction occur at the same time?
Yes.
What is moderation?
Moderation is when C acts as a kind of ‘gatekeeper’ for the effect of X. For example, only when C equals a certain value, does X have an effect on Y.
What is the principal difference between linear and logistic regression?
In linear regression, the predictor variable is quantitative, whereas in logistic regression, it is categorical.
In the case of ANOVA and regression (Statistics 2), the expected value of continuous
variable Y is modelled as the sum total of a constant + main effects + interactions. In logistic
regression, Y is dichotomous. What is being modelled now as the sum total of the effects?
The natural logarithm of the odds of being in a certain category of Y.
How large is ln(X) if X is 1, has a value between 0 and 1, or is greater than 1?
X=1»_space; ln(x) = 0
0> ln(x) < 0
X>1»_space; ln(x) > 0
How large are the log odds if the probability is 50%, less than 50% or more than 50%?
P = 50%»_space; log odds = 0
P < 50%»_space; log odds < 0
P > 50%»_space; log odds > 0
How large is exp(X) if X is 0, smaller than 0 or greater than 0?
X = 0»_space; Exp(X) = 1
X < 0»_space; 0 0»_space; Exp(X) > 0
Assume we perform logistic regression using a single predictor, X, and X is dichotomous.
How do we interpret regression weight B for X in this case? And how do we interpret exp(B)?
B represents the slope of the line if we plot the log odds of Y (for binary logistic regression) on X. Exp(B) is then the odds ration between X and Y.
How large is the odds ratio for the effect of a dichotomous variable X on the dichotomous variable
Y if the regression weight B for X is 0, negative or positive?
If B is 0, the odds ratio will be 1. If B is negative, the odds ration will be between 0 and 1, and if B is positive, the odds ration will be larger than 1.
How large is the odds ratio for the effect of a continuous variable X on a dichotomous variable
Y if the regression weight B for X is 0, negative or positive?
If B is 0, the odds ratio will be 1. If B is negative, the odds ration will be between 0 and 1, and if B is positive, the odds ration will be larger than 1.
How large is the odds ratio for the effect of a continuous variable X on a dichotomous variable
Y if the regression weight B for X is 0, negative or positive? Answer the question under the assumption that there are additional predictors but no interaction between them.
If B is 0, the odds ratio will be 1. If B is negative, the odds ration will be between 0 and 1, and if B is positive, the odds ration will be larger than 1.
Let’s assume that the logistic model for dichotomous value Y (dementia: 0 =no, 1=yes)
includes predictors X (sex: 0=male, 1=female) and C (age in years), and there is no interaction.
The regression weight (B) for X is significantly negative. Will the odds ratio be greater or
smaller than 1? And which of the sexes will have a higher probability of dementia?
It will be smaller. Females will be less likely to have dementia.