Biostats test 3 Flashcards

1
Q

null hypothesis for chi squared test

A

H0: no association, variables are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does H0 translate to cell frequencies?

A

Cell counts are proportional to the marginal (row and column) totals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Formula for expected frequencies

A

In formula: E = (row total x column total) / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does chi quared test measure

A

If the differences between observed and expected frequencies are large enough to reject H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DF for a 2x2 table

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Formula for DF of a cross-table

A

df = (rows – 1) x (columns – 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do we use to test effect size of chi squared test

A

Phi for 2x2, Cramer’s V for cross-table. Only do this for if test is significant, then will inform about the strength of the association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Chi squared goodness of fit test

A

Determines whether the distribution of observed frequency counts differs from some other distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Odds formula

A

prob event/prob non-event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Risk formula

A

risk for specific/total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sensitivity

A

the proportion of positives that are correctly identified as such (so sick people being diagnosed as having the condition)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Specificity

A

the proportion of negatives correctly identified as such (healthy people being diagnosed as not having the condition)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prevalence

A

The number of cases of a disease, number of infected people, or number of people with some other attribute present during a particular interval of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sensitivity formula

A

TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Specificity formula

A

TN / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

PPV definition

A

The likelihood that a person who has a positive test result does have the disease, condition, biomarker, or mutation (change) in the gene being tested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

PPV formula

A

PPV = TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

NPV formula

A

NPV = TN / (FN + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Bayes

A

demonstrated how prior probabilities may affect estimated probabilities for events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cohen’s kappa

A

measures inter/intrarater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Cohen’s kappa formula

A

2(ad – bc) / (r1 x c2 + r2 x c1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is regression analysis used for

A

Predict values of an outcome variable on the basis of other variables. Aim is to build a model that describes variability in a dependent variable (Y) as a function of one or more independent (X) variables:
Yi = f(X1i, X2i, …). Causality may very well play a role.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does pearson’s R squared represent in OLS output

A

the proportion of total variation which is being explained: ssmodel/sstotal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does an outlier in the Y space do to the correlation coefficient

A

Pulls the correlation towards it, boosting it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does an outlier in the X space do to the correlation coefficient

A

Pulls the correlation towards it, lowering it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Simpson’s paradox

A

a phenomenon in statistics where a trend that appears in several groups of data reverses when the data is combined. In other words, a relationship between two variables that holds within individual groups can disappear or even reverse when the data from those groups is pooled together.

27
Q

When is Spearman’s rank used

A

variables are at ordinal data level, the relationship is monotonic, but non-linear or
outliers might affect Pearson’s r too much

28
Q

What are model coefficients in a regression equation

A

the terms in your function that optimally relate predicted values of the dependent variables to observed values, often denoted as b0, b1, b2

29
Q

What you should look for in your data before you start a regression analysis

A

Correct mistakes, check for outliers, stratification, non-linearities, … for all possible predictors!

30
Q

What is each factor in this equation: Yi = b0 + b1X1i + ei

A

b0 is called the intercept (it is a constant). b1 is the regression coefficient for X1
= (estimated) slope for best fitting line in scatter plot of X versus Y. ei is a prediction error, a residual –it is the difference between a predicted and an observed Y value (for a given X value). Yi = predicted value of Y on basis of model + prediction error.

31
Q

What is H0 for OLS regression

A

H0: no relation between X1 and Y. b1 = 0, no effect of changes of X1 on Y. r2 = 0, no variance explained by the model

32
Q

Aim in OLS for finding values

A

want to find values for b0 and b1 that optimally relate outcomes of Y to values of X1

33
Q

Predicting Y without any information about X

A

Mean of Y would be best guess because unbaised. This total prediction error = total sum of squares. Without predictors, SS total = SS error (sum of squares of residuals). All the variability in Y would be unexplained error.

34
Q

Predicting Y when you do have info about predictors

A

Adding predictors to your model, you hope to make better predictions.
By adding predictors, you hope that the ratio of SS total and SS error (sum of squares of residuals) improves, eg that a larger percentage of the variance in Y can be explained.
Better model, < SS error.

35
Q

‘best fitting’: Least squares criterion (OLS regression)

A

The ‘best fitting model’ is the one for which SS error reaches a minimum.

35
Q

Yi

A

predicted value on basis of model + prediction error

36
Q

SStotal

A

Sum of squared difference scores between Y values and mean of Y (the total variability in Y around its mean) ; SSmodel + SSerror. But in the case when nothing is known about the predictors (so-called null model), SStotal = SSerror

37
Q

SSerror

A

sum of the squared differences between the actual observed values of the dependent variable (Y) and the values predicted by the model. (Yi - Yhat)^2

38
Q

SSmodel

A

Sum of squared differences between the mean of Y and the predicted value of Y. The amount of variability in Y that is explained by the model. (Y hat - Y bar)^2. Variance explained.

39
Q

Model fit is proportion of

A

total variability in Y (SStotal) accounted for by sum of squares model prediction (SSmodel). so SSmodel/SStotal

40
Q

A Standardized (Beta) regression coefficient

A

indicates how many standard deviations the dependent variable changes with a standard deviation change of the predictor

41
Q

How can beta values be interpreted

A

in terms of the importance the predicators have in the predictive power of the model
Larger |value|, more influence
Looking at Beta values is useful with multiple regression models –different coefficients may all reach statistical significance, but their impact may differ

42
Q

indicator coding

A

creating dummy variables with values 0 or 1

43
Q

dummy variables

A

Dummy variables have values 0 or 1, where 0 means: ‘does not have the property’ and 1 means ‘has the property’. To code k categories, you need k - 1 dummies.

44
Q

reference category

A

an original level value that does not have its own dummy variable is the reference category

45
Q

synergy

A

if an interaction term is significant and its b is positive, the predictors have synergy (they strengthen each other)

46
Q

Multiple predictors

A

With two predictor X1 and X2, b1 indicates how Y will change with a unit change of X1, while holding X2 at a constant value
b2 indicates have much Y changes with a unit change of X2, holding X1 constant

47
Q

What does multiple regression allow us to estimate

A

the unique contribution of a predictor Xk to the outcome, given the other X variable(s) in the model. Please note that multiple regression therefore provides a way of adjusting / accounting for potentially confounding variables by including these in the model

48
Q

B value for a dummy

A

the estimate of the mean difference on the DV between the dummy level and the ‘uncoded’ reference. For example, blow, tells you how much the mean outcome for “low” differs from the mean outcome for “standard.” If the coefficient is significant, id differs significantly.

49
Q

Modelling interaction between x1 and x2

A

create a new variable X1ByX2 which is simply the product of the two original ones

50
Q

synergy

A

if an interaction term is significant and its b is positive, the predictors have synergy (they strengthen each other)

51
Q

anti-synerg

A

If an interaction term is significant and its b is negative, anti-synergy (they weaken each other)

52
Q

Adjusted R2

A

R2 may become spuriously high if your model is ‘overspecified’ (ie., if it has too many predictors relative to the number of cases). Adjusted R2 attempts to compensate for the spurious increase in predictive power. So more conservative, protects against type I error

53
Q

Standard error of estimate in model summary

A

can be interpreted as the average magnitude (in original units of measurement) of predication error

54
Q

steps in evaluation of MLR output

A
  1. Check significance of F-test:
if p < α reject H0
  2. Check size of R2 : if large enough (whatever that means, context matters) → relevant model
  3. Check significance of t-tests for coefficients
(for each, if p < α reject H0 )
  4. Check sign and size of unstandardized (b) coefficients for substantive interpretation (‘how much does Y change with unit chance in X’)
  5. Check absolute value of standardized (beta) coefficients for relative importance
55
Q

What does the F-value of multiple regression output tell us

A

If the F-test is significant (if 𝑝 < 0.05), it suggests that the model provides a better fit to the data than a model with no predictors.

56
Q

Sequential analysis

A

test whether adding predictors leads to a significant improvement of our model.
We want to know whether the R2 change (model 2 versus model 1) is significantly different from zero.
Use the F-change test to answer the question whether the more elaborate model 2 is better than model 1

57
Q

Homoscedastic model

A

magnitudes of the error terms do not depend on the x-value. the spread of errors remains constant with the values of the predictor

58
Q

Why is homoscedasticity good

A

If the assumption is not met, the fit of the model (multiple r2) may be overestimated
Actually, you cannot really speak of the fit of the model, because the fit changes with values of the predictor

59
Q

heteroscedasticity

A

Heteroscedasticity occurs when the residuals have non-constant variance across levels of the independent variable. the spread of errors changes with the values of the predictor.

60
Q

independence of residuals assumption

A

the error for yi+1 should not depend on the error for yi

61
Q

consequences of multicolinearity

A

At best: the model contains redundant elements. 
→ The model is more complex than needed
Worse: coefficient values may change erratically in response to small changes in the model or the data, and / or the ordering of your model building process (case 4, previous slide) 
→ The model is unstable
→ Model is hard to interpret / unreliable /invalid

62
Q

multicolinearity indicators

A
  • Significant F-test, but insignificant coefficients for specific IV(‘s)?
Suspect, but it could be that IV truly does not relate to DV
  • Large changes in coefficient values when a predictor variable is added or deleted?
’case 4’. Really suspect!!
  • IV is significant as single predictor, but insignificant in multiple regression model? Smoking gun!