706 exam Flashcards

1
Q

What is the precision method way of determining a sample size?

A

Trying to establish a sample size to meet a requirement on the precision of estimates (as measured by confidence intervals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the equation for SE?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calcuate the sample mean of a binary variable?

A

The sample mean of a binary variable Y estimates the probability that Y=1. For binary variables the sample mean is a proportion and the proportion estimates the probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If you code ethnicity 1, 2 and 3- can you then include this coded variable as a predictor in a regression model?

A

No. Doing this forces a structure on the model that is unlikely to be true. The 1v 3 effect is twice the effect of 1 v2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is logistic regression and what is the statistical framework?

A

The outcome in logistic regression is binary and uses counts of occurance. The binomial model provides the logistic framework.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the problems associated with missing data?

A
  • Loss of statistical power
  • distortion of analyses
  • Create bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate a confidence interval?

A

q +/- 1.96 SE (q)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a t-statistic?

A

The calculated difference represented in units of standard error. The greater the magnitude of T, the greater the evidence against the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you calculate relative risk?

A

Probability of an event occuring for group A divided by probability for group B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a chi-square test?

A

The chi-square test is used to assess whether two categorical variables are unrelated to each other. The ‘chi-square statistic’ is a measure of the discrepancy between expected and observed cell values. A measure of the discrepancy between expected and observed “chi-square statistic” χ2. If χ2 is large it indicates a big discrepancy between what we observed and what we would have expected under the hypothesis of independence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate the odds ratio from a coefficent in a regression analysis?

A

exp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you get the probability of two things happening simultaneously?

A

Multiply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how does p relate to x in a logistic model?

A

p is always between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the central limit theorem?

A

The sampling distribution of sample means tend to a normal distribution as n gets large, regardless of underlying distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a MSE of 17 tell you for a model?

A

For a given combination of factors the actual values will be distributed +/- 34 units about the mean value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the key assumptions of ordinary regression?

A
  1. n observations are independent of each other
  2. the effects add together
  3. the residuals are normally distributed with constant variance. You can check this with a Q-Q plot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What judgements do you make when doing a regression model?

A

Modelling requires judgements about how to include variables: should continuous variables be categorized, should dummy variables be used for ordinal scale, which variables should be included in the model, should those that are not statistically significant be dropped, should interaction terms be included.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you increase the power of a study?

A
  1. Sample size
  2. The size of the effect
  3. Significance level
  4. The endpoint being studied
  5. The statistical test being used. Ie generally if the assumptions of a parametric statistical test hold, the parametric statistical test will be more powerful than a non-parametric one. A parametric test is based on tests of the parameters of normal distribution so are based on the assumption that the underlying distribution is normal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a parametric test?

A

A parametric test is based on tests of the parameters of normal distribution so are based on the assumption that the underlying distribution is normal.

20
Q

What is the Hosmer-Lemeshow test?

A

a goodness of fit test in logistic regression. It is calculated by comparing predicted and actual counts.

21
Q

You calcuate a high p value of 0.66 for you Hosmer-Lemeshow test. What does this tell you?

A

The regression modes is a good fit.

22
Q

Is the null hypothesis defined in terms of population or sample quantities?

A

population

23
Q

What is a z score and how do you calculate one?

A

Z-scores are expressed in terms of standard deviations from their means. Resultantly, these z-scores have a distribution with a mean of 0 and a standard deviation of 1.

z= estimate- null value/ SE

24
Q

Is the RR a good approximation for OR?

A

Only if it’s a rare disease

25
Q

What is the formula to calcuate a chi-square statistic?

A
26
Q

What is the distinction between the precision and power methods for sample size determination.

A

Precision is based on fixing a sample to attempt to achieve a certain degree of precision of an estimate, precision as determined by confidence interval. It focuses on the precision of estimation of an effect.

Power, on the other hand, focuses on hypothesis testing, and trying to avoid saying “no statistically significant difference” when in fact there is a difference. Power is the probability of saying an effect exists when it actually does exist.

27
Q

Why is a regression model an artifical construct?

A

A regression model is a mathematical formulae relating an outcome variable Y to a set of other variables X1 X2 X3…The formula is a human construct that is almost certainly a simplification of reality.

28
Q

What are frequentist statistics?

A

Frequentist statistics regards probabilities as “long run relative frequencies”. Frequentist statistics are based on the notion of probability (as in P values and CI’s) as a frequency of occurrence measure.

29
Q

What is standard error?

A

SE is a measure of sampling variability. It is an intrinsic feature of the variability of any statistic that is calculated iin repeatedly drawn samples. In itself it is not an “error”, there is nothing wrong about it. Its use as an “error” arises when SE is used in CI calculation; if a CI is considered as a measure of degree of likely error in estimation.

30
Q

What is a confidence interval?

A

an estimate of the interval μ − 1.96σ/√n within which there is a 95% chance that the sample mean ȳ will lie. But as the CI is an estimate of this interval, we do not know whether the 95% probability is correct.

However, if you repeated the study over and over again, calculating a 95% con- fidence interval each time, we would expect that about 95 of 100 such intervals would cover the true mean μ.

31
Q

What is the _cons value in a regression model?

A

the _cons term is the “intercept” of the model It gives the mean value of your y variable when all the other variables are zero. It is often unhelpful. The p-value associated with it is a test that the intercept is zero; it is not a sensible question to ask.

32
Q

How do you add an interaction in a regression model and why would you want to do this?

A

The model is additive. To include an interaction you multiply x1 and x2 in the model. You do this if you think the effects interact in some way.

33
Q

What is the point of making a model?

A

they provided a framework to estimate simultaneously the effects of any number of variables on an outcome.

34
Q

Why is a confidence interval associated with an estimate q often of the form q +/- 1.96SE(q)

A

Because of the central limit theorem it is often safe to assume that the sampling distribution of estimates is approximately normally distributed, and that the approximation improve with larger n. Because of this, the distribution of q is centred around the true value with standard deviation equal to standard error SE(q). This makes an estimate of the interval contain 95% of the distribution q+/- 1.96 SE(q), where 1.96 is critical 2.5% probability (in each tail) value from z distribution.

35
Q

What is the difference between the standard devation and SEM?

A

The standard deviation (SD) measures the amount of variability, or dispersion, for a subject set of data from the mean, while the standard error of the mean (SEM) measures how far the sample mean of the data is likely to be from the true population mean.

36
Q

What is a p value?

A

It is about the probability of the data configuration, if the null hypothesis is true.

37
Q

when do you multiply probabilities?

A

only for independent events

so diastolic bp and systolic bp are not independent in the same person

38
Q

when can you add probabilities?

A

If the two events are mutually exclusive.

If A and B can occur together then: P(A or B) = P(A) + P(B) - P(A and B)

39
Q

What does logistic regression do?

A

Models the probability of your binary variable in terms of the other variables listed.

40
Q

Why would you use the post-estimation command test after a regression model?

A

Post-estimation command test can be done to test whether the non-significant or questionable significant variables can be removed from the model. This is presumably to build a model which only has significant predictor variables in it.

41
Q

Is there a way that you can know the discrepancy between the sample mean and the population mean?

A

No. SEM is a measure of sampling variability.

42
Q

How do you get the standard error on a proportion?

A
43
Q

How do you calculate degrees of freedom for a chi-square test?

A

the table has (r − 1)(c − 1) degrees of freedom where r is row and c is column.

44
Q

What are the assumptions associated with the ordinary multiple linear regression model?

A
  • First that the model, with its linear structure is a true representation of the mean value of Y. Also that the direction of effect is that Y depends on X.
  • The tests of significance are based on assumption that residuals are normally distributed about the line(plane).
  • Also variance of residuals does not depend on X varaible combination – said to be homoskedasistic (word and correct spelling! not expected in answer).
  • The observations in the data are also assumed to be “independent” that is they cosnstute a random sample of people (if “person” is basic unit). This may be violated in repeated mesureents on same person, or if individuals somehow related. a random sample
45
Q

What are predictor variables?

A

X variables that predict Y

46
Q

How do you calculate the probability from an odds ratio in regression analysis?

A

ex/ 1+ex