L18, L20, L22, L23- biostats lectures Flashcards

1
Q

what is sampling distribution

A

where you take all data points (2) and pair them with themselves and each other to find a distribution of means => n^2 sample size with normal distribution curve
(mean stays the same, SD dec)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the formula for standard error of the mean

A

SEM = σ / sqrt(n)

  • measures variability in the mean
  • decreases as sample size (n) increases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Central Limit Theorem in terms of not normal population distribution

A

it yields a normal distribution- increases the sample size of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

calculation of degrees of freedom

A

df = n - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define t-distribution

A

normal distribution with fatter tails (more extreme values included)
-as df increases (n - 1 inc), tails get smaller and t-values approach Z-values (n = infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define random sample

A

randomness means everyone has an equal probability of being included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does the Central Limit Theorem infer about a sample

A

allows us to infer population parameters (mean, SD) from a single sample of sufficient size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

describe when it is better to use standard error OR standard deviation

A

SEM- how well does the sample estimate the mean

SD- how widely scattered are the measurement in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

define the purpose of Confidence Interval

A

makes inferences about the true mean based on the mean and SD of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define null and alternative hyopthesis

A

null- nothing unusual is happening, no relationship between exposure/disease

alternative- something unusual is happening, exposure and disease are related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the two ways to test a hypothesis

A

One-sided: alternative hypothesis specifies a direction (only better or only worse)

Two-sided: alternative hypothesis can go in either direction (either better or worse)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CI = (1)

P = (2)

A

1- CI = 1 -α

2- P = 1 - β

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the Z value formula

A

Z = (X - µ) / (SD/sqrt(n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

define type I and type II errors in terms of null hypothesis

A

Type I (α error/FP)- null hypothesis is true, study rejects Ho

Type II (β error/FN)- Ho is false, study supports Ho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what situations are bad to have Type I errors

A

(false positives)

  • Tx is expensive, difficult
  • cost of false alarm is high
  • no effective Tx
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what situations are bad to have Type II errors

A

(false negative)

  • Tx is cheap, easy
  • cost of false alarm is low
  • Tx is only responsive in early stages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Type I/α error can only occur if Ho is (T/F)

Type II/β error can only occur is Ho is (T/F)

A

1- (false positives), Ho is true

2- (false negatives), Ho is false

if one inc, the other dec

18
Q

inc β in will have what effect on α, σ, n, Δ

A

α- dec (type I error)
σ- inc (standard deviation)
n- dec (sample size)
Δ- dec (effect size)

19
Q

list the hypothesis test needed for the when the ind. and dep. variable are categorical

A

chi-squared test (contigency tables)

20
Q

list the hypothesis test needed for the when the ind. variable is categorical and the dep. variable is continuous

A

t-tests (2 groups)

ANOVA (3 or more groups)

21
Q

list the hypothesis test needed for the when the ind. and the dep. variable are continuous

A

correlation regression

22
Q

what are the assumptions of a chi-squared test

A
  • categories are exclusive

- each category has an expected value of at least 5

23
Q

what is the calculation for Chi-squared test

A

(observed - expected)^2 / expected

24
Q

what is the formula for the degrees of freedom in a chi-squared test

A

df = (# columns - 1) * (# rows - 1)

25
Q

what is the assumption made in a t-test

A
  • assumes sample population is normally distributed

- assumes variance of both samples are the same

26
Q

define both values used from a Pearson correlation

A

r = strength of association / correlation between variables

r^2 = variance or the percentage that the one variable explains the other

27
Q

list the r values and their associated correlation strengths

A
(note absolute value of r)
r < 0.4 --> weak corr.
0.4 < r < 0.6 --> mod. corr.
0.6 < r < 0.8 --> strong corr.
r > 0.8 --> very strong corr.
28
Q

list the assumptions of Pearson Correlation

A
  • normality of variables (continuous normal distribution)
  • linearity of associations (correlation strength doesn’t change with higher v lower variables)
  • oval scatterplot (not triangle)
29
Q

what are some conditions that violate assumptions of Pearson Correlation

A
  • extreme values
  • multiple modes
  • nonlinear / nonmonotone associations
  • triangular scatterplot
30
Q

describe Spearman’s correlation, what is the disadvantage

A
  • ranking correlations regardless of size differences, used when Pearson cannot be used
    (ex. extreme values, nonmonotonic data)
  • less statistically powerful
31
Q

Spearman is the (1) correlation

Pearson is the (2) correlation

A

1- Safe

2- Powerful

32
Q

what is a correlation regression

A

there is a quantifiable correlation between the 2 variables

y = mx + b

33
Q

describe the linear association regression formula

A

y = α + βx

34
Q

what are residuals in linear regressions

A
  • used for individual data points
  • difference in actual value and predicted value
  • y = α + βx + e (the residual)
35
Q

how do assumptions and power of a test relate

A
  • more assumptions made –> the weaker the power (continuous stuff)
  • less assumptions made –> the stronger the power (categorical stuff)
36
Q

list the 4 data types in order of increasing power

A

(in order of inc power and dec assumptions)

ratio < interval < ordinal < nominal

37
Q

what is the Odds formula in terms of probability

A

Odds = probability / (1 - probability)

38
Q

what is the Probability formula in terms of odds

A

Probability = odds / (1 + odds)

39
Q

describe Bayes Theorem

A

1) pretest probability –> odds ratio
2) PostTest Odds = PreTest Odds x LR
3) posttest odds –> probability
(only used if there is pretest knowledge that can be used)

40
Q

what are the 4 steps for identifying statistical errors in literature

A

1) were limits of design and statistical approach acknowledges
2) was appropriate data and statistical tools applied for hypothesis tested
3) were the assumptions of the approach violated
4) were the results interpreted correctly

41
Q

what are the assumptions of the residuals in a linear regression

A
  • normally distributed
  • uncorrelated with the outcome
  • uncorrelated with each other