BIO Statistics Flashcards

1
Q

Central limit theorem

A

The sampling distribution of the mean of any independent, random variable will be normal, or nearly so, if the size of the sample is large enough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Gaussian curve: area between u and 1SD, 1 SD and 2 SD, 2 SD and 3 SD, 3SD–> infinity

A

U and 1SD: 34.1%
1SD - 2SD: 13.6%
2SD-3SD: 2.1%
Past 3 SD: 0.1%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Parametric statistics (definition)

A

A class of statistical procedures relying on the assumptions about the shape of the distribution(assume normal), in the population and about the form or parameters (u, SD) of the assumed distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Non parametric statistics (definition)

A

A class of statistical procedures NOT relying on assumptions about the shape or form of the probability distribution from which the data is drawn. `

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive statistics include

A

Mean, median, mode, range, variance, SD, SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Range

A

Difference between largest and smallest sample values

Not indicative of the data set’s dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variance

A

Average of the square distance of each value from the mean.

Includes negative values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Standard deviation

A

Tells you how tightly each sample is clustered around the mean.

Tight cluster=low SD.

Only under normal distribution.

Shows precision of the calculated mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standard error

A

Measure of how far the sample mean is from the population mean.

Gets smaller as sample size increases, since the mean of a larger sample is likely to be closer to the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Confidence interval (definition)

A

The estimate of the range that is likely to contain the true population mean. Takes into account the size of the population and the scatter of the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What constitutes reliable data?

A

Precise, accurate, repeatable, reproduce able.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random error

A

Caused by inherently unpredictable fluctuations on the readings of the measurement apparatus or in the experimenter’s interpretation of instrumental reading.

Can occur in any direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Systematic error

A

Result of bad science. Predictable, one direction. Caused by imperfect calibration of instruments, imperfect methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Alpha

A

Significance level. Probability threshold below which the H0 will be rejected.

0.05 or 0.01 are appropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Type 1 error

A

Incorrect rejection of a true Ho. (False positive)

Say the experiment worked when it didn’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Type II error

A

Incorrectly retaining a false Ho. (False negative)

If the true state of the Ho is false and you fail to reject it. Usually an issue with power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Z Test definition

A

Any statistical test for which the distribution of the test can be approximated by a normal distribution, with n>30.

Assumes pop and sample are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the value of Z mean in a z test?

A

Z is the chance that the experimental mean would occur by chance, given that the Ho is true. Large Z means that there’s less of a chance this is true.

Z score of 2.5 means that the sample mean is 2.5 SD away from the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

T test is used when (general)

A

You have a normal distribution in the population and the sample, and have n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

P value– what do large and small p mean

A

Large p indicates weak evidence against the Ho. Need to accept.

Small p indicates strong evidence against the Ho, reject.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

One tailed t test

A

To test if the experimental mean is significantly greater than the population mean, or significantly less than, but not both.

Making the assumption about the data makes this less robust

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Two tailed t test.

A

Testing if the exp. mean is significantly greater than and significantly less than pop mean.

More robust because using a smaller area on each side of the distribution (2.5% on each)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Paired t test

A

The observed data are from the same subject, twins, or otherwise matched subject and are drawn from a population with a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Unpaired t test

A

Observed data are from two independent, random samples from a. Population with a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

ANOVA

A

Compares 3 or more means. Measures the sum of squares to understand the variance.

ANOVA tells you whether any of the earns have a difference between each other, taking scatter and variability into consideration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

One way ANOVA

A

One measurement variable and one nominal variable is explored.

All the groups are independent, and only one thing is being measured in each group. There is theoretically a normal distribution within each group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Two way ANOVA

A

1 measurement variable and 2 nominal variables.

There are two factors being measured within each group that effect the outcome. Ex: how 3 different drugs affect subjects - both men and women. Drug response and gender are the two factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Post hoc tests

A

In follow up to the ANOVA. Used when ANOVA rejects Ho. Tests whether the group means differ significantly, correcting for multiple comparisons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Mann Whitney U test

A

For independent measures with 2 groups. It’s a non-parametric two sample t test.

Ranks measurements from highest to lowest values, separating the groups– U from each sample set. Lowest U is compared to the table. If Uexp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Correlation

A

The extent to which two variables have a linear relationship with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Pearson correlation Coefficient

A

The certainty of when you know X will predict y. How well do the variables correlate.

32
Q

Linear regression

A

Used to adjust the values of the slope and intercept to find the lie that best predicts y from X based on the data. Assumes that data are linear. They may not be.

33
Q

Categorical data

A

No mean, median, mode, or normal distribution. Dead or alive, diabetes or no diabetes.

May be inherent in the data or made from continuous data.

May be more meaningful clinically

34
Q

Chi square- what it is used for

A

It is the appropriate statistic for measuring relationships between categorical data in a contingency table. Compares experimental outcomes to expected outcomes to see if there is a significant difference.

35
Q

Assumptions made by a chi square test

A

Data are frequency data
Adequate sample size
Measures are independent of each other (a patient only goes in one box).

36
Q

When to use a Chi Square (check list)

A

Categorical data

Not normally distributed

No assumption that data will be normal.

37
Q

Experimental research design includes these 3 things

A

Independent variables manipulated, extraneous factor are controlled, random assignments into groups.

38
Q

Run-in experiment

A

Precedes the randomized control trial. A period of time where subjects are put on the control regimen to see if they will continue with the study and comply. If not then they will be removed before the real study starts.

39
Q

Healthy user bias

A

Sample is more healthy, or medically fluent than the average population.

40
Q

Berkson’s bias

A

Sample selected from an impaired or diseased group, like hospital patients. Clearly doesn’t reflect the regular population

41
Q

Exclusion bias

A

Excluding subjects based on potential extraneous factors.

Excluding reduces generalizability

42
Q

Selection bias

A

Bias in placing sample subjects into treatment or control arms. (Hand picking). Leads to non-equivalent groups, which builds inherent biases.

43
Q

Investigator bias

A

Where the investigators are aware of which subjects are in each group and this influences how they work with the subject or record results

44
Q

Hawthorne effect

A

Subjects will change their behavior in a study, effecting internal and external validity.

Usually done to gain approval of/please investigators.

45
Q

Incidence (def)

A

The number of new cases of disease arising during a given period of time.

Also “absolute risk”. (Number of people with disease)/(total number of people)

46
Q

Relative risk

A

Incidence in exposed population/incidence in unexposed population.

47
Q

Cohort study

A

A cohort of people who have something in common when they are first assembled are observed to see what happens to them

Not random, the cohort subjects have a relationship.

Goal: to study predictor variables and associated outcomes

48
Q

Case-control studies

A

Looking backward to compare people with and without a condition– trying to determine risk factors for disease or outcome. Good for long latency, or rare disease.

49
Q

Recall bias

A

People may not remember the exposure or details about it, and is not in medical record.

50
Q

Equation for Variance

A

SUM [(Mean of data - Mean sample) ^2] / (N-1)

51
Q

Equation for Standard Deviation

A

Square root of the variance.

SQRT:
SUM [(Mean sample - Mean pop)^2]/(N-1)

52
Q

Grubb’s test

A

For outliers

Z=(Mean - outlier)/SD

53
Q

Effect on required N: increased variability

A

Increased N

54
Q

Effect on required N: greater differences between groups

A

Lower N required

55
Q

Effect on required N: smaller alpha

A

Increase N

56
Q

Effect on required N: decrease Power

A

Decrease N

57
Q

R^2 correlation

A

-1+1

0 means no correlation

58
Q

Odds ratio- values?

A

OR1 increased odds that the exposure is associated with the case

59
Q

Risk factor definition

A

Characteristic or factor that increases a person’s risk of disease. Can be inherited, environmental, socioeconomic, behavioral.

60
Q

Chemical agents

A

Workplace exposure to chemicals, etc

61
Q

Physical agents

A

Radioactivity in your state, noise, vibration

62
Q

Biologic agents

A

Infectious agents (like bacteria, virus), allergens

63
Q

Psychosocial agents

A

Stress, trauma/ptsd, depression

64
Q

Mechanical agents

A

Repetitive motion jobs/hobbies (typing), heavy lifting,

65
Q

Lifestyle risk factors

A

Drugs, alcohol, unsafe sex, sun exposure

66
Q

Framingham calculator

A

Risk assessment tool for 10 year risk of having a heart attack based on risk factos.

67
Q

Absolute risk

A

The probability of an event in a population under study. Same as incidence.

68
Q

Attributable Rsik

A

Absolute risk, or incidence, of a disease in exposed persons, minus the absolute risk from non-exposed persons.

Risk attributed to an exposure

69
Q

Relative risk

A

Compares the probability of an event occurring in the exposed group vs the non-exposed group.

70
Q

Relative Risk Reduction RRR

A

By how much the treatment reduced the risk of disease outcomes, relative to the control group who did not receive treatment.

71
Q

Absolute Risk Reduction ARR

A

The most useful. Shows difference in risk comparing treated vs non treated. Expressed as NNT

72
Q

NNT

A

Number needed to treat. The number of patients you need to treat before seeing a benefit of the intervention

1/(ARR%)

73
Q

High sensitivity of a diagnostic test

A

Probability of testing positive given the patient has a disease. Small false negative, high false positive (in a normal test)

74
Q

High specificity of a diagnostic test

A

Probability of testing negative given patient does not have disease. Low false positive rate. High false negative.

75
Q

Prevalence of a diagnostic test

A

The proportion of people possessing a clinical condition or outcome at a given point in time. The probability of disease before test result is known.

76
Q

Positive predictive value of a diagnostic test

A

Probability of having disease, given a positive test result.

77
Q

Negative predictive value of a diagnostic test

A

Probability of not having a disease given a negative test result.