Statistics I & II Flashcards
Parameter
describes the whole population
Is a parameter fixed or does it vary? Why?
fixed because you know data about everyone in the population
Statistic
describes a sample of the population
Is a statistic fixed or does it vary? Why?
varies because you can take different samples from the same population
Is a parameter or a statistic more reliable?
parameter
What is the issue with using a parameter instead of a statistic?
if the population you are testing is large, the testing requires more time and resources
What are descriptive statistics used to describe?
central tendency, variability, & shape of data
Central tendency
where the middle of the data lies
What is the “king” measurement for central tendency - mean, median or mode? Why?
mean because you can do the most stats w/it
What is the mean affected by?
Outliers
If your data has outliers, which measurement is used instead of mean?
Median
Median
the middle most number in a set of scores
Mode
the most frequently occurring score(s) in a distribution
Mean (average)
the sum of the scores divided by the number of scores
Variability
the spread or dispersion of a set of research data or distribution
If you have test scores that are all over the place (10’s, 35’s, and 100’s), yet most scored in the 80’s, would you have a lower or higher variability?
higher variability
Range
the difference between the highest and lowest scores in a distribution
Percentiles
describes a score’s position within a distribution compared to others on a scale of 0-100%
If you are in the 75th percentile, you scored ___________ than ________ of others.
75%
higher
Quartiles
divide distributions into four equal parts
Q1 = 25th percentile
Q2 = 50th percentile
Q3 = 75th percentile
Q4 = 100th percentile
Interquartile range
The difference between the upper and lower quartiles.
Standard deviation
represents the spread/dispersion/variability of scores relative to the mean
Coefficient of variation compares what?
compares 2 SD’s
What is the “king” measurement for variability?
standard deviation
Which type of measurement is used to describe nominal data?
mode
Which type of measurement is used to describe ordinal data?
median & interquartile range or mode
Which type of measurement is used to describe interval and ratio data?
mean & standard deviation or mode & interquartile range
What is a bell curve used to describe?
shape
How much data lies between -1SD and 1SD of the mean?
68.3
How much data lies between -2SD and 2SD of the mean?
95.5%
How much data lies between -3SD and 3SD of the mean?
99.7%
In a bell curve with a positive skewed distribution, the mean gets pulled to the ___________ creating a tail to the __________. Why?
right, right
the mean is heavily affected by outliers, so it gets pulled further toward the right
In a bell curve with a negative skewed distribution, the mean gets pulled to the __________ creating a tail to the ___________. Why?
left, left
the mean is heavily affected by outliers, so it gets pulled further toward the left
When the data is skewed in a bell curve, the __________ is often used to describe the central tendency. Why?
median because it is not as impacted by outliers as the mean is
Inferential statistics
allow us to estimate unknown population traits from using a sample
When you have collected data from a sample, you can use inferential statistics to understand the larger population from which the sample is taken.
Probability
likelihood that an event will occur given all possible outcomes
it is what SHOULD happen, not what WILL happen
What is probability used for?
used to determine if observed effects are likely to have occurred by chance
Probability is described as _______ and is in ________ form.
“p”, decimal
Simply put, what is sampling error accounting for?
our sample will never fully reflect the entire population
Will a larger or smaller sample be more representative of the entire population?
larger
Null hypothesis
the observed difference (or effect) occurred by chance
Example of a null hypothesis
There will be NO STATISTICAL DIFFERENCES in strength between open and closed chain exercises.
Statistical tests will always test the ________ hypothesis.
null
Example of an alternative hypothesis
There will be SIGNIFICANT DIFFERENCES in strength between open and closed chain exercises.
Alternative hypothesis
the observed difference (or effect) could not occur by chance
When we run a statistical test, we either ____________ the null hypothesis or do not _________ the null hypothesis.
reject
Type I error
false-positive findings which means that the research found a difference when there isn’t really a difference
Type II error
false-negative findings which means that research did not find a difference when there is a difference present
In a small sample size, which type of error is more likely?
type II
In a large sample size, which type of error is more likely?
type I
You decide to get covid tested and your results came back positive. However, it turns out that you don’t actually have covid. What type of error is this?
type 1
You decide to get covid tested and your results came back negative. However, it turns out that you do have covid. What type of error is this?
type II
Level of significance (alpha)
determines how strict one is with rejecting the null hypothesis
maximal acceptable risk of making a type I error
Probability of a Type 1 error
Rejecting Null hypothesis when its true
How would an individual reduce the chance of a type I error occurring?
reduce the level of significance (alpha)
Beta
probability of making a type II error
Statistical power (1-beta)
probability of finding statistically significant differences
What does the statistical power depend on?
alpha, sample size, effect size
How would an individual reduce the change of a type II error occurring?
lower beta or increase statistical power
What does the p-value tell you?
probability that your results are due to chance when testing between and/or within groups
How likely ur data could have occurred under the null hypothesis.
If p = 0.10 this means that
there is a 10% probability that your results are due to chance.
If p < 0.05 this means that
there is less than a 5% probability that your results are due to chance.
If p < 0.01 this means that
there is less than a 1% probability that your results are due to chance.
The lower the p-value, the more confident you are that your results are _______ due to chance.
NOT
If you have a p-value = 0.10, does this mean that there is a 90% chance that your results are accurate?
no -
there is 10% probability that your results are due to chance
BUT you cannot reverse this & assume the opposite is true
What is the p-value dependent upon?
sample size
What are alternative forms of measurement to a p-value?
confidence intervals & effect size
Confidence Intervals (CI)
range of numbers for which you expect the true population to fall
If you have a 95% CI = (a, b), what does this mean?
you are 95% confident that the true difference between groups is between a and b
“Is there a difference?”
p-value
“How big is the difference?”
effect size
Larger effect size =
larger difference
What is the first step when choosing a statistical test?
determine the purpose of analysis
what are you trying to do…?
- predict
- relationship
- significance of difference
What is the second step when choosing a statistical test?
determine scale of dependent variable
is it…?
nominal, ordinal, interval or ratio
What are the 3 assumptions of parametric data?
- normality: data follows normal distribution
- equal variances: same spread
- typically ratio or interval data & represented as mean
What are the 3 assumptions of nonparametric data?
- no need for normality or equal variance
- ordinal or nominal data
- typically presented as median
When you see the word predict, which test should we automatically think of?
regression
What is R^2?
measure of how strong the prediction is.
total variance of y (dependent) that can be explained by x (independent)
What are the 3 predicting factors?
- likelihood ratios (LR)
- relative risk (RR)
- odds ratio (OR)
Sensitivity test
how often a test gives a positive result in people with a condition
- lots of true positives
- few false negatives
Specificity test
how often a test gives a negative result in people without a condition
- lots of true negatives
- few false positives
If a test has 92% sensitivity and you obtain a negative result, you can confidently rule ______ the condition.
out
If a test has a 95% specificity and you obtain a positive result, you can confidently rule ______ the condition.
in
What two clinical questions do likelihood ratios help answer?
- How sure am I that a person w/a positive test result actually has what I tested him/her for?
- How sure am I that a person w/a negative test result actually does not have what I tested him/her for?
+ LR =
sensitivity / (1-specificity)
- LR =
(1-sensitivity)/specificity
Relative Risk
compares one risk to another
Relative risk = 1
the risk in the exposed group is the same as the risk in the unexposed group, so
no indication of benefit or harm
Relative risk < 1
the exposure is associated w/a protective effect
Relative risk > 1
exposed group has a greater risk of contracting the disease, so
exposure is associated w/harm
If you have a RR of 0.5, what does this mean?
you are at a 50% lower risk than the average
If you have a RR of 1.5, what does this mean?
you have a 50% higher risk than the average
In order to put RR into context, what do you need to know?
the baseline risk for the disease
Odds Ratio
compares whether the odds of a certain event happening is the same for two groups
OR = 1
the event is equally likely in both groups
OR > 1
event is less likely in the first group
OR < 1
event is less likely in the first group
What is the measure of choice in a case-control study?
odds ratio