Lecture 5 - Biostatistics Part 1-2 Flashcards
What are the Measures of Central Tendency?
Mean - the “average” – sum of the set divided by the number in the set
Median – the middle point (arrange the data smallest to largest, then find the middle point)
Mode – the score that occurs most frequently in a set of data
—-May have two most common values = “bimodal distribution”
the “average” – sum of the set divided by the number in the set
Mean -
– the middle point (arrange the data smallest to largest, then find the middle point)
Median
– the score that occurs most frequently in a set of data
Mode
—-May have two most common values = “bimodal distribution”
Mode
—-May have two most common values =
“bimodal distribution”
- The point/score at which 50% of scores fall below it and 50% fall above it
Median
This is the most general and least precise measure of central tendency
When two values occur the same number of times – Bimodal distribution
Mode - most frequently occurring value
look at mean median mode images on slide 10
!
Standard deviation
a measure of variation of scores about the mean
Variance compared to…
standard deviation
quantifies the amount of variability, or spread, around the mean of the measurements.
To calculate: take each difference from the mean, square it, and then average the result
Variance (σ2 )
To calculate the variance:
take each difference from the mean, square it, and then average the result
: a measure of variation of scores about the mean
Standard deviation (σ)
To calculate the standard deviation:
take the √ of the variance
(the “average distance” to the mean)
In practice, the standard deviation is used more frequently than the variance.
Primarily because the standard deviation has the same units as the measurements of the mean.
When comparing two groups, the group with the larger standard deviation exhibits a greater amount of _____ while the groups with smaller deviation has less _______.
variability (heterogeneous)
variability (homogeneous)
Empirical rule for data (68-95-99) - only applies to a set of data having a distribution that is approximately bell-shaped:
Approximately 68% of all scores fall with 1 standard deviation of the mean
Approximately 95% of all scores fall with 2 standard deviations of the mean
Approximately 99.7% of all scores fall with 3 standard deviations of the mean
Scatterplots:
A useful summary of a set of ________-
bivariate data (two continuous variables)
Scatterplots:
Gives a good visual picture of the relationship between the two variables, and aids the interpretation of the _____
correlation coefficient or regression model.
the statistic that summarizes the relationship between the variable on the x axis and the variable on the y axis
correlation coefficient
Perfect Positive correlation
X increases and Y increases at the same rate
Perfect Negative correlation
X increases and Y decreases yet at the same rate
X increases and Y increases =
positive correlation
X increases and Y decreases =
negative correlation
0 value for correlation coefficient means there is
no correlation
Represented by “r ” (rho)
The absolute value of the coefficient (its size, not its sign) tells you how strong the relationship is between the variables.
Correlation Coefficient
Tells us how strongly two variables are related
Correlation Coefficient
Correlation Coefficient
“r” can not be > 1 or < -1
Closer to -1 or +1: ?
Closer to 0 : ?
Closer to -1 or +1: the stronger the relationship
Closer to 0 : the weaker the relationship
The most common measure of association. Results can misleading if the relationship is non-linear.
Pearson Correlation
______ correlation is very sensitive to outlying values.
Pearson’s
______ non-parametric version of Pearson’s correlation. The calculation is based on the ranks of the data points of the x and y values.
Spearman Correlation:
The statement that establishes a relationship between variables being assessed
Example: In a clinical trial the hypothesis states the new drug is better the placebo
Alternative hypothesis (Ha or H1)
The statement of no difference or no relationship between the variables
Example: In a clinical drug trial the null hypothesis states that the new drug is no better than placebo
Null hypothesis (Ho)
AKA the “research hypothesis”
Alternative hypothesis
Hypothesis stating the expected relationship between independent and dependent variables.
Alternative hypothesis
If there IS a statistically significant difference, then the researchers “ACCEPT” the alternative hypothesis and “REJECT” the null hypothesis.
Alternative hypothesis
If there IS a statistically significant difference, then the researchers must “REJECT” the Null hypothesis
Null hypothesis
If there IS NOT a statically significant difference, then the researchers must “RETAIN” or “FAIL to REJECT” the Null hypothesis.
Null hypothesis
Two kinds of errors can be made when we conduct a test of hypothesis.
This first is called a Type I error; also known as a rejection error or an α error.
The second kind of error that can be made during a hypothesis test is a Type II error, also known as an acceptance error or an β error.
A _____ error is made if we reject the null hypothesis when null hypothesis is true.
type I
The probability of make a type I error is determined by the ______ of the test.
significance level
The second kind of error that can be made during a hypothesis test is a Type II error, also known as an _____ error or an _____ error.
acceptance error or β error
A Type 2 error is made if we….
…fail to reject null hypothesis.
The probability of committing a type II error is represented by…
the Greek letter β.
The probability of finding an effect
The probability of correctly rejecting the null hypothesis
The probability of seeing a true effect if one exists
Designers of studies typically aim for a power of 80% or 0.8
Implies there is an 80% chance of getting it right
Generally speaking: More people = more power
Statistical power
The probability of correctly rejecting the null hypothesis
Statistical power
The probability of seeing a true effect if one exists
Statistical power
Designers of studies typically aim for a power of 80% or 0.8
Implies there is an 80% chance of getting it right
Statistical power
______ calculates the number of participants a study must have to draw accurate conclusions
Takes into consideration: estimated effect size, sample means, etc.
power analysis
what does a power analysis take into consideration?
Takes into consideration: estimated effect size, sample means, etc.
is the probability of avoiding a type II error.
Power
Power may also be thought of as the likelihood that a particular study will detect a _____________ given that one exists.
deviation from the null hypothesis
If the β is the probability of making a type II error, 1- β is called the….
power of the test of hypothesis.
1- β is called….
the power of the test of hypothesis.
Statistical significance and p-value
How do we determine if the study result happened by chance alone?
What determines statistical “significance”?
Significance
The probability of rejecting a true H0
α = .05 usually set, acceptable error
—–Chance that 5 times out of 100 the H0 would be falsely rejected
Statistical significance and p-value
Probability level
The likelihood that the difference observed between two interventions could have arisen by chance
Accepted value is 5% risk (p = .05)
Means there is a 5% chance that the results happened by chance
Allows us to reject or accept the null hypothesis
Statistical significance and p-value
p is…
α is the acceptable error, usually = .05
p ≤ α
_____ the H0 (null hypothesis)
the results…
p > α
_____ the H0 (null hypothesis)
the results….
the chance of random error
α is the acceptable error, usually = .05
p ≤ α
reject the H0 (null hypothesis)
the results are statistically significant
p > α
fail reject the H0 (null hypothesis)
the results not statistically significant
Less than 5% chance the effect happened by chance.
“There is less than a 5% chance that the null hypothesis is falsely rejected”
Often reported as “significant at the 0.05 level.”
A statistically significant p = or < .05
If the number of subjects is small, p value tells us that the effect was either large or consistent (or both)
If the number of subjects is large, the effect size may not be that large
A highly significant p < .001
If the number of subjects is small, there might not have been enough subjects to find a difference that truly does exist
If the number of subjects is large, we can be confident that either there is no difference between treatments, or the treatment effect is not consistent
An insignificant p > .05
Statistical significance and p-value
Depends on several factors…
As all of these factors increase, the likelihood of finding statistical significance increases
How large the effect was
How consistent the effect was
How many patients were studied
—As all of these factors increase, the likelihood of finding statistical significance increases
What’s the difference between….
P value of .051 and .049
Look at the slides if you don’t know for sure…. slide 28 in particular