research stats midterm Flashcards
what is biostatistics?
the statistics of medicine, health sciences and public health
define target population
larger population to which results will need to be generalized
define accessible population
actual population of subjects available
define sample
subgroup of accessible population which allows results to be generalized
define parameter
statistical characteristic of population
define statistic
statistical characteristic of sample
define descriptive statistic
describes sample shape, central tendency, variability
define inferential satistic
used to make inferences about a population
define central tendency
the central value
best representative value of target population
single value
define variability
spread of the data
define frequency distribution
the pattern of frequencies of a variable
3 measures of central tendency
mean - average
median - two equal halves
mode - most frequent score
describe skewed to the right
tail faces right
positive skew
mean > median/mode
describe skewed to the left
tail faces left
negative skew
mean < median/mode
when is mean best to use?
numeric, symmetric data
not good for skewed
when is median best to use?
skewed data
not effected by extremes
when is mode best to use?
nominal or ordinal
common in surveys
advantages to mean
easy to calculate and interpret
dont need to arrange values
all values represented
all algebraic formulas possible
disadvantages to mean
cant be used with categorical data
cant calculate if data missing
affected by extremes
advantages to median
easy to calculate
not affected by extremes
can be used with ranked data
disadvantages to median
tedious in large data set
problematic with even number of observations
doesnt account for all values
advantages of mode
easy to understand and fine
not affected by extremes
easy to ID in data set and in frequency distribution
mode is useful for categorical data
disadvantages of mode
not defined if no repeats
not based on all values
unstable when data has small number of values
sometimes could have 2+ or no modes
when would you choose median over mode?
distribution is skewed
researcher is using ordinal data
define range, percentiles, quartiles
R - max-min
P - divides into 100 parts
Q - four parts
define interquartile range
difference between 25th and 75th percentile
used with median
describe box plot
min
1st quartile
median
3rd quartile
max
define standard deviation
reported same units as raw scores
mean +/- SD
define variance
square of SD
coefficient of variation
used for interval and ratio data only
expressed as percentage
unitless so good for comparing scales
constant and predictable characteristics
68% +/- 1SD
95% +/- 2 SD
99% +/- 3 SD
define a z-score
standardized score based on normal distribution
z = SD units
z = score - mean / SD
define sampling error
sample mean will not equal the population mean. the difference is called sampling error
how well does the sample represent the population?
z scores for CI calculations
90% = z 1.65
95% = z 1.96
99% = z 2.58
central limit theorem
will approach mean is N increases
define point estimate
single value the is best estimate
define confidence interval
range of values that we are confident contains parameter
how would you increase precision (narrow) in CI?
larger sample size
less variance (lower SD)
lower selected level of confidence to 90%
CI equation
CI = mean +/- (z) SEM
define null hypothesis
no difference or relationship
will with reject or fail to reject
define alternative hypothesis
is a difference or relationship
error: liar or blind
type 1: liar, p value
type 2: blind
if p value is less than or equal to alpha,
reject the null
if p value is greater than alpha,
fail to reject the null
what happens if we fail to reject the null?
attribute any observed difference to sampling error only
what p value and CI are analogous to each other?
95% CI
.05 p value
significance of type 1 error
mistakenly finding difference
p value tells probability
significance of type 2 error
mistakenly finding no difference
statistical power = 1-B
power is probability of rejection
critical values for two tailed test
2.5% of critical region on each side of non critical
nondirectional hypothesis
critical values of one tailed test
all 5% of critical region on the side hypothesis supports
directional hypothesis
which (one or two tailed) is more powerful
on tailed
define statsical power
probability of finding a statistically significant difference if such difference exists in the real world
what are the four powers of power?
alpha
effect size
variance
sample size
best way to increase power
increase sample size
determinants of statistical power
p = power
a = alpha level
n = sample size
e = effect size
what is A priori
before data collection
what is Post hoc
after data collection
only an issue of you fail to reject null
CI analysis
if upper boundary excludes important benefit of treatment, trial is definitively negative
if CI includes important benefit, treatment might still be worthwhile
define parametric statistics
assumes that sample data comes from population that follows a probability distribution based on a fixed set of parameters
what are the 4 assumptions of parametric tests?
scale data - ratio or interval
random sampling
equal variance - roughly equivalent before starting
normality - normal distribution
what does a t-test do?
determines if the difference in sample represent a real difference in the population or is if just sampling error
what are examples of two levels of one independent variable?
two different groups
one single group with two interventions
one single group with pre and posttest measurements
conceptual bias of comparing means
sample means will be different
variance comes from two sources
~the IV and everything else
conceptual bias with independent groups
t= difference between means / variability within groups
conceptual bias with repeated measures
t = mean of differences between pairs / SD error of the difference scores
what if t > 1?
you have a greater difference between groups
what if t< 1?
you have more variability within groups
what is the most simple t test equation?
t = treatment effect + error / error
what are degrees of freedom?
the number of independent pieces of information that went into calculating the estimate
number of values that are free to vary
independent (unpaired t-test)
numerator is difference between group means
denominator represents the variance within groups
assumptions for unpaired t-tests
data from interval or ratio
samples are randomly drawn from populations
homogeneity of variance - equal variances
population is normally distributed
are unequal variances an issue?
not a major issue when sample sizes are equal
effect size for t-test
use cohen’s d
small d = 0.20
medium d = 0.50
large d = 0.80
extra large d = 1.0 or 1.1
paire t-test
numerator is mean of paired difference scores
denominator is standard error of difference scores
3 assumptions for paired t-test
data from ratio or interval
samples are randomly drawn from populations
population is normally distributed
what is an inappropriate use of multiple t-tests
to compare more than 2 means within the same sample
“family wise error”
increase chance of type I error
which t test is used for independent groups with one IV
independent
which t test is used for repeated measures with one IV
paired
levene’s test
for equal variances for independent groups
tests the null: no dig difference in variance between
what statistic does the ANOVA use?
the F statistic
(ANOVA) if variance between samples is small,
F will be small
(ANOVA) if variance within samples is small,
F will be large
what is an ANOVA for?
compare 3+ groups
one way ANOVA
one IV with 3+ levels
one way repeated measures ANOVA
one IV with 3+ levels
comparison of group means in ANOVA
looks at distance of each group from the grand mean
what is the F test called?
omnibus test
will tell that a difference exists, but not where
what tells where a difference exists?
multiple comparison tests
ANOVA effect size small
eta squared: .01
cohen’s f: .10
ANOVA effect size medium
eta squared: .06
cohen’s f: .25
ANOVA effect size large
eta squared: .14
cohen’s f: .40
increased power in RM ANOVA
less variance
define sphericity
homogeneity of variance of differences
test with mauchly’s test
what is another name for multiple comparison tests?
pairwise comparisons
describe post hoc MCT
performed after ANOVA
most common
test every difference
describe planned comparisons MCT
instead of ANOVA
focused on specific comparisons
what is the goal of MCT
decrease family wise error rate
what is a solution of of family wise error?
bonferroni correction
divide alpha by the number of statistical tests
describe fisher’s least significant difference
essentially unadjusted t-tests (LSD)
least conservative
most power
describe tukey’s honestly significant difference
IG only
middle of the road in terms of risk
most common
best balance of type I and II error
describe bonferroni t-test
divides alpha by # of comparisons
most conservative
high type II error
describe sidak
RM
adjusted alpha
good balance of type I and II error
most common