biostatistics Flashcards
what is a sampling error
a statistical error that occurs when a selected sample does not represent the entire population
the results found in the sample do not represent those that would be found from the entire population
what is random error
error by chance because we have a sample and not the whole population
how to reduce sampling error
increase sample size
select a representative sampling (probability/ random sampling)
data types
variable:
numerical: continuous and discrete
categorical: ordinal and nominal type
what is the best way to summarize and analyze Numerical variables
histograms and box and whisker plots
summary; mean and SD, variance if symmetrical
median and IQ range if skewed
what is the best way to summarize categorical data
pie charts and bar graphs frequency(relative and cumulative)and proportions
what is a null hypothesis
when we assume there is no relationship between variables in the population,
it is always right until you have enough evidence to reject it
what is an alternative hypothesis
that there is a relationship between variables
type 1 error
reject the null hypothesis when it is actually true
1- alpha
probability of making a type 1 error
type 2 error
failure to reject the type 1 when it should have been rejected
1-B
probability of making a type 2 error
what does the p-value tell you
if p is less or equal to 0.05 we reject the null hypothesis and the p value is significant
if p is greater than 0.05 we do not reject the null
hypothesis and the p value is not significant( we do not have enough evidence to reject the null hypothesis
factors that influence statistical power
sample size
big sample sizes have lots of statistical power
effect size(difference in means, proportions, odds or risk ratio)- big effect size require less power
level of significance (if we want less error then we need more power)
when do we use a chi square test
for comparison of categorical data
when do we use a t-test
for comparison of categorical and numerical data
when do we use a correlation test
for relationship between numerical data
what do we use the shapiro-wilks test for
only used with numerical data
Tests whether the distribution of the variable is different from what
we know to be a normal distribution
parametric tests
for normally distributed data
compare mean
one way anova- cannot tell which specific groups are different
students t test
non parametric tests
for skewed data
Tests for differences in the overall distribution of
the variable
mann-whitney
wilcoxon signed rank
kruskal wallis- cannot tell which specific groups are different
pearson’s correlation
parametric
for normally distributed
spearmans correlation
non parametric
for skewed data
confidence interval
measure of precision
wide- lack of precision
narrow- good precision
pros and cons of random sampling
pros
accurate representation of the population
ease of use
cons
if sampling frame is large, method is impractical
minority subgroups of interest in a population may not be present in sample
cluster sampling
Subjects in the same cluster are different
from one another regarding the factor of
interest (heterogenous)
Each cluster is similar to other clusters
Inclusion in the
sample
Only a subset of clusters are in the sample
stratified sampling
Subjects in the same stratum are
similar to one another regarding the
stratifying factor (homogenous)
Each stratum is different from other
strata
All strata are represented in the
sample
convenience sampling
• Selecting participants who are close at hand, readily available, convenient
purposive sampling
• Researcher selects participants because they have certain characteristics
• Often used in qualitative research when particularly interested in insights from
certain types of people
snowball sampling
- Find or two eligible people and ask them to refer others to you
- Useful for difficult to access groups
volunteer sampling
- Participants self-select
* Put out an advert and see who signs up