Week 1&2 (Descriptive/Foundations/Experimental Designs/Comparing 2 Means/Inferential Tables/Statistical Software) Flashcards
What are the types of biostatistics
descriptive statistics
probability
estimate population parameters
hypothesis testing
Types of population
target and accessible
target population definition
the LARGER population to which results of a study will be generalized
accessible population definition
the ACTUAL population of subjects available to be chosen for a study
Sample definition
a subgroup of the population of interest
parameter
statistical characteristics of population
statistic
statistical characteristic of sample
descriptive statistic
used to describe a sample shape, central tendency, and variability
inferential statistic
used to make inferences about a population (t-test, ANOVA, Pearsons R)
measures of central tendency
mean, median, and mode
what is central tendency
central value, BEST representative value of the target population
what is variability
the “spread” of the data
small: spike like
large: wave
frequency definition
the number of times a value appears in a data set
frequency distribution
the pattern of frequencies of a variable
methods of displaying frequency distributions
histogram & stem and leaf plots
skewed to the left (image)
skewed to the right (image)
normal “skewed” (image)
different shapes of distributions
normal (B)
skewed to right (A)
skewed to left (C)
Skewed to right (words)
“tail” faces right not where the bulk of the curve lies
AKA “positive skew”
mean > median/mode
Skewed to left (words)
“tail” faces left
AKA “negative skew”
mean < median/mode
Measures of Central Tendency: best choice for MEAN
best choice for numberic
(not good for skewed data)
Measures of Central Tendency: best choice for MEDIAN
best for non-symmetrical data
Measures of Central Tendency: best choice for MODE
limited utility; nominal or ordinal data
common in surveys
Mean: Advantages
easy, don’t have to arrange in order, all formulas are possible
Mean: Disadvantages
can’t be used with categorical data, affected by extreme values
Median: advantages
easy, can be used with “ranked” data
Median: disadvantages
tedious in a large data set
should be used with ordinal
mode: advantages
easy to understand and calculate
Mode: disadvantages
not based on all values
unstable when the data consist of a small number of values
sometimes the data has 2+ modes or no modes at all
common measures of variability
range, interquartile range, standard deviation, variance, coefficient of variation
range
difference between highest and lowest score
percentiles of range
a score’s position within the distribution (divides into 100 parts)
quartiles of range
divides distribution into 4 equal parts
interquartile range (IQR)
difference between 25th and 75th percentile
often used with median
What is a box plot?
five-number summary of data set
(minimum, 1st quartile, median, 3rd quartile)
box = interquartile range
horizontal line at median
“whiskers” = minimum and maximum scores
coefficient of variation
used for interval and ratio data only
unitless
helpful comparing variability between two distributions on different scales
what shape is normal distribution?
bell-shaped
constant and predictable characteristics of normal distribution
68% of scores are 1 SD of the mean
95% of scores are 2 SD of the mean
99% of scores are 3SD of the mean
z-scores
a standardized score based on the normal distribution
allows for the interpretation of a single score in relation to the distribution of scores