Basic Statistics Flashcards
What is an independent variable (IV)?
- variable examined to determine its effect on outcome of interest (DV)
- under control of experimenter - manipulated variable
e. g., dose of a drug
What is a variable?
measurable characteristic that changes with person, environment, experiment
e.g., blood pressure, A1c levels, cholesterol LDL/HDL levels
What is a dependent variable (DV)?
- outcome of interest measured to assess effects of IV
- not under experimenter control
e. g., how a person reacts to the drug
What is a subject, or organismic variable?
naturally occurring IV characteristic of people but not controlled
e.g., gender, race, BRCA1
What are the four different types of data?
- nominal
- ordinal
- interval
- ratio
What is nominal data?
- qualitative (name)
- mutually exclusive without logical order
e. g., types of physical activity a diabetic patient engages in - walking, swimming, hiking
What is ordinal data?
- qualitative
- mutually exclusive with logical rank ordering ()
e. g., ratings of how a patient feels: very poor, poor, average, good, very good
What is interval data?
- quantitative with equal units of measurement allowing for the distance between two pairs to be equivalent in some way
- there is an arbitrary (no meaningful) zero point
e. g., cancer patients rate their level of energy on a 1-10 scale
What is ratio data?
- quantitative with equal units of measurement where numbers can be compared as multiples of one another
- meaningful zero point
e. g., height, weight, length
What are the two different types of numbers?
- discrete/discontinuous data
- continuous data
What is characteristic of discrete data?
only whole numbers allowed
e.g., # of manic episodes in a week
What is characteristic of continuous data?
any values allowed
e.g., weight, height, fasting blood glucose levels
On which axis is the independent variable typically plotted on?
x-axis
On which axis is the dependent variable typically plotted on?
y-axis
What are some features of a bar graph?
- nominal, sometimes ordinal data
- each bar = category
- height = frequency (proportion or %)
- bars do not touch for categories (but if have two+ groups the groups within each category (males and females) can touch)
- if ordinal data, must preserve order
- can be vertical or horizontal
What are some features of a histogram?
- interval, ratio date; sometimes ordinal
- same rules as bar, BUT bars touch
- usually for discrete data
What are some features of a line/frequency graph?
- interval, ratio, sometimes ordinal data
- usually for continuous data
- points represent data and lines connect the data points showing the continuous nature of data (i.e., can have any value between)
What are the different forms of graphs?
- normal: bell-shaped or symmetric about a line drawn through the center
- skewed: not symmetric, shifted to one side or the other
What are two types of skewed graphs?
- negative skew: fewer scores at the low end, peak shifted to the right
- positive skew: fewer scores at the high end, peak shifted to the left
What is the mean of data?
- a.k.a. average
- a single value meant to typify a list of values
- most common measure of central tendency
- most appropriate when data are normally distributed (affected significantly by outliers or extreme values)
- symbolized as u (population data) or Xbar (for sample data)
- basic arithmetic mean calculated by adding up all data values and dividing by the number you have (e.g., 4+5+12 = 24/4 = 6 = mean)
What is kurtosis?
the sharpness of the peak of a frequency-distribution curve
Why is mean most appropriate for data that is normally distributed?
it is affected significantly by outliers or extreme values
How do you calculate the mean?
basic arithmetic mean calculated by adding up all data values and dividing by the number you have
e.g., 4+5+12 = 24/4 = 6 = mean
What is the median?
- midpoint of a distribution of scores so 1/2 fall above and 1/2 fall below (50th percentile)
- appropriate measure of central tendency with skewed distributions and those with outliers or extreme values
- if you have an odd array of values, put them in ascending order and the median is the humber in the middle
- if even array, put in ascending order, take mean of the two middle values
What is the mode?
- most common score in a distribution
e. g., scores are 2 3 4 4 4 5 ; mode is 4 - can have more than one mode
e. g., scores are 2 4 4 5 6 6 7 ; mode = 4,6
What is variance?
- measure of dispersion which is calculated by taking the average of the squared differences between the mean and all the scores contributing to the mean
- used with the mean
- tells you, on average, how far the score varies from the mean
- symbolized as sigma^2 for a population, s^2 for a sample
- outliers and extreme values increase variance
- typically shown with the mean and reported as the mean +/- the variance
How do outliers and extreme values affect the variance?
increase variance
What is standard deviation?
- square root of the variance
- used with the mean
- converts variance to a score interpretable in terms of measurement scale
- outliers and extreme values increase standard deviation
- typically shown with the mean and reported as the mean +/- the standard deviation
- sigma for population, s for sample
What is the measure of dispersion with the median?
interquartile range (IQR)
What is the interquartile range (IQR)?
- the middle 50% of scores in a distribution
- used with the median
- not affected by outliers or extreme values
- the range of scores between the 25th and 75th percentiles of a distribution
What is characteristic of sampling distribution?
fill in
What is characteristic of sampling distribution?
the sampling distribution of a statistic is the distribution of all values of that statistic for every sample of a particular size from the population
In a normal curve, how do the mean, median, and mode relate?
they are all equal
What is the Central Limit Theorem (CLT)?
- relationship between a population mean and its sampling distribution
- it describes the conditions under which the sum of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed
- justifies the approximation of large-sample statistics to the normal distribution in controlled experiments
- provides a way to analyze data and test hypotheses
Central Limit Theorem:
When random samples of fixed size are drawn from a population and as the sample size gets larger three things are assumed to occur:
- the distribution of sample means approaches normality
- the overall mean of the samples approaches the mean of the population
- the standard deviation of the sample means equals the standard deviation of the population divided by the square root of the sample size
accuracy
the degree of closeness to which a measured quantity is to its actual/true value
reliability
the extent to which a measurement procedure yields a consistent outcome when done repeatedly
Think of a target:
What is known as your ability to hit the target itself?
What is known as your ability to repeated hit the bulls-eye?
- accuracy
- precision/reliability
If a study has both high accuracy and high precision, what does this say about the sample estimate?
- the sample estimate of the population will be close
- repeated studies will show little variability
validity
does the measurement tool really measure what it is intended to measure
The question of whether standardized tests (e.g., SAT, GRE, MCAT) really measure someone’s ability to perform well in college, medical school, etc. is a question of these tests’ what?
validity
What three things affect the quality of the data collected and thus the quality of the decisions made based on those data?
- accuracy
- reliability/precision
- validity
population
complete set of people/objects having some common characteristic
parameter
- value summarizing characteristic of population
- constants
- use Greek letters to represent
sample
- subset of population
- share same characteristics
statistic
- value summarizing characteristic of a sample
- are variable
- use Roman letters to represent
simple random sample
subset of population selected so that each population member has equal and independent chance of being chosen
random assignment
assign subjects to treatments in equal and independent manner to avoid bias