Unit 3: Ch. 12 Flashcards
descriptive statistics
used to describe and synthesize data; typically used to describe demographic data
inferential statistics
used to make inferences about the population based on the sample
frequency distributions
systematic arrangement of numeric values on a variable from the lowest or the highest with a count or percentage of the number of times each value occurred
-ex: 57%, 5%, 28%, n=27
Frequency distributions can be described in terms of:
- shape
- central tendency
- variability
Can be presented in a table (Ns and percentages) or graphically
Standard deviation: abbreviated as “SD” in articles
- average deviation of scores in a distribution
- small SD indicates consistency among respondents; large SD indicates wide variation
standard deviation
abbreviated as “SD” in articles
- average deviation of scores in a distribution
- small SD indicates consistency among respondents; large SD indicates wide variation
frequency distributions: shape
symmetric or asymmetric
asymmetric = “skewed”
skew is a rough indicator of the normality of the distributed data; reflects the degree w/ which scores on a variable fall at one end of the other on the scale of a variable
- normality: under a normal curve, it’s a normal distribution (“bell shaped curve”)
- skew test: ex test scores
- if the majority of scores fall at the high end of the scale, the distribution is negatively skewed
- if the majority of the scores fall at the low end of the scale, the distribution is positively skewed
frequency distributions: central tendency
an index of the typicalness of a set of scores that comes from the center of the distribution
-ex: normal curve (“middle 2/3 of the normal curve”)
3 measures:
- Mode: most frequently occurring score in distribution
- Median: point in the distribution in the middle
- Mean: average; in articles you’ll see “M” for mean or the word “average”; may see “x-bar” (an “x” w/ a line over it)
frequency distributions: variability
degree to which scores in a distribution are spread out or dispersed
- homogeneity: when there is little variability in the group of scores you have (“everybody is roughly getting in the same few scores”)
- heterogeneity: a lot of variability (ex: exam scores)
- Range: highest value to the lowest value
bivariate descriptive statistics include? (2)
- cross tabs (aka contingency table)
2. correlation coefficients
bivariate descriptive statistics: Cross Tabs (Contingency Table or 2-way ANOVA)
evaluates whether a statistical relationship exists between 2 variables, each of which has different levels
- ex: types of maltreatment (maltreatment is a “construct” with 5 levels); demographic variables (whether or not someone is a HS graduate, whether or not they’re employed)
- researcher ends up with the probability of a thing happening
- -> ex: “adolescent mothers who have been sexually abused were 2.5 times more likely to be high school graduates and employed”
- produces an odds ratio
- -> odds ratio: the odds of one thing happening versus the odds of another thing happening
- —> ex: high school graduation: either you graduate or you don’t, odds ratio gives probability of that happening
bivariate descriptive studies: Correlation Coefficients
describe the intensity and direction of a relationship between 2 variables
intensity: represented by the absolute value of a number
direction of the relationship is indicated by either a positive (+) or a negative (-) sign
the higher the absolute value of a number the stronger the relationship
- ex: number of 0.25 is much weaker than relationship with 0.80
- when looking at the strength of the relationship you’re not looking at the +/- sign –> you’re just looking at the number itself (the absolute value)
Correlation uses the “r” statistic (“r=”)
Negative relationship: inverse relationship; when one variable goes up the other variable goes down
Positive relationship: when one variable goes up the other variable goes up
R = -0.45 and -0.80 –> -0.80 is stronger relationship
describing risk
clinical decision making for EBP may involve the calculation of risk indexes, so that decisions can be made about relative risks for alternative tx or exposures
Some frequently used indexes:
- absolute risk
- absolute risk reduction (ARR)
- odds ratio (OR)
absolute risk
the proportion of people who experienced an undesirable or desirable outcome
-ex: proportion of smokers who get lung cancer
absolute risk reduction (ARR)
the difference between the absolute for exposure to an outcome and the absolute risk for no exposure to an outcome (“exposure minus no exposure”)
odds ratio (OR)
proportion of people w/ an adverse outcome relative to the people w/o the adverse outcome (“probability”)
ex: OR = 1.75
may be seen in conjuncture with confidence intervals
look at p. 225 of textbook for examples of OR
needs to be statistically significant to mean that the result didn’t happen by accident
inferential statistics
used to make objective decisions about population parameters using the sample data
-using a sample to make decisions about the whole population
based on laws of probability
Point estimation
a single, descriptive statistic that estimates the population value (e.g. a mean, percentage, or OR)
-basically saying you have one number you’re dealing with
SEM (standard error of the mean)
used to determine the SD of the sampling distribution
Interval Estimation
a range of values within which a population value probability lies
-involves computing a confidence interval (CI)
confidence interval (CI)
indicates an upper and lower range; looking at the probability of a particular score you get being within that range
ex: 95% CI of 40-60 for a sample mean of 45 indicates there’s a 95% probability that any one score will be between that range
parametric statistics
involves the estimation of a parameter; assumes variables are normally distributed in the population (“not skewed”); measurements are on interval/ratio scale
use for big samples
nonparametric statistics
does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn’t assume normal distribution in the population
use for small samples
type 1 error
rejecting a null hypothesis when it’s true
- getting a false positive
- saying someone has a condition/dz when they don’t have it (ex: false positive pregnancy test)
type 2 error
accepting the null hypothesis when it’s false
- getting a false negative
- saying that someone doesn’t have a condition/dz when they really do