Chapter 12 - Statistical Analysis of Quantitative Data Flashcards
Descriptive Statistics
Used to synthesize and describe data
Parameters
When indexes (averages, percentages) are calculated with data from a population
Statistic
Descriptive index from a sample
Inferential Statistics
Used to help make inferences about the population
Frequency distribution
Systematic arrangement of values from lowest to highest, together with a count or percentage of how many times the data occurred
- easy to see highest and lowest scores, most common scores, where data clusters, and how many patients were in the sample
- can be displayed in a “frequency polygon” where scores are graphed on horizontal line and frequency on vertical line
Symmetric Distribution
occurs if, when folded over, the two halves of a frequency polygons would line up
Positive Skew
when the longer tail points to the right
–>ex. personal income
Negative Skew
when the longer tail points to the left
–>ex. age of death
Unimodal vs. Multi-modal
one peak vs. multiple peaks
Normal distribution
“bell-shaped curve”
- symmetrical
- unimodal
- not very peaked
–>ex. height, intelligence
Central Tendency
measures of “typicalness”
- mode
- median
- mean
Mode
number that occurs most frequently in the distribution
Median
the point in a distribution that dived the scores in half, the middle point
-preferred when data is highly skewed
Mean
the sum of all values divided by the number of participants
“average”
-most stable index
Variability
how two distributions with identical means could differ in shape and how spread out the data is
Range
highest score minus the lowest score in a distribution
- easy to compute
- unstable
- “gross descriptive index”
Standard Deviation
summarizes the AVERAGE amount of deviation of values from the mean
- most widely used variability index
- calculated based on every value in the distribution
- 3 SDs above and below the mean in a normal/near normal distribution
- lower SD = more homogeneous
+/- 1 SD: 68% of data
+/- 2 SD: 95% of data
+/- 3 SD: 99.7% of data
Crosstabs (contingency) table
Two-dimentional frequency distribution in which the frequencies of two variables are crosstabulated
–>ex. differentiating between men and women in categories of non-smoker, light smoker, and heavy smoker
Correlation
To what extent are two variables related to each other?
–>ex. anxiety scores and BP
Correlation Coefficient
calculation that describes intensity and direction of a relationship
-how “perfect” a relationship is (ex. tallest person also weighs the most)
Positive Correlation
when an increase in one variable lead to an increase in the other (.01 to 1.00)
Negative (Inverse) Correlation
when a decrease in one variable leads to an increase in the other (-.01 to -1.00)
–>ex. depression and self-esteem
Pearson’s r
product-moment correlation coefficient
computed with interval or ratio measurements
-no clear guidelines for interpretation
- descriptive: summarizes the magnitude an direction of a relationship between two variables
- inferential: tests hypotheses about population correlations
Correlation Matrix
variables are displayed in both rows and columns
Absolute Risk
the proportion of people who experienced an undesirable outcome in each group
Absolute Risk Reduction Index
comparison of the two risks
- ->computed by subtracting the absolute risk for the exposed group from the absolute risk for the unexposed group
- *it is the proportion of people who would be spared the undesirable outcome through exposure to an intervention/protective factor
Odds Ratio
proportion of people with the adverse outcome relative to those without it
–>ex. those who continued smoking DIVIDED BY those who stopped smoking with the intervention
Inferential Statistics
based on the “law of probability”
- ->provide means for drawing conclusions about a population given the data from a sample
- assume that the population has been randomly sampled
Sampling Distribution of the Mean
Thinking about what would happen if you could draw many samples according to the same research data and graph them
-would be able to plot multiple means to make sure they were equivalent
Not necessary because…
- sampling distributions of means are normally distributed
- the mean of a sampling distribution equals the original population mean
Standard Error of the Mean
the standard deviation of the error in the sample mean with respect to the true mean
-lower SEM = more accurate the mean is as an estimate of the population value
-larger population = less deviation from the mean
Parameter Estimation
used to estimate a population parameter
–>ex. a mean proportion or a mean difference between two groups
Point estimation
involves calculating a single statistic to estimate the parameter (ex. mean entrance exam score)
-convey no information about the estimate’s margin of error
Interval estimation
indicates a range of values within which the parameter has a specified probability of lying
Confidence Interval
establishes a range of values for the population value and the probability of being right
- ->an estimate made with a certain degree of confidence (researchers usually use a 95% or 99% CI)
- reflect how much risk researchers are wiling to take of being wrong - depends on the nature of the problem
Confidence Limits
Upper and lower levels of the confidence interval
-involves the SEM
Hypothesis Testing
uses objective criteria for deciding whether research hypotheses should be accepted as true or rejected as false
GOAL: use a sample to make inferences about a population
-assume the hypothesis is true and then gather evidence to disprove it
Statistical Tests
used to help reject null hypotheses
Type I Error
rejecting a null hypotheses that is, in fact, actually true
- false positive conclusion
- reducing Type I, increases Type II
Typer II Error
acceptance of a false null hypothesis
- false negative conclusion
- reduce by increasing sample size
Level of Significance
the probability of making a Type I error
- does NOT mean important or meaningful
- ->most frequently used levels (alpha) are .05 and .01
ex. .05 level of significance = acceptance that a true null hypotheses would be rejected 5 times.
Power Analysis
estimation of the probability in committing a Type II error (beta)
Power: ability of a statistical test to detect true relationships/is the compliment of beta
-acceptable risk for Type II error is .20 (ideally use a sample size that will give them a minimum power of .80)
Test Statistic
in hypothesis testing resetters use study data to compute a test statistic
-there is a theoretical distribution to establish probable and improbable values –> used to accept or reject the null hypothesis
Parametric Statistics
Use involves estimation of a parameter; assumes variables are normally distributed in the population; measurements are on interval/ratio scale.
Nonparametric Statistics
Use does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn’t assume normal distribution in the population
Statistically Significant
results are not likely to have been due to chance, at some specified level of probability
Nonsignificant Result
any observed difference/relationship could have been the result of chance fluctuation
How to use hypothesis testing procedures:
- select a test statistic
- specify the level of significance (usually .05)
- compute a test statistic - calculated based on collected data
- determine degrees of freedom - number of observations free to vary
- compare the test statistic to a theoretical value - significant or nonsignificant?
p level
Probability
–>anything greater than .05 indicates a nonsignificant relationship (NS) - could have occurred on the basis of change in more than 5/100 samples
t-test
used to test the significance of differences in two groups
- ->can a significant portion of the variation be attributed to the IV?
- uses group means, variability, and sample size
Analysis of Variance (ANOVA)
used to test mean group differences of 3 or more groups
- sorts out the variability of an outcome variable into two components:
1. variability due to the IV (experimental group status)
2. variability due to all other sources (individual differences, measurement error)
F-ratio
the variation BETWEEN groups is contrasted with variation WITHIN groups
Post Hoc Tests (multiple comparison procedures)
used to isolate the differences between group means that are responsible for rejecting the overall ANOVA null hypothesis
Repeated Measures ANOVA (RM-ANOVA)
can be used when the mans being compared are means at different points in time (ex. mean BP at 2, 4, 6 hours post-op)
Chi-Squared Test
used to test hypotheses about the proportion of cases in different categories (ex. crosstabulation)
–>computed by summing the differences between the observed frequencies in each cell and expected frequencies (if there were no relationships between variables)
Null Hypotheses
there is NO relationship between two variables
population r = .00
Effect Size Index
estimates of the magnitude of effects of an “I” component on an “O” component in the PICO questions
-important because even small effects can be statistically significant
d statistic
Effect Size Index
–>summarizes the magnitude of differences in two means (ex. differences between experimental and control group means) on an outcome
- d ≤ .20, small effect
- d = .50, moderate effect
- d ≥ .80, large effect
Multivariate statistics
analyses dealing with at least 3 variables simultaneously
Multiple Regression
To test the relationship between 2+ IVs and I DV OR to predict a DV from 2+ IVs
-outcome variables are interval or ratio level variables
Multiple Correlation Coefficient (R)
- NO negative values, varies from .00 to 1.00
- ->shows the strength of the relationship between several IVs and and outcome (but NOT direction)
R squared
interpreted as the proportion of the variability in the outcome variable that is explained by the predictors
-used over R alone
Analysis of Covariance (ANCOVA)
To test the difference between the means of 2+ groups while controlling for 1+ covariate
- ->used to control confounding variables statistically - “equalize” the groups being compared
- powerful, produce F statistics
Multivariance Analysis of Variance (MANOVA)
To test the difference between the means of 2+ groups for 2+ DVs simultaneously
-used to test the significance of differences between the means of two or more group son two or more outcome variables considered simultaneously
Ex. comparing the effect of two exercise regimens on HR and BP
Logistic Regression
To test eh relationship between 2+ IVs and 1 DV, to predict the probability of an event, to estimate relative risk
- transforms the probability of an event occurring (ex. that a woman will practice breast self-examination or not) into its odds
- odds ratio: factor by which change for a unit change in the predictors after controlling other predictors
- ->yields CIs around the OR
–>ex. identifying various risk factors (parent edu, children’s use of computers) for childhood obesity (obese vs. not obese) in a sample of 1644 Korean children
Multivariate Analysis of Covariance (MANCOVA)
To test the difference between the means of 2+ groups for 2+ DVs simultaneously, while controlling 1+ covariate
p < __ vs. p > __
p : more than (results are NOT statistically significant)