Chapter 12 - Statistical Analysis of Quantitative Data Flashcards

1
Q

Descriptive Statistics

A

Used to synthesize and describe data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parameters

A

When indexes (averages, percentages) are calculated with data from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Statistic

A

Descriptive index from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inferential Statistics

A

Used to help make inferences about the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Frequency distribution

A

Systematic arrangement of values from lowest to highest, together with a count or percentage of how many times the data occurred

  • easy to see highest and lowest scores, most common scores, where data clusters, and how many patients were in the sample
  • can be displayed in a “frequency polygon” where scores are graphed on horizontal line and frequency on vertical line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Symmetric Distribution

A

occurs if, when folded over, the two halves of a frequency polygons would line up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Positive Skew

A

when the longer tail points to the right

–>ex. personal income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Negative Skew

A

when the longer tail points to the left

–>ex. age of death

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unimodal vs. Multi-modal

A

one peak vs. multiple peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normal distribution

A

“bell-shaped curve”

  • symmetrical
  • unimodal
  • not very peaked

–>ex. height, intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central Tendency

A

measures of “typicalness”

  1. mode
  2. median
  3. mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mode

A

number that occurs most frequently in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Median

A

the point in a distribution that dived the scores in half, the middle point
-preferred when data is highly skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mean

A

the sum of all values divided by the number of participants
“average”
-most stable index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variability

A

how two distributions with identical means could differ in shape and how spread out the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Range

A

highest score minus the lowest score in a distribution

  • easy to compute
  • unstable
  • “gross descriptive index”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Standard Deviation

A

summarizes the AVERAGE amount of deviation of values from the mean

  • most widely used variability index
  • calculated based on every value in the distribution
  • 3 SDs above and below the mean in a normal/near normal distribution
  • lower SD = more homogeneous

+/- 1 SD: 68% of data
+/- 2 SD: 95% of data
+/- 3 SD: 99.7% of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Crosstabs (contingency) table

A

Two-dimentional frequency distribution in which the frequencies of two variables are crosstabulated
–>ex. differentiating between men and women in categories of non-smoker, light smoker, and heavy smoker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Correlation

A

To what extent are two variables related to each other?

–>ex. anxiety scores and BP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Correlation Coefficient

A

calculation that describes intensity and direction of a relationship
-how “perfect” a relationship is (ex. tallest person also weighs the most)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Positive Correlation

A

when an increase in one variable lead to an increase in the other (.01 to 1.00)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Negative (Inverse) Correlation

A

when a decrease in one variable leads to an increase in the other (-.01 to -1.00)
–>ex. depression and self-esteem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Pearson’s r

product-moment correlation coefficient

A

computed with interval or ratio measurements
-no clear guidelines for interpretation

  1. descriptive: summarizes the magnitude an direction of a relationship between two variables
  2. inferential: tests hypotheses about population correlations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Correlation Matrix

A

variables are displayed in both rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Absolute Risk
the proportion of people who experienced an undesirable outcome in each group
26
Absolute Risk Reduction Index
comparison of the two risks - ->computed by subtracting the absolute risk for the exposed group from the absolute risk for the unexposed group * *it is the proportion of people who would be spared the undesirable outcome through exposure to an intervention/protective factor
27
Odds Ratio
proportion of people with the adverse outcome relative to those without it -->ex. those who continued smoking DIVIDED BY those who stopped smoking with the intervention
28
Inferential Statistics
based on the "law of probability" - ->provide means for drawing conclusions about a population given the data from a sample - assume that the population has been randomly sampled
29
Sampling Distribution of the Mean
Thinking about what would happen if you could draw many samples according to the same research data and graph them -would be able to plot multiple means to make sure they were equivalent Not necessary because... 1. sampling distributions of means are normally distributed 2. the mean of a sampling distribution equals the original population mean
30
Standard Error of the Mean
the standard deviation of the error in the sample mean with respect to the true mean -lower SEM = more accurate the mean is as an estimate of the population value -larger population = less deviation from the mean
31
Parameter Estimation
used to estimate a population parameter | -->ex. a mean proportion or a mean difference between two groups
32
Point estimation
involves calculating a single statistic to estimate the parameter (ex. mean entrance exam score) -convey no information about the estimate's margin of error
33
Interval estimation
indicates a range of values within which the parameter has a specified probability of lying
34
Confidence Interval
establishes a range of values for the population value and the probability of being right - ->an estimate made with a certain degree of confidence (researchers usually use a 95% or 99% CI) - reflect how much risk researchers are wiling to take of being wrong - depends on the nature of the problem
35
Confidence Limits
Upper and lower levels of the confidence interval | -involves the SEM
36
Hypothesis Testing
uses objective criteria for deciding whether research hypotheses should be accepted as true or rejected as false GOAL: use a sample to make inferences about a population -assume the hypothesis is true and then gather evidence to disprove it
37
Statistical Tests
used to help reject null hypotheses
38
Type I Error
rejecting a null hypotheses that is, in fact, actually true * false positive conclusion - reducing Type I, increases Type II
39
Typer II Error
acceptance of a false null hypothesis * false negative conclusion - reduce by increasing sample size
40
Level of Significance
the probability of making a Type I error - does NOT mean important or meaningful - ->most frequently used levels (alpha) are .05 and .01 ex. .05 level of significance = acceptance that a true null hypotheses would be rejected 5 times.
41
Power Analysis
estimation of the probability in committing a Type II error (beta) Power: ability of a statistical test to detect true relationships/is the compliment of beta -acceptable risk for Type II error is .20 (ideally use a sample size that will give them a minimum power of .80)
42
Test Statistic
in hypothesis testing resetters use study data to compute a test statistic -there is a theoretical distribution to establish probable and improbable values --> used to accept or reject the null hypothesis
43
Parametric Statistics
Use involves estimation of a parameter; assumes variables are normally distributed in the population; measurements are on interval/ratio scale.
44
Nonparametric Statistics
Use does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn’t assume normal distribution in the population
45
Statistically Significant
results are not likely to have been due to chance, at some specified level of probability
46
Nonsignificant Result
any observed difference/relationship could have been the result of chance fluctuation
47
How to use hypothesis testing procedures:
1. select a test statistic 2. specify the level of significance (usually .05) 3. compute a test statistic - calculated based on collected data 4. determine degrees of freedom - number of observations free to vary 5. compare the test statistic to a theoretical value - significant or nonsignificant?
48
p level
Probability -->anything greater than .05 indicates a nonsignificant relationship (NS) - could have occurred on the basis of change in more than 5/100 samples
49
t-test
used to test the significance of differences in two groups - ->can a significant portion of the variation be attributed to the IV? - uses group means, variability, and sample size
50
Analysis of Variance (ANOVA)
used to test mean group differences of 3 or more groups - sorts out the variability of an outcome variable into two components: 1. variability due to the IV (experimental group status) 2. variability due to all other sources (individual differences, measurement error)
51
F-ratio
the variation BETWEEN groups is contrasted with variation WITHIN groups
52
Post Hoc Tests (multiple comparison procedures)
used to isolate the differences between group means that are responsible for rejecting the overall ANOVA null hypothesis
53
Repeated Measures ANOVA (RM-ANOVA)
can be used when the mans being compared are means at different points in time (ex. mean BP at 2, 4, 6 hours post-op)
54
Chi-Squared Test
used to test hypotheses about the proportion of cases in different categories (ex. crosstabulation) -->computed by summing the differences between the observed frequencies in each cell and expected frequencies (if there were no relationships between variables)
55
Null Hypotheses
there is NO relationship between two variables | population r = .00
56
Effect Size Index
estimates of the magnitude of effects of an "I" component on an "O" component in the PICO questions -important because even small effects can be statistically significant
57
d statistic
Effect Size Index -->summarizes the magnitude of differences in two means (ex. differences between experimental and control group means) on an outcome - d ≤ .20, small effect - d = .50, moderate effect - d ≥ .80, large effect
58
Multivariate statistics
analyses dealing with at least 3 variables simultaneously
59
Multiple Regression
To test the relationship between 2+ IVs and I DV OR to predict a DV from 2+ IVs -outcome variables are interval or ratio level variables
60
Multiple Correlation Coefficient (R)
- NO negative values, varies from .00 to 1.00 | - ->shows the strength of the relationship between several IVs and and outcome (but NOT direction)
61
R squared
interpreted as the proportion of the variability in the outcome variable that is explained by the predictors -used over R alone
62
Analysis of Covariance (ANCOVA)
To test the difference between the means of 2+ groups while controlling for 1+ covariate - ->used to control confounding variables statistically - "equalize" the groups being compared - powerful, produce F statistics
63
Multivariance Analysis of Variance (MANOVA)
To test the difference between the means of 2+ groups for 2+ DVs simultaneously -used to test the significance of differences between the means of two or more group son two or more outcome variables considered simultaneously Ex. comparing the effect of two exercise regimens on HR and BP
64
Logistic Regression
To test eh relationship between 2+ IVs and 1 DV, to predict the probability of an event, to estimate relative risk - transforms the probability of an event occurring (ex. that a woman will practice breast self-examination or not) into its odds - odds ratio: factor by which change for a unit change in the predictors after controlling other predictors - ->yields CIs around the OR -->ex. identifying various risk factors (parent edu, children's use of computers) for childhood obesity (obese vs. not obese) in a sample of 1644 Korean children
65
Multivariate Analysis of Covariance (MANCOVA)
To test the difference between the means of 2+ groups for 2+ DVs simultaneously, while controlling 1+ covariate
66
p < __ vs. p > __
p : more than (results are NOT statistically significant)