Chapter 12 - Statistical Analysis of Quantitative Data Flashcards

1
Q

Descriptive Statistics

A

Used to synthesize and describe data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parameters

A

When indexes (averages, percentages) are calculated with data from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Statistic

A

Descriptive index from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inferential Statistics

A

Used to help make inferences about the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Frequency distribution

A

Systematic arrangement of values from lowest to highest, together with a count or percentage of how many times the data occurred

  • easy to see highest and lowest scores, most common scores, where data clusters, and how many patients were in the sample
  • can be displayed in a “frequency polygon” where scores are graphed on horizontal line and frequency on vertical line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Symmetric Distribution

A

occurs if, when folded over, the two halves of a frequency polygons would line up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Positive Skew

A

when the longer tail points to the right

–>ex. personal income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Negative Skew

A

when the longer tail points to the left

–>ex. age of death

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unimodal vs. Multi-modal

A

one peak vs. multiple peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normal distribution

A

“bell-shaped curve”

  • symmetrical
  • unimodal
  • not very peaked

–>ex. height, intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central Tendency

A

measures of “typicalness”

  1. mode
  2. median
  3. mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mode

A

number that occurs most frequently in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Median

A

the point in a distribution that dived the scores in half, the middle point
-preferred when data is highly skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mean

A

the sum of all values divided by the number of participants
“average”
-most stable index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variability

A

how two distributions with identical means could differ in shape and how spread out the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Range

A

highest score minus the lowest score in a distribution

  • easy to compute
  • unstable
  • “gross descriptive index”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Standard Deviation

A

summarizes the AVERAGE amount of deviation of values from the mean

  • most widely used variability index
  • calculated based on every value in the distribution
  • 3 SDs above and below the mean in a normal/near normal distribution
  • lower SD = more homogeneous

+/- 1 SD: 68% of data
+/- 2 SD: 95% of data
+/- 3 SD: 99.7% of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Crosstabs (contingency) table

A

Two-dimentional frequency distribution in which the frequencies of two variables are crosstabulated
–>ex. differentiating between men and women in categories of non-smoker, light smoker, and heavy smoker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Correlation

A

To what extent are two variables related to each other?

–>ex. anxiety scores and BP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Correlation Coefficient

A

calculation that describes intensity and direction of a relationship
-how “perfect” a relationship is (ex. tallest person also weighs the most)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Positive Correlation

A

when an increase in one variable lead to an increase in the other (.01 to 1.00)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Negative (Inverse) Correlation

A

when a decrease in one variable leads to an increase in the other (-.01 to -1.00)
–>ex. depression and self-esteem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Pearson’s r

product-moment correlation coefficient

A

computed with interval or ratio measurements
-no clear guidelines for interpretation

  1. descriptive: summarizes the magnitude an direction of a relationship between two variables
  2. inferential: tests hypotheses about population correlations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Correlation Matrix

A

variables are displayed in both rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Absolute Risk

A

the proportion of people who experienced an undesirable outcome in each group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Absolute Risk Reduction Index

A

comparison of the two risks

  • ->computed by subtracting the absolute risk for the exposed group from the absolute risk for the unexposed group
  • *it is the proportion of people who would be spared the undesirable outcome through exposure to an intervention/protective factor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Odds Ratio

A

proportion of people with the adverse outcome relative to those without it

–>ex. those who continued smoking DIVIDED BY those who stopped smoking with the intervention

28
Q

Inferential Statistics

A

based on the “law of probability”

  • ->provide means for drawing conclusions about a population given the data from a sample
  • assume that the population has been randomly sampled
29
Q

Sampling Distribution of the Mean

A

Thinking about what would happen if you could draw many samples according to the same research data and graph them
-would be able to plot multiple means to make sure they were equivalent

Not necessary because…

  1. sampling distributions of means are normally distributed
  2. the mean of a sampling distribution equals the original population mean
30
Q

Standard Error of the Mean

A

the standard deviation of the error in the sample mean with respect to the true mean
-lower SEM = more accurate the mean is as an estimate of the population value

-larger population = less deviation from the mean

31
Q

Parameter Estimation

A

used to estimate a population parameter

–>ex. a mean proportion or a mean difference between two groups

32
Q

Point estimation

A

involves calculating a single statistic to estimate the parameter (ex. mean entrance exam score)
-convey no information about the estimate’s margin of error

33
Q

Interval estimation

A

indicates a range of values within which the parameter has a specified probability of lying

34
Q

Confidence Interval

A

establishes a range of values for the population value and the probability of being right

  • ->an estimate made with a certain degree of confidence (researchers usually use a 95% or 99% CI)
  • reflect how much risk researchers are wiling to take of being wrong - depends on the nature of the problem
35
Q

Confidence Limits

A

Upper and lower levels of the confidence interval

-involves the SEM

36
Q

Hypothesis Testing

A

uses objective criteria for deciding whether research hypotheses should be accepted as true or rejected as false
GOAL: use a sample to make inferences about a population
-assume the hypothesis is true and then gather evidence to disprove it

37
Q

Statistical Tests

A

used to help reject null hypotheses

38
Q

Type I Error

A

rejecting a null hypotheses that is, in fact, actually true

  • false positive conclusion
  • reducing Type I, increases Type II
39
Q

Typer II Error

A

acceptance of a false null hypothesis

  • false negative conclusion
  • reduce by increasing sample size
40
Q

Level of Significance

A

the probability of making a Type I error

  • does NOT mean important or meaningful
  • ->most frequently used levels (alpha) are .05 and .01
    ex. .05 level of significance = acceptance that a true null hypotheses would be rejected 5 times.
41
Q

Power Analysis

A

estimation of the probability in committing a Type II error (beta)

Power: ability of a statistical test to detect true relationships/is the compliment of beta

-acceptable risk for Type II error is .20 (ideally use a sample size that will give them a minimum power of .80)

42
Q

Test Statistic

A

in hypothesis testing resetters use study data to compute a test statistic
-there is a theoretical distribution to establish probable and improbable values –> used to accept or reject the null hypothesis

43
Q

Parametric Statistics

A

Use involves estimation of a parameter; assumes variables are normally distributed in the population; measurements are on interval/ratio scale.

44
Q

Nonparametric Statistics

A

Use does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn’t assume normal distribution in the population

45
Q

Statistically Significant

A

results are not likely to have been due to chance, at some specified level of probability

46
Q

Nonsignificant Result

A

any observed difference/relationship could have been the result of chance fluctuation

47
Q

How to use hypothesis testing procedures:

A
  1. select a test statistic
  2. specify the level of significance (usually .05)
  3. compute a test statistic - calculated based on collected data
  4. determine degrees of freedom - number of observations free to vary
  5. compare the test statistic to a theoretical value - significant or nonsignificant?
48
Q

p level

A

Probability
–>anything greater than .05 indicates a nonsignificant relationship (NS) - could have occurred on the basis of change in more than 5/100 samples

49
Q

t-test

A

used to test the significance of differences in two groups

  • ->can a significant portion of the variation be attributed to the IV?
  • uses group means, variability, and sample size
50
Q

Analysis of Variance (ANOVA)

A

used to test mean group differences of 3 or more groups

  • sorts out the variability of an outcome variable into two components:
    1. variability due to the IV (experimental group status)
    2. variability due to all other sources (individual differences, measurement error)
51
Q

F-ratio

A

the variation BETWEEN groups is contrasted with variation WITHIN groups

52
Q

Post Hoc Tests (multiple comparison procedures)

A

used to isolate the differences between group means that are responsible for rejecting the overall ANOVA null hypothesis

53
Q

Repeated Measures ANOVA (RM-ANOVA)

A

can be used when the mans being compared are means at different points in time (ex. mean BP at 2, 4, 6 hours post-op)

54
Q

Chi-Squared Test

A

used to test hypotheses about the proportion of cases in different categories (ex. crosstabulation)
–>computed by summing the differences between the observed frequencies in each cell and expected frequencies (if there were no relationships between variables)

55
Q

Null Hypotheses

A

there is NO relationship between two variables

population r = .00

56
Q

Effect Size Index

A

estimates of the magnitude of effects of an “I” component on an “O” component in the PICO questions
-important because even small effects can be statistically significant

57
Q

d statistic

A

Effect Size Index
–>summarizes the magnitude of differences in two means (ex. differences between experimental and control group means) on an outcome

  • d ≤ .20, small effect
  • d = .50, moderate effect
  • d ≥ .80, large effect
58
Q

Multivariate statistics

A

analyses dealing with at least 3 variables simultaneously

59
Q

Multiple Regression

A

To test the relationship between 2+ IVs and I DV OR to predict a DV from 2+ IVs
-outcome variables are interval or ratio level variables

60
Q

Multiple Correlation Coefficient (R)

A
  • NO negative values, varies from .00 to 1.00

- ->shows the strength of the relationship between several IVs and and outcome (but NOT direction)

61
Q

R squared

A

interpreted as the proportion of the variability in the outcome variable that is explained by the predictors
-used over R alone

62
Q

Analysis of Covariance (ANCOVA)

A

To test the difference between the means of 2+ groups while controlling for 1+ covariate

  • ->used to control confounding variables statistically - “equalize” the groups being compared
  • powerful, produce F statistics
63
Q

Multivariance Analysis of Variance (MANOVA)

A

To test the difference between the means of 2+ groups for 2+ DVs simultaneously
-used to test the significance of differences between the means of two or more group son two or more outcome variables considered simultaneously

Ex. comparing the effect of two exercise regimens on HR and BP

64
Q

Logistic Regression

A

To test eh relationship between 2+ IVs and 1 DV, to predict the probability of an event, to estimate relative risk

  • transforms the probability of an event occurring (ex. that a woman will practice breast self-examination or not) into its odds
  • odds ratio: factor by which change for a unit change in the predictors after controlling other predictors
  • ->yields CIs around the OR

–>ex. identifying various risk factors (parent edu, children’s use of computers) for childhood obesity (obese vs. not obese) in a sample of 1644 Korean children

65
Q

Multivariate Analysis of Covariance (MANCOVA)

A

To test the difference between the means of 2+ groups for 2+ DVs simultaneously, while controlling 1+ covariate

66
Q

p < __ vs. p > __

A

p : more than (results are NOT statistically significant)