Statistical Analysis of Quantitative Data Flashcards

1
Q

Purpose of Stat Analysis in Quan Research

A
  1. To describe the data (ex: sample characteristics)
  2. Estimate population values
  3. To test hypotheses
  4. To provide evidence regarding measurement properties of quantified variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Levels of Measurement from Lowest to Highest

A

Nominal
Ordinal
Interval
Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nominal Level

A

Lowest level

involves using numbers simply to categorize attributes

Named

ex: eye color

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ordinal level

A

2nd Lowest Level

Ranks people on an attribute - hierarchy but unquantifiable - cannot know the distance betwen levels or it cant be quantified

Named and Natural Order

ex: Level of satisfaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interval Level

A

2nd highest level

Ranks people on an attribute AND specifies the distance between them - oftne used interchangeably with ratio

Named, Natural Order, and Equal distance between intervals

ex: Temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ratio Level

A

highest level

ratio scales, unlike interval scales, have a meaningful zero and provide information about the absolute magnitude of the attribute

Named, Natural Order, Qual distance between intervals, and a “True Zero” so ratio between values can be calculated

ex: Height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Nominal = ___

A

Names

ex: Male = 1; Female =2 etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal Level is more like taking ____ data

A

qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In many experiments, the independent variable is what level?

A

Nominal!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Numeric Pain Scale is what level

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Age is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hours studied for a test is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Most biophysiologic data like pulse is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amount of money in bank account is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What level are the following 4 things:

  1. Time of Day
  2. Completion time for running (hr/time)
  3. Runner registration # ina race
  4. Finish order for a race
A
  1. Interval (0 does not mean absence of time so its not ratio)
  2. Ratio
  3. Nominal
  4. Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What level is gender

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What level is height wieght and pulse

A

ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What level is Grade in School

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What level is temperature

A

interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What level is zip code

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What level is dates on a calendar

A

interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Descriptive Statistics

A

Used to describe and synthesize data

Describes the data and what the sample looks like

Involves parameters and statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What sets parameters and statistics apart`

A

Parameters are descriptors for a population

Statistics is a descriptive infex from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Inferential Statistics

A

USed to make inferences about the population base don sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does descriptive and inferential statistics differ

A

Descriptive stats is just for the group in front of you but inferential stats makes the inferences about the generalizable population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Frequency Distribution

A

A systemic arrangement of numeric values on a variable from lowest to highest and a count of the number of times (and/or percentage) each value was obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Frequency distributions can be described in terms of what 3 things

A
  1. Shape
  2. Central Tendency
  3. Variability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

In what ways can frequency distributions be presented

A
  1. In a table (Ns and percentages)
  2. Graphically (ex: frequency polygons)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Frequency distributions can be described by their ____

A

symmetry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is normal symmetry of a frequency distributionc alled

A

Normal Distribution (Bell Curve)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Skewed/Asymmetric Frequency Distribution

A

A distribution either skewed positively or negatively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Positive Skew

A

Long tails point right

ex: Income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Negative Skew

A

Long tails point left

ex: Youth death

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Modality

A

number of peaks in a frequency distribution

can be unimodal, bimodal, multimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Unimodal

A

1 peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Bimodal

A

2 peaks

can include normal distribution if averaging 2 peaks into a bell shaped curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Multimodal

A

2+ peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Central Tendency

A

index of “typicalness” of a set of scores that comes from center of the distribution

Includes mode median and mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Mode

A

Measure of central tendency that is the most frequently occurring score in a distribution

ex: 2333456789 - Mode = 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Median

A

measure of central tendency where the point in a distribution above which and below which 50% of cases fall

ex: 23334|56789 - Median = 4.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Mean

A

measure of central tendency that equals the sum of all the scores divided by the total number of scores

ex: 2333456789 Mean = 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What measure of central tendency is most useful for when scores are skewed

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What measure of central tendency is seen msot frequently

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Which measures of central tendency is least helpful and most helpful when using standard deviation

A

Mode - least helpful

Mean - most helpful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Why is median helpful for skewed results

A

because it can offset the skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Variability

A

the degree to which scores in a distribution are spread out or dispersed: homogeneity v heterogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Homogeneity

A

Little variability in a frequency distribution sample

Makes for a taller and less wide spike

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Heterogeneity

A

Great variability in a frequency distribution sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What are the 2 indexes of variability not seen in something like the mean

A

Range

Standard Deviation (SD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Range

A

The highest value minus the lowest value

shows variability

can be misled by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Standard Deviation (SD)

A

average deviation of scores in a distribution

shows variability - preferred to range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the Rule when it comes to standard deviations?

A

Rule of 68, 95, 99.7

68% of all Data/Sampling occurs within +/- 1 SD

95% of all data/sampling occurs within +/- 2 SD

99.7% of all data/sampling occurs within +/- 3 SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is important to know about the tails of Standard Deviation in a Normal Distribution

A

the 0.3% outside 3 SD means that the tails never truly touch 0 - so there is always a theoretical possibility for outliers any distance out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Bivariate Descriptive Statistics

A

Used for DESCRIBING the relationship between 2 variables

Approachs: Crosstabs (Contingency Table) or Correlation Coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Correlation Coefficient

A

Describes the intensity and direction of a relationship

Rnages from -1 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Negative Correlation Coefficient Relationship

A

-1 to 0

One variable increases in value as the other decreases

ex: amount of exercise and weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Positive Correlation Coeffieicnt Relationship

A

0 to 1

Both variables increase/decrease

ex: Calorie consumption and weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What does a correlation coefficient of 0 mean

A

there is no value difference / there is no relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

The greater the absolute value of the correlation coefficient…

A

the stronger the relationship

ex: r=-.45 is stronger than r=.40

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

If there are multiple variables and you want to see all of the correlations/relationships what can be displayed

A

A correlation matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Pearson’s r

A

the product-moment correlation coefficient

computed with continuous measurements

r

used for Ratio level / scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Spearman’s rho

A

used for correlations between variables measured on an ordinal scale (lower level) as compared to pearson’s r being ratio

63
Q

Clinical Decision Making in EBP involves the calculation of what ?

A

Risk indexes - so that decisions can be made about relative risks for alternative treatments or exposures

ex: Absolute Risk, Absolute Risk Reduction (ARR), Odds ratio (OR), Numbers needed to treat

64
Q

Absolute Risk

A

Index used a lot in clinical decision making to decide in doing an intervention and whether there will be actual reduction of poor outcomes

65
Q

Absolute Risk Reduction (ARR)

A

Comparing risks in the group who got the outcome and who did not -estimated proportion of those spared undesirable outcomes because of their exposure to this intervention

66
Q

Odds Ratio (OR)

A

Odds of proportion of those with the adverse outcome relative to those without it - what are the odds experimental group v control group develop undesirable outcomes

Often seen in media/lay terms

67
Q

Numbers Needed To Treat Risk Index

A

Estimation of how many people need to get an intervention before we see the prevention of one tru undesirable outcome

So if 3.3 people need a smoking intervention before 1 quits smoking we can take this into account for budgeting purposes

68
Q

Inferential Statistics

A

Used to make objective decisions about population parameters using sample data

Provides a means for drawing inferences about a population, given data from a sample

ex: Taking tylenol is the assumption of the trial’s generalizations

69
Q

Inferential stats is based on …

A

the laws of probability

70
Q

Why is sampling error a big issue for inferential statistics

A

Because fluctuation in samples/Unrepresentative samples do not allow accurate generalizability to the greater population

A math program will assume we used best methods, but if our convenience sample under or overrepresented the population then it still runs the numbers assuming this and gives false results - this is why we should remain skeptical

71
Q

Inferential statistics uses the concept of…

A

theoretical distributions (to the entire population)

ex: Sampling distribution of the mean error

72
Q

What do the stats/sampling distributions of inferential samplings act as a proxy for

A

Since we do not have the time or means to do infinite sampling we can assume principles of stats to assume what the general population mean would be

73
Q

Inferential statistics always assumes that the population is…

A

normally distributed

74
Q

What is the standard deviation called in inferential statistics

A

Standard Error of the Mean (SEM)

So the SE is estimated from the SE of the actual sample

75
Q

The ____ the SEM the better the generalizability

A

smaller

76
Q

What improves accuracy of the estimate and shrinks SEM

A

larger sample size

77
Q

alpha represents…

A

threshold of risk (5% chance for error and chance of being outside the 95% SE)

78
Q

Why is it important to not udner/over represent in sampling as it impacts SEM

A

because we can end up in the tails of the distribution without knowing it

sometimes it is not our fault but we have to prevent the times it is

79
Q

Alpha states there is a 5% risk…

A

that our results came from the chance the null hypothesis is true

80
Q

2 Purposes of Inferential Statistics

A

Point Estimation / Interval estimation

Hypothesis Testing

81
Q

Point Estimation

A

a single descriptive statistic that estimates the population value

ex: a mean, percentage, or OR

ex: mean BP, mean score on a scale, etc

82
Q

Interval Estimation

A

a range of values within which a population value probably lies

involes computing a confidence interval (CI)

83
Q

Confidence INtervals reflect…

A

how much risk of being wrong researchers take in interval estimation

84
Q

Confidence Intervals

A

indicate the upper and lower confidence limits and the probability that the population value is between those limits

Confidence Limit is the estimate for a population range

85
Q

What are the 2 main confidence interval numbers seen

A

99%

95%

86
Q

What does a 95% CI of 40-50 for a sample mean of 45 indicate

A

that there is a 95% probability that the population mean is between 40 and 50

87
Q

How do 95% and 99% CI differ

A

95% = tighter parameters but less confident, allows for a more accurate estimate

99% = less risk and less tolerance for risk, but naturally means estimate is not as precise

88
Q

Hypothesis testing helps researchers…

A

to make objective decisions about whether results are likely to reflect chance differences or hypothesize effects

89
Q

We can only ever ___ or ___ the ___ hypothesis with statistical decisions from hypothesis testing

A

accept or reject the null hypothesis

(never proven or accepting the research hypothesis)

90
Q

Decisions of hypothesis is always made regarding which hypothesis

A

the null (accept or reject)

91
Q

Rejecting the null implies..

A

there is a difference large enough between groups to say they are different from intervention rather than just general differences between the groups

92
Q

If the value of the test statistic indicates that the null hypothesis is improbable then…

A

results are statistially significant

93
Q

Nonsignificant results mean…

A

that any observed difference or relationship could have happened by chance

94
Q

Statistical decisions (sig or not) are either ___ or ____

A

correct or incorrect

95
Q

When can we know if a stat decision was correct or not

A

Not in the initial research but rather after enough replication

96
Q

Type I Error

A

” False Positive “

Rejection of the null when it should not be rejected - thought we saw something when there was not

97
Q

Any stat decision in an initial trial has some level/risk of…

A

type I or II error

98
Q

Telling a man he is pregnant would be what type of error

A

Type I Error

99
Q

Risk of Type I and II error is controlled by …

A

the level of significance (Alpha)

ex: Alpha = 0.05 or 0.01

100
Q

alpha is usually ____

A

0.05

the probability of rejecting the null hypothesis when it is true - if your p value is less than the alpha you reject the null

101
Q

Type II Error

A

“False Negative”

Failure to reject a null hypothesis when it should be rejected

102
Q

Telling a pregnant woman she is not pregnant is what error

A

Type 2 -false negative

103
Q

A type 1 error can only occur..

A

with statistically significant results

104
Q

Power

A

the ability of a test to detect true relationships

increases with larger samples –> larger power

105
Q

Power needs to be at least…

A

0.80

106
Q

Does Type I and II error mean an error was made necessarily?

A

No it means there was risk for making that error based on the conclusion

107
Q

Hypothesis Testing Procedure

A
  1. Select an appropriate Stat Test
  2. Specify level of significance (ex: alpha = 0.05)
  3. Compute a test statistic with actual data
  4. Determine Degrees of Freedom (df) for the test stat (made by program)
  5. Compare computed test stat to a theoretical value - decide if significant or not
108
Q

Important Bivariate Stat Tests

A

t tests

ANOVA

chi squared test

correlation coefficients

effect size indexes

109
Q

t-test

A

tests the difference between 2 means

2 types: independent groups between subjects and dependent (paired) groups within subjects

110
Q

t test for independent groups: between subjects test

A

tests difference of means for 2 independent groups

ex: men and women

IV is nominal

DV is continuous

111
Q

t test for paired groups: within subjects test

A

to test the difference of means of a paired group

ex: pretest v post test for same people

IV is nominal

DV is continuous

112
Q

p-value

A

probability of the difference between the means meaning the null hypothesis is true

So there is a 0.1(1%) chance that the difference in means is explained due to regular normal variation

113
Q

alpha v p-value

A

Alpah is a 5% risk for error, but the p value is a 1% cahnce that the difference is from regular error

if the p value is smaller than the alpha you can reject the null hypothesis

error does not mean mistake ehre it means there is normal distribution - opposite of bias

114
Q

ANOVA (Analysis of Variance)

A

Tests the difference between more than 2 means (3+ independent groups)

IV - Nominal
DV - continuous

Can be one way (3 groups) Multifactor/Two Way, or Repeated measures ANOVA (within subjects)

115
Q

What does ANOVA sort out

A

the variability of an outcome variable into 2 components:

  1. variability due to the IV
  2. Variability due to all other sources

ex: Variation between groups is contrasted with variation within groups

116
Q

What is the statistic yielded with ANOVA

A

F Ratio Statistics (it is the variation between groups contrasted wiht the variation within groups)

117
Q

Chi Squared Test

A

Tests the difference in proprotions in 2+ independent groups

Uses a contingency table - comparing observed frequencies in each cells with expected frequencies (the frequencies expected if there was no relationship)

IV - Nominal (or ordinal)
DV - NOMINAL!!! (or ordinal in some)

118
Q

Chi Squared Tests are the inferential statistics version of a…

A

crosstab table

119
Q

Test stat for Chi Squared Tests

A

X^2

120
Q

What are test statistics

A

values used to compare in a table to get the p value - not used much anymore

121
Q

If p is lower than the alpha..

A

results are statistically significant

122
Q

Correlation Coefficients can be used in both…

A

inferential and descriptive statistics

IV and DV -Continuous

123
Q

What are the 3 things needed for any inferential statistic test

A

Test Statistic Number

P Value

Degrees of Freedom (DF)

could also include effect size

124
Q

Effect Size Indexes

A

summarize the magnitude of the effect of the IV on the DV - how much effect on the outcome measured

an important concept in power analysis

125
Q

In a comparison of two group means (ex. in a t test situation) the effect size is represented by…

A

Cohen’s d

126
Q

d < or equal to .20 means…

A

small effect

127
Q

d = 0.50 means…

A

moderate effect

128
Q

d > or equal to .80 means…

A

large effect

129
Q

Multivariate Stat Analysis

A

stat procedure for analyzing relationships among 3 or more variables simultaneously

ex: Multiple regression, ANCOVA, logisitc regression

130
Q

Multiple Regression

A

used to predict a DV based on 2 or more IV (predictors)

IV - continuous (interval or ratio) or dichotomous

DV - continuous (interval or ratio level data)

ex: What are things that effect birth weight: Grams at Birth - what is the number of IVs determining that

ex: maternal age, income in dollars, maternal weight, SBP, smoking etc

131
Q

What is the stat used in multiple regression

A

the Multiple Correlation Coefficient symbolized as R

132
Q

Multiple Correlation Coeffiicient (R)

A

The correlation index for a DV and more than 2 IVs represented by R

does not have negative values, but shows strength of relationships - not direction

133
Q

R sees ___ not ___

A

strength not direction

134
Q

R^2

A

an estimate of the proportion of variability in the DV accounted for by all predictors (multiple regression)

135
Q

ANCOVA (Analysis of Covariance)

A

Extends ANOVA by removing the effect of confounding variables (covariates) before testing whether mean group differences are stat significant

IV - Nominal (group status)

Covariates - cont./dichotomous

Individual differences variability due to all other sources

136
Q

Logistic Regression

A

analyzes relationships between a nominal-level DV and more than 2 IVs

yields an ODDS RATIO - the risk of an outcome occurring given one condition versus the risk of it occurring given a different condition

137
Q

Reliability Assessment Tests

A

Test Retest Reliability

Interrater Reliability

Internal Consistency Reliability

138
Q

Validity Assessment Tests

A

Content Validity

Construct Validity

Criterion Validity

139
Q

Reliability

A

Accuracy of Results

140
Q

Test Retest Relaibility

A

Give the same test over and over and hope to see similar results in that person

141
Q

Interrater Reliability

A

Extent at which 2 raters will assign the same score to some attribute

142
Q

Internal Consistency Reliability

A

Extent to which various components all measure the same thing -ex: chrombeck alpha

143
Q

Content Validity

A

Multiple item scales whether content measures constructs of interest

144
Q

Criterion Validity

A

How consistent with measurements on a scale with a comparison to a gold standard criterion

Sensitivity and Specificity

145
Q

Sensitivity

A

ability to correctly ID a case

146
Q

Specificity

A

Ability to correctly rule out certain cases

147
Q

Construct Validity

A

Extent to which measurement really measures the true construct

done via hypothesis testing

148
Q

When reading a research article and its hypothesis testing, what things are important to look for

A
  1. The Test Used
  2. The value of the calculated statistic
  3. Degrees of freedom
  4. Level of statistical significance (p-value)
149
Q

A researcher measures the wieght of people in a study involving obesity and Type 2 diabetes. What type of measurement is being employed?

A. Nominal
B. Ordinal
C. Interval
D. Ratio

A

D. Ratio

Rationale: Many physical measures, such as a person’s weight, are ratio measures. Gender is an example of a nominally measured variable. A measurement of ability to perform ADLs is an example of ordinal measurement, and interval measurement occurs when researchers can rank people on an attribute and specify the distance between them, e.g., psychological testing.

150
Q

T/F: A bell shaped Curve is also called a normal distribution

A

True

Rationale: A special distribution called the normal distribution (a bell shaped curve) is symmetric, unimodal, and not very peaked

151
Q

The researcher subtracts the lowest value of data from the highest value of data to obtain:

A. Mode

B. Median

C. Mean

D. Range

A

D. Range

Rationale: The range is calculated by subtracting the lowest value of data from the highest value of data. The mode refers to the most frequently occurring score. The median refers to the point distribution above which and below which 50% of the cases fall. The mean is the sum of all the scores divided by the total number of scores.

152
Q

T/F: A correlation coefficient of -.38 is stronger than a correlation coefficient of +.32

A

True

Rationale: For a correlation coefficient, the greater the absolute value of the coefficient, the stronger the relationship. So, the absolute value of −.38 is greater than the absolute value of +.32 and thus is stronger.

153
Q

Which test would be used to compare the observed frequencies with expected frequencies within a cotningency table?

A. Pearson’s r
B. Chi squared test
C. t test
D. ANOVA

A

B. Chi Squared Test

Rationale: The chi-squared test evaluates the difference in proportions in categories within a contingency table, comparing the observed frequencies with the expected frequencies. Pearson’s r tests that the relationship between two variables is not zero. The t-test evaluates the difference between two means. The ANOVA tests the difference between more than two means.