Statistical Analysis of Quantitative Data Flashcards

Question

How does descriptive and inferential statistics differ

Answer 1

Descriptive stats is just for the group in front of you but inferential stats makes the inferences about the generalizable population

Answer 2

A systemic arrangement of numeric values on a variable from lowest to highest and a count of the number of times (and/or percentage) each value was obtained

Answer 3

1. Shape 2. Central Tendency 3. Variability

Answer 4

1. In a table (Ns and percentages) 2. Graphically (ex: frequency polygons)

Answer 5

Normal Distribution (Bell Curve)

Answer 6

A distribution either skewed positively or negatively

Answer 7

Long tails point right ex: Income

Answer 8

Long tails point left ex: Youth death

Answer 9

number of peaks in a frequency distribution can be unimodal, bimodal, multimodal

Answer 10

2 peaks can include normal distribution if averaging 2 peaks into a bell shaped curve

Answer 11

index of "typicalness" of a set of scores that comes from center of the distribution Includes mode median and mean

Answer 12

Measure of central tendency that is the most frequently occurring score in a distribution ex: 2333456789 - Mode = 3

Answer 13

measure of central tendency where the point in a distribution above which and below which 50% of cases fall ex: 23334|56789 - Median = 4.5

Answer 14

measure of central tendency that equals the sum of all the scores divided by the total number of scores ex: 2333456789 Mean = 5

Answer 15

Mode - least helpful Mean - most helpful

Answer 16

because it can offset the skew

Answer 17

the degree to which scores in a distribution are spread out or dispersed: homogeneity v heterogeneity

Answer 18

Little variability in a frequency distribution sample Makes for a taller and less wide spike

Answer 19

Great variability in a frequency distribution sample

Answer 20

Range Standard Deviation (SD)

Answer 21

The highest value minus the lowest value shows variability can be misled by outliers

Answer 22

average deviation of scores in a distribution shows variability - preferred to range

Answer 23

Rule of 68, 95, 99.7 68% of all Data/Sampling occurs within +/- 1 SD 95% of all data/sampling occurs within +/- 2 SD 99.7% of all data/sampling occurs within +/- 3 SD

Answer 24

the 0.3% outside 3 SD means that the tails never truly touch 0 - so there is always a theoretical possibility for outliers any distance out

Answer 25

Used for DESCRIBING the relationship between 2 variables Approachs: Crosstabs (Contingency Table) or Correlation Coefficients

Answer 26

Describes the intensity and direction of a relationship Rnages from -1 to 1

Answer 27

-1 to 0 One variable increases in value as the other decreases ex: amount of exercise and weight

Answer 28

0 to 1 Both variables increase/decrease ex: Calorie consumption and weight

Answer 29

there is no value difference / there is no relationship

Answer 30

the stronger the relationship ex: r=-.45 is stronger than r=.40

Answer 31

A correlation matrix

Answer 32

the product-moment correlation coefficient computed with continuous measurements r used for Ratio level / scales

Answer 33

used for correlations between variables measured on an ordinal scale (lower level) as compared to pearson's r being ratio

Answer 34

Risk indexes - so that decisions can be made about relative risks for alternative treatments or exposures ex: Absolute Risk, Absolute Risk Reduction (ARR), Odds ratio (OR), Numbers needed to treat

Answer 35

Index used a lot in clinical decision making to decide in doing an intervention and whether there will be actual reduction of poor outcomes

Answer 36

Comparing risks in the group who got the outcome and who did not -estimated proportion of those spared undesirable outcomes because of their exposure to this intervention

Answer 37

Odds of proportion of those with the adverse outcome relative to those without it - what are the odds experimental group v control group develop undesirable outcomes Often seen in media/lay terms

Answer 38

Estimation of how many people need to get an intervention before we see the prevention of one tru undesirable outcome So if 3.3 people need a smoking intervention before 1 quits smoking we can take this into account for budgeting purposes

Answer 39

Used to make objective decisions about population parameters using sample data Provides a means for drawing inferences about a population, given data from a sample ex: Taking tylenol is the assumption of the trial's generalizations

Answer 40

the laws of probability

Answer 41

Because fluctuation in samples/Unrepresentative samples do not allow accurate generalizability to the greater population A math program will assume we used best methods, but if our convenience sample under or overrepresented the population then it still runs the numbers assuming this and gives false results - this is why we should remain skeptical

Answer 42

theoretical distributions (to the entire population) ex: Sampling distribution of the mean error

Answer 43

Since we do not have the time or means to do infinite sampling we can assume principles of stats to assume what the general population mean would be

Answer 44

normally distributed

Answer 45

Standard Error of the Mean (SEM) So the SE is estimated from the SE of the actual sample

Answer 46

larger sample size

Answer 47

threshold of risk (5% chance for error and chance of being outside the 95% SE)

Answer 48

because we can end up in the tails of the distribution without knowing it sometimes it is not our fault but we have to prevent the times it is

Answer 49

that our results came from the chance the null hypothesis is true

Answer 50

Point Estimation / Interval estimation Hypothesis Testing

Answer 51

a single descriptive statistic that estimates the population value ex: a mean, percentage, or OR ex: mean BP, mean score on a scale, etc

Answer 52

a range of values within which a population value probably lies involes computing a confidence interval (CI)

Answer 53

how much risk of being wrong researchers take in interval estimation

Answer 54

indicate the upper and lower confidence limits and the probability that the population value is between those limits Confidence Limit is the estimate for a population range

Answer 55

that there is a 95% probability that the population mean is between 40 and 50

Answer 56

95% = tighter parameters but less confident, allows for a more accurate estimate 99% = less risk and less tolerance for risk, but naturally means estimate is not as precise

Answer 57

to make objective decisions about whether results are likely to reflect chance differences or hypothesize effects

Answer 58

accept or reject the null hypothesis (never proven or accepting the research hypothesis)

Answer 59

the null (accept or reject)

Answer 60

there is a difference large enough between groups to say they are different from intervention rather than just general differences between the groups

Answer 61

results are statistially significant

Answer 62

that any observed difference or relationship could have happened by chance

Answer 63

correct or incorrect

Answer 64

Not in the initial research but rather after enough replication

Answer 65

" False Positive " Rejection of the null when it should not be rejected - thought we saw something when there was not

Answer 66

type I or II error

Answer 67

Type I Error

Answer 68

the level of significance (Alpha) ex: Alpha = 0.05 or 0.01

Answer 69

0.05 the probability of rejecting the null hypothesis when it is true - if your p value is less than the alpha you reject the null

Answer 70

"False Negative" Failure to reject a null hypothesis when it should be rejected

Answer 71

Type 2 -false negative

Answer 72

with statistically significant results

Answer 73

the ability of a test to detect true relationships increases with larger samples --> larger power

Answer 74

No it means there was risk for making that error based on the conclusion

Answer 75

1. Select an appropriate Stat Test 2. Specify level of significance (ex: alpha = 0.05) 3. Compute a test statistic with actual data 4. Determine Degrees of Freedom (df) for the test stat (made by program) 5. Compare computed test stat to a theoretical value - decide if significant or not

Answer 76

t tests ANOVA chi squared test correlation coefficients effect size indexes

Answer 77

tests the difference between 2 means 2 types: independent groups between subjects and dependent (paired) groups within subjects

Answer 78

tests difference of means for 2 independent groups ex: men and women IV is nominal DV is continuous

Answer 79

to test the difference of means of a paired group ex: pretest v post test for same people IV is nominal DV is continuous

Answer 80

probability of the difference between the means meaning the null hypothesis is true So there is a 0.1(1%) chance that the difference in means is explained due to regular normal variation

Answer 81

Alpah is a 5% risk for error, but the p value is a 1% cahnce that the difference is from regular error if the p value is smaller than the alpha you can reject the null hypothesis error does not mean mistake ehre it means there is normal distribution - opposite of bias

Answer 82

Tests the difference between more than 2 means (3+ independent groups) IV - Nominal DV - continuous Can be one way (3 groups) Multifactor/Two Way, or Repeated measures ANOVA (within subjects)

Answer 83

the variability of an outcome variable into 2 components: 1. variability due to the IV 2. Variability due to all other sources ex: Variation between groups is contrasted with variation within groups

Answer 84

F Ratio Statistics (it is the variation between groups contrasted wiht the variation within groups)

Answer 85

Tests the difference in proprotions in 2+ independent groups Uses a contingency table - comparing observed frequencies in each cells with expected frequencies (the frequencies expected if there was no relationship) IV - Nominal (or ordinal) DV - NOMINAL!!! (or ordinal in some)

Answer 86

crosstab table

Answer 87

values used to compare in a table to get the p value - not used much anymore

Answer 88

results are statistically significant

Answer 89

inferential and descriptive statistics IV and DV -Continuous

Answer 90

Test Statistic Number P Value Degrees of Freedom (DF) *could also include effect size*

Answer 91

summarize the magnitude of the effect of the IV on the DV - how much effect on the outcome measured an important concept in power analysis

Answer 92

small effect

Answer 93

moderate effect

Answer 94

large effect

Answer 95

stat procedure for analyzing relationships among 3 or more variables simultaneously ex: Multiple regression, ANCOVA, logisitc regression

Answer 96

used to predict a DV based on 2 or more IV (predictors) IV - continuous (interval or ratio) or dichotomous DV - continuous (interval or ratio level data) ex: What are things that effect birth weight: Grams at Birth - what is the number of IVs determining that ex: maternal age, income in dollars, maternal weight, SBP, smoking etc

Answer 97

the Multiple Correlation Coefficient symbolized as R

Answer 98

The correlation index for a DV and more than 2 IVs represented by R does not have negative values, but shows strength of relationships - not direction

Answer 99

strength not direction

Answer 100

an estimate of the proportion of variability in the DV accounted for by all predictors (multiple regression)

Answer 101

Extends ANOVA by removing the effect of confounding variables (covariates) before testing whether mean group differences are stat significant IV - Nominal (group status) Covariates - cont./dichotomous Individual differences variability due to all other sources

Answer 102

analyzes relationships between a nominal-level DV and more than 2 IVs yields an ODDS RATIO - the risk of an outcome occurring given one condition versus the risk of it occurring given a different condition

Answer 103

Test Retest Reliability Interrater Reliability Internal Consistency Reliability

Answer 104

Content Validity Construct Validity Criterion Validity

Answer 105

Accuracy of Results

Answer 106

Give the same test over and over and hope to see similar results in that person

Answer 107

Extent at which 2 raters will assign the same score to some attribute

Answer 108

Extent to which various components all measure the same thing -ex: chrombeck alpha

Answer 109

Multiple item scales whether content measures constructs of interest

Answer 110

How consistent with measurements on a scale with a comparison to a gold standard criterion Sensitivity and Specificity

Answer 111

ability to correctly ID a case

Answer 112

Ability to correctly rule out certain cases

Answer 113

Extent to which measurement really measures the true construct done via hypothesis testing

Answer 114

1. The Test Used 2. The value of the calculated statistic 3. Degrees of freedom 4. Level of statistical significance (p-value)

Answer 115

D. Ratio Rationale: Many physical measures, such as a person’s weight, are ratio measures. Gender is an example of a nominally measured variable. A measurement of ability to perform ADLs is an example of ordinal measurement, and interval measurement occurs when researchers can rank people on an attribute and specify the distance between them, e.g., psychological testing.

Answer 116

True Rationale: A special distribution called the normal distribution (a bell shaped curve) is symmetric, unimodal, and not very peaked

Answer 117

D. Range Rationale: The range is calculated by subtracting the lowest value of data from the highest value of data. The mode refers to the most frequently occurring score. The median refers to the point distribution above which and below which 50% of the cases fall. The mean is the sum of all the scores divided by the total number of scores.

Answer 118

True Rationale: For a correlation coefficient, the greater the absolute value of the coefficient, the stronger the relationship. So, the absolute value of −.38 is greater than the absolute value of +.32 and thus is stronger.

Answer 119

B. Chi Squared Test Rationale: The chi-squared test evaluates the difference in proportions in categories within a contingency table, comparing the observed frequencies with the expected frequencies. Pearson’s r tests that the relationship between two variables is not zero. The t-test evaluates the difference between two means. The ANOVA tests the difference between more than two means.

Statistical Analysis of Quantitative Data Flashcards

(153 cards)