Midterm 2 - Chapter 12 Flashcards

1
Q

When do we need stats in the research process?

A

After collecting data - need to summarize & communicate findings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 types of stats:

A
  1. Descriptive Statistics
  2. Inferential Statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How should we most efficiently present research:

A

Want to convey maximum information using minimum space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Purpose of descriptive statistics?

A

Summarizes mass of data points
- Understanding and interpretation
- Visual displays, appropriate calculations

In experiments can calculate within each
- condition/group
- Mean, standard deviation…

In correlation designs
- For each variable, calculate mean, standard deviations, etc
- For every pair of variables, calculate a correlation coefficient (also a descriptive statistic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 Types of Descriptive Statistics

A
  1. Measures of Central Tendency (Mean, Median, Mode)
  2. Measures of Variability (Range, Variance, Standard Deviation)
  3. Measures of Relationship (Correlation, Multiple Regression, Multiple Correlation, Partial Correlation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Scales of Measures

A
  • Nominal
  • Ordinal
  • Interval
  • Ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Nominal

A

Group or categorization
- No order or direction
- Summarized by proportion/percentages or the mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ordinal

A

Ranked order (1st, 2nd, 3rd..)
- Uneven spaces between “scores”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interval

A

Numerical scales in which intervals have the same interpretation throughout but no true zero (e.g. temp in celsius - 0 deg still indicates a temperature)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ratio

A

An interval scale with a true zero reference point (e.g. 0 pounds)
- Summarized with the mean or median and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measures of central tendency

A
  • Describe what’s happening at middle of data
  • What’s “normal”?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3 measures of central tendency:

A

Mean, Median, Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mean

A

= arithmetic average

  • What we usually use!
  • Uses information from every single score
  • Add up all scores in each group and divide by the
    number of scores in each group
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Downsides of Mean:

A

□ Affected by outliers (i.e., extreme scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Upsides of mean:

A

□ With increasing sample size, each extreme score has less effect on the mean.
□ Maximizes use of all of our data.
□ Has mathematical properties that enable us to
use it in statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Outliers - With increasing sample size, the mean is

A

Less affected by outliers

□ Main idea here - check for outliers if you only have a small sample, but try to get a large sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Median

A

= score that divides group in half

  • 50% of the scores above, 50% below
  • Used if there are extreme scores (outliers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to find median:

A

Put scores in order. Count number of scores.

If odd #: identify the middlemost score.

If even#: identify two middle scores, take average of them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When is median useful?

A

Whenever it’s most descriptively informative to report the value for which equal numbers of people score higher
and lower (e.g. income)

  • Also, when you can spot an outlier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mode

A

= most frequently occurring score

  • Sometimes no mode; sometimes more than one
  • Usually used for nominal or ordinal variables
  • Put the scores in order – look for most frequently occurring score(s)
  • May be none, or more than one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When is mode useful?

A

Whenever it’s most descriptively informative to
report the most frequently occurring score (e.g.: employee salary distribution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Measures of Spread:

A
  • Variability
  • Standard Deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Variability:

A

The spread in a distribution of scores

AMOUNT of spread is often measured by Standard Deviation

How much each score deviates from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Possible Measures of Variability

A

□ Range (max – min)
□ Variance
□ Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Issues with range
can be too simplistic
26
Issues with variability:
not very descriptive
27
Standard Deviation
□ A measure of variability that enables reference to the Normal Distribution so it’s meaningful (as opposed to variance). □ Defines what’s “normal” for that variable
28
How to calculate variance:
- Find out how much each score deviates from the mean (mean of 5, 0 = 5) - Square each number - Add up these numbers - Divide by TOTAL number of scores being calculated MINUS 1 (not the total of the numbers, but how many there are - 7, 3: not 10, but 1)
29
One half of the bell curve, SD of +/-1, SD of +/- 2, SD of +/- 3
- 50% - 68% - 95% - 99.7%
30
How to calculate Standard Deviation:
SQUARE ROOT of variance
31
Measures of Relationship: | 3 types of Descriptive Stats
- Correlation - Multiple Regression - Multiple Correlation - Partial Correlation
32
Correlation (r) and r-squared
□ +/- = “direction” of relationship - Positive or negative? □ Number = “strength” of relationship? (How closely is one set associated with other set?) For linear relationships!! - r = 0 could mean no relationship OR a non-linear relationship!
33
Correlations - Restriction of range
Correlations can be misleading if the full range on both variables is not measured
34
Multiple Correlation (R) and R-squared
Expressing correlation as a percentage: - r2 = a proportion - r2 = .28 means that 28% of the variance in scores is shared by MC and written portions. - r2 = .28 means that 28% of the variance in MC scores is predicted by written scores & vice versa Measures the *proportion of variance* in the dependent variable that can be explained by the independent variable in a regression model.
35
Amount of Shared Variance
If r2 = 0, there is NO shared variance! If r2 = 1, there is 100% shared variance (goes from 0-1 only)
36
READINGS
37
In which type of scale are the intervals between each rank order NOT equal?
Ordinal scale
38
In which type of scale ARE the intervals between each rank order equal?
Interval Scale (differences between 90-95 are the same as 115-120)
39
When it's difficult to know whether an ordinal or interval scale is being used, what should we do?
Treat variables as a interval scale. because when ordinal scales are averaged across many instances, they take on properties similar to an interval scale
40
Data measured on __ and __ scales can be summarized using __
interval; ratio; MEAN
41
Variables measured on interval and ratio scale are often referred to as...
continuous variables - the values represent an underlying continuum
42
Interval and ratio scales can be treated the same way..
Statistically
43
Why should we first explore variables separately?
Allows us to get a sense for what the data for each of our variables look like and also identify any possible errors that might have occurred during data collection
44
Graphing frequency distributions
Frequency Distribution: indicates # of p's select each possible category/scale on a variable EX: a poll asks 100 people how many pets they have. They find that 38 people have no pets, 25 have one pet, 17 have two pets, 6 have three pets, and 14 have four or more pets.
45
Pros of Graphing frequency distributions:
- See what scores are most common/uncommon - See shape of distribution - Identify OUTLIERS: scores that are unusual, unexpected, very different from scores of other participants
46
Bar Graph:
Uses a separate and distinct bar for each piece of information Used for comparing group means and percentages
47
Types of Frequency Distributions:
- Bar Graphs - Pie Charts - Histograms - Frequency Polygons
48
Pie Charts
Divide a whole circle that represents relative percentages Useful in representing data on a nominal scale
49
Histograms:
Uses bars to display a frequency distribution for a continuous variable (e.g. continuous, increasing amounts of a variable)
50
How do histograms differ from bar graphs:
- Histograms: bars touch each other, reflecting a continuous variables (ie on x axis) - Bar Graph: gaps between each bar, helping communicate that values on x-axis are nominal categories
51
Normal Distribution
A distribution of scores that is frequently observed, and rather important for stats Majority of scores cluster around the mean Only possible for continuous variables (interval or ratio)
52
Standard Deviation:
How scores spread out from the mean, on average
53
Breakdown of normal distribution/deviation:
- 68%: fall within 1 standard deviation above and below the mean - 96% fall within 2 standard deviations above/below mean
54
Frequency Polygons:
Alternative to histograms - use a line to represent frequencies for continuous variables Helpful when you want to examine frequencies for multiple groups simultaneously
55
Descriptive Statistics:
Calculating statistics to describe or summarize our data
56
2 main types of descriptive stats:
1: measures of central tendency - capture how participants scored overall, across the entire sample 2: measures of variability - how differently the scores are from each other, or how widely they're spread out or distributed
57
Central Tendency
Tells us what the scores are like as a whole, or how people scored on average: - Mean (represented by X in calculations, M in reports) - Median (Mdn in scientific reports) - Mode (for variables that employ an interval, ratio, or ordinal scale)
58
Variability
Characterizes the amount of spread in a distribution of scores, for continuous variables
59
Comparing Group Percentages
e.g. - wanting to know how groups differ in the ways they respond to questions - Can calculate percentages for each group and compare
60
Comparing group means
e.g. wanting to see how groups, on average, responded and comparing these numbers
61
Graphing Nominal Data
Common way to graph relationships between variables when one variable is nominal is to use a bar graph or line graph
62
When are bar graphs used compared to line graphs?
Bar graphs - when values on x-axis are nominal Line graphs - when values on x axis are numeric
63
Describing Effect-Size Between Two Groups
Effect-Size: describing relationships among variables in terms of size, amount, or strength; helps determine how large effects are
64
Effect Size - Cohen's d:
Cohen's d: comparing two groups on their responses to a continuous variable - Difference in means between two groups, standardized by expressing it in units of standard deviation - In a true experiment, the Cohen's d value describes the magnitude of the effect of the IV on the DV - When studying naturally occurring groups, describes magnitude of effect of group membership on a continuous variable
65
Smallest possible value for Cohen's d:
0 - no effect, no max value
66
Different analyses are needed when you don't have distinct groups you wish to compare, but rather..
have a range of scores to investigate in terms of their relationship with other scores
67
What data is appropriate for correlational designs?
Correlation coefficient: statistic describing whether, how, and how much two variables relate to one another (many different types)
68
Pearson R Correlation Coefficient
- r = 0 to 1 (NOT a percentage or probability) - tells us the direction of the relationship
69
How can R be graphed - Scatterplots
- Scatterplot: each pair of scores is plotted as a single point in a graph - Perfect relationships = perfectly diagonal lines (however, remember measurement errors!) - Whenever relationships aren't perfect, if you know a person's score on the first variable, you can't perfectly predict what that person's score will be on the second variable
70
Pros of scatterplots:
- Provide ways of seeing how variables relate to one another - Allow researchers to detect outliers
71
Important Considerations:
- Restriction of Range - Curvilinear Relationship
72
Important Considerations - Restriction of Range:
- If the full range of possible scores isn't sampled, but instead restricted, the correlation coefficient produced with these data can be misleading - LESS variability in the scores and thus, less variability that can be explained or predicted by the other variable - The issue can occur when people you're sampling are all very similar on one or both of the variables you are studying
73
Important Considerations - Curvilinear Relationship
Pearson Correlation only designed to detect linear relationships - if relationship is not linear but curvilinear, the correlation coefficient will fail to detect this relationship Another type of statistic must be used to determine the strength of the relationship
74
Correlation Coefficients as Effect-Sizes
Correlation coefficients not only allow us to examine relationships between continuous variables, they are also indicators of effect size
75
Correlation Coefficients as Effect-Sizes - Square Value of R
By multiplying R by itself, it lends itself to a simple interpretation: THE PROPORTION OF VARIANCE BEING EXPLAINED - AKA Squared Correlation Coefficient
76
Regression
Regression: advanced way of examining how variables relate or covary (a statistical technique); analyzes relationships among variables
77
The Regression Equation
Y = a + bX (Y = criterion variable: score we wish to predict, X = predictor variable: known score, a = y-intercept, b = slope of line) The same as an equation for drawing a straight line - the line that best summarizes all of the data points Can be used to make specific predictions
78
Multiple Correlation
(symbolized as R, distinguished from Pearson r) Provides correlation between a combined set of predictor variables and a single criterion variable (as any phenomenon is likely determined by many factors, accounting for these permits a greater accuracy of prediction)
79
Squared Multiple Correlation Coefficient
R squared can be interpreted in the same way as the Squared Correlation Coefficient (r squared) R squared tells you the proportion of variability in the criterion variable that is accounted for by the combined set of predictor variables
80
Regression is more powerful than correlation because...
it can be expanded to accommodate more than one predictor to predict the criterion variable - this expanded model is AKA multiple regression, allowing us to examine the unique relationship between each predictor and the criterion In contrast to multiple correlation, which only provides a single value for the relationship between the combined set of predictors and the criterion variable
81
Order of Interpreting Data Analysis:
1. Correlation Coefficient 2. Multiple Correlation 3. Multiple Regression
82
What technique helps address the third variable problem?
Partial Correlation: provides a way of statistically controlling for possible third variables in correlational Estimates what the correlation between the two primary variables would be if the third variable were held constant - in other words, if everyone responded to this third variable in the exact same way
83
What can you do with a calculated partial correlation?
With a calculated partial correlation, you can compare with the original correlation to see if the third variable was influencing the original relationship
84
Advanced Modelling Techniques:
- Structural Equation Modelling (SEM): examines models (an expected pattern of relationships among numerous different variables) that specify a set of relationships among many variables