Midterm 2 - Chapter 12 Flashcards
When do we need stats in the research process?
After collecting data - need to summarize & communicate findings
2 types of stats:
- Descriptive Statistics
- Inferential Statistics
How should we most efficiently present research:
Want to convey maximum information using minimum space
Purpose of descriptive statistics?
Summarizes mass of data points
- Understanding and interpretation
- Visual displays, appropriate calculations
In experiments can calculate within each
- condition/group
- Mean, standard deviation…
In correlation designs
- For each variable, calculate mean, standard deviations, etc
- For every pair of variables, calculate a correlation coefficient (also a descriptive statistic)
3 Types of Descriptive Statistics
- Measures of Central Tendency (Mean, Median, Mode)
- Measures of Variability (Range, Variance, Standard Deviation)
- Measures of Relationship (Correlation, Multiple Regression, Multiple Correlation, Partial Correlation)
Scales of Measures
- Nominal
- Ordinal
- Interval
- Ratio
Nominal
Group or categorization
- No order or direction
- Summarized by proportion/percentages or the mode
Ordinal
Ranked order (1st, 2nd, 3rd..)
- Uneven spaces between “scores”
Interval
Numerical scales in which intervals have the same interpretation throughout but no true zero (e.g. temp in celsius - 0 deg still indicates a temperature)
Ratio
An interval scale with a true zero reference point (e.g. 0 pounds)
- Summarized with the mean or median and standard deviation
Measures of central tendency
- Describe what’s happening at middle of data
- What’s “normal”?
3 measures of central tendency:
Mean, Median, Mode
Mean
= arithmetic average
- What we usually use!
- Uses information from every single score
- Add up all scores in each group and divide by the
number of scores in each group
Downsides of Mean:
□ Affected by outliers (i.e., extreme scores)
Upsides of mean:
□ With increasing sample size, each extreme score has less effect on the mean.
□ Maximizes use of all of our data.
□ Has mathematical properties that enable us to
use it in statistical analysis.
Outliers - With increasing sample size, the mean is
Less affected by outliers
□ Main idea here - check for outliers if you only have a small sample, but try to get a large sample
Median
= score that divides group in half
- 50% of the scores above, 50% below
- Used if there are extreme scores (outliers)
How to find median:
Put scores in order. Count number of scores.
If odd #: identify the middlemost score.
If even#: identify two middle scores, take average of them.
When is median useful?
Whenever it’s most descriptively informative to report the value for which equal numbers of people score higher
and lower (e.g. income)
- Also, when you can spot an outlier
Mode
= most frequently occurring score
- Sometimes no mode; sometimes more than one
- Usually used for nominal or ordinal variables
- Put the scores in order – look for most frequently occurring score(s)
- May be none, or more than one
When is mode useful?
Whenever it’s most descriptively informative to
report the most frequently occurring score (e.g.: employee salary distribution)
Measures of Spread:
- Variability
- Standard Deviation
Variability:
The spread in a distribution of scores
AMOUNT of spread is often measured by Standard Deviation
How much each score deviates from the mean
Possible Measures of Variability
□ Range (max – min)
□ Variance
□ Standard Deviation
Issues with range
can be too simplistic
Issues with variability:
not very descriptive
Standard Deviation
□ A measure of variability that enables reference to the Normal Distribution so it’s meaningful (as opposed to variance).
□ Defines what’s “normal” for that variable
How to calculate variance:
- Find out how much each score deviates from the mean (mean of 5, 0 = 5)
- Square each number
- Add up these numbers
- Divide by TOTAL number of scores being calculated MINUS 1 (not the total of the numbers, but how many there are - 7, 3: not 10, but 1)
One half of the bell curve, SD of +/-1, SD of +/- 2, SD of +/- 3
- 50%
- 68%
- 95%
- 99.7%
How to calculate Standard Deviation:
SQUARE ROOT of variance
Measures of Relationship:
3 types of Descriptive Stats
- Correlation
- Multiple Regression
- Multiple Correlation
- Partial Correlation
Correlation (r) and r-squared
□ +/- = “direction” of relationship - Positive or negative?
□ Number = “strength” of relationship?
(How closely is one set associated with other set?)
For linear relationships!!
- r = 0 could mean no relationship OR a non-linear relationship!
Correlations - Restriction of range
Correlations can be misleading if the full range on both variables is not measured
Multiple Correlation (R) and R-squared
Expressing correlation as a percentage:
- r2 = a proportion
- r2 = .28 means that 28% of the variance in scores is shared by MC and written portions.
- r2 = .28 means that 28% of the variance in MC scores is predicted by written scores & vice versa
Measures the proportion of variance in the dependent variable that can be explained by the independent variable in a regression model.
Amount of Shared Variance
If r2 = 0, there is NO shared variance!
If r2 = 1, there is 100% shared variance (goes from 0-1 only)
READINGS
In which type of scale are the intervals between each rank order NOT equal?
Ordinal scale
In which type of scale ARE the intervals between each rank order equal?
Interval Scale (differences between 90-95 are the same as 115-120)
When it’s difficult to know whether an ordinal or interval scale is being used, what should we do?
Treat variables as a interval scale. because when ordinal scales are averaged across many instances, they take on properties similar to an interval scale
Data measured on __ and __ scales can be summarized using __
interval; ratio; MEAN
Variables measured on interval and ratio scale are often referred to as…
continuous variables - the values represent an underlying continuum
Interval and ratio scales can be treated the same way..
Statistically
Why should we first explore variables separately?
Allows us to get a sense for what the data for each of our variables look like and also identify any possible errors that might have occurred during data collection
Graphing frequency distributions
Frequency Distribution: indicates # of p’s select each possible category/scale on a variable
EX: a poll asks 100 people how many pets they have. They find that 38 people have no pets, 25 have one pet, 17 have two pets, 6 have three pets, and 14 have four or more pets.
Pros of Graphing frequency distributions:
- See what scores are most common/uncommon
- See shape of distribution
- Identify OUTLIERS: scores that are unusual, unexpected, very different from scores of other participants
Bar Graph:
Uses a separate and distinct bar for each piece of information
Used for comparing group means and percentages
Types of Frequency Distributions:
- Bar Graphs
- Pie Charts
- Histograms
- Frequency Polygons
Pie Charts
Divide a whole circle that represents relative percentages
Useful in representing data on a nominal scale
Histograms:
Uses bars to display a frequency distribution for a continuous variable (e.g. continuous, increasing amounts of a variable)
How do histograms differ from bar graphs:
- Histograms: bars touch each other, reflecting a continuous variables (ie on x axis)
- Bar Graph: gaps between each bar, helping communicate that values on x-axis are nominal categories
Normal Distribution
A distribution of scores that is frequently observed, and rather important for stats
Majority of scores cluster around the mean
Only possible for continuous variables (interval or ratio)
Standard Deviation:
How scores spread out from the mean, on average
Breakdown of normal distribution/deviation:
- 68%: fall within 1 standard deviation above and below the mean
- 96% fall within 2 standard deviations above/below mean
Frequency Polygons:
Alternative to histograms - use a line to represent frequencies for continuous variables
Helpful when you want to examine frequencies for multiple groups simultaneously
Descriptive Statistics:
Calculating statistics to describe or summarize our data
2 main types of descriptive stats:
1: measures of central tendency - capture how participants scored overall, across the entire sample
2: measures of variability - how differently the scores are from each other, or how widely they’re spread out or distributed
Central Tendency
Tells us what the scores are like as a whole, or how people scored on average:
- Mean (represented by X in calculations, M in reports)
- Median (Mdn in scientific reports)
- Mode (for variables that employ an interval, ratio, or ordinal scale)
Variability
Characterizes the amount of spread in a distribution of scores, for continuous variables
Comparing Group Percentages
e.g. - wanting to know how groups differ in the ways they respond to questions
- Can calculate percentages for each group and compare
Comparing group means
e.g. wanting to see how groups, on average, responded and comparing these numbers
Graphing Nominal Data
Common way to graph relationships between variables when one variable is nominal is to use a bar graph or line graph
When are bar graphs used compared to line graphs?
Bar graphs - when values on x-axis are nominal
Line graphs - when values on x axis are numeric
Describing Effect-Size Between Two Groups
Effect-Size: describing relationships among variables in terms of size, amount, or strength; helps determine how large effects are
Effect Size - Cohen’s d:
Cohen’s d: comparing two groups on their responses to a continuous variable
- Difference in means between two groups, standardized by expressing it in units of standard deviation
- In a true experiment, the Cohen’s d value describes the magnitude of the effect of the IV on the DV
- When studying naturally occurring groups, describes magnitude of effect of group membership on a continuous variable
Smallest possible value for Cohen’s d:
0 - no effect, no max value
Different analyses are needed when you don’t have distinct groups you wish to compare, but rather..
have a range of scores to investigate in terms of their relationship with other scores
What data is appropriate for correlational designs?
Correlation coefficient: statistic describing whether, how, and how much two variables relate to one another (many different types)
Pearson R Correlation Coefficient
- r = 0 to 1 (NOT a percentage or probability)
- tells us the direction of the relationship
How can R be graphed - Scatterplots
- Scatterplot: each pair of scores is plotted as a single point in a graph
- Perfect relationships = perfectly diagonal lines (however, remember measurement errors!)
- Whenever relationships aren’t perfect, if you know a person’s score on the first variable, you can’t perfectly predict what that person’s score will be on the second variable
Pros of scatterplots:
- Provide ways of seeing how variables relate to one another
- Allow researchers to detect outliers
Important Considerations:
- Restriction of Range
- Curvilinear Relationship
Important Considerations - Restriction of Range:
- If the full range of possible scores isn’t sampled, but instead restricted, the correlation coefficient produced with these data can be misleading - LESS variability in the scores and thus, less variability that can be explained or predicted by the other variable
- The issue can occur when people you’re sampling are all very similar on one or both of the variables you are studying
Important Considerations - Curvilinear Relationship
Pearson Correlation only designed to detect linear relationships - if relationship is not linear but curvilinear, the correlation coefficient will fail to detect this relationship
Another type of statistic must be used to determine the strength of the relationship
Correlation Coefficients as Effect-Sizes
Correlation coefficients not only allow us to examine relationships between continuous variables, they are also indicators of effect size
Correlation Coefficients as Effect-Sizes - Square Value of R
By multiplying R by itself, it lends itself to a simple interpretation: THE PROPORTION OF VARIANCE BEING EXPLAINED - AKA Squared Correlation Coefficient
Regression
Regression: advanced way of examining how variables relate or covary (a statistical technique); analyzes relationships among variables
The Regression Equation
Y = a + bX
(Y = criterion variable: score we wish to predict, X = predictor variable: known score, a = y-intercept, b = slope of line)
The same as an equation for drawing a straight line - the line that best summarizes all of the data points
Can be used to make specific predictions
Multiple Correlation
(symbolized as R, distinguished from Pearson r)
Provides correlation between a combined set of predictor variables and a single criterion variable (as any phenomenon is likely determined by many factors, accounting for these permits a greater accuracy of prediction)
Squared Multiple Correlation Coefficient
R squared can be interpreted in the same way as the Squared Correlation Coefficient (r squared)
R squared tells you the proportion of variability in the criterion variable that is accounted for by the combined set of predictor variables
Regression is more powerful than correlation because…
it can be expanded to accommodate more than one predictor to predict the criterion variable - this expanded model is AKA multiple regression, allowing us to examine the unique relationship between each predictor and the criterion
In contrast to multiple correlation, which only provides a single value for the relationship between the combined set of predictors and the criterion variable
Order of Interpreting Data Analysis:
- Correlation Coefficient
- Multiple Correlation
- Multiple Regression
What technique helps address the third variable problem?
Partial Correlation: provides a way of statistically controlling for possible third variables in correlational
Estimates what the correlation between the two primary variables would be if the third variable were held constant - in other words, if everyone responded to this third variable in the exact same way
What can you do with a calculated partial correlation?
With a calculated partial correlation, you can compare with the original correlation to see if the third variable was influencing the original relationship
Advanced Modelling Techniques:
- Structural Equation Modelling (SEM): examines models (an expected pattern of relationships among numerous different variables) that specify a set of relationships among many variables