exam 3 Flashcards
Define ANOVA
Analysis of variance between groups
Why use ANOVA?
To test more than two groups as opposed to T-test which test only for two groups. The difference between the two groups is that ANOVA compares the difference of more than 2 means while T test compares only the difference between 2 means.
.
•
Objective of ANOVA
Test for the difference between the means of two or more groups on one factor/dimension/variable. Each group is only tested once.
Question to ask when using ANOVA
is there a difference between the means of the different groups? And is this difference more than we would expect by chance?
Hypothesis and ANOVA
Null Hypothesis: There is no difference between groups on variable/measure X except that which is expected by chance.
H0: u1 = u2 = u3
Research Hypothesis: There is a difference between groups on variableX that is more than what is expected by chance; There is at least one difference that is significant. H1: x1 ≠ x2 ≠ x3
Hypotheses and ANOVA notes
–(this does not tell us which one of the three differences is responsible for the rejection of the null. It could be one of the three, it could be all of the three)
–“More than what is expected by chance” – we interpret this to mean that it is due to the grouping variable.
Non-directional. All ANOVAs are non-directional.
Types of ANOVA we will use
1) Simple Analysis of Variance/one-way analysis of variance
2) Factorial design
Simple analysis variance-ANOVA
Where there is one factor or one treatment variable i.e. group membership, this is also called one-way analysis of variance because there is only one grouping dimension.
Factorial design-ANOVA
Factorial ANOVA (e.g. 3x2): Effect of exercise (high, medium, low impact) on weight loss by gender.
–Factor 1 (Independent Variable)Treatment (high/medium/low impact exercise)
–Factor 2 (Independent Variable)Gender (male/female)
–Weight Loss Outcome (Dependent Variable)
–Questions to be answered: Main effect of exercise? Main effect of gender? Interaction between exercise & gender?
COMPUTING ANOVA
•F Test Statistic
F = MSBetween / MS Within
Logic behind this ratio: (test statistics)
- Mean squares are estimates of variance
- Within Group Variance is Due to Chance (individual difference)
- Between Group Variance is due to the grouping category (Independent Variable)
- An increasing F value…
STEPS to compute ANOVA
1)A statement of Null and Research Hypotheses Null H0: u1 = u2 = u3 Research H1: x1 ≠ x2 ≠ x3 2)Level of risk .05 (always) 3)Test statistic ANOVA F = MSBetween / MS Within 4)Compute the test statistics value
•The F ratio definition:
Ratio of variability between groups to variability within groups. To get this we compute the sum of squares for each group of variability between groups; within groups; and the total
explain between Groups:
The sum of the differences between the mean of all scores and the mean of each group’s score, squared. (How different is each group’s mean from the overall mean?)
explain Within Groups:
The sum of the differences between each individual score in a group and the mean of each group, squared. (How different is each score in a group from the group mean?)
Total in ANOVA means:
The sum of the between-group and within-group sum of squares
Total sum of squares in ANOVA means:
Sum between –group and within-group sum of square
Degrees of freedom definition
approximation of the sample or the group size
There are 2 sets of degrees of freedom for ANOVA
- between group k-1 (k equals the number of groups)– groups minus 1
- within group N-k (N equals the total sample size) - total number of people in groups
Post-Hoc Comparisons after fact
Each mean is compared with each other mean, type I error is controlled by SPSS (Bonferonni)
Use if you get a significant F; it gives you more info on where the groups lie; and helps you know where that difference is.
results Interpretation: (ANOVA)
State out what test you were performing, give the results and then interpret it.
LINEAR REGRESSION
1) Regression: is a statistical prediction. It happens when data used on past events such variable correlations1 are applied to future events given the knowledge of only one variable.
explain Prediction
using a set of previously collected data, to calculate how correlated the variables are with one another.
prediction requirements
we need an already established correlation and we need to have data on one of the variables. The higher the coefficient, the more accurate the prediction is of one variable from the other based on that correlation.
Purpose
Regression uses our knowledge of relationships between variables to predict the value of one variable from another
regression line
ne
•(RG): reflects our best guess as to what score on the Y variable (dependent variable/when you will dye) would be predicted by a score on the X variable (independent variable/amount of cigarettes smoked). It is the line drawn based on the values in the regression equation
allows us our best guess at estimating
What is another name for the regression line?
the line of best fit, because it minimizes the distance between each individual point and the regression line minimizes the error in prediction.
What is an error in prediction
Directly related to correlation, it is the distance between each individual data. QuestionWhat would the line look like if the correlation was perfect?
What is the role of the Regression equation?
It allows us to plot this line correctly.
The equation that defines the points and the line that are closes to the actual scores.
Regression equation
Y’ = bX + a
a&b could be calculated from data
Y ‘= dependent variablethe predicted score or criterion
X = independent variablethe score being used as the predictor (you know this value)
b = the slope direction and “steepness” of the line
a = the interceptpoint at which the line crosses the y-axis
X Y X2 Y2 XY
How to calculate the Slope and Intercept From your Data
B=∑ XY– [(∑ X* ∑Y)/n] / ∑X2 – [(∑ X)2/n]
A = ∑ Y – b(∑ X) / n
How good is our prediction?
- Look at the absolute value of the correlation our prediction is based on
- Look at the difference between the predicted value (Y’) and the actual value (Y) from the data set.–>error of estimate
What is an error of estimate?
the difference between the predicted value (Y’) and the actual value (Y) from the data set.
What is a MULTIPLE REGRESSION?
Predicting an outcome from 2 or more variables rather than one.
Multiple regression equation
Y’ = bX1 + bx2 + a
X1value for the first independent variable
X2value for the second independent variable
bregression weight for each variable
Rules for the new independent variables
- should be adding something unique to the equation
- should be correlated to dependent variable (should share something in common)
- The 2 (or more) variables should be independent from each other, but related to the outcome variable.
What is a Chi Square?
non-parametric tests that allow you to determine if what you observe in a distribution of frequencies (i.e. rates, percentages, looking at fair allocation) is what would be expected by chance alone.
Types of Chi-Square
One Sample vs. Two Sample Test: Number of Dimensions
i.e. average GPA and average GPA by college level
how do you Compute a Chi Square
You can easily compute what you would expect by chance
Compares what is observed in the data by what would be expected by chance
CHI SQUARE equation
X2 = ∑ (O-E)2/ E
∑=summation sign
X2 = Test statistic
O = Observed Frequency
E = Expected Frequency (By chance)
STEPS TO COMPUTE Chi Square
- Null and Research Hypothesis (Words and Symbols)
- Compute the Test Statistic
- Determine if it is significant or not
- Write a 2-3 sentence results section, including a numerical X2 statement.
Hypotheses
Null hypothesis H0: P1 = P2 = P3 percentage of occurrence in any one category
Research hypothesis H1: P1 P2 P3states that there is a difference in the frequency/proportion of occurrences in each category
How to Compute the Test Statistic?
Test Statistic
category Observed Expected Difference D squared D squared/E
Total Total=Obtained value
Significant?
•Determine if it is significant or not
Degrees of freedom r-1 (r equals rows)
Obtained value exceeds the critical valuesignificantreject NULLaccept researchresults not due to chance
Obtained value below the critical valuenon-significantaccept NULLresults due to chance
Write a 2-3 sentence results
In Chi square The Null hypothesis is we expected it to be chance
Do not say there will be equal; because it is already said use equal rates instead.
The research hypothesis rates not due to chance
numerical X2 statement.
x2(2) = 20.6, p < .05 •x2 represents the test statistic •2 is the number of degrees of freedom •20.6 is the obtained value •p < .05 is the probability