FINAL REVIEW Flashcards
BIVARIATE TABLE
a table that displays the joint frequency distributions of 2 variables
CELLS
the cross classification categories of the variables in a big aria the table
X^2 (CRITICAL)
the score on the sampling distribution of ALL possible sample chi squares
X^2 (OBTAINED)
the test statistic as computed from SAMPLE RESULTS
CHI SQUARE TEST
a non-parametric test of hypothesis for variables that have been organized into a bivariate table
COLUMN
the vertical dimension of a bivariate table
- each column represents a score on the INDEPENDENT VARIABLE
EXPECTED FREQUENCY (fe)
the cell frequencies that’d be expected in a bivariate table if the variables were INDEPENDENT
GOODNESS OF FIT TEST
an additional use for chi square that tests the significance of the distribution of a single variable
INDEPENDENCE
the NULL hypothesis in the chi square test
- 2 variables are interdependent if, for all cases, the classification of a case on one variable has NO EFFECT on the probability that the case will be classified in any particular category of the second variable
MARGINALS
the row and column subtotals in the bivariate table
NONPARAMETRIC
a “distribution free” test
- these tests don’t assume a normal sampling distribution
OBSERVED FREQUENCIES (fo)
the cell frequencies actually observed in a bivariate table
ROW
the HORIZONTAL dimension of the bivariate table, conventionally representing a score on the dependent variable
THE DECISION TO REJECT THE NULL HYPOTHESIS IS NOT SPECIFIC
- means that only ONE statement in the model OR the null hypothesis is WRONG
WHAT LEVEL CAN A CHI SQUARE TEST BE CONDUCTED AT
can be measured at the NOMINAL LEVEL, which is the lowest level of measurement
- because it is NONPARAMETRIC, chi square requires NO ASSUMPTION at all about the shape of the population or sampling distribution
THE ____ CERTAIN WE ARE OF THE MODEL, THE _____ OUR CONFIDENCE THAT THE NULL HYPOTHESIS IS THE FAULTY ASSUMPTION
more, greater
A “_____” OR EASILY SATISFIED MODEL MEANS THAT OUT DECISION TO ______ THE NULL HYPOTHESIS CAN BE MADE WITH EVEN GREATER CERTAINTY
weak, reject
CHI SQUARES FLEXIBILITY
can be used and conducted with any variables at ANY LEVEL OF MEASUREMENT
WHAT ARE A BIVARIATE TABLES TWO DIMENSIONS
- the horizontal (across) dimension (ROWS)
- the vertical (up and down) dimension (COLUMNS)
ROW = ________
COLUMN = ___________
dependent, independent
SUBTOTALS THAT ARE ADDED TO EACH COLUMN AND ROW IS CALLED _________
marginals
WHAT IS REPORTED AT THE INTERSECTION OF THE ROW AND COLUMN MARGINALS
the total number of cases
ON A TABLE, WHAT VARIABLE IS LISTED FIRST
the dependent variable
THE CONCEPT OF INDEPENDENCE
the relationship between the VARIABLES, not between SAMPLES
WHAT IS THE NULL HYPOTHESIS FOR CHI SQUARE
that the variables are INDEPENDENT
IF THE NULL HYPOTHESIS IS _____ AND THE VARIABLES ARE _________, THEN THERE SHOULD BE _________ DIFFERENCE BETWEEN THE EXPECTED AND OBSERVED FREQUENCIES
true, independent, little
IF THE NULL HYPOTHESIS IS ______, THERE SHOULD BE _______ DIFFERENCES BETWEEN THE EXPECTED AND OBSERVED FREQUENCIES
false, large
THE GREATER THE ________ BETWEEN EXPECTED AND OBSERVED FREQUENCIES, THE _____ LIKELY THAT THE VARIABLES ARE ______ AND _____ LIKELY THAT WE WILL BE ABLE TO _______ THE NULL HYPOTHESIS
differences, less, independent, more, reject
X^2(OBTAINED) = Σ (fo - fe) ^2 / fe
calculation of chi square
FE = ROW MARGINAL X COLUMN MARGINAL / N
expected frequency formula of each cell
STEP ONE : MAKING ASSUMPTIONS AND MEETING TEST REQUIREMENTS
model : independent random samples
- level of measurement is NOMINAL
STEP TWO : STATING THE NULL HYPOTHESIS
ho : the two variables are INDEPENDENT
(h1 : the two variables are DEPENDENT)
STEP THREE : SELECTING THE SAMPLING DISTRIBUTION AND ESTABLISHING THE CRITICAL REGION
- the sampling distribution of sample chi squares are POSITIVELY SKEWED, with higher values of sample chi squares in the upper tail of the distribution
- the critical region is established in the UPPER TAIL OF THE SAMPLING DISTRIBUTION
DF = (R-1)(C-1)
degrees of freedom formula
STEP FOUR : COMPUTING THE TEST STATISTIC
x^2 (obtained) = Σ (fo - fe) ^2 / fe
CHI SQUARE TEST OF STATISTICAL SIGNIFICANCE
tests the null hypothesis that the variables are INDEPENDENT in the population
IF WE _______ THE NULL HYPOTHESIS, WE ARE CONCLUDING, WITH A KNOWN PROBABILITY OF ERROR (determined by alpha level), THAT THE VARIABLES ARE ________ ON EACH OTHER IN THE POPULATION
reject, dependent
A _____ SMALL SAMPLE SIZE IS DEFINED AS ONE WHERE A _____ PERCENTAGE OF THE CELLS HAVE EXPECTED FREQUENCIES OF 5 OR LESS
small, high
CONSIDERING TWO VARIABLES SIMULTANEOUSLY
relationship between variables
RELATIONSHIP
2 variables are related if the distribution of cases in (or among) the values (or categories) of one variable differs depending on which value (or category) of the other variable is considered
WHEN IS THERE A RELATIONSHIP
when there are two variables
- never when there’s one, three, four, five, etc
RELATIONSHIP BETWEEN
the distributions differ
NO RELATIONSHIP BETWEEN
distributions don’t differ
NULL
no relationship
OBSERVED FREQUENCIES
all we have is the null hypothesis of relationships
- shows some coordination
EXPECTED FREQUENCIES
no coordination
- “what we think is in the world”
X^2 (OBTAINED) = Σ (Fo - Fe)^2 / Fe
formula for x^2
- “how far is what i’m observing to what i expect”
BECOMES MORE NORMAL = __________
becomes a sampling distribution
PHI Φ
a chi square based measure of association
- appropriate for nominally measured variables that have been organized into a 2x2 bivariate the table
MEASURES OF ASSOCIATION ARE DESCRIPTIVE STATISTICS THAT
summarize the overall strength of the association between 2 variables
WHATS COMPUTED TO MEASURE THE STRENGTH OF THE ASSOCIATION
phi Φ
Φ = √ X^2 / N
formula for phi Φ
MEASURES OF ASSOCIATION
number that talks about the extent to the measures
DISTRIBUTIONS NOT DIFFERING
0.0
DISTRIBUTIONS DIFFERING AS MUCH AS POSSIBLE
1.0
CORRELATIONS NEAR 0
weak
CORRELATIONS NEAR 1
stronger
LINEAR RELATIONSHIP
a relationship between 2 variables in which the observation points (dots) in the scatter gram can be approximated with a straight line
REGRESSION LINE
the simple, best fitting straight line that summarizes the relationship between 2 variables
SCATTERGRAM
graphic display device that depicts the relationship between 2 variables
SLOPE (b)
the amount of change in one variable per unit change in the other
TOTAL VARIATION
the spread of the Y scores around the mean of Y
- equal to Σ(Yi - Ybar)^2
UNEXPLAINED VARIATION
the proportion of the total variation in Y that’s NOT accounted for by X
Y INTERCEPT (a)
the point where the regression line crosses the y axis
THE STATISTICAL TECHNIQUES OF CORRELATION AND REGRESSION ARE MORE APPROPRIATELY USED WITH HIGH QUALITY, PRECISELY MEASURED VARIABLES AT THE _____ ______ ________
interval ratio level
WHICH SCORE IS ARRAYED ALONG THE HORIZONTAL AXIS
independent (X) value
WHICH SCARES ARE ALONG THE VERTICAL AXIS
dependent (Y) variables
2 REASONS WHY SCATTERGRAMS ARE USED
- provide at least impressionistic information about the existence, strength, and direction of the relationship of linearity
- the scattergram can be used to predict the score of. case on one variable from the score of that case on the other variable
WHERE WOULD ALL DOTS LIE IN A PERFECT ASSOCIATION
all dots would lie on the regression line
THE ________ OF THE BIVARIATE ASSOCIATION CAN BE JUDGED BY OBSERVING THE SPREAD OF DOTS AROUND THE _______ ______
spread, regression line
AS ___ INCREASES, ___ ALSO INCREASES
X, Y
IF THE RELATIONSHIP HAD BEEN NEGATIVE, THE REGRESSION LINE WOULD HAVE SLOPED IN THE _________ DIRECTION TO INDICATE THAT ______ SCORES ON ONE VARIABLE WERE ASSOCIATED WITH ____ SCORES ON THE OTHER
opposite, high, low
THE OBSERVATION POINTS OR DOTS ON A SCATTERGRAM ___________.
must form a pattern that can be approximated with a straight line
IF THE RELATIONSHIP IS NON LINEAR, YOU MIGHT NEED TO TREAT THE VARIABLES AS IF THEY WERE _______ RATHER THAN _______ IN LEVEL OF MEASUREMENT
ordinal, interval ratio
THE MEAN OF ANY DISTRIBUTION OF SCORES IS THE POINT AROUND WHICH _____________
the variation of the scores, as measured by squared deviations, is minimized
Σ(Xi - XBar) ^2
variance of x
IF THE REGRESSION LINE IS DRAWN SO THAT IT TOUCHES EACH _____________, IT WOULD BE THE STRAIGHT LINE THAT COMES AS CLOSE AS POSSIBLE TO ALL THE SCORES
conditional mean of Y
CONDITIONAL MEANS ARE FOUND BY
summing all Y values for each value of X and then dividing by all numbers of the cases
THE Y INTERCEPT, OR THE POINT WHERE THE REGRESSION LINE CROSSES THE Y AXIS
a
THE SLOPE OF THE REGRESSION LINE, OR THE AMOUNT OF CHANGE PRODUCED IN Y BY A UNIT CHANGE IN X
b
SCORE OF INDEPENDENT VARIABLE
x
THE POINT AT WHICH THE REGRESSION LINE CROSSES THE VERTICAL, OR Y, AXIS
y intercept (a)
THE LEAST SQUARES REGRESSION LINE IS THE AMOUNT OF CHANGE PRODUCED IN THE DEPENDENT VARIABLE (Y) BY A UNIT CHNGW IN THE INDEPENDENT VARIABLE (X)
slope (b)
IF THE VARIABLES HAVE A ______ ASSOCIATION, THEN CHANGES IN THE VALUE OF X WILL BE ACCOMPANIED BY SUBSTANTIAL CHANGES IN THE VALUE OF Y, AND THE SLOPE (b) WILL HAVE A _________ VALUE
strong, high
THE ________ THE EFFECT OF X ON Y, THE _______ THE VALUE OF THE SLOPE (b)
weaker, lower
IF THE TWO VARIABLES ARE UNRELATED, THE LEAST SQUARES REGRESSION LINE WOULD BE PARALLEL TO THE X AXIS, AND SLOPE (b) WOULD BE
0.0, line would have NO slope
B = COV(X,Y) / VAR (X)
formula for slope (b)
VERTICAL ABOVE MEAN
x
DEVIATIONS FROM THE MEAN
“how far are my dispersions from the mean?”