3: Exploratory Data Analysis: Relationships between variables Flashcards
scatterplot
shows relationship b/w two quant. variables measured on the same individuals.
values of one variable appear on horiz axis and values of other appear on vert axis. each individual in the data appears as the point in the plot fixed by the values of both variables for that individual
for scatterplot, the explanatory variable always goes on which axis?
explanatory variable (if there is one) is on X axis response variable is on Y
two variables are __________ when above average values of one tend to accompany above average values of the other and below average values also tend to occur together
two variables are POSITIVELY ASSOCIATED when above average values of one tend to accompany above average values of the other and below average values also tend to occur together
two variables are ______________ when above-average values of one tend to accompany below average values of the other and vice versa
two variables are NEGATIVELY ASSOCIATED when above-average values of one tend to accompany below average values of the other and vice versa
strength of a scatterplot relationship is shown by…
how closely the points follow a clear form (e.g. line)
transformation of data
we replace the original values by the transformed values and then use the transformed values for our analysis
correlation
measures the direction and strength of the linear relationship b/w 2 quant variables. r.
if we have data for x and y for n individuals… The means and std deviation of the two variables are xbar and Sx for the x-values and ybar and Ys for the y values, the correlation is:
PASTE FORMULA HERE
correlation indicates the direction of a linear relationship by its sign. r > 0 for ____ association and r < 0 for ________ association.
correlation indicates the direction of a linear relationship by its sign. r > 0 for POSITIVE association and r < 0 for NEGATIVE association.
correlation is/is not a resistant measure.
correlation is not a resistant measure.
normal distributions
- density curve is always on or above the X axis
- density curve has exactly 1.0 total area beneath it
- normal distrib describes overall pattern of a distribution
- area under the curve and above (or below) any value is the relative frequency of all observations that fall in that range
center and spread of density curves
- mode
- median
- mean
- mode = peak point
- median = point at which half the total area is on each side
- mean - point at which curve would balance if solid
In Normal Curve….
approx ____% of observations fall within 1 std. dev of mean
approx ____% of observations fall within 2 std. dev of mean
approx ____% of observations fall within 3 std. dev of mean
- 68
- 95
- 99.7
Crosstab measures ____ and ____ variables. Rows are responsible for ___ variable and columns are ____ variable.
Crosstab measures CAT and CAT variables. Rows are responsible for RESPONSE variable and columns are EXPLANATORY variable.
In a scatterplot, the response variable = ___ axis and the explanatory variable = ___ axis
In a scatterplot, the response variable = Y axis
and the explanatory variable = X axis
correlation
standardized measure of the direction and strength of the linear relationship b/w 2 quant variables
ranges from -1.0 to 1.0
b/c r uses the standardized values of the observations, the correlation doesn’t change when we change units of measurement
correlation ignores distinction b/w response and explanatory variable
correlation r is strongly affected by a few outlying observations