Week 2 Flashcards
Bivariate distributions
two score for each individual
Scatter Diagram
picture of the relationship between two variables
an important reason for examining the scatter diagram is that the relationships between X and Y are not always best described by a straight line.
Regression
Trying to predict a variable Y from another variable X
Best guess from a midterm mark to a final - use data from past - use this on a new population
make predictions about scores on one variable from knowledge of scores on another variable-
Regression - Galton
Individuals with unusual characteristics tended to produce offspring who were closer to average
Regression towards mediocrity - idea became the basis for a statistical procedure that described how scores tend to regress toward the mean
Why is regression important in psychological testing?
Figure out associations between different variables and measurements
Determine whether changes in test scores are related to changes in performance
make predictions about scores on one variable from knowledge of scores on another variable
difference btw regression and correlation
Regression done on the actual numbers
Correlation takes those numbers and uses standardized units
use correlation to assess the magnitude and direction of a relationship.
regression, is used to make predictions about scores on one variable from knowledge of scores on another variable.
Regression equation & Residual
gives a predicted value for y as denoted by Y’
Y’ = bx + a
Y’ = the predicted value of Y
b = regression coefficient - slope of the line
===. The regression coefficient can be expressed as the ratio of the sum of squares for the covariance to the sum of squares for X. Sum of squares is defined as the sum of the squared deviations around the mean.
a = value of Y when X is 0. a = ybar - bxbar
actual and predicted are rarely the same
The difference between the observed and predicted is the residual - best fitting line keeps residuals to a minimum - minimizes deviation between observed and predicted
Because residuals can be positive or negative and will cancel to 0 if averaged, the best-fitting line is most appropriately found by squaring each residual.
Regression line & Principle of least squares
Used to find the regression line
Minimizes the squared deviation around the regression line
Understand:
Mean is the point of least squares for any variable. Sum of squared deviations around the mean will be less than it is around any value other than the mean.
Regression line is the running mean or line of least squares.
The least squares method in regression finds the straight line that comes as close to as many of these Y means as possible. In other words, it is the line for which the squared deviations around the line are at a minimum.
best-fitting line is obtained by keeping these squared residuals as small as possible. This is known as the principle of least squares
SUM (Y-Y)^2 is at a minimum
observed - predicted
Sum of cross Products (covariance)
Variance around each mean
How far away are all x’s from mean of x
How far away from y from mean of y
Covariance & the goal of regression analysis
Covariance - Whether two variables covary - does y get larger as X gets larger
The covariance is calculated from the cross products, or products of variations around each mean.
Regression analysis attempts to determine how similar the variance between two variables is by dividing the covariance by the average variance of each variable
Intercept of the regression line = a
A = ybar - bxbar
Regression Plot
Pictures that show the relationship between variables
Common use of correlation is to determine the criterion validity evidence for a test, or the relationship between a test score and some well-defined criterion.
association between a test of job aptitude and the criterion of actual performance on the job is an example of criterion validity evidence.
normative because it uses information gained from a representative group
Correlation
Correlation is a special case of regression in which the scores for both variables are in standardized, or Z, units.
correlation coefficient is that it has a reciprocal nature. The correlation between X and Y will always be the same as the correlation between Y and X
regression does not have this property.
eliminates the need to find the intercept
In correlation, the intercept is always 0
Correlation coefficient - describes the direction and magnitude of the relationship
assess the magnitude and direction of a relationship
Regression but with the scores normalized - varies between -1 and 1 = no intercept value
Correlation between two randomly created variables will not always be 0
By chance alone its possible to observe a correlation higher or lower than 0
null hypothesis is rejected if there is evidence that the association between two variables is significantly different from 0.
Correlation coefficients can be tested for statistical significance using the t distribution
t distribution
t distribution is not a single distribution (such as the Z distribution) but a family of distributions, each with its own degrees of freedom.
The degrees of freedom (df ) are defined as the sample size minus two, or N -2
Different kinds of correlation coefficient
Pearsons = ratio scale, occasional interval like likert
determine the degree of variation in one variable that can be estimated from knowledge about variation in the other variable
Different kinds of correlation coefficient
Biserial r
biserial correlation expresses the relationship between a continuous variable and an artificial dichotomous variable
relationship between passing or failing the bar examination (artificial dichotomous variable) and GPA in law school (continuous variable).
Different kinds of correlation coefficient
Point biseral r
dichotomous variable had been “true” (such as gender),
For instance, the point biserial correlation would be used to find the relationship between gender and GPA
Tetrochoric r
Different kinds of correlation coefficient
both dichotomous variables are artificial, we might use a special correlation coefficient
Different kinds of correlation coefficient
Phi
Depends on whether variables are continuous, dichotomous (artificial or true)
both variables are dichotomous and at least one of the dichotomies is “true,” then the association between them can be estimated using the phi coefficient
Also coefficients for rank correlations