Week 2 Flashcards

1
Q

Bivariate distributions

A

two score for each individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scatter Diagram

A

picture of the relationship between two variables

an important reason for examining the scatter diagram is that the relationships between X and Y are not always best described by a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Regression

A

Trying to predict a variable Y from another variable X

Best guess from a midterm mark to a final - use data from past - use this on a new population

make predictions about scores on one variable from knowledge of scores on another variable-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Regression - Galton

A

Individuals with unusual characteristics tended to produce offspring who were closer to average

Regression towards mediocrity - idea became the basis for a statistical procedure that described how scores tend to regress toward the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is regression important in psychological testing?

A

Figure out associations between different variables and measurements

Determine whether changes in test scores are related to changes in performance

make predictions about scores on one variable from knowledge of scores on another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

difference btw regression and correlation

A

Regression done on the actual numbers
Correlation takes those numbers and uses standardized units

use correlation to assess the magnitude and direction of a relationship.

regression, is used to make predictions about scores on one variable from knowledge of scores on another variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression equation & Residual

A

gives a predicted value for y as denoted by Y’

Y’ = bx + a

Y’ = the predicted value of Y
b = regression coefficient - slope of the line
===. The regression coefficient can be expressed as the ratio of the sum of squares for the covariance to the sum of squares for X. Sum of squares is defined as the sum of the squared deviations around the mean.
a = value of Y when X is 0. a = ybar - bxbar

actual and predicted are rarely the same

The difference between the observed and predicted is the residual - best fitting line keeps residuals to a minimum - minimizes deviation between observed and predicted

Because residuals can be positive or negative and will cancel to 0 if averaged, the best-fitting line is most appropriately found by squaring each residual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression line & Principle of least squares

A

Used to find the regression line

Minimizes the squared deviation around the regression line

Understand:
Mean is the point of least squares for any variable. Sum of squared deviations around the mean will be less than it is around any value other than the mean.

Regression line is the running mean or line of least squares.
The least squares method in regression finds the straight line that comes as close to as many of these Y means as possible. In other words, it is the line for which the squared deviations around the line are at a minimum.

best-fitting line is obtained by keeping these squared residuals as small as possible. This is known as the principle of least squares

SUM (Y-Y)^2 is at a minimum
observed - predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sum of cross Products (covariance)

A

Variance around each mean

How far away are all x’s from mean of x

How far away from y from mean of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Covariance & the goal of regression analysis

A

Covariance - Whether two variables covary - does y get larger as X gets larger

The covariance is calculated from the cross products, or products of variations around each mean.

Regression analysis attempts to determine how similar the variance between two variables is by dividing the covariance by the average variance of each variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Intercept of the regression line = a

A

A = ybar - bxbar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression Plot

A

Pictures that show the relationship between variables

Common use of correlation is to determine the criterion validity evidence for a test, or the relationship between a test score and some well-defined criterion.

association between a test of job aptitude and the criterion of actual performance on the job is an example of criterion validity evidence.

normative because it uses information gained from a representative group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Correlation

A

Correlation is a special case of regression in which the scores for both variables are in standardized, or Z, units.

correlation coefficient is that it has a reciprocal nature. The correlation between X and Y will always be the same as the correlation between Y and X

regression does not have this property.
eliminates the need to find the intercept

In correlation, the intercept is always 0

Correlation coefficient - describes the direction and magnitude of the relationship
assess the magnitude and direction of a relationship

Regression but with the scores normalized - varies between -1 and 1 = no intercept value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correlation between two randomly created variables will not always be 0

A

By chance alone its possible to observe a correlation higher or lower than 0

null hypothesis is rejected if there is evidence that the association between two variables is significantly different from 0.

Correlation coefficients can be tested for statistical significance using the t distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

t distribution

A

t distribution is not a single distribution (such as the Z distribution) but a family of distributions, each with its own degrees of freedom.

The degrees of freedom (df ) are defined as the sample size minus two, or N -2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Different kinds of correlation coefficient
Pearsons = ratio scale, occasional interval like likert

A

determine the degree of variation in one variable that can be estimated from knowledge about variation in the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Different kinds of correlation coefficient
Biserial r

A

biserial correlation expresses the relationship between a continuous variable and an artificial dichotomous variable

relationship between passing or failing the bar examination (artificial dichotomous variable) and GPA in law school (continuous variable).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Different kinds of correlation coefficient
Point biseral r

A

dichotomous variable had been “true” (such as gender),

For instance, the point biserial correlation would be used to find the relationship between gender and GPA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Tetrochoric r
Different kinds of correlation coefficient

A

both dichotomous variables are artificial, we might use a special correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Different kinds of correlation coefficient
Phi

A

Depends on whether variables are continuous, dichotomous (artificial or true)

both variables are dichotomous and at least one of the dichotomies is “true,” then the association between them can be estimated using the phi coefficient

Also coefficients for rank correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Spearman’s Rho

A

Rank order variables

correlation for finding the association between two sets of rank

rho coefficient (r) is easy to calculate and is often used when the individuals in a sample can be ranked on two variables but their actual scores are not known or have a normal distribution

One whole family of correlation coefficients involves dichotomous variables.
true dichotomous because they naturally form two categories - gender

artificially dichotomous because they reflect an underlying continuous scale forced into a dichotomy. Passing or failing a bar examination is an example of such an artificial dichotomy;

22
Q

Residual

A

Y - Y’
Observed - predicted

The difference between the predicted and the observed values is called the residual.

sum of the residuals always equals 0
sum of the squared residuals is the smallest value according to the principle of least squares [Σ(Y2Y′) 2 5 smallest value].

23
Q

Standard Error of Estimate

A

How far apart are my predicted and observed
standard deviation of the residuals

measure of the accuracy of prediction
most accurate when the standard error of estimate is relatively small.

24
Q

Coefficient of Determination r^2

A

What percentage of variation in Y that is known as a function of knowing X

How much is accounted for

25
Coefficient of Alienation
Sqrt (1-r^2) How not associated the variables are r is the coefficient of determination
26
General Multivariate Models: Linear Combination
Multiple X variables and regression coefficients relationship among combinations of three or more variables study the relationship between many predictors and one outcome, as well as the relationship among the predictors. multiple regression, and the goal of the analysis is to find the linear combination of the three variables that provides the best prediction of law school success. law school GPA 5 .80 (Z scores of undergraduate GPA) + 1.54 (Z scores of professor ratings) + 1.03 (Z scores of age) reason for using Z scores for the three predictors is that the coefficients in the linear composite are greatly affected by the range of values taken on by the variables.
27
standardized regression coefficients
When the variables are expressed in Z units, the coefficients, or weights for the variables, are known as standardized regression coefficients
28
raw regression coefficients
weights in the model are called raw regression coefficients
29
Discriminant Analysis
When the task is to find the linear combination of variables that provides maximum discrimination between categories, the appropriate multivariate method is discriminant analysis. attempts to determine whether a set of measures predicts success or failure on a particular performance evaluation For example, say that two groups of children are classified as “language disabled” and “normal.” After a variety of items are presented, discriminant analysis is used to find the linear combination of items that best accounts for differences between the two groups
30
Shrinkage
Regression equation - tendency to overestimate the relationship, particularly if the sample of subjects is small Shrinkage is the amount of decrease observed when a regression equation is created for one population and then applied to another regression equation is developed to predict first-year college GPAs on the basis of SAT scores. Although the proportion of variance in GPA might be fairly high for the original group, we can expect to account for a smaller proportion of the variance when the equation is used to predict GPA in the next year’s class
31
Cross Validation
ensure that proper references are being made is to use the regression equation to predict performance in a group of subjects other than the ones to which the equation was applied. standard error of estimate can be obtained for the relationship between the values predicted by the equation and the values actually observed
32
Correlation-Causation Problem
Just because two variables are correlated does not necessarily imply that one has caused the other
33
Third Variable Explanation
the apparent relationship between viewing and aggression actually might be the result of some variable not included in the analysis.
34
Restricted Range
circumstances in which the ranges of variability are restricted. relationship between scores on the Graduate Record Examination GRE quantitative test and performance during the first year of graduate school in the math department of an elite Ivy League university. No students had been admitted to the program with GRE verbal scores less than 700. most grades given in graduate school were A’s. might be extremely difficult to demonstrate a relationship even though a true underlying relationship may exist. . Correlation requires variability. If the variability is restricted, then significant correlations are difficult to find.
35
Factor Analysis
Trying to find some common factors amongst complex, intercorrelated datasets How many factors do you need to explain the most variance? How do developmental psychologists find underlying dimensions when we can only observe specific behaviors How often does baby cry, sensitivity to lights, excessive fear of strangers Some behaviors will cluster together Sensitivity to pain and crying
36
Sea monster analogy
Visible parts move together and other move independently - intuitive correlation Correlations between parts we can see = observable behaviors We can infer about their underlying nature = theoretical constructs
37
Factor analysis
a statistical method that looks at how lots of different observations correlate and determines how many theoretical constructs could most simply explain what you see linear combinations of variables that maximize the prediction of some criterion matrix that shows the correlation between every variable and every other variable Find the linear combinations, or principal components, of the variables that describe as many of the interrelationships among the variables as possible first component will be the most successful in describing the variation among the variables, with each succeeding component somewhat less successful. Thus, we often decide to examine only a few components that account for larger proportions of the variation find the correlation between the original items and the factors. These correlations are called factor loadings
38
Meaurement Whats the point - usually have a choice
Trade off between complexity and precision Nominal, ordinal, interval, ratio Least complex to morst complex Lease precise to most precise Program, percentile rank, mcmaster grade, final percentage
39
Three more correlation concepts Bidirectionality of Predictions
X correlates with Y, Y correlates with X just as much
40
Three more correlation concepts Restriction of Range
school level and knowledge of geography Restrict to grade 3 cant see a correlation Does GRE predict performance in grad school - doesnt correlate well to grad school- not a great predictor - restriction of range - we are only taking GREs that were high Would need to let EVERYONE in to test this Always inspect scatterplots - range restrictions, outliers, nonlineraties (curves)
41
Regression to the Mean
Father and biological sons If a faller is taller than average - we predict his son will also be taller than average Height is genetically linked, so will be correlated However, we predict that the son will be a little shorter - closer to average - than his dad If a father is shorter than average - predict that son will also be shorter than average Positive correlation However we predict that son will be a little taller - closer to average - than his dad The taller the father is, the more we expect the son to be significantly shorter or vice versa
42
regression to the mean - grades
Midterm grades are positively, but not perfectly correlated with final exam grades If you do better than average on the midterm you would do better than average on the final - but probably do a little worse Worse than average on the midterm - better on the final - best prediction is still less than average What if one test is easier than the other - problem - transforms into z scores if things are normal FURTHER FROM MEAN ON ONE - CLOSER ON NEXT ONES
43
Regression to mean only happens when stats are
imperfectly correlated Remember a perfect correlation +1/-1
44
Correlation: y = rx
(if x and y are in standardized units - zscores - and r is the correlation coefficient Trying to predict y in z units from x in z units If two scores are perfectly positively correlated what is the relationship btw x and y What if x was one sd higher than the mean What if x was two sds higher than the mean If two scores have a correlation of 0.5, what is the relationship btw x and y What if x was one sd higher than the mean What if X was two SDS higher than the mean Y = 0.5x - son will be 0.5 sds away from the mean Taller dad x - 2sds from mean Y = 0.5(2) Son will be one sd above the mean - shorter than dad - 1 sd
45
Statistical Concepts and Rationality
If you understand the actual concepts of the normal distribution (lots of people are near the middle, fewer on the outside), plus how correlation works, including restriction of range and regression to the mean, you are in a position to act more rationally than the vast majority of the population
46
Spearman’s Early Studies
Spearman actually worked out most of the basics of contemporary reliability theory and published his work in a 1904 article entitled “The Proof and Measurement of Association between Two Things.”
47
Reliability
Does a test measure something the same way - do we get the same results every time We dont have perfect measures - trying to measure things that are difficult to measure Does our depression meter come up with the same thing -
48
What are some reasons people do better or worse in an exam than they “should”?
The test itself The test taker Not feeling well etc The environment Room was hot, loud, coughing, alarm How the test was scored Essay - unfair - ta scoring unreliabl
49
True vs. Observed Scores
Theoretical idea of a true score Imagine taking a test and receiving a score - observed score We use it to estimate some theoretical TRUE score
50
Basics of Test Score Theory
Classical test score theory assumes that each person has a true score that would be obtained if there were no errors in measurement. observed for each person almost always differs from the person’s true ability or characteristic If it is a reliable test, the observed score should be pretty close to this theoretical true score If the test isn’t very reliable, we would expect the observed score might be not all that close to the true score
51
Imagine being IN THEORY able to take that same test over and over, and receiving an observed score X each time
In real world, we cant really do this because of practice effects etc We could plot the distribution of all those observed X values Turn out normally distributed - plot and find that they are clustered around the mean - basic normal distribution Normally distributed - know lots of things about this already Need to know mean Need to know SD Can describe hundreds of things with 2 numbers The mean of this distribution is the theoretical true score The observed scores are normally distributed around the true score Small SD - tightly clustered around the mean Small variance - observed score is probably a good guess for true score
52
Why would an observed score differ from a true score
Error - nothing is perfect X = T + E Observed score X is true score plus the error All theoretical cant actually calculate this - error can be pos or neg