Week 2 Flashcards

1
Q

Bivariate distributions

A

two score for each individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scatter Diagram

A

picture of the relationship between two variables

an important reason for examining the scatter diagram is that the relationships between X and Y are not always best described by a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Regression

A

Trying to predict a variable Y from another variable X

Best guess from a midterm mark to a final - use data from past - use this on a new population

make predictions about scores on one variable from knowledge of scores on another variable-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Regression - Galton

A

Individuals with unusual characteristics tended to produce offspring who were closer to average

Regression towards mediocrity - idea became the basis for a statistical procedure that described how scores tend to regress toward the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is regression important in psychological testing?

A

Figure out associations between different variables and measurements

Determine whether changes in test scores are related to changes in performance

make predictions about scores on one variable from knowledge of scores on another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

difference btw regression and correlation

A

Regression done on the actual numbers
Correlation takes those numbers and uses standardized units

use correlation to assess the magnitude and direction of a relationship.

regression, is used to make predictions about scores on one variable from knowledge of scores on another variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression equation & Residual

A

gives a predicted value for y as denoted by Y’

Y’ = bx + a

Y’ = the predicted value of Y
b = regression coefficient - slope of the line
===. The regression coefficient can be expressed as the ratio of the sum of squares for the covariance to the sum of squares for X. Sum of squares is defined as the sum of the squared deviations around the mean.
a = value of Y when X is 0. a = ybar - bxbar

actual and predicted are rarely the same

The difference between the observed and predicted is the residual - best fitting line keeps residuals to a minimum - minimizes deviation between observed and predicted

Because residuals can be positive or negative and will cancel to 0 if averaged, the best-fitting line is most appropriately found by squaring each residual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression line & Principle of least squares

A

Used to find the regression line

Minimizes the squared deviation around the regression line

Understand:
Mean is the point of least squares for any variable. Sum of squared deviations around the mean will be less than it is around any value other than the mean.

Regression line is the running mean or line of least squares.
The least squares method in regression finds the straight line that comes as close to as many of these Y means as possible. In other words, it is the line for which the squared deviations around the line are at a minimum.

best-fitting line is obtained by keeping these squared residuals as small as possible. This is known as the principle of least squares

SUM (Y-Y)^2 is at a minimum
observed - predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sum of cross Products (covariance)

A

Variance around each mean

How far away are all x’s from mean of x

How far away from y from mean of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Covariance & the goal of regression analysis

A

Covariance - Whether two variables covary - does y get larger as X gets larger

The covariance is calculated from the cross products, or products of variations around each mean.

Regression analysis attempts to determine how similar the variance between two variables is by dividing the covariance by the average variance of each variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Intercept of the regression line = a

A

A = ybar - bxbar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression Plot

A

Pictures that show the relationship between variables

Common use of correlation is to determine the criterion validity evidence for a test, or the relationship between a test score and some well-defined criterion.

association between a test of job aptitude and the criterion of actual performance on the job is an example of criterion validity evidence.

normative because it uses information gained from a representative group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Correlation

A

Correlation is a special case of regression in which the scores for both variables are in standardized, or Z, units.

correlation coefficient is that it has a reciprocal nature. The correlation between X and Y will always be the same as the correlation between Y and X

regression does not have this property.
eliminates the need to find the intercept

In correlation, the intercept is always 0

Correlation coefficient - describes the direction and magnitude of the relationship
assess the magnitude and direction of a relationship

Regression but with the scores normalized - varies between -1 and 1 = no intercept value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correlation between two randomly created variables will not always be 0

A

By chance alone its possible to observe a correlation higher or lower than 0

null hypothesis is rejected if there is evidence that the association between two variables is significantly different from 0.

Correlation coefficients can be tested for statistical significance using the t distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

t distribution

A

t distribution is not a single distribution (such as the Z distribution) but a family of distributions, each with its own degrees of freedom.

The degrees of freedom (df ) are defined as the sample size minus two, or N -2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Different kinds of correlation coefficient
Pearsons = ratio scale, occasional interval like likert

A

determine the degree of variation in one variable that can be estimated from knowledge about variation in the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Different kinds of correlation coefficient
Biserial r

A

biserial correlation expresses the relationship between a continuous variable and an artificial dichotomous variable

relationship between passing or failing the bar examination (artificial dichotomous variable) and GPA in law school (continuous variable).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Different kinds of correlation coefficient
Point biseral r

A

dichotomous variable had been “true” (such as gender),

For instance, the point biserial correlation would be used to find the relationship between gender and GPA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Tetrochoric r
Different kinds of correlation coefficient

A

both dichotomous variables are artificial, we might use a special correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Different kinds of correlation coefficient
Phi

A

Depends on whether variables are continuous, dichotomous (artificial or true)

both variables are dichotomous and at least one of the dichotomies is “true,” then the association between them can be estimated using the phi coefficient

Also coefficients for rank correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Spearman’s Rho

A

Rank order variables

correlation for finding the association between two sets of rank

rho coefficient (r) is easy to calculate and is often used when the individuals in a sample can be ranked on two variables but their actual scores are not known or have a normal distribution

One whole family of correlation coefficients involves dichotomous variables.
true dichotomous because they naturally form two categories - gender

artificially dichotomous because they reflect an underlying continuous scale forced into a dichotomy. Passing or failing a bar examination is an example of such an artificial dichotomy;

22
Q

Residual

A

Y - Y’
Observed - predicted

The difference between the predicted and the observed values is called the residual.

sum of the residuals always equals 0
sum of the squared residuals is the smallest value according to the principle of least squares [Σ(Y2Y′) 2 5 smallest value].

23
Q

Standard Error of Estimate

A

How far apart are my predicted and observed
standard deviation of the residuals

measure of the accuracy of prediction
most accurate when the standard error of estimate is relatively small.

24
Q

Coefficient of Determination r^2

A

What percentage of variation in Y that is known as a function of knowing X

How much is accounted for

25
Q

Coefficient of Alienation

A

Sqrt (1-r^2)

How not associated the variables are
r is the coefficient of determination

26
Q

General Multivariate Models: Linear Combination

A

Multiple X variables and regression coefficients
relationship among combinations of three or more variables

study the relationship between many predictors and one outcome, as well as the relationship among the predictors.

multiple regression, and the goal of the analysis is to find the linear combination of the three variables that provides the best prediction of law school success.

law school GPA 5 .80 (Z scores of undergraduate GPA) + 1.54 (Z scores of professor ratings) + 1.03 (Z scores of age)
reason for using Z scores for the three predictors is that the coefficients in the linear composite are greatly affected by the range of values taken on by the variables.

27
Q

standardized regression coefficients

A

When the variables are expressed in Z units, the coefficients, or weights for the variables, are known as standardized regression coefficients

28
Q

raw regression coefficients

A

weights in the model are called raw regression coefficients

29
Q

Discriminant Analysis

A

When the task is to find the linear combination of variables that provides maximum discrimination between categories, the appropriate multivariate method is discriminant analysis.

attempts to determine whether a set of measures predicts success or failure on a particular performance evaluation

For example, say that two groups of children are classified as “language disabled” and “normal.” After a variety of items are presented, discriminant analysis is used to find the linear combination of items that best accounts for differences between the two groups

30
Q

Shrinkage

A

Regression equation - tendency to overestimate the relationship, particularly if the sample of subjects is small

Shrinkage is the amount of decrease observed when a regression equation is created for one population and then applied to another

regression equation is developed to predict first-year college GPAs on the basis of SAT scores.

Although the proportion of variance in GPA might be fairly high for the original group, we can expect to account for a smaller proportion of the variance when the equation is used to predict GPA in the next year’s class

31
Q

Cross Validation

A

ensure that proper references are being made is to use the regression equation to predict performance in a group of subjects other than the ones to which the equation was applied.

standard error of estimate can be obtained for the relationship between the values predicted by the equation and the values actually observed

32
Q

Correlation-Causation Problem

A

Just because two variables are correlated does not necessarily imply that one has caused the other

33
Q

Third Variable Explanation

A

the apparent relationship between viewing and aggression actually might be the result of some variable not included in the analysis.

34
Q

Restricted Range

A

circumstances in which the ranges of variability are restricted.

relationship between scores on the Graduate Record Examination GRE quantitative test and performance during the first year of graduate school in the math department of an elite Ivy League university.

No students had been admitted to the program with GRE verbal scores less than 700.

most grades given in graduate school were A’s.

might be extremely difficult to demonstrate a relationship even though a true underlying relationship may exist.

. Correlation requires variability. If the variability is restricted, then significant correlations are difficult to find.

35
Q

Factor Analysis

A

Trying to find some common factors amongst complex, intercorrelated datasets

How many factors do you need to explain the most variance?

How do developmental psychologists find underlying dimensions when we can only observe specific behaviors

How often does baby cry, sensitivity to lights, excessive fear of strangers

Some behaviors will cluster together
Sensitivity to pain and crying

36
Q

Sea monster analogy

A

Visible parts move together and other move independently - intuitive correlation

Correlations between parts we can see = observable behaviors

We can infer about their underlying nature = theoretical constructs

37
Q

Factor analysis

A

a statistical method that looks at how lots of different observations correlate and determines how many theoretical constructs could most simply explain what you see

linear combinations of variables that maximize the prediction of some criterion

matrix that shows the correlation between every variable and every other variable

Find the linear combinations, or principal components, of the variables that describe as many of the interrelationships among the variables as possible

first component will be the most successful in describing the variation among the variables, with each succeeding component somewhat less successful. Thus, we often decide to examine only a few components that account for larger proportions of the variation

find the correlation between the original items and the factors. These correlations are called factor loadings

38
Q

Meaurement
Whats the point - usually have a choice

A

Trade off between complexity and precision

Nominal, ordinal, interval, ratio

Least complex to morst complex
Lease precise to most precise

Program, percentile rank, mcmaster grade, final percentage

39
Q

Three more correlation concepts
Bidirectionality of Predictions

A

X correlates with Y, Y correlates with X just as much

40
Q

Three more correlation concepts
Restriction of Range

A

school level and knowledge of geography

Restrict to grade 3 cant see a correlation
Does GRE predict performance in grad school - doesnt correlate well to grad school- not a great predictor - restriction of range - we are only taking GREs that were high

Would need to let EVERYONE in to test this
Always inspect scatterplots - range restrictions, outliers, nonlineraties (curves)

41
Q

Regression to the Mean

A

Father and biological sons

If a faller is taller than average - we predict his son will also be taller than average
Height is genetically linked, so will be correlated

However, we predict that the son will be a little shorter - closer to average - than his dad

If a father is shorter than average - predict that son will also be shorter than average
Positive correlation

However we predict that son will be a little taller - closer to average - than his dad
The taller the father is, the more we expect the son to be significantly shorter or vice versa

42
Q

regression to the mean - grades

A

Midterm grades are positively, but not perfectly correlated with final exam grades

If you do better than average on the midterm you would do better than average on the final - but probably do a little worse

Worse than average on the midterm - better on the final - best prediction is still less than average

What if one test is easier than the other - problem - transforms into z scores if things are normal

FURTHER FROM MEAN ON ONE - CLOSER ON NEXT ONES

43
Q

Regression to mean only happens when stats are

A

imperfectly correlated

Remember a perfect correlation +1/-1

44
Q

Correlation: y = rx

A

(if x and y are in standardized units - zscores - and r is the correlation coefficient

Trying to predict y in z units from x in z units

If two scores are perfectly positively correlated what is the relationship btw x and y

What if x was one sd higher than the mean
What if x was two sds higher than the mean

If two scores have a correlation of 0.5, what is the relationship btw x and y

What if x was one sd higher than the mean
What if X was two SDS higher than the mean

Y = 0.5x - son will be 0.5 sds away from the mean
Taller dad x - 2sds from mean
Y = 0.5(2)
Son will be one sd above the mean - shorter than dad - 1 sd

45
Q

Statistical Concepts and Rationality

A

If you understand the actual concepts of the normal distribution (lots of people are near the middle, fewer on the outside), plus how correlation works, including restriction of range and regression to the mean, you are in a position to act more rationally than the vast majority of the population

46
Q

Spearman’s Early Studies

A

Spearman actually worked out most of the basics of contemporary reliability theory and published his work in a 1904 article entitled “The Proof and Measurement of Association between Two Things.”

47
Q

Reliability

A

Does a test measure something the same way - do we get the same results every time

We dont have perfect measures - trying to measure things that are difficult to measure

Does our depression meter come up with the same thing -

48
Q

What are some reasons people do better or worse in an exam than they “should”?

A

The test itself
The test taker
Not feeling well etc
The environment
Room was hot, loud, coughing, alarm
How the test was scored
Essay - unfair - ta scoring unreliabl

49
Q

True vs. Observed Scores

A

Theoretical idea of a true score
Imagine taking a test and receiving a score - observed score
We use it to estimate some theoretical TRUE score

50
Q

Basics of Test Score Theory

A

Classical test score theory assumes that each person has a true score that would be obtained if there were no errors in measurement.

observed for each person almost always differs from the person’s true ability or characteristic

If it is a reliable test, the observed score should be pretty close to this theoretical true score

If the test isn’t very reliable, we would expect the observed score might be not all that close to the true score

51
Q

Imagine being IN THEORY able to take that same test over and over, and receiving an observed score X each time

A

In real world, we cant really do this because of practice effects etc

We could plot the distribution of all those observed X values

Turn out normally distributed - plot and find that they are clustered around the mean - basic normal distribution

Normally distributed - know lots of things about this already

Need to know mean
Need to know SD
Can describe hundreds of things with 2 numbers

The mean of this distribution is the theoretical true score
The observed scores are normally distributed around the true score
Small SD - tightly clustered around the mean
Small variance - observed score is probably a good guess for true score

52
Q

Why would an observed score differ from a true score

A

Error - nothing is perfect

X = T + E

Observed score X is true score plus the error
All theoretical cant actually calculate this - error can be pos or neg