Exam 2 Flashcards
correlation
measures and describes a relationship between two variables
linear vs curvilinear
linear: forms a line
curvilinear: has a curve
* to determine do a scatterplot
positive vs negative relationship
positive: has an upward slope, direct (when Y increases X increases)
negative: has a downward slop, inverse (when Y increases X decreases)
scatterplot
graph used for correlation that graphs all points in a data
correlation coefficient (pearson r)
specific measure of a correlation
from -1 to 1
pearson r formula
SP/ sqrt SSxSSy
coefficient of determination (r^2)
measures the percentage of variability in one variable which is determined by its relationship with the other variable
what is another word for the coefficient of determination?
effectsize
what are the ranges of effectsize?
small: r=10 -> r2= 1%
mid: r=30 -> r2= 9%
large: r=50 -> r2= 25%
spearman rho (rs)
used for ordinary scaling
spearman rho formula
rs= 1- (6 sigma D1^2 / N^3 - N)
where:
D1= Rank B - Rank A * sum should equal 0
why does correlation not imply cuasation
we are only observing the relationship between the two variables we are not influencing either of them
* there can always be a third variable that is influencing them
pearson r
measures the degree of a straight line
- only for interval and ratio scales
define regression
regression focuses on using the relationship for the purpose of PREDICTION
- linear relationship between 2 variables
- paired scores
define regression line
we can predict a score of one variable (Y) based on our knowledge of the other (X)
Linear regression equation
Y’ = byX + ay
where:
by= SP/SSx
ay= Ybar - by(Xbar)
regression constant
that value where the regression line crosses the y-axis
* ay
standard error of estimate
tells us how far away, on average, a point will be from the regression line
* for accuracy we want a low value
Specify the conditions that must be met to use linear regression
- to determine if the relationship is linear use scatterplot
- random sampling
- can only be used to make predictions within the range
- we are not interested in the individuals who are apart of the group used for linear regression
define a random sample
each possible sampled of a given size has an equal chance of being selected
why is it important to use random sampling?
decreases error
what are the two ways to conduct random sampling?
- sampling with replacement
- sampling without replacement
sampling WITH replacement
each member of the population selected is returned
- does not change probability
sampling WITHOUT replacement
selected member of the population is not returned
- probability WILL change
a priori
before hand, based on rationalism
a posterior
after the fact, based on empiricism
a priori formula
p(A) = # of events classifiable as A / total N of possible events
a posterior formula
p(A) = # of times A occurred / total N of possible events
what are the three basic points in probability?
- can be defined in two ways: a priori or a posterior
- can be a fraction, decimal, or percentage
- range is from 0 to 1
the addition rule
p(A or B) = p(A) + p(B) - p(A and B)
* OR = ADDITION
!! if mutually exclusive: p(A) + p(B)
multiplication rule
INDEPENDENT: p(A and B) = p(A) x p(B)
DEPENDENT: p(A and B) = p(A) x p(B|A)
Mutually exclusive: p(A and B) = 0
mutual exclusive events
A and B CANNOT happen at the SAME time; has to be OR
independent
they do NOT influence each other = sampling with repalcement
dependent
has an effect of probability = sampling without replacement
exhaustive
includes all of the possible events
when there are only two events…
we call that P and Q
- P + Q = 1
Define probability in conjunction with a continuous variable
when using a continuous variable use z-score