Summa Week 6 Flashcards
What is the GLM?
the GLM or the General LInear MOdel is the conceptual framework unifying a large set of statistical methods
What can GLM answer?
almost any question if the DV is continuous
What must the DV be for the GLM to likely be able to answer it?
if it is continuous, and therefore NOT categorical
What are familiar statistics within the GLM?
t-tests
correlation
ANOVA
regression
What is bivariate correlation?
an association between scores on 2 random variables
e.g. hrs spent on Twitter and # followers
What is the relation in correlation?
a straight line, or “linear correlation” or “linear regression”
What is assumed about a correlation?
it can be a straight line if a linear correlation or regression, or a curved one referred to as a non-linear regression
how are correlations detected by the eye?
usually by a scatterplot
What are scatterplots?
a graph that shows the degree and pattern of the relationship between two variables
What is on the horizontal axis of a scatterplot?
a variable that does the predicting (likely arbitrary, and an IV)
e.g. hours of studying, income
What is on the vertical axis of a scatterplot?
usually the variable that is predicted (likely a DV)
e.g. grades, happiness
What is the shape of a positive bivariate distribution or a scatterplot?
a line from the bottom left to the top right
What is the shape of a negative bivariate distribution or a scatterplot?
line from top left to bottom right
What is the shape of no correlation in a scatterplot?
there are datapoints scattered all ove r the plot
What is the shape of a curvilinear distribution or a scatterplot?
a bell curve on a scatterplot
What is the covariance?
the correlation coefficient which is a bivariate statistic that measures the degree of linear association between two quantitative variables
What is covariance?
a number that reflets the degree to which two variables vary together
What represents the correlation coefficient?
italicized r
What does italicized r indicate?
the precise degree of linear correlation between two variables, or the correlation coefficient
What is the range of italicized r?
the range for a correlation coefficient is between -1 and 1 [-1,1]
What is the formula for covariance?
COVxy = sum of (stat1 - mean1) (stat2 - mean 2)/ (N - 1)
What is the formula for a correlation coefficient?
r = COVxy / SxSy
Who developed the covariance, or correlation coefficient?
Pearson
What is an assumption of random sampling for Pearson’s r?
each sample is a random sample from its population
What is an assumption of random sampling for the robustness of Pearson’s r?
considered inappropriate to conduct if violated, but some argue ti is robust if violated
What is an assumption for Pearson’s r in regards to independence of cases?
each case is NOT influenced by other cases
What is an assumption for Pearson’s r in regards to the robustness of the independence of cases?
NOT robust to violations
What is an assumption for Pearson’s r in regards to normality?
the DV is normally distributed in each population
What is an assumption for Pearson’s r in regards to normality’s robustness?
robust to violations if the sample size is large
What is an assumption for Pearson’s r in regards to linearity?
the relationship between the two variables in the population is a linear one
What is an assumption for Pearson’s r in regards to linearity’s robustness?
not robust
What is the sign for adjusted correlation coefficient?
Radj
What is Radj?
when the number of observations is small, the sample correlation will be a BIASED estimate of the population correlation coefficient (not accurate). To correct for this, we can computed the Radj, which is an unbiased estimate of the population correlation coefficient
What is the formula for Radj?
Radj = square root of (1 - [(1-r^2) (N-1)/(N-2)]
What is the reduced formula for Radj?
Radj = square root of (1 - (N-1/N-2) (1 - r^2))
What is the symbol for the correlation coefficient in the population?
(p) rho
What is the (p) rho?
an unbiased estimate of the correlation coefficient in the population
What is the amount that the common hypothesis that we test for a sample correlation between X and Y in the population, denoted by p (rho)?
zero
Why is the p (rho) a meaningful test of the population correlation coefficient?
the null hypothesis being tested is really the hypothesis that X and Y are linearly independent.
Rejection of the hypothesis that X and Y are linearly independent leads to…
the conclusion that they are not independent, and there is some linear relationship between them
What is the sampling distribution of r when p = 0?
above 0 is ur +, and below is ur -
When p = 0 and the sample size is relatively large, the sampling distribution of r will be _____ with a standard error of ____
normal, sr
What is the formula for standard error of sr?
sr = square root of [(1 - r^2)/(N-2)]
What can we calculate a t-stat for a correlation coefficient as?
df = n - 2 t = r (square root of N-2)/(square root of 1 - r^2)
What factors influence the Pearson r?
(L) linearity
(O) outliers
(R) restriction of range
(C) context
Why does linearity affect the Pearson r?
r will underestimate the relationship of a bivariate distribution by departing from linearity
Why do outliers influence the Pearson r?
discrepant data points, or outliers, affect the magnitude of r and the direction of the effect depending on the outlier’s location in the scatterplot