Lecture 9 - Correlation & Regression Flashcards

1
Q

In a binary relationship, what are x and y ?

A
x = explanatory variable
y = response variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give an example of a response variable that you cannot put on y axis bc it is dichotomous (yes/no) ?

A
x = smoking
y = lung cancer

*can’t use lung cancer bc it’s either yes or no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A scatter plot can only be used when both variables are _______

A

numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you describe the overall pattern of a scatterplot ?

A

by the form, direction, and strength of the relationship between the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the form and direction of the relationship

A
  • Linear relationships, where the points roughly follow a straight line, are especially important.
  • Relationships can be negative (A) or positive (B) in direction.
  • Curvilinear relationships (C) and clusters (D) are other things to watch for.
  • An important kind of deviation is an OUTLIER (E) where an individual value that falls outside the overall pattern
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The strength of a relationship is determined by ??

A

how close the points in the scatter plot lie to a simple form such as a line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We use the _____ to evaluate the variability for univariate data; where n is the sample size

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For bivariate data, we use the _______ (the variance in x & y ) where n is the number of x,y pairs.

A

covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The covariance is limited as ?

A

a tool for measuring and describing relationships because it’s composite units are difficult to translate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standardized values of variance have ____ units.

A

no (they are the same whether x is measured in cm or mg)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do standardized values express?

A

Their deviations from the mean in terms of their s, and avoid the problem of trying to interpret unit-dependent covariance units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the correlation coefficient ?

A

The strength of a relationship is quantified by the standardized covariance of the two continuous measures, which is termed the correlation coefficient p (rho) or more formally as the Pearson correlation coefficient.

The sample correlation coefficient is r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the correlation coefficient measure?

A

It measures both the strength and direction of a linear relationship between 2 continuous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the square of the correlation coefficient?

A

r^2 is referred to as the coefficient of determination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When are two variables x and y positively associated ?

A

when above-average values of one variable tend to go with above-average values of the other; in this case r will be positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When are two variables x and y negatively associated ?

A

when above-average values of one variable tend to go with below-average values of the other; in this case r will be negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For the correlation coefficient, what is the null and alternative hypothesis ?

A

Null hypothesis: p = 0 (There is no linear relationship

Alternative hypothesis: p does not = 0 (Height and weight are linearly related)

**We can evaluate the correlation, using alpha = 0.05 with degrees of freedom n-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Correlation makes no distinction between ______ and ______ variables.

A

explanatory and response

19
Q

Correlation requires both variables to be _____

20
Q

Correlation does not depend on the scale of ______ used

A

measurement

21
Q

r > 0 indicates ?

A

a positive relationship between the variables

22
Q

r < 0 indicates ?

A

a negative relationship between the variables

23
Q

Correlation is always between ?

24
Q

Correlation measures the strength of linear relations and doesn’t apply to ____ relations

25
Like the mean and SD, the correlation is strongly affected by ______
outliers
26
correlation coefficient can become a ____-tailed test
one
27
Correlation coefficient test" | Degrees of freedom ?
n-2
28
Compare regression to correlation
With correlation, we capture the degree to which to variable co-vary, there is no question about casual relationships With regression, we introduce the concept of dependent and independent variables
29
What is the task in regression?
to find some meaningful way of describing the essence of a dependent relationship
30
Regression extends the concept of ______ by summarizing the relationship between 2 variables with a straight line that best describes their relationship. This line is termed the regression line.
correlation
31
dependent variable is always presented as the __-variable and is always plotted on the _____ axis of the regression plot
y, vertical
32
The independent variable __ is plotted on the _______ axis
x, horizontal
33
To describe the regression line we need to know two values, what are they?
1) The slope "b". - This value has clear practical interpretation, such as the average increased response per increase in dose. 2) The y-intercept "a". - This may have a practical interpretation (such as HR at rest); in some cases it may not ex. the weight of all individual with zero height ?????
34
what is Y-claret ?
The predicted value.
35
The fit of the predicted value can be summarized by ?
the difference between the value y observed, and the value of y (y-claret) predicted by the regression line. The difference is called the residual value!! y - y(claret) = observed - predicted
36
What is the formula for SSE?
sum of square of the error SSE = sum of [(y-y^) squared]
37
When you find the y intercept, what are you testing for? (slide 50)
solving for b testing to see if it's a negative or positive relationship
38
Having obtained the regression line, we need to establish that it's predictive value has not arisen by _____ alone.
chance
39
If b = 0, what does that mean?
there is no correlation between x and y (LIKE AT ALL)
40
For regression, what is the null and and alternative hypothesis ? *this test is similar to ANOVA
Null: B = 0 Alternative: B does not = 0 *We still use F test, just like ANOVA
41
We find SST which is ?
SSR + SSE
42
df = ?
(SSR) Explained: df = 1 (SSE) Error: df = n-2 (SST) Total: df = n-1
43
How do we calculate r^2?
r^2 = SSR (explained) / SST (total)