Lecture 9 - Correlation & Regression Flashcards

1
Q

In a binary relationship, what are x and y ?

A
x = explanatory variable
y = response variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give an example of a response variable that you cannot put on y axis bc it is dichotomous (yes/no) ?

A
x = smoking
y = lung cancer

*can’t use lung cancer bc it’s either yes or no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A scatter plot can only be used when both variables are _______

A

numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you describe the overall pattern of a scatterplot ?

A

by the form, direction, and strength of the relationship between the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the form and direction of the relationship

A
  • Linear relationships, where the points roughly follow a straight line, are especially important.
  • Relationships can be negative (A) or positive (B) in direction.
  • Curvilinear relationships (C) and clusters (D) are other things to watch for.
  • An important kind of deviation is an OUTLIER (E) where an individual value that falls outside the overall pattern
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The strength of a relationship is determined by ??

A

how close the points in the scatter plot lie to a simple form such as a line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We use the _____ to evaluate the variability for univariate data; where n is the sample size

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For bivariate data, we use the _______ (the variance in x & y ) where n is the number of x,y pairs.

A

covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The covariance is limited as ?

A

a tool for measuring and describing relationships because it’s composite units are difficult to translate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standardized values of variance have ____ units.

A

no (they are the same whether x is measured in cm or mg)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do standardized values express?

A

Their deviations from the mean in terms of their s, and avoid the problem of trying to interpret unit-dependent covariance units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the correlation coefficient ?

A

The strength of a relationship is quantified by the standardized covariance of the two continuous measures, which is termed the correlation coefficient p (rho) or more formally as the Pearson correlation coefficient.

The sample correlation coefficient is r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the correlation coefficient measure?

A

It measures both the strength and direction of a linear relationship between 2 continuous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the square of the correlation coefficient?

A

r^2 is referred to as the coefficient of determination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When are two variables x and y positively associated ?

A

when above-average values of one variable tend to go with above-average values of the other; in this case r will be positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When are two variables x and y negatively associated ?

A

when above-average values of one variable tend to go with below-average values of the other; in this case r will be negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For the correlation coefficient, what is the null and alternative hypothesis ?

A

Null hypothesis: p = 0 (There is no linear relationship

Alternative hypothesis: p does not = 0 (Height and weight are linearly related)

**We can evaluate the correlation, using alpha = 0.05 with degrees of freedom n-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Correlation makes no distinction between ______ and ______ variables.

A

explanatory and response

19
Q

Correlation requires both variables to be _____

A

numerical

20
Q

Correlation does not depend on the scale of ______ used

A

measurement

21
Q

r > 0 indicates ?

A

a positive relationship between the variables

22
Q

r < 0 indicates ?

A

a negative relationship between the variables

23
Q

Correlation is always between ?

A

-1 and 1

24
Q

Correlation measures the strength of linear relations and doesn’t apply to ____ relations

A

curved

25
Q

Like the mean and SD, the correlation is strongly affected by ______

A

outliers

26
Q

correlation coefficient can become a ____-tailed test

A

one

27
Q

Correlation coefficient test”

Degrees of freedom ?

A

n-2

28
Q

Compare regression to correlation

A

With correlation, we capture the degree to which to variable co-vary, there is no question about casual relationships

With regression, we introduce the concept of dependent and independent variables

29
Q

What is the task in regression?

A

to find some meaningful way of describing the essence of a dependent relationship

30
Q

Regression extends the concept of ______ by summarizing the relationship between 2 variables with a straight line that best describes their relationship. This line is termed the regression line.

A

correlation

31
Q

dependent variable is always presented as the __-variable and is always plotted on the _____ axis of the regression plot

A

y, vertical

32
Q

The independent variable __ is plotted on the _______ axis

A

x, horizontal

33
Q

To describe the regression line we need to know two values, what are they?

A

1) The slope “b”.
- This value has clear practical interpretation, such as the average increased response per increase in dose.

2) The y-intercept “a”.
- This may have a practical interpretation (such as HR at rest); in some cases it may not ex. the weight of all individual with zero height ?????

34
Q

what is Y-claret ?

A

The predicted value.

35
Q

The fit of the predicted value can be summarized by ?

A

the difference between the value y observed, and the value of y (y-claret) predicted by the regression line.

The difference is called the residual value!!

y - y(claret) = observed - predicted

36
Q

What is the formula for SSE?

A

sum of square of the error

SSE = sum of [(y-y^) squared]

37
Q

When you find the y intercept, what are you testing for? (slide 50)

A

solving for b

testing to see if it’s a negative or positive relationship

38
Q

Having obtained the regression line, we need to establish that it’s predictive value has not arisen by _____ alone.

A

chance

39
Q

If b = 0, what does that mean?

A

there is no correlation between x and y (LIKE AT ALL)

40
Q

For regression, what is the null and and alternative hypothesis ?

*this test is similar to ANOVA

A

Null: B = 0

Alternative: B does not = 0

*We still use F test, just like ANOVA

41
Q

We find SST which is ?

A

SSR + SSE

42
Q

df = ?

A

(SSR) Explained: df = 1
(SSE) Error: df = n-2
(SST) Total: df = n-1

43
Q

How do we calculate r^2?

A

r^2 = SSR (explained) / SST (total)