Biostat | Prefinal - Regression Analysis Flashcards

1
Q

A graph that shows the relationship between the 2 variables.

A

Scatter Plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Also called a Regression Line is a straight line that best represents the data on a scatter plot.

A

LINE OF BEST FIT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

REGRESSION EQUATION:

A

Y = bx + a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The single variable being explained by the regression model - criterion

A

DEPENDENT VARIABLE (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The explanatory variables used to predict the dependent variables - predictors

A

INDEPENDENT VARIABLE (X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The values computed by the regression tool: reflecting explanation to dependent variable relationship

A

COEFFICIENTS (b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The portion of the dependent variable that isn’t explained by the model.

A

RESIDUALS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

METHODS
Linear Regression

A

> Straight-line relationship
Form: y=mx+b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

METHODS OF REGRESSION ANALYSIS:
> Straight-line relationship
> Form: y=mx+b

A

Linear Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

METHODS OF REGRESSION ANALYSIS:
> Implies curved relationship
> Logarithmic relationships

A

Non-Linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

METHODS OF REGRESSION ANALYSIS:
> data gathered from the same time period

A

Cross-Sectional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

METHODS OF REGRESSION ANALYSIS:
> Involves data observed over equally spaced points in time.

A

Time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

> Only one dependent variable, x
Relationship between x and y is described by a linear function.
Changes in y are assumed to be caused by changes in x.

A

SIMPLE LINEAR REGRESSION MODEL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regression variability that is explained by the relationship b/w X and Y

A

SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Unexplained variability, due to factors than the regression

A

SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CORRELATION COEFFICIENT:
> the strength of the relationship between X and Y variables

A

r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Total variability about the mean

A

SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

CORRELATION OF DETERMINATION:
> Proportion of explained variation

A

r Square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

SD of error around the regression line

A

Standard Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Significance of the Regression Model

A

TEST FOR LINEARITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variation of Model

A

Variation of Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Errors may be positive or negative.

A

VARIABILITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q
  • Measures the total variable in Y
A

Sum of Squares Total (SST)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

– Less than SST bcoz the regression line reduced the variability

A

Sum of Squared Error (SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q
  • Indicated how much of the total variability is explained by the regression model.
A

Sum of Squared due to Regression (SSR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The proportion of the variability in Y is explained by the regression equation.

A

COEFFICIENT OF DETERMINATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

TEST FOR LINEARITY:
If the significance level for the F test is low,,,

A

reject the null hypothesis and conclude there is a linear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

An F test is used to statistically test the null hypothesis that there is no linear relationship between the X and Y variables.

A

TEST FOR LINEARITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The mean squared error (MSE) is the estimate of the error variance of the regression equation

S^2 = MSE = SSE
n - k - 1

A

STANDARD ERROR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

ASSUMPTIONS OF THE REGRESSION MODEL

A

Errors are independent
Errors are normally distributed
Errors have a mean of zero
Errors have a constant variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Special variables that are created for qualitative data
The number of dummy variables must equal to 1 less than the number of categories of the qualitative variable.

A

BINARY/DUMMY VARIABLES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Takes into account the number of independent variables in the model.

A

ADJUSTED R-SQUARE

31
Q

Occurs when two or more predictor variables are highly correlated to each other.

A

MULTICOLLINEARITY

31
Q

Exists when an independent variable is correlated with another independent variable.

A

MULTICOLLINEARITY

32
Q

Creates problems in the coefficients because duplication of information may occur.

A

MULTICOLLINEARITY

33
Q

A metric to detect multicollinearity that measures the correlation and strength of correlation between the predictor variables in a regression model.

A

Variance inflation factor (VIF)

34
Q

Variance inflation factor (VIF)
no correlation between a given predictor

A

1

35
Q

Variance inflation factor (VIF)
moderate correlation between a given predictor variable

A

1-5

36
Q

Variance inflation factor (VIF)
potentially severe correlation between a given predictor variable

A

> 5

37
Q

Form of regression that allows the prediction of discrete variables by a mix of continuous and discrete predictors.

A

LOGISTICS REGRESSION

38
Q

graphical representation of the relation between two or more variables.

Types of Logistics Regression:
- Used when the dependent variable is dichotomous

A

Binary Logistic Regression

39
Q

Types of Logistics Regression:
- Used when the dependent variable has more than two categories

A

Multinomial Logistic Regression

40
Q

WHEN TO USE LOGISTIC REGRESSION?

A

> When the dependent variable has only two levels (yes/no, male/female, taken/not taken)
If multivariate normality is suspected
If we don’t have linearity

41
Q

Assumptions in Logistic Regression

A

No assumptions about the distributions of the predictor variables
Predictors do not have to be normally distributed
Does not have to be linearly related
Does not have to have equal variance within each group
There should be a minimum of 20 cases per predictor, with a minimum of 60 total cases.

42
Q

captures how one variable is different from its mean as the other variable is different from its mean.
between two random variables is a statistical measure of the degree to which the two variables move together.

A

covariance

43
Q

A measure of the strength of the relationship between or among variables.

A

correlation coefficient

43
Q

A positive covariance indicates that….

A

the variables tend to move together; a negative covariance indicates that the variables tend to move in opposite directions.

44
Q

an extreme value of a variable

A

OUTLIER

45
Q

The appearance of a relationship when in fact there is no relation.

A

Spurious correlation

45
Q

is the analysis of the relation between one variable and some other variable(s), assuming a linear relation. Also referred to as least squares regression and ordinary least squares (OLS).

A

REGRESSION ANALYSIS

46
Q

The purpose is to explain the variation in a variable (that is, how a variable differs from it’s mean value) using the variation in one or more other variables.

A

REGRESSION ANALYSIS

46
Q

is the variable whose variation is used to explain that of the dependent variable. Also referred to as the explanatory variable, the exogenous variable, or the predicting variable.

A

INDEPENDENT VARIABLE

47
Q

is the variable whose variation is being explained by the other variable(s). Also referred to as the explained variable, the endogenous variable, or the predicted variable.

A

DEPENDENT VARIABLE

48
Q

exists between dependent and independent variable.

A

Linear Relationship

49
Q

What is the expected value of the disturbance term ?

A

zero

50
Q

disturbance terms:

A

homoskedastistic.

51
Q

is the percentage of variation in the dependent variable (variation of Yi’s or the sum of squares total, SST) explained by the independent variable(s).

A

coefficient of determination

52
Q

It is the range of regression coefficient values for a given value estimate of the coefficient and a given level of probability.

A

confidence interval

53
Q

is the square root of the ratio of the variance of the regression to the variation in the independent variable

A

standard error

54
Q

using regression involves making predictions about the dependent variable based on average relationships observed in the estimated regression.

A

forecasting

55
Q

A regression analysis with more than one independent variable.

A

Multiple regression

56
Q

has the same interpretation as it did under the simple linear case – the intercept is the value of the dependent variable when all independent variables are equal zero.

A

intercept

57
Q

are values of the dependent variable based on the estimated regression coefficients and a prediction about the values of the independent variables.

A

Predicted values

58
Q

is a measure of how well a set of independent variables, as a group, explain the variation in the dependent variable.

A

F-statisctics

59
Q

is the percentage of variation in the dependent variable explained by the independent variables.

A

coefficient of determination,

60
Q

are qualitative variables that take on a value of zero or one.

A

Dummy variables

61
Q

The situation in which the variance of the residuals is not constant across all observations.

A

Heteroskedasticity

62
Q

is the situation in which the residual terms are correlated with one another. This occurs frequently in time-series analysis.

A

Autocorrelation

63
Q

The residuals are independently distributed,,,

A

the residual or disturbance for one observation is not correlated with that of another observation. [A violation of this is referred to as autocorrelation.]

64
Q

If last year‟s earnings were high, this means that this year‟s earnings may have a greater probability of being high than being low. This is an example of?

A

positive autocorrelation

65
Q

When a good year is always followed by a bad year, this is,,,, .

A

a negative autocorrelation

66
Q

is the problem of high correlation between or among two or more independent variables.

A

Multicollinearity

67
Q

Form of regression that allows the prediction of discrete variables by a mix of continuous and discrete predictors.

A

LOGISTIC REGRESSION

68
Q

TYPES OF LOGISTIC REGRESSION
- used when the dependent variable is dichotomous

A

Binary Logistic Regression

69
Q

TYPES OF LOGISTIC REGRESSION
- It is used when the dependent or outcomes variable has more than two categories

A

Multinomial Logistics Regression

70
Q

WHEN TO USE LOGISTIC REGRESSION?

A
  • When the dependent variable is non parametric and we don’t have homoscedasticity (variance of dependent variable and independent variable is not equal).
  • Used when the dependent variable has only two levels. (Yes/No, Male/Female, Taken/Not Taken)
  • If multivariate normality is suspected
  • If we don’t have linearity.
71
Q

are the number of independent pieces of information that are used to estimate the regression parameters.

A

DEGREES OF FREEDOM

72
Q

is the square root of the ratio of the variance of the regression to the variation in the independent variable

A

Standard Error