Chapter 1 Flashcards

0
Q

What is a functional relation?

A

X-Independent variable and Y-Dependent variable then the functional relation is Y=F(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What is Regression Analysis?

A

Stat methodology that utilizes the relation between two or more quantitative variables so that a response/outcome variable can be predicted from the others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Statistical Relation?

A

Not exact like a functional relation. In example, book used scatter plot to show a linear trend…there is variation in the points on the graph. Relation could be curvilinear (not linear).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a regression model?

A

formal means of expressing the two essential ingredients of a statistical relation:

  1. A tendency of the response variable Y to vary with the predictor variable X in a systematic fashion
  2. A scattering of points around the curve of statistical relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 characteristics embodied in a regression model by postulating that:

A
  1. There is a probability distribution of Y for each level of X
  2. The means of these probability distributions vary in some systematic fashion with X.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a major consideration in the selection of Predictor Variables during the construction of regression models?

A

The extent to which a chosen variables contributes to reducing the remaining variation in Y after allowance is made for the contributions of other predictor variables that have tentatively been included in the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List some other considerations in selecting Predictor Variables.

A
  1. The importance of the variable as a casual agent in the process under analysis
  2. The degree to which observations on the variable can be attained more accurately, or quickly, or economically than on competing variables
  3. The degree to which the variable can be controlled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the scope of a model determined?

A

Either by the design of the investigation or by the range of data at hand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 major purposes of Regression Analysis?

A
  1. Description
  2. Control
  3. Prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F: No matter how strong the statistical relation between X and Y, there is a cause-effect pattern implied by the regression model.

A

F: No matter how strong the statistical relation between X and Y, no cause-and effect pattern is necessarily implied by the regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define the basic regression model:

A

Y_i = B_0 + B_1X_i + e_1
note: Y_i is the value of the response variable in the i-th trial, B_0 and B_1 are parameters (Beta), X_i is a known constant, namely, the value of the predictor variable in the i-th trial, e_i is a random error term with mean E[e_i]=0 and variance V[e_i]=(simga)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is this model: Y_i=B_0+B_1X_i+e_i called a first order model?

A

The regression model is said to be simple, linear in the parameters, and linear in the predictor variable. Simple because there is only one predictor variable, linear because no parameter is squared, multiplied or divided by another parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

E[Y_i] where Y_i = simple regression model

A

E[Y_i] = E[B_0+B_1X_i+e_i] = B_0 + B_1X_i + E[e_i] = B_0 + B_1X_i
because E[e_i]=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

V[Y_i] =

A

(sigma)^2

because the error term e_i has constant variance, (sigma)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: Since the error terms, say e_i and e_j, in a regression model are assumed to be uncorrelated then Y_i and Y_j (any two responses) are also uncorrelated.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the regression coefficients in a simple regression model?

A

B_0, B_1 are the parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is B_1 in the simple regression model?

A

Slope of the regression line. It is the change in the mean of the probability distribution of Y per unit increase in X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the parameter B_0 in the simple regression model?

A

Y intercept of the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens when the scope of the simple regression model includes X = 0?

A

B_0 gives the mean of the probability distribution of Y at X = 0.

19
Q

What does B_0 mean when the scope of the model does not cover X = 0?

A

B_0 does not have any particular meaning as a separate term in the regression model.

20
Q

How is observational data obtain?

A

Observational data is obtained from non-experimental studies. Theses studies do not control the explanatory or the predictor variable(s) of interest.

21
Q

What is one major limitation of observational data?

A

They often do not provide adequate information about cause and effect relationships.

22
Q

When control over the explanatory variable(s) is exercised through random assignments the resulting experimental data provide much stronger information about cause and effect relationships than observational data. Why?

A

The reason is that randomization tends to balance out of any other variables that might effect the response variable.

23
Q

T/F: Control over the explanatory variable(s) consists of assigning a treatment to each of the experimental units by means of randomization.

A

T

24
Q

Describe Completely Randomized Design.

A

Makes randomized assignments of treatments to experimental units (or vice versa). Useful when experimental units are homogeneous. The design is flexible because it accommodates any number of treatments and permits different sample sizes for different treatments.

25
Q

What is the Method of Least Squares?

A

A way to find “good” estimators of the regression parameters B_0 and B_1.

26
Q

How do you use the Method of Least Squares?

A

For the observations (X_i,Y_i) for each case, the MLS considers the deviation of Y_i from its expected value: Y_i - (B_o +B_1X_i).
The MLS requires that we consider the sum of the n squared deviations: Q = (sum from i=1 to n) (Y_i - B_0 - B_1X_i)^2

27
Q

What is another way to describe the Method of Least Squares?

A

The estimators of B_0 and B_1 are those b_0 and b_1 that minimize the criterion Q for the given sample observations (X_1,Y_1), …,()X_n,Y_n)

28
Q

What are the point estimators of B_0 and B_1?

A

b_0 and b_1 (page 17 of applied linear models)

29
Q

What is the Gauss-Markov Theorem?

A

Under the conditions of the simple regression model, the least squares estimators b_0 and b_1 are unbiased and have a minimum variance among all unbiased linear estimators (b_0 and b_1 sampling distributions are less variable than any other estimators).
E[b_0]=B_0 and E[b_1]=B_1

30
Q

What is the estimate of the regression function?

A

Y(hat) = b_0 + b_1X

Y hat is the value of the estimated regression function at the level X of the predictor variable.

31
Q

What is the difference between the response and the mean response?

A

We call a value of the response variable a response and E[Y] the mean response. Mean response is the mean of the probability distribution of Y corresponding to the level X of the predictor variable. Y hat is a point estimator of the mean response when the level of the predictor variable is X. Y hat is an unbiased estimator of E[Y], with minimum variance in the class of unbiased estimators (by Markov Theorem).

32
Q

Define residual.

A

The i-th residual is the difference between the observed value Y_i and the corresponding fitted value Y_i hat. e_i = Y_i - Y_i hat
e_i = Y_i - (b_0 +b_1X_i)
The magnitude of a residual is represented by the vertical deviation of the Y_i observation from the corresponding point on the estimated regression function.

33
Q

Distinguish between the model error term value epsilon = Y_i - E[Y_i] and the residual e_i = Y_i - Y_i hat.

A

The model error term is the vertical deviation from the unknown true regression line and hence is unknown. The residual is the vertical deviation of Y_i from the fitted value Y_i hat on the estimated line and is known. Residuals are highly useful for studying whether a given regression model is appropriate for the data at hand.

34
Q

What is the sum of the residuals?

A

0

35
Q

What is the sum of the squared residuals?

A

a minimum

36
Q

What does the sum of the observed values Y_i equal?

A

The sum of the fitted values, Y_i hat.

37
Q

What is the sum of the weighted residuals when the residual in the i-th trial is weighted by the level of the predictor variable in the i-th trial.

A

Sum from i=1 to n of X_ie_i = 0

38
Q

What point does the regression line always go through?

A

(X(bar), Y(bar))

39
Q

T/F: The sum of the weighted residuals is 1 when the residual in the i-th trial is weighted by the fitted value of the response variable for the ith trial.

A

F: Sum from i=1 to n of Yhat_i e_i =0

40
Q

Why does the variance, (sigma)^2, of the error terms in a simple regression model need to be estimated?

A

To obtain an indication of the variability of the probability distributions of Y.

41
Q

What is the variance (sigma)^2 of a single population estimated by?

A

s^2, the sample variance. The sample variance is also known as the mean square.

42
Q

T/F: s^2, the sample variance, is an unbiased estimator of the the variance (sigma)^2 of an infinite population.

A

True

43
Q

What is E[MSE], where MSE is the error mean square/residual mean square

A

(sigma)^2 because MSE is an unbiased estimator of (sigma)^2 for the regression model.

44
Q

Describe the Method of Maximum likelihood.

A

This method chooses as estimates those values of the parameters that are most consistent with the sample data. Way to obtain the estimators of the parameters B_0, B_1, and (sigma)^2. The method of maximum likelihood chooses the maximum likelihood estimate that value of mu for which the likelihood value is largest.