ENGDAT2 Flashcards

1
Q

Type of graph that can be used to show the relationship between two variables

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Used to measure the strength of the association (linear relationship) between two variables.

A

Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable we wish to explain or predict

A

Dependent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variable used to predict or explain the dependent variable

A

Independent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Equation of Simple Linear Regression Model

A

Y = A + BX or Yi = b0 + b1Xi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Differentiate SST, SSR, and SSE.

A

SST (Total Sum of Squares) = Total Variation
SSR (Regression Sum of Squares) = Explained Variation/Outcome
SSE (Error Sum of Squares) = Unexplained Variation/Outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Portion of the total variation in the dependent variable that is explained by variation in the independent variable

A

Coefficient of Determination (r^2) wherein 0 <= r^2 <= 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If r^2 = 1, what does it imply?

A

Perfect linear relationship between X and Y. 100% of the variation in Y is explained by the variation in X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

if r^2 = 0, what does it imply?

A

No linear relationship between X and Y. Value of Y does not depend on X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard deviation of the variation of observations around the regression line

A

Standard Error of Estimate (S)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Assumptions of Regression L.I.N.E

A

Linearity - Relationship between X and Y is linear
Independence of Errors - Error values are statistically independent
Normality of Error - Error values are normally distributed for any given value of X
Equal Variance - the probability distribution of the errors has constant variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Difference between observed and predicted value

A

Residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Process that checks assumptions of regression by examining residuals

A

Residual Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Exists if residuals in one time period are related to residuals in another period. Violates regression assumption that residuals are random and independent.

A

Autocorrelation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Test Statistic used to test for autocorrelation

A

Durbin-Watson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Original hypothesis that motivates the experiment

A

Conjecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Test performed to investigate the conjecture

A

Experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Statistical analysis of the data from the experiment

A

Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What has been learned about the original conjecture from the experiment

A

Conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sequence of Activities for every experiment

A
  1. Conjecture
  2. Experiment
  3. Analysis
  4. Conclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Level of the factor

A

Treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Each treatment can have multiple observations or ______ .

A

Replicates

23
Q

A value of correlation coefficient r that is near ______ will have a residuals of predicted vs. actual values that is not normally distributed.

A

Zero

24
Q

Experimental design wherein observations are taken in random order and that the environment in which the treatments are used is as uniform as possible.

A

Complete Randomized Design (CRD)

25
Q

Which table is it possible to derive the F-statistic value if the total sum of squares, the sum of squares between groups, and degrees of freedom are known

A

ANOVA

26
Q

True or False. Blocking in experimental design allows ANOVA to detect especially small differences in treatment means.

A

True

27
Q

True or False. When a scatter plot shows all y values lined at a constant value, then strong correlation is indicated.

A

False

28
Q

True or False. The number of levels in One-Way ANOVA is always synonymous with the number of groups in Completely Randomized Experiments.

A

True

29
Q

Penalizes excessive use of unimportant independent variables. Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used.

A

Adjusted r^2

30
Q

If the slope b1=0, what does it mean?

A

No linear relationship

31
Q

The task is to find the best fit regression model for Y for candidate variables X1, X2, X3, X4, X5, and X6. If the p-values are 0.993, 0.329, 0.920, 0.017, 0.691, and 0.006, what should you do?

A

Remove X1 since it has the highest P-value. Incorporate a model from X2 to X6. Repeat process until all predictor variables have a P value that is significant.

32
Q

True or false. The larger the variance inflation factor (VIF), the less severe the multicollinearity

A

False. (More severe)

33
Q

Estimated average value of Y when the value of X is zero

A

b0

34
Q

Estimated change in the average value of Y as a result of a one unit change in X

A

b1

35
Q

Statistical formula used to compare variances across the means (or average) of different groups

A

Analysis of Variance (ANOVA)

36
Q

True or False. R-squared never decreases when a new X variable is added to the model.

A

True

37
Q

Number of independent values that a statistical analysis can measure.

A

Degrees of Freedom

38
Q

These are obtained by finding the values that minimizes the sum of the squared differences between observed Y and the predicted value of Y.

A

b0 and b1

39
Q

Used in the context of the analysis of variance, when the F-ratio suggests rejection of the null hypothesis, that is, when the difference between the population means is significant.

A

Least Squares Method

40
Q

Test or a series of tests

A

Experiment

41
Q

The ______ of an experiment plays a major role in the eventual solution of the problem.

A

Design

42
Q

In a ______________________ , experimental trials (or runs) are performed at all combinations of the factor levels.

A

Factorial Experimental Design

43
Q

A collection of mathematical and statistical techniques that are useful for modeling and analysis in applications where a response of interest is influenced by several variables and the objective is to optimize this response.

A

Response Surface Methodology (RSM)

44
Q

Enumerate the four types of Multivariate Analysis

A
  1. Analysis of Data Structure
  2. Internal Consistency
  3. Grouping Observation
  4. Correspondence Analysis
45
Q

Statistical study of experiments in which multiple measurements are made on each experimental unit and for which the relationship among multivariate measurements and their structure are important to the experiment’s understanding.

A

Multivariate Analysis

46
Q

b0 is denoted as the _________ while b1 is denoted as the __________ .

A

Population Y-intercept; Population slope coefficient

47
Q

Used to identify a smaller number of uncorrelated variables from a large set of data. A technique that transforms high-dimensions data into lower-dimensions while retaining as much information as possible

A

Principal Component Analysis

48
Q

A statistical technique that reduces a set of variables by extracting all their commonalities into a smaller number of factors. Evaluates the correlations between variables

A

Factor Analysis

49
Q

An assessment of how reliably survey or test items that are designed to measure the
same construct actually do so

A

Item Analysis

50
Q

A __________ is an underlying theme, characteristic, or skill such as reading
comprehension or customer satisfaction

A

Construct

51
Q

Used to join observations that share common characteristics into groups. This analysis
is appropriate when you do not have any initial information about how to form the groups. Analyzed through a row view.

A

Cluster Observation

52
Q

Used to join observations that share common characteristics into groups. This analysis
is appropriate when you do not have any initial information about how to form the groups. Analyzed through a column view.

A

Cluster Variable

53
Q

Used to join observations that share common characteristics into groups. This method is appropriate when you have sufficient information to make good starting cluster designations for the clusters

A

Cluster K-Means

54
Q

Used to classify observations into two or more groups when you have a sample with known groups

A

Discriminant Analysis