ENGDAT2 Flashcards by Jacob Austin Chua

Type of graph that can be used to show the relationship between two variables

Scatterplot

How well did you know this?

Not at all

Perfectly

Used to measure the strength of the association (linear relationship) between two variables.

Correlation

How well did you know this?

Not at all

Perfectly

Variable we wish to explain or predict

Dependent Variable

How well did you know this?

Not at all

Perfectly

Variable used to predict or explain the dependent variable

Independent Variable

How well did you know this?

Not at all

Perfectly

Equation of Simple Linear Regression Model

Y = A + BX or Yi = b0 + b1Xi

How well did you know this?

Not at all

Perfectly

Differentiate SST, SSR, and SSE.

SST (Total Sum of Squares) = Total Variation
SSR (Regression Sum of Squares) = Explained Variation/Outcome
SSE (Error Sum of Squares) = Unexplained Variation/Outcome

How well did you know this?

Not at all

Perfectly

Portion of the total variation in the dependent variable that is explained by variation in the independent variable

Coefficient of Determination (r^2) wherein 0 <= r^2 <= 1

How well did you know this?

Not at all

Perfectly

If r^2 = 1, what does it imply?

Perfect linear relationship between X and Y. 100% of the variation in Y is explained by the variation in X.

How well did you know this?

Not at all

Perfectly

if r^2 = 0, what does it imply?

No linear relationship between X and Y. Value of Y does not depend on X.

How well did you know this?

Not at all

Perfectly

Standard deviation of the variation of observations around the regression line

Standard Error of Estimate (S)

How well did you know this?

Not at all

Perfectly

Assumptions of Regression L.I.N.E

Linearity - Relationship between X and Y is linear
Independence of Errors - Error values are statistically independent
Normality of Error - Error values are normally distributed for any given value of X
Equal Variance - the probability distribution of the errors has constant variance.

How well did you know this?

Not at all

Perfectly

Difference between observed and predicted value

Residual

How well did you know this?

Not at all

Perfectly

Process that checks assumptions of regression by examining residuals

Residual Analysis

How well did you know this?

Not at all

Perfectly

Exists if residuals in one time period are related to residuals in another period. Violates regression assumption that residuals are random and independent.

Autocorrelation

How well did you know this?

Not at all

Perfectly

Test Statistic used to test for autocorrelation

Durbin-Watson

How well did you know this?

Not at all

Perfectly

Original hypothesis that motivates the experiment

Conjecture

How well did you know this?

Not at all

Perfectly

Test performed to investigate the conjecture

Experiment

How well did you know this?

Not at all

Perfectly

Statistical analysis of the data from the experiment

Analysis

How well did you know this?

Not at all

Perfectly

What has been learned about the original conjecture from the experiment

Conclusion

How well did you know this?

Not at all

Perfectly

Sequence of Activities for every experiment

Conjecture
Experiment
Analysis
Conclusion

How well did you know this?

Not at all

Perfectly

Level of the factor

Treatments

How well did you know this?

Not at all

Perfectly

Each treatment can have multiple observations or ______ .

Study These Flashcards

Replicates

A value of correlation coefficient r that is near ______ will have a residuals of predicted vs. actual values that is not normally distributed.

Study These Flashcards

Zero

Experimental design wherein observations are taken in random order and that the environment in which the treatments are used is as uniform as possible.

Study These Flashcards

Complete Randomized Design (CRD)

Which table is it possible to derive the F-statistic value if the total sum of squares, the sum of squares between groups, and degrees of freedom are known

ANOVA

True or False. Blocking in experimental design allows ANOVA to detect especially small differences in treatment means.

True

True or False. When a scatter plot shows all y values lined at a constant value, then strong correlation is indicated.

False

True or False. The number of levels in One-Way ANOVA is always synonymous with the number of groups in Completely Randomized Experiments.

True

Penalizes excessive use of unimportant independent variables. Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used.

Adjusted r^2

If the slope b1=0, what does it mean?

No linear relationship

The task is to find the best fit regression model for Y for candidate variables X1, X2, X3, X4, X5, and X6. If the p-values are 0.993, 0.329, 0.920, 0.017, 0.691, and 0.006, what should you do?

Remove X1 since it has the highest P-value. Incorporate a model from X2 to X6. Repeat process until all predictor variables have a P value that is significant.

True or false. The larger the variance inflation factor (VIF), the less severe the multicollinearity

False. (More severe)

Estimated average value of Y when the value of X is zero

Estimated change in the average value of Y as a result of a one unit change in X

Statistical formula used to compare variances across the means (or average) of different groups

Analysis of Variance (ANOVA)

True or False. R-squared never decreases when a new X variable is added to the model.

True

Number of independent values that a statistical analysis can measure.

Degrees of Freedom

These are obtained by finding the values that minimizes the sum of the squared differences between observed Y and the predicted value of Y.

b0 and b1

Used in the context of the analysis of variance, when the F-ratio suggests rejection of the null hypothesis, that is, when the difference between the population means is significant.

Least Squares Method

Test or a series of tests

Experiment

The ______ of an experiment plays a major role in the eventual solution of the problem.

Design

In a ______________________ , experimental trials (or runs) are performed at all combinations of the factor levels.

Factorial Experimental Design

A collection of mathematical and statistical techniques that are useful for modeling and analysis in applications where a response of interest is influenced by several variables and the objective is to optimize this response.

Response Surface Methodology (RSM)

Enumerate the four types of Multivariate Analysis

1. Analysis of Data Structure 2. Internal Consistency 3. Grouping Observation 4. Correspondence Analysis

Statistical study of experiments in which multiple measurements are made on each experimental unit and for which the relationship among multivariate measurements and their structure are important to the experiment's understanding.

Multivariate Analysis

b0 is denoted as the _________ while b1 is denoted as the __________ .

Population Y-intercept; Population slope coefficient

Used to identify a smaller number of uncorrelated variables from a large set of data. A technique that transforms high-dimensions data into lower-dimensions while retaining as much information as possible

Principal Component Analysis

A statistical technique that reduces a set of variables by extracting all their commonalities into a smaller number of factors. Evaluates the correlations between variables

Factor Analysis

An assessment of how reliably survey or test items that are designed to measure the same construct actually do so

Item Analysis

A __________ is an underlying theme, characteristic, or skill such as reading comprehension or customer satisfaction

Construct

Used to join observations that share common characteristics into groups. This analysis is appropriate when you do not have any initial information about how to form the groups. Analyzed through a row view.

Cluster Observation

Cluster Variable

Used to join observations that share common characteristics into groups. This method is appropriate when you have sufficient information to make good starting cluster designations for the clusters

Cluster K-Means

Used to classify observations into two or more groups when you have a sample with known groups

Discriminant Analysis

ENGDAT2 Flashcards

(54 cards)