ENGDAT2 Flashcards
Type of graph that can be used to show the relationship between two variables
Scatterplot
Used to measure the strength of the association (linear relationship) between two variables.
Correlation
Variable we wish to explain or predict
Dependent Variable
Variable used to predict or explain the dependent variable
Independent Variable
Equation of Simple Linear Regression Model
Y = A + BX or Yi = b0 + b1Xi
Differentiate SST, SSR, and SSE.
SST (Total Sum of Squares) = Total Variation
SSR (Regression Sum of Squares) = Explained Variation/Outcome
SSE (Error Sum of Squares) = Unexplained Variation/Outcome
Portion of the total variation in the dependent variable that is explained by variation in the independent variable
Coefficient of Determination (r^2) wherein 0 <= r^2 <= 1
If r^2 = 1, what does it imply?
Perfect linear relationship between X and Y. 100% of the variation in Y is explained by the variation in X.
if r^2 = 0, what does it imply?
No linear relationship between X and Y. Value of Y does not depend on X.
Standard deviation of the variation of observations around the regression line
Standard Error of Estimate (S)
Assumptions of Regression L.I.N.E
Linearity - Relationship between X and Y is linear
Independence of Errors - Error values are statistically independent
Normality of Error - Error values are normally distributed for any given value of X
Equal Variance - the probability distribution of the errors has constant variance.
Difference between observed and predicted value
Residual
Process that checks assumptions of regression by examining residuals
Residual Analysis
Exists if residuals in one time period are related to residuals in another period. Violates regression assumption that residuals are random and independent.
Autocorrelation
Test Statistic used to test for autocorrelation
Durbin-Watson
Original hypothesis that motivates the experiment
Conjecture
Test performed to investigate the conjecture
Experiment
Statistical analysis of the data from the experiment
Analysis
What has been learned about the original conjecture from the experiment
Conclusion
Sequence of Activities for every experiment
- Conjecture
- Experiment
- Analysis
- Conclusion
Level of the factor
Treatments
Each treatment can have multiple observations or ______ .
Replicates
A value of correlation coefficient r that is near ______ will have a residuals of predicted vs. actual values that is not normally distributed.
Zero
Experimental design wherein observations are taken in random order and that the environment in which the treatments are used is as uniform as possible.
Complete Randomized Design (CRD)
Which table is it possible to derive the F-statistic value if the total sum of squares, the sum of squares between groups, and degrees of freedom are known
ANOVA
True or False. Blocking in experimental design allows ANOVA to detect especially small differences in treatment means.
True
True or False. When a scatter plot shows all y values lined at a constant value, then strong correlation is indicated.
False
True or False. The number of levels in One-Way ANOVA is always synonymous with the number of groups in Completely Randomized Experiments.
True
Penalizes excessive use of unimportant independent variables. Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used.
Adjusted r^2
If the slope b1=0, what does it mean?
No linear relationship
The task is to find the best fit regression model for Y for candidate variables X1, X2, X3, X4, X5, and X6. If the p-values are 0.993, 0.329, 0.920, 0.017, 0.691, and 0.006, what should you do?
Remove X1 since it has the highest P-value. Incorporate a model from X2 to X6. Repeat process until all predictor variables have a P value that is significant.
True or false. The larger the variance inflation factor (VIF), the less severe the multicollinearity
False. (More severe)
Estimated average value of Y when the value of X is zero
b0
Estimated change in the average value of Y as a result of a one unit change in X
b1
Statistical formula used to compare variances across the means (or average) of different groups
Analysis of Variance (ANOVA)
True or False. R-squared never decreases when a new X variable is added to the model.
True
Number of independent values that a statistical analysis can measure.
Degrees of Freedom
These are obtained by finding the values that minimizes the sum of the squared differences between observed Y and the predicted value of Y.
b0 and b1
Used in the context of the analysis of variance, when the F-ratio suggests rejection of the null hypothesis, that is, when the difference between the population means is significant.
Least Squares Method
Test or a series of tests
Experiment
The ______ of an experiment plays a major role in the eventual solution of the problem.
Design
In a ______________________ , experimental trials (or runs) are performed at all combinations of the factor levels.
Factorial Experimental Design
A collection of mathematical and statistical techniques that are useful for modeling and analysis in applications where a response of interest is influenced by several variables and the objective is to optimize this response.
Response Surface Methodology (RSM)
Enumerate the four types of Multivariate Analysis
- Analysis of Data Structure
- Internal Consistency
- Grouping Observation
- Correspondence Analysis
Statistical study of experiments in which multiple measurements are made on each experimental unit and for which the relationship among multivariate measurements and their structure are important to the experiment’s understanding.
Multivariate Analysis
b0 is denoted as the _________ while b1 is denoted as the __________ .
Population Y-intercept; Population slope coefficient
Used to identify a smaller number of uncorrelated variables from a large set of data. A technique that transforms high-dimensions data into lower-dimensions while retaining as much information as possible
Principal Component Analysis
A statistical technique that reduces a set of variables by extracting all their commonalities into a smaller number of factors. Evaluates the correlations between variables
Factor Analysis
An assessment of how reliably survey or test items that are designed to measure the
same construct actually do so
Item Analysis
A __________ is an underlying theme, characteristic, or skill such as reading
comprehension or customer satisfaction
Construct
Used to join observations that share common characteristics into groups. This analysis
is appropriate when you do not have any initial information about how to form the groups. Analyzed through a row view.
Cluster Observation
Used to join observations that share common characteristics into groups. This analysis
is appropriate when you do not have any initial information about how to form the groups. Analyzed through a column view.
Cluster Variable
Used to join observations that share common characteristics into groups. This method is appropriate when you have sufficient information to make good starting cluster designations for the clusters
Cluster K-Means
Used to classify observations into two or more groups when you have a sample with known groups
Discriminant Analysis