Exam2 Flashcards
Naive Bayes predicts…
What is the probability that a new data point has label A, B, C… etc
Regression models are used to predict…
responses which have a continuous span of values
The adjusted R^2 can be used
to compare models with different numbers of terms
R^2, Mean Squared Error, Mean absolute error, etc, are examples of
goodness measures
For Root Mean Squared Error (RMSE) if the value is < ? , it’s a good sign
< 1 standard deviation of the response variable
For linear regression, always use…
more than 1 input variable
Regression trees are
non-parametric method
What is a stochastic process
a random variable that is a function of some index (ex space or time)
What is Lazy Learning
A model fit using local data, it does not create a general model but instead memorizes the training data
What is a technique that is an example of lazy learning
K nearest neighbors
What is Eager Learning
A model fit which is “eager” to produce a general model to fit all data
A Gaussian Process for regression produces
a probability distribution of functions
For Gaussian Processes for regression, a Kernel function…
ascribes how each data is similar to others. It is chosen and determines the covariance function
What are 3 main types of pre-processing for AI Model improvement
- Transformations
- Feature Selection
- Feature Engineering
The selection of a subset of input variables to use in the model is called
Feature Selection (AI improvement)
A “Transformation” is…
a mathematical alteration to the data space to facilitate improving the model goodness; rotating/stretching the input features
PCA stands for ___ and is an example of
Principal Components Analysis, a Transformation technique for data pre-processing
Converts a set of correlated input variables to uncorrelated variables (ie Principal Components)
What is “Feature Engineering”?
Addition of a new variable (feature) from existing variables to provide important info to ML, often adds human context to the data set (AI model improvement)
What is a primary purpose of text analytics in model improvement?
To perform feature engineering to integrate text data into the ML process
When you Transform the data using PCA, common application is
Reducing dimensions, to just use the most influential (largest) principal components, while minimizing information loss
In PCA, the ____ are referred to as the principal components
Eigenvalues
With PCA, the transformed space is…
Orthonormal and uncorrelated (0 covariance)