Exam2 Flashcards
Naive Bayes predicts…
What is the probability that a new data point has label A, B, C… etc
Regression models are used to predict…
responses which have a continuous span of values
The adjusted R^2 can be used
to compare models with different numbers of terms
R^2, Mean Squared Error, Mean absolute error, etc, are examples of
goodness measures
For Root Mean Squared Error (RMSE) if the value is < ? , it’s a good sign
< 1 standard deviation of the response variable
For linear regression, always use…
more than 1 input variable
Regression trees are
non-parametric method
What is a stochastic process
a random variable that is a function of some index (ex space or time)
What is Lazy Learning
A model fit using local data, it does not create a general model but instead memorizes the training data
What is a technique that is an example of lazy learning
K nearest neighbors
What is Eager Learning
A model fit which is “eager” to produce a general model to fit all data
A Gaussian Process for regression produces
a probability distribution of functions
For Gaussian Processes for regression, a Kernel function…
ascribes how each data is similar to others. It is chosen and determines the covariance function
What are 3 main types of pre-processing for AI Model improvement
- Transformations
- Feature Selection
- Feature Engineering
The selection of a subset of input variables to use in the model is called
Feature Selection (AI improvement)
A “Transformation” is…
a mathematical alteration to the data space to facilitate improving the model goodness; rotating/stretching the input features
PCA stands for ___ and is an example of
Principal Components Analysis, a Transformation technique for data pre-processing
Converts a set of correlated input variables to uncorrelated variables (ie Principal Components)
What is “Feature Engineering”?
Addition of a new variable (feature) from existing variables to provide important info to ML, often adds human context to the data set (AI model improvement)
What is a primary purpose of text analytics in model improvement?
To perform feature engineering to integrate text data into the ML process
When you Transform the data using PCA, common application is
Reducing dimensions, to just use the most influential (largest) principal components, while minimizing information loss
In PCA, the ____ are referred to as the principal components
Eigenvalues
With PCA, the transformed space is…
Orthonormal and uncorrelated (0 covariance)
Creation of Dummy Variables is considered a ___ technique
Transformation
Dummy variables are…
a numerical equivalent to categorical variables
If you have 3 or more categories to convert to a number, it is better to introduce a dummy variable than a scale to avoid…
introducing bias
What is the limiting assumption of PCA
It is based on covariance, which is the linear statistical variation between 2 variables. The variables could have a more complex/nonlinear correlation in reality
Covariance is…
the linear statistical variation between 2 variables
What are some ML techniques that have built-in feature selection
Decision/Regression trees and Linear regression
What is a disadvantage of feature selection
You’re still losing information
What are 3 types of importance measures used for feature selection
- Filter (select features in pre-processing, train on your selection)
- Wrapper (train on a subset, then add or remove iteratively)
- Embedded (integrated into learning process, ex Trees)
What is the simplest importance measure to use for feature selection
Linear statistical importance - squared correlation
“Feature Importance” means
assigning a numerical importance value to each feature
When using squared correlation technique for feature importance, you could drop (input) features that…
show a very weak correlation to your target value (output) - ex 6.3
Cross Validation can produce more _ models, while ensemble typically has the goal of more _ models
Robust, accurate
One technique for Cross Validation is
K-folds
_____ is a newer ensemble learning technique
Random Forests
Ensemble learning often using ___ methods
decision/regression tree
By themselves, Trees are considered ____
weak learners
Trees can be prone to
overfitting
Bagging can help
avoid overfitting
Bagging definition in class
Each model uses a random subset of training data
Boosting can help
remove bias
What is the Random Forests technique
Ensemble method where a multitude of decision trees are used using probability measures for their construction and preduction
The accuracy of what technique can get on same level as A NNs?
Random Forests
CART stands for
classification and regression trees
Bootstrap Aggregation is another term for
Bagging
Text Analytics is the process of
quantifying information from raw text
What are 3 methods of text analytics
- Feature-Value Mapping
- Similarity Measures
- Vectorizing
Feature-value mapping is the same as…
Dummy Variables
In text analytics, similarity methods work by…
calculating an equivalent distance metric for text variables
What is an example of similarity method for text analytics
Levenshtein distance
Definition of vectorizing
Conversion of raw text data into a numerical equivalent
What are 3 ways to vectorize text?
- tokenizing
- counting
- normalizing
What are “stop words”
words that may not be informative in a set of text data, that can be excluded from vectorization
Validation test sets are used often when…
Detecting/avoiding overfitting when training ANNs