Exam2 Flashcards by Christine Vernon

Naive Bayes predicts…

What is the probability that a new data point has label A, B, C… etc

How well did you know this?

Not at all

Perfectly

Regression models are used to predict…

responses which have a continuous span of values

How well did you know this?

Not at all

Perfectly

The adjusted R^2 can be used

to compare models with different numbers of terms

How well did you know this?

Not at all

Perfectly

R^2, Mean Squared Error, Mean absolute error, etc, are examples of

goodness measures

How well did you know this?

Not at all

Perfectly

For Root Mean Squared Error (RMSE) if the value is < ? , it’s a good sign

< 1 standard deviation of the response variable

How well did you know this?

Not at all

Perfectly

For linear regression, always use…

more than 1 input variable

How well did you know this?

Not at all

Perfectly

Regression trees are

non-parametric method

How well did you know this?

Not at all

Perfectly

What is a stochastic process

a random variable that is a function of some index (ex space or time)

How well did you know this?

Not at all

Perfectly

What is Lazy Learning

A model fit using local data, it does not create a general model but instead memorizes the training data

How well did you know this?

Not at all

Perfectly

What is a technique that is an example of lazy learning

K nearest neighbors

How well did you know this?

Not at all

Perfectly

What is Eager Learning

A model fit which is “eager” to produce a general model to fit all data

How well did you know this?

Not at all

Perfectly

A Gaussian Process for regression produces

a probability distribution of functions

How well did you know this?

Not at all

Perfectly

For Gaussian Processes for regression, a Kernel function…

ascribes how each data is similar to others. It is chosen and determines the covariance function

How well did you know this?

Not at all

Perfectly

What are 3 main types of pre-processing for AI Model improvement

Transformations
Feature Selection
Feature Engineering

How well did you know this?

Not at all

Perfectly

The selection of a subset of input variables to use in the model is called

Feature Selection (AI improvement)

How well did you know this?

Not at all

Perfectly

A “Transformation” is…

a mathematical alteration to the data space to facilitate improving the model goodness; rotating/stretching the input features

How well did you know this?

Not at all

Perfectly

PCA stands for ___ and is an example of

Principal Components Analysis, a Transformation technique for data pre-processing

Converts a set of correlated input variables to uncorrelated variables (ie Principal Components)

How well did you know this?

Not at all

Perfectly

What is “Feature Engineering”?

Addition of a new variable (feature) from existing variables to provide important info to ML, often adds human context to the data set (AI model improvement)

How well did you know this?

Not at all

Perfectly

What is a primary purpose of text analytics in model improvement?

To perform feature engineering to integrate text data into the ML process

How well did you know this?

Not at all

Perfectly

When you Transform the data using PCA, common application is

Reducing dimensions, to just use the most influential (largest) principal components, while minimizing information loss

How well did you know this?

Not at all

Perfectly

In PCA, the ____ are referred to as the principal components

Eigenvalues

How well did you know this?

Not at all

Perfectly

With PCA, the transformed space is…

Orthonormal and uncorrelated (0 covariance)

How well did you know this?

Not at all

Perfectly

Creation of Dummy Variables is considered a ___ technique

Study These Flashcards

Transformation

Dummy variables are…

Study These Flashcards

a numerical equivalent to categorical variables

If you have 3 or more categories to convert to a number, it is better to introduce a dummy variable than a scale to avoid...

introducing bias

What is the limiting assumption of PCA

It is based on covariance, which is the linear statistical variation between 2 variables. The variables could have a more complex/nonlinear correlation in reality

Covariance is...

the linear statistical variation between 2 variables

What are some ML techniques that have built-in feature selection

Decision/Regression trees and Linear regression

What is a disadvantage of feature selection

You're still losing information

What are 3 types of importance measures used for feature selection

- Filter (select features in pre-processing, train on your selection) - Wrapper (train on a subset, then add or remove iteratively) - Embedded (integrated into learning process, ex Trees)

What is the simplest importance measure to use for feature selection

Linear statistical importance - squared correlation

"Feature Importance" means

assigning a numerical importance value to each feature

When using squared correlation technique for feature importance, you could drop (input) features that...

show a very weak correlation to your target value (output) - ex 6.3

Cross Validation can produce more _ models, while ensemble typically has the goal of more _ models

Robust, accurate

One technique for Cross Validation is

K-folds

_____ is a newer ensemble learning technique

Random Forests

Ensemble learning often using ___ methods

decision/regression tree

By themselves, Trees are considered ____

weak learners

Trees can be prone to

overfitting

Bagging can help

avoid overfitting

Bagging definition in class

Each model uses a random subset of training data

Boosting can help

remove bias

What is the Random Forests technique

Ensemble method where a multitude of decision trees are used using probability measures for their construction and preduction

The accuracy of what technique can get on same level as A NNs?

Random Forests

CART stands for

classification and regression trees

Bootstrap Aggregation is another term for

Bagging

Text Analytics is the process of

quantifying information from raw text

What are 3 methods of text analytics

- Feature-Value Mapping - Similarity Measures - Vectorizing

Feature-value mapping is the same as...

Dummy Variables

In text analytics, similarity methods work by...

calculating an equivalent distance metric for text variables

What is an example of similarity method for text analytics

Levenshtein distance

Definition of vectorizing

Conversion of raw text data into a numerical equivalent

What are 3 ways to vectorize text?

- tokenizing - counting - normalizing

What are "stop words"

words that may not be informative in a set of text data, that can be excluded from vectorization

Validation test sets are used often when...

Detecting/avoiding overfitting when training ANNs

Exam2 Flashcards

(56 cards)