Exam2 Flashcards

1
Q

Naive Bayes predicts…

A

What is the probability that a new data point has label A, B, C… etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression models are used to predict…

A

responses which have a continuous span of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The adjusted R^2 can be used

A

to compare models with different numbers of terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

R^2, Mean Squared Error, Mean absolute error, etc, are examples of

A

goodness measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For Root Mean Squared Error (RMSE) if the value is < ? , it’s a good sign

A

< 1 standard deviation of the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For linear regression, always use…

A

more than 1 input variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression trees are

A

non-parametric method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a stochastic process

A

a random variable that is a function of some index (ex space or time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Lazy Learning

A

A model fit using local data, it does not create a general model but instead memorizes the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a technique that is an example of lazy learning

A

K nearest neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Eager Learning

A

A model fit which is “eager” to produce a general model to fit all data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A Gaussian Process for regression produces

A

a probability distribution of functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For Gaussian Processes for regression, a Kernel function…

A

ascribes how each data is similar to others. It is chosen and determines the covariance function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 3 main types of pre-processing for AI Model improvement

A
  • Transformations
  • Feature Selection
  • Feature Engineering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The selection of a subset of input variables to use in the model is called

A

Feature Selection (AI improvement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A “Transformation” is…

A

a mathematical alteration to the data space to facilitate improving the model goodness; rotating/stretching the input features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

PCA stands for ___ and is an example of

A

Principal Components Analysis, a Transformation technique for data pre-processing

Converts a set of correlated input variables to uncorrelated variables (ie Principal Components)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is “Feature Engineering”?

A

Addition of a new variable (feature) from existing variables to provide important info to ML, often adds human context to the data set (AI model improvement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a primary purpose of text analytics in model improvement?

A

To perform feature engineering to integrate text data into the ML process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When you Transform the data using PCA, common application is

A

Reducing dimensions, to just use the most influential (largest) principal components, while minimizing information loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In PCA, the ____ are referred to as the principal components

A

Eigenvalues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

With PCA, the transformed space is…

A

Orthonormal and uncorrelated (0 covariance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Creation of Dummy Variables is considered a ___ technique

A

Transformation

24
Q

Dummy variables are…

A

a numerical equivalent to categorical variables

25
If you have 3 or more categories to convert to a number, it is better to introduce a dummy variable than a scale to avoid...
introducing bias
26
What is the limiting assumption of PCA
It is based on covariance, which is the linear statistical variation between 2 variables. The variables could have a more complex/nonlinear correlation in reality
27
Covariance is...
the linear statistical variation between 2 variables
28
What are some ML techniques that have built-in feature selection
Decision/Regression trees and Linear regression
29
What is a disadvantage of feature selection
You're still losing information
30
What are 3 types of importance measures used for feature selection
- Filter (select features in pre-processing, train on your selection) - Wrapper (train on a subset, then add or remove iteratively) - Embedded (integrated into learning process, ex Trees)
31
What is the simplest importance measure to use for feature selection
Linear statistical importance - squared correlation
32
"Feature Importance" means
assigning a numerical importance value to each feature
33
When using squared correlation technique for feature importance, you could drop (input) features that...
show a very weak correlation to your target value (output) - ex 6.3
34
Cross Validation can produce more _ models, while ensemble typically has the goal of more _ models
Robust, accurate
35
One technique for Cross Validation is
K-folds
36
_____ is a newer ensemble learning technique
Random Forests
37
Ensemble learning often using ___ methods
decision/regression tree
38
By themselves, Trees are considered ____
weak learners
39
Trees can be prone to
overfitting
40
Bagging can help
avoid overfitting
41
Bagging definition in class
Each model uses a random subset of training data
42
Boosting can help
remove bias
43
What is the Random Forests technique
Ensemble method where a multitude of decision trees are used using probability measures for their construction and preduction
44
The accuracy of what technique can get on same level as A NNs?
Random Forests
45
CART stands for
classification and regression trees
46
Bootstrap Aggregation is another term for
Bagging
47
Text Analytics is the process of
quantifying information from raw text
48
What are 3 methods of text analytics
- Feature-Value Mapping - Similarity Measures - Vectorizing
49
Feature-value mapping is the same as...
Dummy Variables
50
In text analytics, similarity methods work by...
calculating an equivalent distance metric for text variables
51
What is an example of similarity method for text analytics
Levenshtein distance
52
Definition of vectorizing
Conversion of raw text data into a numerical equivalent
53
What are 3 ways to vectorize text?
- tokenizing - counting - normalizing
54
What are "stop words"
words that may not be informative in a set of text data, that can be excluded from vectorization
55
Validation test sets are used often when...
Detecting/avoiding overfitting when training ANNs
56