AI Flashcards

Question

If the outcome is categorical it is called ..... .

Answer 1

Classification.

Answer 2

Regression.

Answer 3

Is the assignment of observations into clusters so that observations in the same cluster are similar.

Answer 4

Is a predicting the value of a categorical variable (target or class) by building a model based on one or more numerical and/or categorical variables (predictors or attributes).

Answer 5

Is the simplest classification method which relies on the target and ignores all predictors.

Answer 6

Majority category (class).

Answer 7

determining a baseline performance as a benchmark for other classification methods.

Answer 8

Construct a frequency table for the target and select its most frequent value.

Answer 9

Short for "One Rule", is a simple classification algorithm that generates one rule for each predictor in the data, then selects the rule with the smallest total error as its "one rule".

Answer 10

Construct a frequency table for each predictor against the target.

Answer 11

producing rules that are simple for humans to interpret.

Answer 12

For each predictor, For each value of that predictor, make a rule as follows; Count how often each value of target (class) appears. Find the most frequent class. Make the rule assign that class to this value of the predictor. Calculate the total error of the rules of each predictor. Choose the predictor with the smallest total error.

Answer 13

a higher contribution to the predictability of the model.

Answer 14

- A denotes something about which we are uncertain. | - perhaps the outcome of a randomized experiment.

Answer 15

The fraction of possible worlds in which A is true.

Answer 16

Sample space (S).

Answer 17

Bayes’ theorem with independence assumptions between predictors.

Answer 18

very large datasets.

Answer 19

Because it often outperforms more sophisticated classification methods.

Answer 20

``` Bayes theorem provides a way of calculating the posterior probability, P(c|x), from P(c), P(x), and P(x|c). Naive Bayes classifier assume that the effect of the value of a predictor (x) on a given class (c) is independent of the values of other predictors. This assumption is called class conditional independence. ```

Answer 21

Add 1 to the count for every attribute value-class combination (Laplace estimator) when an attribute value (Outlook=Overcast) doesn’t occur with every class value (Play Golf=no).

Answer 22

- Numerical variables need to be transformed to their categorical counterparts (binning) before constructing their frequency tables. - The other option we have is using the distribution of the numerical variable to have a good guess of the frequency. - For example, one common practice is to assume normal distributions for numerical variables.

Answer 23

Mean and standard deviation.

Answer 24

- Decision tree builds classification or regression models in the form of a tree structure. - It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. - The final result is a tree with decision nodes and leaf nodes. - Decision trees can handle both categorical and numerical data.

Answer 25

A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy).

Answer 26

Leaf node (e.g., Play) represents a classification or decision.

Answer 27

root node.

Answer 28

The core algorithm is called ID3 employs a top-down, greedy search through the space of possible branches with no backtracking. ID3 uses Entropy and Information Gain to construct a decision tree.

Answer 29

- A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogenous). - ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one.

Answer 30

the decrease in entropy after a dataset is split on an attribute.

Answer 31

Finding attribute that returns the highest information gain.

Answer 32

By mapping from the root node to the leaf nodes one by one.

Answer 33

- Working with continuous attributes (binning). - Avoiding overfitting - Super Attributes (attributes with many values). - Working with missing values.

Answer 34

Logistic regression predicts the probability of an outcome that can only have two values (i.e. a dichotomy).

Answer 35

The use of one or several predictors (numerical and categorical).

Answer 36

- A linear regression will predict values outside the acceptable range (e.g. predicting probabilities outside the range 0 to 1). - Since the dichotomous experiments can only have one of two possible values for each experiment, the residuals will not be normally distributed about the predicted line.

Answer 37

The natural logarithm of the “odds” of the target variable.

Answer 38

Ordinary least square regression.

Answer 39

Maximum likelihood estimation (MLE).

Answer 40

pseudo R2.

Answer 41

Is a test of the significance of the difference between the likelihood ratio for the baseline model minus the likelihood ratio for a reduced model.

Answer 42

model chi-square

Answer 43

Making predictions or decisions from Data.

Answer 44

- Coverage. - Concision. - Directness. - Templates. - Histograms.

Answer 45

Is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions).

Answer 46

K nearest neighbors (KNN).

Answer 47

- A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. - If K = 1, then the case is simply assigned to the class of its nearest neighbor.

Answer 48

- Euclidean - Manhattan - Minkowski

Answer 49

Continuous.

Answer 50

Inspecting the data.

Answer 51

Cross-validation.

Answer 52

Variables have different measurement scales, there is a mixture of numerical and categorical variables.

Answer 53

LDA is a classification method based upon the concept of searching for a linear combination of the variables that best separates two classes, It is simple, mathematically robust and often produces models whose accuracy is as good as more complex methods.

Answer 54

the Mahalanobis distance between two groups.

Answer 55

it means that in two averages differ by more than 3 standard deviations. It means that the overlap (probability of misclassification) is quite small.

Answer 56

test which predictors contribute significantly to the discriminant function.

Answer 57

- Stop growing when data split not statistically significant. - Grow full tree, then post-prune.

Answer 58

- Simple to understand and interpret. - Little data preparation and little computation. - Indicates which attributes are most important for classification.

Answer 59

- Learning an optimal decision tree is NP-complete. - Perform poorly with many classes and small data. - Computationally expensive to train. - Over-complex trees do not generalize well from the training data (overfitting).

AI Flashcards

(88 cards)