Unit 2 Flashcards

1
Q

Decision Tree Learning

A

Definition: Supervised Machine Learning method where data is split based on a parameter.

Entities: Decision nodes (splits) and leaves (decisions/outcomes).

Types: Classification Trees (Categorical Outcomes),Regression Trees (Continuous Outcomes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Representing Concepts as Decision Trees

A

Tree Structure: Root node,leaf node, splitting, branches/subtrees.

Building Trees: Using CART algorithm (Classification and Regression Tree).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Recursive Induction of Decision Trees

A

Recursive Partitioning: Statistical method for variable analysis.

Decision Tree Example: Survival based on Titanic Ship based on variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Picking the Best Splitting Attribute

A

Information Gain: Measure for choosing the feature that provides the best split.

Ex: Classifying people at a theatre based on attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Entropy and Information Gain

A

Entropy: Measure of randomness and disorder in information.
Information Gain: Reduction in entropy, calculated for each attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Computational Complexities of ML Models

A

Ex: K Nearest Neighbour, Logistic Regression, SVM, Decision Tree, Randon Forest, Naive Bayes.
Time and Space Complexity: Big O Notation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Occam’s Razor

A

Principle: The simplest explanation is likely the correct one.
Application in ML: Balancing model complexity for accurate predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Overfitting in ML

A

Definition: Model captures noise, reducing efficiency and accuracy.
Reasons: Uncleaned data, high variance, inadequate training data, model complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Noisy Data and Pruning

A

Noisy Data: Corrupted to distorted data with low signal-to-noise ratio.
Pruning: Data compression technique to reduce non-critical parts of decision trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Experimental Evaluation of Learning Algorithms

A

Hypothesis Accuracy: Estimating accuracy using statistical methods.
Factors: Testing, likelihood, strategy with limited data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Comparing Learning Algorithms and Cross-Validation

A

Factors: Time complexity, space complexity, sample complexity, unbiased data, online/offline algorithms, parallelizability, parametricity.
Cross-Validation Types: Holdout method, K-fold cross validation, Leave-p-out, Leave-one-out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Learning Curves and Statistical Hypothesis Testing

A

Learning Curves: Plots showing progress over training experience.
Hypothesis Testing: Confirming observations using sample data and statistical tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Random Forest Complexity

A

Training Time: O(n * log(n)dk), kis being the number of Decision Trees.
Run-time: O(depth of tree * k).
Space Complexity: O(depth of tree * k).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Naive Bayes Complexity:

A

Training Time: O(n*d).

Run-time: O(c*d), retrieving features for each class ‘c’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Occam’s Razor in Model Selection

A

Model Selection: Choosing the appropriate model for a machine learning problem.
Balance: Achieving a balance between model simplicity and accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Limitations of Cross-Validation

A

Computational Resources: Cross-validation can be computationally expensive.
Unseen Data: Test dataset in cross-validation may contain crucial information.

17
Q

Learning Curve Types

A

Diminishing-Returns Curve: Rapid progression initially, slows over time.
Increasing-Returns Curve: Progression accelerates over time.
Increasing- Decreasing Return Curve (S-curve): Combination of both.
Complex Learning Curve: Varied progression patterns.

18
Q

Hypothesis Tesing in ML

A

Purpose: Confirming observations about the population using sample data.
Null Hypothesis: Assumes no significant difference.
Alternate Hypothesis: Assumes a significant difference.

19
Q

Types of Cross-Validation Methods

A

Holdout Method: Basic, dividing dataset into training and testing.
K-fold Cross-Validation: Improved holdout method with k subsets.
Leave-p-out Cross-validation: Exhaustive method leaving p data points out.
Leave-one-out Cross Validation: Simplified version, p equals one.
Applications: Evaluating and selecting ML models.

20
Q

Bias-Variance Tradeoff

A

Balance: The tradeoff between bias (inflixibility) and variance (noisiness).
High Bias: Model is too simple, may underfit.
High Variance: Model is too complex, may overfit.