Unit 2 Flashcards

Question 1

Q

Decision Tree Learning

Answer

A

Definition: Supervised Machine Learning method where data is split based on a parameter.

Entities: Decision nodes (splits) and leaves (decisions/outcomes).

Types: Classification Trees (Categorical Outcomes),Regression Trees (Continuous Outcomes).

Question 2

Q

Representing Concepts as Decision Trees

Answer

A

Tree Structure: Root node,leaf node, splitting, branches/subtrees.

Building Trees: Using CART algorithm (Classification and Regression Tree).

Question 3

Q

Recursive Induction of Decision Trees

Answer

A

Recursive Partitioning: Statistical method for variable analysis.

Decision Tree Example: Survival based on Titanic Ship based on variables.

Question 4

Q

Picking the Best Splitting Attribute

Answer

A

Information Gain: Measure for choosing the feature that provides the best split.

Ex: Classifying people at a theatre based on attributes.

Question 5

Q

Entropy and Information Gain

Answer

A

Entropy: Measure of randomness and disorder in information.
Information Gain: Reduction in entropy, calculated for each attribute.

Question 6

Q

Computational Complexities of ML Models

Answer

A

Ex: K Nearest Neighbour, Logistic Regression, SVM, Decision Tree, Randon Forest, Naive Bayes.
Time and Space Complexity: Big O Notation.

Question 7

Q

Occam’s Razor

Answer

A

Principle: The simplest explanation is likely the correct one.
Application in ML: Balancing model complexity for accurate predictions

Question 8

Q

Overfitting in ML

Answer

A

Definition: Model captures noise, reducing efficiency and accuracy.
Reasons: Uncleaned data, high variance, inadequate training data, model complexity.

Question 9

Q

Noisy Data and Pruning

Answer

A

Noisy Data: Corrupted to distorted data with low signal-to-noise ratio.
Pruning: Data compression technique to reduce non-critical parts of decision trees.

Question 10

Q

Experimental Evaluation of Learning Algorithms

Answer

A

Hypothesis Accuracy: Estimating accuracy using statistical methods.
Factors: Testing, likelihood, strategy with limited data.

Question 11

Q

Comparing Learning Algorithms and Cross-Validation

Answer

A

Factors: Time complexity, space complexity, sample complexity, unbiased data, online/offline algorithms, parallelizability, parametricity.
Cross-Validation Types: Holdout method, K-fold cross validation, Leave-p-out, Leave-one-out.

Question 12

Q

Learning Curves and Statistical Hypothesis Testing

Answer

A

Learning Curves: Plots showing progress over training experience.
Hypothesis Testing: Confirming observations using sample data and statistical tests.

Question 13

Q

Random Forest Complexity

Answer

A

Training Time: O(n * log(n)dk), kis being the number of Decision Trees.
Run-time: O(depth of tree * k).
Space Complexity: O(depth of tree * k).

Question 14

Q

Naive Bayes Complexity:

Answer

A

Training Time: O(n*d).

Run-time: O(c*d), retrieving features for each class ‘c’.

Question 15

Q

Occam’s Razor in Model Selection

Answer

A

Model Selection: Choosing the appropriate model for a machine learning problem.
Balance: Achieving a balance between model simplicity and accuracy.

Question 16

Q

Limitations of Cross-Validation

Answer

A

Computational Resources: Cross-validation can be computationally expensive.
Unseen Data: Test dataset in cross-validation may contain crucial information.

Question 17

Q

Learning Curve Types

Answer

A

Diminishing-Returns Curve: Rapid progression initially, slows over time.
Increasing-Returns Curve: Progression accelerates over time.
Increasing- Decreasing Return Curve (S-curve): Combination of both.
Complex Learning Curve: Varied progression patterns.

Question 18

Q

Hypothesis Tesing in ML

Answer

A

Purpose: Confirming observations about the population using sample data.
Null Hypothesis: Assumes no significant difference.
Alternate Hypothesis: Assumes a significant difference.

Question 19

Q

Types of Cross-Validation Methods

Answer

A

Holdout Method: Basic, dividing dataset into training and testing.
K-fold Cross-Validation: Improved holdout method with k subsets.
Leave-p-out Cross-validation: Exhaustive method leaving p data points out.
Leave-one-out Cross Validation: Simplified version, p equals one.
Applications: Evaluating and selecting ML models.

Question 20

Q

Bias-Variance Tradeoff

Answer

A

Balance: The tradeoff between bias (inflixibility) and variance (noisiness).
High Bias: Model is too simple, may underfit.
High Variance: Model is too complex, may overfit.