Unit 2 Flashcards
Decision Tree Learning
Definition: Supervised Machine Learning method where data is split based on a parameter.
Entities: Decision nodes (splits) and leaves (decisions/outcomes).
Types: Classification Trees (Categorical Outcomes),Regression Trees (Continuous Outcomes).
Representing Concepts as Decision Trees
Tree Structure: Root node,leaf node, splitting, branches/subtrees.
Building Trees: Using CART algorithm (Classification and Regression Tree).
Recursive Induction of Decision Trees
Recursive Partitioning: Statistical method for variable analysis.
Decision Tree Example: Survival based on Titanic Ship based on variables.
Picking the Best Splitting Attribute
Information Gain: Measure for choosing the feature that provides the best split.
Ex: Classifying people at a theatre based on attributes.
Entropy and Information Gain
Entropy: Measure of randomness and disorder in information.
Information Gain: Reduction in entropy, calculated for each attribute.
Computational Complexities of ML Models
Ex: K Nearest Neighbour, Logistic Regression, SVM, Decision Tree, Randon Forest, Naive Bayes.
Time and Space Complexity: Big O Notation.
Occam’s Razor
Principle: The simplest explanation is likely the correct one.
Application in ML: Balancing model complexity for accurate predictions
Overfitting in ML
Definition: Model captures noise, reducing efficiency and accuracy.
Reasons: Uncleaned data, high variance, inadequate training data, model complexity.
Noisy Data and Pruning
Noisy Data: Corrupted to distorted data with low signal-to-noise ratio.
Pruning: Data compression technique to reduce non-critical parts of decision trees.
Experimental Evaluation of Learning Algorithms
Hypothesis Accuracy: Estimating accuracy using statistical methods.
Factors: Testing, likelihood, strategy with limited data.
Comparing Learning Algorithms and Cross-Validation
Factors: Time complexity, space complexity, sample complexity, unbiased data, online/offline algorithms, parallelizability, parametricity.
Cross-Validation Types: Holdout method, K-fold cross validation, Leave-p-out, Leave-one-out.
Learning Curves and Statistical Hypothesis Testing
Learning Curves: Plots showing progress over training experience.
Hypothesis Testing: Confirming observations using sample data and statistical tests.
Random Forest Complexity
Training Time: O(n * log(n)dk), kis being the number of Decision Trees.
Run-time: O(depth of tree * k).
Space Complexity: O(depth of tree * k).
Naive Bayes Complexity:
Training Time: O(n*d).
Run-time: O(c*d), retrieving features for each class ‘c’.
Occam’s Razor in Model Selection
Model Selection: Choosing the appropriate model for a machine learning problem.
Balance: Achieving a balance between model simplicity and accuracy.