Lecture Notes 2 Flashcards
What are the three methods of learning?
Trial and error, listening to others, watching others
What can models in Machine Learning be?
Tree diagram, neural network, collection of examples
What are the three tasks a model is used for?
- Describe the samples in building the model
- Predict something about unseen data
- Generate new data
What is Supervised Learning? What are the 2 types?
Takes in labeled data s = <i,o> and makes model M such that M(i) -> o. Classification & Regression
What are the two main types of Supervised Learning?
- Classification - find boundaries to separate classes
- Regression - find best fitting line to predict outcomes
What does Unsupervised Learning focus on?
Only gets samples s = <i> and finds relationships between data points</i>
Clustering, describe groups based on similarities, outlier detection, density estimation
What is the goal of Semi-supervised Learning?
To build clusters of unlabeled data and label them using the labeled points
What is Reinforcement Learning?
Maps states ‘s’ to actions ‘a’ to optimize life, using a policy π(s) -> a
Doesn’t use data points for models, tries various actions and receives reward/punishment
What does Generalization in Machine Learning refer to?
Tigers Example
Leads to stereotyping and involves overfitting (under-gen) and underfitting
What is Bias Error in the context of Machine Learning?
Error produced by underfitting
What is Variance Error in Machine Learning?
Error produced by overfitting
What do probabilistic models in Machine Learning output?
A probability of success/fail instead of a simple yes or no
How to do testing on Machine Learning?
To build a model using half data (training) and test it on the other half
What is K-fold validation?
Divides data into k subsets, iterating through all where one is testing and the rest training
What is the Curse Of Dimensionality?
Need an adequate number of samples to make a good model; sample size increases exponentially with dimension size increase
What is Dimensionality Reduction?
Field trying to recast high dimension spaces into lower ones
What does Clustering in Unsupervised Learning aim to do? Why is clustering an ill posed problem
Builds model from set of unlabeled data
Describe/generate data based on similarities
Clustering: ill posed problem - male / female / penguin problem
What is K-means in clustering?
A method to find ‘k’ means of clusters in data through optimization. If you want clusters to be same size consider using weights on distance
What are the steps involved in K-means clustering?
Looking to find “k” means of clusters in data.. Optimization
- Start with ‘n’ samples ‘X’ and collection of K-means ‘M’. m in M are random x values
- cycle through next 2 steps until means dont change
- Color each sample x in ‘X’ according to its closest mean ‘m’ in ‘M’
- Re-average ‘m’ and move it to its new location
What is Expected Maximization (EM) in clustering? What algo is part of this. What is its main problem?
Labels data to current model predictions and modifies model to match distributions of labels
K-means is part of this
Getting stuck in local optima
What is K-nearest-neighbor? Why type of machine learning algo is it? Steps?
A classification algorithm that stores all samples <points, labels>. Supervised Learning
(1) Store all samples <point,label>
(2) When queried with new point, find “k” points closest to it. Points then vote using label values.
How does K-nearest-neighbor classify a new point?
Finds ‘k’ points closest to it and those points vote using label values
What is the effect of a low ‘k’ in K-nearest-neighbor?
Affected by noise
What is the effect of a large ‘k’ in K-nearest-neighbor?
Means areas with few samples get corrupted
What is the purpose of a quad tree in K-nearest-neighbor?
To increase efficiency by reducing big O
True or False: Reinforcement Learning uses data points for models.
False
Types of machine learning algos
Supervised Learning - <i,o>, Classification, regression
Unsupervised Learning <i>
Semi-supervised Learning
Reinforcement Learning π(s) -> a</i>