Machine Learning Flashcards
What is machine learning?
- technology that allows computers to learn from data without having to program it
subset of artificial intelligence (AI) that enables computers to learn from data without having to program it, using algorithms to identify patterns and use this knowledge to make predictions or decisions
What’s difference between supervised machine learning and unsupervised machine learning?
-supervised learning uses labeled data to train models for prediction or classification (eg. Mom telling you this is a teddy bear)
-unsupervised learning uses unlabeled data to discover patterns and structures (you have to figure out what it is based on group common characteristics)
What is difference between labeled data and unlabeled data?
- labeled date: cat picture with title cat
- unlabeled data: cat picture
In supervised machine learning what’s the difference between regression problems and classification problems?
- regression problem: what will weather be tomorrow
- classification problem: will it be hot or cold tomorrow
What is the difference between dimension reduction and clustering in unsupervised machine learning?
-dimension reduction: reducing number of feature variables (independent variables)/frequencies so you don’t overfit and use the most important variables to explain outcome (similar to parisomy)
-clustering observations or groping observations based on common characteristics
What difference between deep learning and reinforcement learning?
- deep learning learns from labeled data (deep learning is like teaching the robot to learn on its own by looking at lots of examples).
- reinforcement learning learns by interacting with an environment and receiving rewards or penalties for its actions, recording rewards or penalties, then learning with another interaction
What is generalization in machine learning algorithms?
- how well model explains training data and applies this knowledge to new data
model that doesn’t explain training data very well is considered underfit
model that explains training data too well it’s considered overfit
The dataset for machine learnings models are usually divided into 3 samples what are the 3 samples and uses of 3 samples?
- training sample: sample used to find relationship and is in sample data
- validation sample: validate and fine tune data and is in sample data
- test sample: test the model on new data which is a small portion of the total data set, is out of sample data
What is difference between bias errors, variance errors, and base errors?
- bias error: oversimplifying a model causing underfitting (often due to not enough independent variables that explain the dependent variable)
- variance error: making model too complex or overfitting (often due to too many independent variables that explain the dependent variable too well)
- based on error: model that is good fit/robust, good balance between bias error and variance error.
What are two methods for addressing overfitting models? CC
- complexity reduction: limiting number of features (independent variables) and penalizing algorithms that are too complex. (achieved by include only parameters that reduce out of sample errors)
- cross validation: divides data into training sample, validation sample, and test sample and then sees how well model generalizes data in the unseen samples
What is penalized regression in supervised machine learning, and what is noise?
-Penalized regression: technique used in machine learning to prevent models from becoming too complex and overfitting the training data. Overfitting happens when a model learns not just the underlying patterns but also the noise in the data, which makes it perform poorly on new, unseen data.
noise refers to random, irrelevant, or erroneous information that doesn’t represent the true underlying patterns you’re trying to learn (eg. Measurement errors when collecting data, outliers, etc)
How does penalized regression solve complex models or assigning excessively large coefficients to some features?
- Penalized regression addresses this by adding a penalty term to the loss function, discouraging the model from relying too heavily on any single feature or using too many features unnecessarily.
How does the lasso work, and what happens to coefficients as lasso increases?
- as lasso increases the coefficients decrease, shrinking the effect of each independent variable
- machine model will create coefficients,as lasso increases each coefficient for each independent variable will decrease, eventually reducing some not as important coefficients down to 0
What is support vector machine in supervised machine learning?
- a system to help a model classify an image or data into one of 2 classes
- comparing apples to oranges, you graph apples on one side and oranges on the other side. Draw a line down the middle called the separating hyperplane. then draw 2 more parallel lines to the hyperplane called support vectors, one for apples and one for oranges. when the machine is given a new fruit, it’ll look at the graph to see where the fruit falls on the graph.
What happens if data points falls within the support vectors and hyperplane, called the margin?
- within the margin (the space between the hyperplane and the support vectors), it means the point is in a more uncertain area — the model is less confident about its classification.
What is k nearest neighbor in supervised machine learning?
- machine learning how to classify data or what to do with data based on nearest neighbors of data points. It means follow the majority of what your neighbor data points are doing.
- eg. you see a bunch of kids playing soccer and basketball on the playground. Pick a Number (k): Let’s say k = 3. You decide to check what the 3 closest kids are playing. Count the Games: If 2 kids are playing soccer and 1 kid is playing basketball, you choose soccer because more kids near you are playing it.
What is classification & regression tree in supervised learning (CART)?
- CART is like a decision tree that helps you figure things out by asking yes/no questions until you reach an answer.
eg.
1. Is it sunny?
• Yes → Go to the next question
• No → Stay inside
- Is it hot outside?
• Yes → Play outside with water balloons
• No → Play outside with a ball
What is ensemble learning in supervised machine learning?
Ensemble learning: like having a team of experts instead of relying on just one person’s opinion. The idea is to combine multiple models (called weak learners) to create a stronger, more accurate model. working together these models make better predictions than any single model could on its own!
What are 3 types of ensemble learning techniques for supervised machine learning? VBR
- voting classifiers: following majority. (eg. 4 models say default 3 models say no default, you go with default)
- bagging (bootstrap aggregating): training different models independently on different subsets of data, the model will either say yes bankruptcy or no bankruptcy and go with majority
- random forest: an algorithm that combines multiple decision trees, each trained on a random subset of the data and features to make predictions
What is principal component analysis in unsupervised machine learning, and what can’t principal component analysis be used for?
- takes many features or observations and groups them into just a few composite variables. composite variables with the most explanatory power (aka line of best fit) are graphed first then least explanatory power composite variable is plotted on same graph as the observations. composite variables should be uncorrelated with each other so when the 2nd composite variable is graphed it won’t be parallel to the first composite variable
- (can’t be used for regression problems)
What are eigenvectors and eigenvalue?
- eigenvectors (direction): tells you best way to combine independent variables (eg. Combine height and weight because they change together)
- eigenvalue (importance): tells us how important or how much variation the independent variable contributes. so if eigenvalue is higher the independent variable is going to be higher and more explanatory for the dependent variable
What is projection error in principal component analysis?
- distance between the observation and the most explanatory composite variable (aka line of best fit)
What is a scree plot?
- graphing each composite variable based on their explanatory power from most explanatory to least explanatory composite variable. most people want it to be around 80-90% explanatory
What is clustering for unsupervised machine learning?
- process of organizing observations into groups that share common features.
- PCA is different because it focuses on reducing observations into composite variables
What is k means clustering in unsupervised machine learning?
- process of organizing observations into k amount of clusters, without overlapping the clusters.
in other words you’re trying to minimize the observations within cluster closest to the centroid (center of cluster) and maximize the distance between cluster 1 and cluster 2.
What are steps in k clustering?
- Assign k amount of clusters
- Plot data and it will fall into the clusters
- Find average of cluster 1, and average of cluster 2.
- Reposition cluster 1 centroid to average point, and reposition cluster 2 centroid to average point.
- Repeat process 2-4 over & over until the centroids don’t need to be repositioned.
What is hierarchical clustering?
- similar to k clustering it organizes data in clusters. except the # of clusters aren’t predetermined.
What are the 2 types of hierarchical clustering?
- agglomerative clustering (bottom up clustering): starts with all observations, clusters the two closest observations into 1, then clusters the next 2 closed observations into 1.
- divisive clustering (top down clustering): starts with all observations and one big cluster with all the data, then the cluster is broken down into 2 clusters, then 3 clusters and so on.
What is a dendogram in hierarchical clustering?
- tree like structure that shows how data points are grouped together at different levels of similarity
Connecting a & b is called dendrites in tree gram, vertical line
Distance between a & b is called arches in tree gram, or horizontal line
eg. points on a graph a, b, c, d, e, f, g, h,
Cluster a & b, cluster, c & d, cluster e & f, cluster g & h, etc.
What are neural networks and 3 layers?
- machine learning algorithms modeled after humans
- Input layer (data received) (eg. Friends look at puzzle piece and pass it along)
- Hidden layer (where learning takes place) (figure out how puzzle pieces fit together, talk to each other, share ideas, etc)
- Output layer (exports information) (makes final guess on puzzle, repeats until they get it right)
What’s the difference between neural networks and deep learning?
- deep learning has many more hidden layers (huge team of friends with many groups, each learning a small part of the puzzle) (at least 3 hidden layers, usually 10-20 hidden layers)
What size of data set is support vector machine best for, is SVM affected by outliers, and is SVM for supervised or unsupervised learning?
- best suited for small to medium size data sets
- unaffected by outliers that plot beyond the support vectors
- supervised learning