Unsupervised Machine Learning Flashcards
What is unsupervised learning?
This is used when we do not have a target/y in our dataset and we need our machine to learn patterns in the data on its own
What is Kmeans clustering?
Randomly chooses k points to start as our clusters centers (commonly called centoids)
What is explanatory k means
What is Hierarchical Clustering?
Hierarchical Clustering is another way to Clustering our data into different groups
What is Divisive Clustering?
The is a top-down approach, starts with all data in one cluster and split data into more based on similar traits
What is Agglomerative Clustering?
This is a bottom-up approach, starts with each data point in its own cluster, from there we group clusters together until we have one
What is DBSCAN?
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an algorithm used for clustering
How effective is DBSCAN?
this is a useful algorithm to group high density clusters together, while ignoring data points that may not fit together “called noise”
What is Dimensionality Reduction?
Dimensionality reduction refers to any technique where we try to reduce the feature space we are working with.
What are the two types of dimensionality reduction?
Feature selection and feature extraction
What is Feature selection?
Choosing only the most important features in your final model and leave out the features not important to the model
What is feature Extraction?
Feature extraction refers to techniques where you take all of your features and combine the in certain ways to reduce them into lower dimensions?
What can you use pca for?
You can use PCA to speed up a supervised machine learning even though PCA itself is an unsupervised machine learning task.
What is Feature Engineering?
Feature engineering is when you add or modify features to your data. Pca is one example of feature engineering.
How to overload in pandas?
Pandas will overload the + sign to concatenate two string columns.