Unsupervised Machine Learning Flashcards
Unlabeled Data
any data that’s not organized in an easily identifiable manner is known as unstructured/unlabeled
Goals of Unsupervised Learning
Goal is to learn about data’s underlying structure and find out how different features relate to each other.
Name 2 Methodologies of unsupervised learning
- Recommendation Systems
- K-mean models
Briefly describe a Recommendation system
Recommendation systems are a
- subclass of machine learning algorithms that
- can be both supervised or unsupervised
- offer relevant suggestions to users.
What is the goal of a recommendation system?
to quantify how similar one thing is to another, and use this information to suggest a closely related option.
What is content-based filtering?
Content-based filtering is a type of recommendation system where comparisons are made based on the attributes of the content itself.
For example, attributes of a song you played are compared to attributes of other songs to determine similarity.
What are some benefits of content-based filtering?
- The benefits include being easy to understand, recommending more of what a user likes,
- not needing other users’ information to work, and - -
- being able to map users and items in the same space to recommend things that are closest to a user’s typical preferences.
What are some drawbacks of content-based filtering?
- Always recommends more of the same
- Require manual input of attributes
- Cannot reccommend across content type
- Limited use cases
What is collaborative filtering?
Collaborative filtering is a type of recommendation system that uses the likes and dislikes of users to make recommendations.
It does not need to know anything about the content itself. All that matters is if the user liked it.
What are some benefits of collaborative filtering?
The benefits include the ability to
- recommend across content types,
- finding hidden correlations in the data, and
- not requiring tedious manual mapping.
What are some drawbacks of collaborative filtering?
Drawbacks include
- needing lots of data to even start getting useful results,
- requiring every user to give the system lots of data, and
- dealing with sparse data that has a lot of missing values.
What type of model is K-means and what does it do?
- unsupervised learning model
- partitioning algorithm,
- organize unlabeled data into clusters
What is a Centroid?
- central point of a cluster
- also known as the mathematical mean
List the 4 steps to build K-means model
- Initiate k centroids
- Assign all points to nearest centroid
- Recalculate the centroid of each cluster.
- Repeat Step 2 and 3 until the algorithm converges
What is the difference between Clustering and Partitioning Algorithms
clustering algorithms: outlying points can exist outside of the clusters.
partitioning algorithms: all points must be assigned to a cluster.
in other words, K-means does not allow unassigned outliers