Introduction & Collaborative Filtering Flashcards
What is data mining?
Analyzing large datasets to discover:
- patterns and insights,
- enabling data driven decision making.
What are the 3 components of mine data?
- Rapid growth,
- technology advancement and
- competitive advantage
What is rapid growth of data?
Volume of data generated increasing exp. Due to:
- transactions,
- media, -IoT and
- cloud
What is the technology advancements? (3)
- Modern data storage,
- processing,
- large scale data analysis
What is competitive advantage? (3)
- Uncover trends,
- optimize operations,
- strategic advantage
What is the goal of supervised learning?
Predict a single variable where the target value is known
What are two methods in supervised learning?
Classification and regression(prediction)
What are the 2 goals of unsupervised learning?
- Segment data into groups;
- detect patterns where the target variable is unknown
What are four methods of unsupervised learning?
- Association rules & recommendation systems
- Cluster analysis
- Data & dimension reduction
- Data exploration/visualization
What are the 7 steps in data mining?
- Define business purpose
- Obtain data (random sampling)
- Explore, clean, pre-process (reduce data)
- Specify task and choose technique
- Iterative implementation and tuning
- Assess and compare results
- Deploy solution
What is collaborative filtering?
Technique to make predictions/recommendations by leveraging:
- preferences,
- behaviors, or
- interactions of groups/users
How does collaborative filtering operate?
Individuals with similar preferences in the past are likely to share preference in the future
What are three examples of real world applications recommendation systems?
- e-commerce platforms,
- streaming services, and
- social networks
What is association rules mining?
Focuses on discovering relationships or patterns between items in transactional data
What does collaborative filtering aim to provide? and how?
Aims to provide personalized recommendations by
leveraging user interactions and similarities
What are the two ways to measure similarity?
- Pearson correlation and
- cosine similarity
What are the ranges when using pearson correlation?
-1 (perfect negative) to 1 (perfect positive)
What are the ranges when using cosine similarity?
0 (no similarity) to 1 (perfect similarity)
Why isnt collaborative filtering not be used to create recommendations for new users or new items?
Suffers from cold start
What are the advantages of the clustering alternative?
Move large computations and faster/cheaper
What are the disadvantages of the clustering alternative?
Accuracy in recommendations
What is item-based alternative?
Finding items that were co-rated by KNN user(s) with:
item of interest &
recommend the most popular items among the similar items
What is user-based alternative?
Recommends items by identifying users with similar preferences