Class Six Flashcards
What is outlier detection in machine learning?
Outlier detection refers to the process of identifying data points or observations that deviate significantly from the majority of the dataset. Outliers can be caused by errors, anomalies, or rare events.
What are the advantages of outlier detection?
Advantages of outlier detection include identifying data quality issues, detecting anomalies or fraudulent activities, and improving the accuracy of predictive models by removing influential outliers.
What are the limitations of outlier detection?
Limitations of outlier detection methods include the subjectivity of defining what constitutes an outlier, the potential presence of masked outliers, and the impact of outliers on the overall analysis.
What is feature selection?
Feature selection is the process of selecting a subset of relevant features from a larger set of available features in the data. It aims to improve model performance, reduce overfitting, and enhance interpretability.
What are the advantages of feature selection?
Advantages of feature selection include improved model interpretability, reduced computational complexity, increased generalization performance, and the elimination of irrelevant or redundant features.
What are the limitations of feature selection?
Limitations of feature selection include potential loss of information if relevant features are removed, the challenge of selecting the optimal subset of features, and the sensitivity to feature interactions.
What is finding similar items?
Finding similar items involves identifying items that are similar or related to a given item based on their attributes, characteristics, or usage patterns. It is commonly used in recommendation systems and search engines.
What are the advantages of finding similar items?
Advantages of finding similar items include personalized recommendations, improved user experience, identification of related products or content, and the potential for cross-selling or upselling.
What are the limitations of finding similar items?
Limitations of finding similar items include the challenge of defining similarity metrics, scalability issues with large datasets, and the potential for serendipity problems where similar items may not always be relevant or desired.
What are recommender systems?
Recommender systems are information filtering systems that predict and suggest relevant items to users based on their preferences, historical behavior, or similarities with other users.
What are the different types of recommender systems?
Content Filtering: Assumes access to side information about items
Example: Pandora
Collaborative Filtering: Does not assume access to side information about items
* Example: Netflix
* Personal tastes are correlated:
* If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y.
In summary, collaborative filtering is based on finding similarities in user or item behavior to make recommendations, while content-based filtering relies on item attributes and user preferences for those attributes. Collaborative filtering looks at user-user or item-item relationships, while content-based filtering focuses on item characteristics.
What are the two types of collaborative filtering?
Neighborhood: Find neighbors based on similarity of movie preferences.
Latent Factor: Assume that both movies and users live in some low-dimensional space describing their properties.Recommend a movie based on its proximity to the user in the latent space.
What are the advantages of recommender systems?
Advantages of recommender systems include personalized recommendations, increased user engagement, improved customer satisfaction, and potential revenue growth through cross-selling and upselling.
What are the limitations of recommender systems?
Limitations of recommender systems include the cold-start problem for new users or items, the potential for echo chamber effects or limited diversity, and the need for data privacy and ethical considerations.
Issues:
* Diversity: How different are the recommendations?
* Persistence: How long should recommendations last?
* Trust: Tell user why you made a recommendation..
* Social recommendation: What did your friends watch?
* Freshness: people tend to get more excited about new/surprising things.
How can outlier detection be performed?
Outlier detection can be performed using various techniques such as statistical methods (e.g., z-score, modified z-score), distance-based approaches (e.g., k-nearest neighbors), or machine learning algorithms (e.g., isolation forest, one-class SVM).