Class Six Flashcards

Question 1

Q

What is outlier detection in machine learning?

Answer

A

Outlier detection refers to the process of identifying data points or observations that deviate significantly from the majority of the dataset. Outliers can be caused by errors, anomalies, or rare events.

Question 2

Q

What are the advantages of outlier detection?

Answer

A

Advantages of outlier detection include identifying data quality issues, detecting anomalies or fraudulent activities, and improving the accuracy of predictive models by removing influential outliers.

Question 3

Q

What are the limitations of outlier detection?

Answer

A

Limitations of outlier detection methods include the subjectivity of defining what constitutes an outlier, the potential presence of masked outliers, and the impact of outliers on the overall analysis.

Question 4

Q

What is feature selection?

Answer

A

Feature selection is the process of selecting a subset of relevant features from a larger set of available features in the data. It aims to improve model performance, reduce overfitting, and enhance interpretability.

Question 5

Q

What are the advantages of feature selection?

Answer

A

Advantages of feature selection include improved model interpretability, reduced computational complexity, increased generalization performance, and the elimination of irrelevant or redundant features.

Question 6

Q

What are the limitations of feature selection?

Answer

A

Limitations of feature selection include potential loss of information if relevant features are removed, the challenge of selecting the optimal subset of features, and the sensitivity to feature interactions.

Question 7

Q

What is finding similar items?

Answer

A

Finding similar items involves identifying items that are similar or related to a given item based on their attributes, characteristics, or usage patterns. It is commonly used in recommendation systems and search engines.

Question 8

Q

What are the advantages of finding similar items?

Answer

A

Advantages of finding similar items include personalized recommendations, improved user experience, identification of related products or content, and the potential for cross-selling or upselling.

Question 9

Q

What are the limitations of finding similar items?

Answer

A

Limitations of finding similar items include the challenge of defining similarity metrics, scalability issues with large datasets, and the potential for serendipity problems where similar items may not always be relevant or desired.

Question 10

Q

What are recommender systems?

Answer

A

Recommender systems are information filtering systems that predict and suggest relevant items to users based on their preferences, historical behavior, or similarities with other users.

Question 11

Q

What are the different types of recommender systems?

Answer

A

Content Filtering: Assumes access to side information about items
Example: Pandora

Collaborative Filtering: Does not assume access to side information about items
* Example: Netflix
* Personal tastes are correlated:
* If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y.

In summary, collaborative filtering is based on finding similarities in user or item behavior to make recommendations, while content-based filtering relies on item attributes and user preferences for those attributes. Collaborative filtering looks at user-user or item-item relationships, while content-based filtering focuses on item characteristics.

Question 12

Q

What are the two types of collaborative filtering?

Answer

A

Neighborhood: Find neighbors based on similarity of movie preferences.
Latent Factor: Assume that both movies and users live in some low-dimensional space describing their properties.Recommend a movie based on its proximity to the user in the latent space.

Question 13

Q

What are the advantages of recommender systems?

Answer

A

Advantages of recommender systems include personalized recommendations, increased user engagement, improved customer satisfaction, and potential revenue growth through cross-selling and upselling.

Question 14

Q

What are the limitations of recommender systems?

Answer

A

Limitations of recommender systems include the cold-start problem for new users or items, the potential for echo chamber effects or limited diversity, and the need for data privacy and ethical considerations.

Issues:
* Diversity: How different are the recommendations?
* Persistence: How long should recommendations last?
* Trust: Tell user why you made a recommendation..
* Social recommendation: What did your friends watch?
* Freshness: people tend to get more excited about new/surprising things.

Question 15

Q

How can outlier detection be performed?

Answer

A

Outlier detection can be performed using various techniques such as statistical methods (e.g., z-score, modified z-score), distance-based approaches (e.g., k-nearest neighbors), or machine learning algorithms (e.g., isolation forest, one-class SVM).

Question 16

Q

How can feature selection be done?

Answer

Study These Flashcards

A

Feature selection can be done using methods like filter methods (e.g., correlation, chi-square test), wrapper methods (e.g., recursive feature elimination), or embedded methods (e.g., Lasso regression, decision tree-based feature importance).

Question 17

Q

How can recommender systems be implemented?

Answer

Study These Flashcards

A

Recommender systems can be implemented using techniques like collaborative filtering (user-based or item-based), content-based filtering, hybrid methods combining multiple approaches, or more advanced techniques like matrix factorization and deep learning.

Question 18

Q

What are L1 and L2 penalties?

Answer

Study These Flashcards

A

L1 penalty (Lasso regularization): L1 penalty refers to the use of the absolute values of the coefficients as a regularization term in the loss function. Lasso regularization encourages sparsity by driving many coefficients to exactly zero, effectively performing feature selection. It is particularly useful when dealing with high-dimensional datasets and can lead to models that have a subset of important features.

L2 penalty (Ridge regularization): L2 penalty refers to the use of the squared magnitudes of the coefficients as a regularization term in the loss function. Ridge regularization helps control overfitting by penalizing large coefficient values and encourages the model to distribute the weight among all features rather than emphasizing a few. It tends to produce models with smaller but non-zero coefficients.

Both L1 (Lasso) and L2 (Ridge) regularization methods are widely used for controlling model complexity and preventing overfitting in machine learning. They have different effects on the resulting models, with L1 regularization promoting sparsity and feature selection, while L2 regularization encourages a more even distribution of weights across features.

Class Six Flashcards

(18 cards)