Unsupervised learning and model evaluation Flashcards
What is the main goal of unsupervised learning?
a) Predicting future outcomes
b) Grouping or finding patterns in data without labels
c) Testing hypotheses with known outputs
d) Optimizing supervised algorithms
b) Grouping or finding patterns in data without labels
What is cluster analysis?
a) A statistical method for finding correlations between variables
b) The process of partitioning data into subsets based on similarity
c) A supervised learning technique for predicting outcomes
d) A method for data cleaning
b) The process of partitioning data into subsets based on similarity
In what applications is clustering commonly used?
a) Fraud detection, image recognition, and customer segmentation
b) Regression tasks and time-series analysis
c) Hyperparameter tuning for machine learning models
d) Feature engineering for supervised tasks
a) Fraud detection, image recognition, and customer segmentation
Clustering is an unsupervised learning technique that groups data points into clusters based on their similarity. It is commonly used in tasks where labels are not provided, and the goal is to identify inherent patterns or groupings in the data.
What is a partitioning clustering method?
a) A hierarchical decomposition of data into clusters
b) Dividing data into a predefined number of non-overlapping clusters
c) Using density measures to find clusters of arbitrary shapes
d) Grouping based on sequential patterns in time-series data
b) Dividing data into a predefined number of non-overlapping clusters
What type of clustering method is k-means?
a) Density-based clustering
b) Grid-based clustering
c) Centroid-based partitioning
d) Hierarchical clustering
c) Centroid-based partitioning
What is the first step in the k-means clustering algorithm?
a) Calculate the distances between all data points
b) Assign data points randomly to clusters
c) Select k initial centroids from the dataset
d) Measure the density of each cluster
c) Select k initial centroids from the dataset
How does k-means determine cluster membership for a data point?
a) By assigning it to the closest centroid
b) By checking its density within a neighborhood
c) Based on predefined labels
d) Using hierarchical splitting of the dataset
a) By assigning it to the closest centroid
What is a major limitation of k-means clustering?
a) It requires labeled data for training
b) It struggles with high-dimensional data and outliers
c) It only works for binary classification problems
d) It is computationally too slow for small datasets
b) It struggles with high-dimensional data and outliers
What is frequent pattern mining?
a) Discovering associations and correlations in a dataset
b) A method to predict the next event in a sequence
c) Grouping data points into clusters
d) A supervised learning approach for regression
a) Discovering associations and correlations in a dataset
What is an example of a frequent pattern?
a) Clustering customers by demographics
b) Milk and bread frequently bought together in transactions
c) Predicting house prices based on features
d) Finding the best split in a decision tree
b) Milk and bread frequently bought together in transactions
What does “support” indicate in market basket analysis?
a) The number of items in a cluster
b) The fraction of transactions containing a specific itemset
c) The probability of an itemset occurring given another itemset
d) The number of times an item appears in the dataset
b) The fraction of transactions containing a specific itemset
What does “lift” measure in association rules?
a) The total number of transactions in the dataset
b) The strength of an association relative to its random occurrence
c) The distance between clusters
d) The time complexity of the rule-mining algorithm
b) The strength of an association relative to its random occurrence
What is association rule mining?
a) Grouping items in a dataset into clusters
b) Predicting future sales trends
c) Finding relationships between items in transactional data
d) Labeling data points for supervised learning
c) Finding relationships between items in transactional data
Which metrics are commonly used to evaluate association rules?
a) Accuracy and precision
b) Support, confidence, and lift
c) Variance and standard deviation
d) Recall and specificity
b) Support, confidence, and lift
What is the Apriori algorithm designed for?
a) Clustering high-dimensional datasets
b) Mining frequent itemsets in a dataset
c) Optimizing the parameters of a regression model
d) Predicting sequential patterns in time-series data
b) Mining frequent itemsets in a dataset