Data mining Flashcards
What is data mining?
A. Extracting minerals from the earth
B. Extracting useful information from large datasets
C. Creating new databases
D. None of the above
B
Which of the following is not a data mining task?
A. Classification
B. Clustering
C. Sorting
D. Association rule mining
C
Which technique is used for dimensionality reduction in data mining?
A. Principal Component Analysis (PCA)
B. Linear Regression
C. Support Vector Machines (SVM)
D. K-Means Clustering
A
What is association rule mining?
A. Finding patterns where one event leads to another
B. Predicting future stock prices
C. Classifying data into multiple classes
D. None of the above
A
Which algorithm is used for frequent itemset generation in association rule mining?
A. Apriori
B. Decision Tree
C. k-Nearest Neighbors (k-NN)
D. Naive Bayes
A
In clustering, which method requires predefining the number of clusters?
A. K-Means
B. Hierarchical Clustering
C. DBSCAN
D. Mean-Shift
A
What does the acronym “CRISP-DM” stand for in data mining methodology?
A. Comprehensive Regression for Intelligent Statistical Prediction in Data Mining
B. Cross-Industry Standard Process for Data Mining
C. Critical Review of Innovative Statistical Processes in Data Mining
D. None of the above
B
What is the primary goal of regression analysis in data mining?
A. Predicting categorical values
B. Predicting continuous values
C. Finding association rules
D. Classifying data
B
Which data mining technique is used for anomaly detection?
A. Classification
B. Clustering
C. Outlier detection
D. Association rule mining
C
Which evaluation metric is not used for assessing the performance of classification
models?
A. Accuracy
B. Mean Squared Error (MSE)
C. Precision
D. Recall
B
Which algorithm is a supervised learning method used for classification?
A. K-Means
B. Apriori
C. Decision Tree
D. PCA
C
What is the purpose of cross-validation in machine learning?
A. To divide data into training and testing sets
B. To reduce overfitting in models
C. To validate results using an independent dataset
D. To evaluate a model’s performance
B
Which technique is used to handle missing values in a dataset?
A. Mean/Median imputation
B. Dropping rows with missing values
C. Using the mode value
D. All of the above
D
In which phase of the data mining process are patterns and insights discovered?
A. Data cleaning
B. Data exploration
C. Pattern evaluation
D. Data modeling
C
Which algorithm is a popular ensemble learning method?
A. Random Forest
B. K-Means
C. Linear Regression
D. Gradient Descent
A
Which algorithm is a popular ensemble learning method?
A. Random Forest
B. K-Means
C. Linear Regression
D. Gradient Descent
A
What does the term “overfitting” refer to in machine learning?
A. Model performs well on unseen data
B. Model learns noise and irrelevant details from the training data
C. Model has too few parameters
D. None of the above
B
Which algorithm is used for anomaly detection in time series data?
A. DBSCAN
B. ARIMA
C. K-Nearest Neighbors (KNN)
D. AdaBoost
B
Which technique is used to handle imbalanced datasets in classification?
A. Oversampling
B. Undersampling
C. SMOTE (Synthetic Minority Over-sampling Technique)
D. All of the above
D
Which type of data mining task involves assigning predefined categories to items?
A. Classification
B. Clustering
C. Association rule mining
D. Regression
A
Which method is used to measure the similarity between two data points in
clustering?
A. Euclidean distance
B. Manhattan distance
C. Cosine similarity
D. All of the above
D
Which algorithm is a type of unsupervised learning?
A. Decision Tree
B. K-Means
C. Support Vector Machine (SVM)
D. Random Forest
B
Which technique is used for reducing the dimensionality of data while preserving
its structure?
A. Principal Component Analysis (PCA)
B. Singular Value Decomposition (SVD)
C. Linear Discriminant Analysis (LDA)
D. All of the above
D
Which algorithm is used for collaborative filtering in recommendation systems?
A. Apriori
B. K-Means
C. Singular Value Decomposition (SVD)
D. Decision Tree
C
Which evaluation metric is used for regression models?
A. Accuracy
B. F1 Score
C. Mean Absolute Error (MAE)
D. Precision
C
Which technique is used for reducing the noise in data?
A. Outlier detection
B. Normalization
C. Feature selection
D. Smoothing
D
Which algorithm is sensitive to the initialization of centroids?
A. K-Means
B. Decision Tree
C. Random Forest
D. AdaBoost
A
Which data mining task is used for discovering hidden patterns in large datasets?
A. Classification
B. Clustering
C. Regression
D. Association rule mining
B