Question 1

What is data mining? A. Extracting minerals from the earth B. Extracting useful information from large datasets C. Creating new databases D. None of the above

Accepted Answer

B

Question 2

Which of the following is not a data mining task? A. Classification B. Clustering C. Sorting D. Association rule mining

Accepted Answer

C

Question 3

Which technique is used for dimensionality reduction in data mining? A. Principal Component Analysis (PCA) B. Linear Regression C. Support Vector Machines (SVM) D. K-Means Clustering

Accepted Answer

A

Question 4

What is association rule mining? A. Finding patterns where one event leads to another B. Predicting future stock prices C. Classifying data into multiple classes D. None of the above

Accepted Answer

A

Question 5

Which algorithm is used for frequent itemset generation in association rule mining? A. Apriori B. Decision Tree C. k-Nearest Neighbors (k-NN) D. Naive Bayes

Accepted Answer

A

Question 6

In clustering, which method requires predefining the number of clusters? A. K-Means B. Hierarchical Clustering C. DBSCAN D. Mean-Shift

Accepted Answer

A

Question 7

What does the acronym "CRISP-DM" stand for in data mining methodology? A. Comprehensive Regression for Intelligent Statistical Prediction in Data Mining B. Cross-Industry Standard Process for Data Mining C. Critical Review of Innovative Statistical Processes in Data Mining D. None of the above

Accepted Answer

B

Question 8

What is the primary goal of regression analysis in data mining? A. Predicting categorical values B. Predicting continuous values C. Finding association rules D. Classifying data

Accepted Answer

B

Question 9

Which data mining technique is used for anomaly detection? A. Classification B. Clustering C. Outlier detection D. Association rule mining

Accepted Answer

C

Question 10

Which evaluation metric is not used for assessing the performance of classification models? A. Accuracy B. Mean Squared Error (MSE) C. Precision D. Recall

Accepted Answer

B

Question 11

Which algorithm is a supervised learning method used for classification? A. K-Means B. Apriori C. Decision Tree D. PCA

Accepted Answer

C

Question 12

What is the purpose of cross-validation in machine learning?

A. To divide data into training and testing sets
B. To reduce overfitting in models
C. To validate results using an independent dataset
D. To evaluate a model’s performance

Accepted Answer

B

Question 13

Which technique is used to handle missing values in a dataset? A. Mean/Median imputation B. Dropping rows with missing values C. Using the mode value D. All of the above

Accepted Answer

D

Question 14

In which phase of the data mining process are patterns and insights discovered? A. Data cleaning B. Data exploration C. Pattern evaluation D. Data modeling

Accepted Answer

C

Question 15

Which algorithm is a popular ensemble learning method? A. Random Forest B. K-Means C. Linear Regression D. Gradient Descent

Accepted Answer

A

Question 16

Which algorithm is a popular ensemble learning method? A. Random Forest B. K-Means C. Linear Regression D. Gradient Descent

Accepted Answer

A

Question 17

What does the term "overfitting" refer to in machine learning? A. Model performs well on unseen data B. Model learns noise and irrelevant details from the training data C. Model has too few parameters D. None of the above

Accepted Answer

B

Question 18

Which algorithm is used for anomaly detection in time series data? A. DBSCAN B. ARIMA C. K-Nearest Neighbors (KNN) D. AdaBoost

Accepted Answer

B

Question 19

Which technique is used to handle imbalanced datasets in classification? A. Oversampling B. Undersampling C. SMOTE (Synthetic Minority Over-sampling Technique) D. All of the above

Accepted Answer

D

Question 20

Which type of data mining task involves assigning predefined categories to items? A. Classification B. Clustering C. Association rule mining D. Regression

Accepted Answer

A

Question 21

Which method is used to measure the similarity between two data points in clustering? A. Euclidean distance B. Manhattan distance C. Cosine similarity D. All of the above

Accepted Answer

D

Question 22

Which algorithm is a type of unsupervised learning? A. Decision Tree B. K-Means C. Support Vector Machine (SVM) D. Random Forest

Accepted Answer

B

Question 23

Which technique is used for reducing the dimensionality of data while preserving
its structure?

A. Principal Component Analysis (PCA)
B. Singular Value Decomposition (SVD)
C. Linear Discriminant Analysis (LDA)
D. All of the above

Accepted Answer

D

Question 24

Which algorithm is used for collaborative filtering in recommendation systems? A. Apriori B. K-Means C. Singular Value Decomposition (SVD) D. Decision Tree

Accepted Answer

C

Question 25

Which evaluation metric is used for regression models? A. Accuracy B. F1 Score C. Mean Absolute Error (MAE) D. Precision

Accepted Answer

C

Question 26

Which technique is used for reducing the noise in data? A. Outlier detection B. Normalization C. Feature selection D. Smoothing

Accepted Answer

D

Question 27

Which algorithm is sensitive to the initialization of centroids? A. K-Means B. Decision Tree C. Random Forest D. AdaBoost

Accepted Answer

A

Question 28

Which data mining task is used for discovering hidden patterns in large datasets? A. Classification B. Clustering C. Regression D. Association rule mining

Accepted Answer

B

Question 29

Which technique is used for text mining to represent words as numerical vectors? A. One-Hot Encoding B. Bag-of-Words C. TF-IDF D. All of the above

Accepted Answer

D

Data mining Flashcards