Data mining and supervised learning Flashcards
What is CRISP-DM?
a) A software for data mining
b) A standard methodology for conducting data mining projects
c) A programming language for machine learning
d) A tool for data visualization
b) A standard methodology for conducting data mining projects
What are the six phases of the CRISP-DM process?
a) Data Cleaning, Analysis, Visualization, Modelling, Evaluation, Presentation
b) Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, Deployment
c) Problem Identification, Data Gathering, Feature Selection, Modelling, Testing, Reporting
d) Data Collection, Cleaning, Modelling, Testing, Evaluation, Delivery
b) Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, Deployment
Which phase of CRISP-DM focuses on understanding project objectives from a business perspective?
a) Data Preparation
b) Business Understanding
c) Modelling
d) Evaluation
b) Business Understanding
What is the main goal of the Deployment phase in CRISP-DM?
a) To build machine learning models
b) To ensure the results of data mining are used in decision-making
c) To clean and preprocess data
d) To explore data relationship
b) To ensure the results of data mining are used in decision-making
What is machine learning?
a) A process of explicitly programming computers to solve problems
b) A field of study where computers learn from data without being explicitly programmed
c) A visualization method for large datasets
d) A data cleaning tool
b) A field of study where computers learn from data without being explicitly programmed
What is an example of machine learning application?
a) Predicting stock market trends
b) Cleaning unstructured data
c) Managing relational databases
d) Archiving data
a) Predicting stock market trends
What is supervised learning?
a) A technique to cluster unlabeled data
b) A learning process using labeled data to predict outcomes for unseen data
c) A statistical method for creating decision trees
d) A method for dimensionality reduction in datasets
b) A learning process using labeled data to predict outcomes for unseen data
What type of data does supervised learning require?
a) Only categorical data
b) Data without labels
c) Data with both features and labels
d) Data with missing values
c) Data with both features and labels
What is unsupervised learning?
a) A method that uses labeled data to make predictions
b) An approach that analyzes and clusters unlabeled data
c) A process for supervised classification tasks
d) A technique to preprocess data for supervised models
b) An approach that analyzes and clusters unlabeled data
Which of these is an example of unsupervised learning?
a) Predicting house prices based on past data
b) Clustering customers based on purchase behavior
c) Sentiment analysis using labeled data
d) Fraud detection with supervised models
b) Clustering customers based on purchase behavior
What are features in machine learning?
a) The rows in a dataset
b) Attributes or input variables describing each observation
c) The process of splitting data
d) The final predictions of a model
b) Attributes or input variables describing each observation
What is a label in machine learning?
a) A categorical or continuous value that an observation is meant to predict
b) An algorithm used for training
c) A type of data preprocessing method
d) The summary statistic of a dataset
a) A categorical or continuous value that an observation is meant to predict
What is the purpose of cross-validation in machine learning?
a) To find the best features in a dataset
b) To validate a model’s performance on unseen data
c) To split data into training and testing sets
d) To preprocess the raw data
b) To validate a model’s performance on unseen data
What is a model in machine learning?
a) A tool for visualizing data
b) A representation of learned patterns used to make predictions
c) A data cleaning technique
d) A preprocessing step for supervised learning
b) A representation of learned patterns used to make predictions
What are the two main types of supervised learning tasks?
a) Clustering and Regression
b) Classification and Regression
c) Clustering and Dimensionality Reduction
d) Regression and Association Rule Learning
b) Classification and Regression