Data Mining 1 Flashcards
Series of tasks, activities, or operations to achieve a goal or an outcome
Process
Combination of hardware and software to facilitate or automate processes
Technology
Discrete measurement, fact, or observation representing a real-world process
Data
the mathematical discipline that studies the methods of collecting, analyzing, and interpreting data.
Statistics
specific collection of items of interest
Population
subset or subcollection of the population
Sample
two scopes of data
Sample & Population
Logic is built based on business rules
Traditional Rule-Based AI
Logic is built by modelling and training data
Machine Learning
Input and sometimes output data are provided to a machine which will build a logic based on mathematical rules
Machine Learning
Machine learning algorithms in which the training data includes both input and output
Supervised Machine Learning
Inputs are called
feature values
outputs are called
label values
the label predicted by the model is a numeric value
Regression
the model predicts whether a record is an instance of a specific class or category
Binary Classification
the model predicts whether a record is an instance of one of multiple classes or categories
Multiclass Classification
Training data consists only of input without any known output
Unsupervised Machine Learning
the model identifies similarities between observations based on their features and groups them into discrete clusters
Clustering
A model that groups existing customers into clusters based on age, location, gender, social media usage, and purchasing behavior.
Clustering
A model that classifies whether a social media post is positive, negative, or neutral.
Multiclass Classification
A model that predicts whether a customer will cancel their subscription.
Binary Classification
A model that predicts the price of an apartment based on the size, number of rooms, barangay, and date of building.
Regression
Used to train the model, data where the algorithm learns patterns from
Training Data
Used to evaluate the model
Test Data