Introduction To Data Mining Flashcards
4 Reasons of Using Data Mining
Existing Solutions
Complex Problems
Fluctuating Environments
Getting Insights
4 Motivating Challenge of Using Data Mining
Large-scale
High dimensional
Heterogeneous and Complex
Distributed
2 Type of Data Mining Tasks
Predictive Methods
Descriptive Methods
8 Main Challenges of Data Mining
Insufficient Quantity of Training Data
Irrelevant Features
Nonrepresentative Training Data
Poor-Quality Data - Outliers, Missing values, Noise, Errors
Overfitting the Training Data
Underfitting the Training Data
Hyperparameter Tuning and Model Selection
Testing and Validating
Labelled Data
Data that has been labelled with one or more labels that indicate specific attributes or characteristics, classes, or contained objects.
Clustering
Defining a group of data points that are similar to each other and different from other objects in another group.
4 Type of Machine Learning Systems based on amount of supervision
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
3 Feature of Online Learning
Incrementally
Sequentially
Individually or small groups (mini-batches)
2 Online Learning Usefulness
Huge datasets
Data as a continuous flow
3 Characteristic of Batch Learning
From scratch on the full dataset
Requires a lot of computing resources
Cannot adapt to rapidly changing data
3 Step of Knowledge Discovery in Database (KDD)
Preprocessing
Data Mining
Postprocessing
6 Step of Cross-Industry Standard Process for Data Mining (CRISP-DM)
Business Understanding
Data Understanding
Data Preparation
Modelling
Evaluation
Deployment