Data Mining 1 Flashcards

1
Q

Series of tasks, activities, or operations to achieve a goal or an outcome

A

Process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Combination of hardware and software to facilitate or automate processes

A

Technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete measurement, fact, or observation representing a real-world process

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the mathematical discipline that studies the methods of collecting, analyzing, and interpreting data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

specific collection of items of interest

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

subset or subcollection of the population

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

two scopes of data

A

Sample & Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logic is built based on business rules

A

Traditional Rule-Based AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logic is built by modelling and training data

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Input and sometimes output data are provided to a machine which will build a logic based on mathematical rules

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine learning algorithms in which the training data includes both input and output

A

Supervised Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inputs are called

A

feature values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

outputs are called

A

label values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the label predicted by the model is a numeric value

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

the model predicts whether a record is an instance of a specific class or category

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

the model predicts whether a record is an instance of one of multiple classes or categories

A

Multiclass Classification

17
Q

Training data consists only of input without any known output

A

Unsupervised Machine Learning

18
Q

the model identifies similarities between observations based on their features and groups them into discrete clusters

A

Clustering

19
Q

A model that groups existing customers into clusters based on age, location, gender, social media usage, and purchasing behavior.

A

Clustering

20
Q

A model that classifies whether a social media post is positive, negative, or neutral.

A

Multiclass Classification

21
Q

A model that predicts whether a customer will cancel their subscription.

A

Binary Classification

22
Q

A model that predicts the price of an apartment based on the size, number of rooms, barangay, and date of building.

A

Regression

23
Q

Used to train the model, data where the algorithm learns patterns from

A

Training Data

24
Q

Used to evaluate the model

25
Proportion of predictions that the model got right
Accuracy
26
Proportion of predicted positive cases where the true label is actually positive
Precision
27
Proportion of positive cases that the model identified correctly
Recall
28
Overall metric combining Recall and Precision
F1 Score
29
a lazy learning algorithm, predicts the class of a data point based on the majority class of its k nearest neighbors
k-NN classifier
30
predicts the probability that a given data point belongs to a particular class, uses the logistic function
Logistic Regression
31
an S-shaped curve, used to represent logistical regression
logistic function
32
occurs when one class is significantly more frequent than the other
Class Imbalance
33
reducing the number of instances in the majority class by removing samples until the classes are balanced.
Undersampling
34
increasing the number of instances in the minority class by duplicating samples or generating new synthetic examples.
Oversampling
35
Generates synthetic samples for the minority class by interpolating between existing samples
SMOTE (Synthetic Minority Oversampling Technique)
36
Cons of Oversampling
Oversampling can cause overfitting, especially with random oversampling.
37
Cons of Undersampling
Important information from the majority class may be lost, potentially underfitting the model.