Data Mining 1 Flashcards

1
Q

Series of tasks, activities, or operations to achieve a goal or an outcome

A

Process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Combination of hardware and software to facilitate or automate processes

A

Technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete measurement, fact, or observation representing a real-world process

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the mathematical discipline that studies the methods of collecting, analyzing, and interpreting data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

specific collection of items of interest

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

subset or subcollection of the population

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

two scopes of data

A

Sample & Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logic is built based on business rules

A

Traditional Rule-Based AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logic is built by modelling and training data

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Input and sometimes output data are provided to a machine which will build a logic based on mathematical rules

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine learning algorithms in which the training data includes both input and output

A

Supervised Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inputs are called

A

feature values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

outputs are called

A

label values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the label predicted by the model is a numeric value

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

the model predicts whether a record is an instance of a specific class or category

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

the model predicts whether a record is an instance of one of multiple classes or categories

A

Multiclass Classification

17
Q

Training data consists only of input without any known output

A

Unsupervised Machine Learning

18
Q

the model identifies similarities between observations based on their features and groups them into discrete clusters

A

Clustering

19
Q

A model that groups existing customers into clusters based on age, location, gender, social media usage, and purchasing behavior.

A

Clustering

20
Q

A model that classifies whether a social media post is positive, negative, or neutral.

A

Multiclass Classification

21
Q

A model that predicts whether a customer will cancel their subscription.

A

Binary Classification

22
Q

A model that predicts the price of an apartment based on the size, number of rooms, barangay, and date of building.

A

Regression

23
Q

Used to train the model, data where the algorithm learns patterns from

A

Training Data

24
Q

Used to evaluate the model

A

Test Data

25
Q

Proportion of predictions that the model got right

A

Accuracy

26
Q

Proportion of predicted positive cases where the true label is actually positive

A

Precision

27
Q

Proportion of positive cases that the model identified correctly

A

Recall

28
Q

Overall metric combining Recall and Precision

A

F1 Score

29
Q

a lazy learning algorithm, predicts the class of a data point based on the majority class of its k nearest neighbors

A

k-NN classifier

30
Q

predicts the probability that a given data point belongs to a particular class, uses the logistic function

A

Logistic Regression

31
Q

an S-shaped curve, used to represent logistical regression

A

logistic function

32
Q

occurs when one class is significantly more frequent than the other

A

Class Imbalance

33
Q

reducing the number of instances in the majority class by removing samples until the classes are balanced.

A

Undersampling

34
Q

increasing the number of instances in the minority class by duplicating samples or generating new synthetic examples.

A

Oversampling

35
Q

Generates synthetic samples for the minority class by interpolating between existing samples

A

SMOTE (Synthetic Minority Oversampling Technique)

36
Q

Cons of Oversampling

A

Oversampling can cause overfitting, especially with random oversampling.

37
Q

Cons of Undersampling

A

Important information from the majority class may be lost, potentially underfitting the model.