Data Mining Introductions Flashcards

1
Q

is the science of extracting useful knowledge from huge data repositories.

A

Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

is an open standard process model.

A

CRISP-DM REFERENCE MODEL

(Cross Industry Standard Process for Data Mining)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

6 TASKS IN CRISP-DM REFERENCE MODEL

A
  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 DATA MINING METHODS

A
  1. Descriptive Method
  2. Predictive Method
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is a method where we find human-interpretable patterns that describe the data.

A

Descriptive Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

is a method that uses some feature (variables) to predict unknown or future value of other variable.

A

Predictive Metho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

5 DATA MINING TASKS

A
  • Clustering
  • Association Rule Discovery
  • Regression
  • Classification
  • Deviation / Anomaly Detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

is a type of data mining task that predicts value of a given continuous valued variable based on the values of other variables.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

is a type of data mining task that detects significant deviation from normal behavior.

A

Deviation / Anomaly Detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

5 CHALLENGES OF DATA MINING

A
  1. Scalability
  2. Dimensionality
  3. Complexity and Heterogenous Data
  4. Data Quality
  5. Data Ownership and Privacy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3 TYPES OF TOOLS DATA MINING

A
  1. Simple Graphical User Interface
  2. Process Oriented
  3. Programming Oriented
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 COMMON PROGRAMMING ORIENTED TOOLS

A
  • R
  • Python
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

4 INFO ABOUT DATA WAREHOUSE

A
  • Subject Oriented
  • Integrated
  • Nonvolatile
  • Time Variant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

data warehouses are designed to help you analyzed data.

A

Subject Oriented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

integrates data from disparate sources into a consistent format.

A

Integrated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

data in the data warehouse are never overwritten or deleted.

A

Nonvolatile

17
Q

maintains both historical and (nearly) current data.

A

Time Variant

18
Q

EXPLAIN EXTRACT, TRANSFORM, LOAD

A
  1. Extracting the data from outside sources
  2. Transforming data to fit analytical needs
  3. Loading data into the data warehouse.
19
Q

is a term for data sets that are so large or complex that traditional data processing application are inadequate to deal with them.

A

Big Data

20
Q

4 CHARACTERTISTICS OF BIG DATA

A
  • Variety
  • Veracity
  • Velocity
  • Volume
21
Q

is a characteristic of big data that means there are different forms of data.

A

Variety

22
Q

is a characteristic of big data that means the uncertainty of the data.

A

Veracity

23
Q

is a characteristic of big data that means the analysis of data.

A

Velocity

24
Q

is a characteristic of big data that means the scale of data.

A

Volume