1. From Data Analysis to Data Mining Flashcards

1
Q

What is the first step in the data mining process?

A

Data Collection -Gathering raw data from various sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of Data Cleaning?

A

To remove noise and correct inconsistencies in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does Data Integration involve?

A

Combining multiple data sources into a cohesive dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Data Selection?

A

Retrieving relevant data for analysis from databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the goal of Data Transformation?

A

To modify and consolidate data into appropriate formats for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Data Mining?

A

Applying intelligent methods to extract patterns from the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Pattern Evaluation?

A

Identifying significant patterns that represent knowledge based on specific measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does Knowledge Presentation involve?

A

Using visualization and representation techniques to present the mined knowledge to users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the iterative nature of the data mining process?

A

It often requires revisiting previous steps for refinement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is domain knowledge important in data mining?

A

It helps in understanding the context and relevance of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What role does feature selection play in data mining?

A

It identifies the most relevant variables for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the significance of data visualization in data mining?

A

It helps in interpreting complex data patterns and results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does data mining contribute to decision-making?

A

By providing insights and patterns that inform strategic choices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the importance of model validation in data mining?

A

To ensure the accuracy and reliability of the extracted patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the final step in the data mining process?

A

Knowledge representation and communication of results to stakeholders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an interesting pattern?

A

A pattern is interesting if:
1. it is easily understood by humans;
2. valid on new or test data with some degree of certainty;
3. potential useful;
4. novel

or if it validates an hypothesis the user sought to confirm

An interesting pattern represents knowledge.

17
Q

Identify the Data Science’s Steps

A
  1. Data Collection
  2. Data Cleaning
  3. Data Integration
  4. Data Selection data relevant to the analysis
  5. Data Transformation
  6. Data Mining
  7. Pattern evaluation
  8. Knowledge presentation
18
Q

Name the main data mining tasks.

A

1- Anomaly Detection
2.Association Rule
3.Clustering
4.Classification
5.Regression
6. Summarization

19
Q

The quality of the data source is based on

A
  1. Completeness
    2.Correctness
  2. How relevant it is to the problem being solved
20
Q

Identify some Python Machine learning packages for data mining.

A
  1. scikit-learn
  2. kedro
  3. TensorFlow
  4. keras
  5. PyTorch
21
Q

Correctness will depend on:

A

-Validity
-Accuracy
-Consistency
- Uniformity

22
Q

Transformations can be made through

A

Normalizing and Scaling
Encoding Categorical Variables
Discretization
Domain related transformations

23
Q

What to do if there are missing values?

A
  1. Do nothing
  2. Drop the missing values
    * Replace them by other values (mean, median, mode, interpolation)-> known as imputation.