Data Mining Flashcards

1
Q

What is Data Mining?

A

Is the process of extracting knowledge from data. It aims to identify correlations in data, find patterns and variations, understand trends, and predict probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Patterns

A

Pattern are a variable that changes in a repeating or predictable way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Trends

A

Trends is a general change in one variable compared to another over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Mining Techniques

Data Mining Techniques

A
  • Classification.
  • Clustering.
  • Anomalies.
  • Association Rule Mining.
  • Sequential Patterns.
  • Affinity grouping.
  • Decision Trees.
  • Regression.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Commonly use software’s tools for data mining:

A
  • Spreadsheets.
  • R-Language.
  • Python.
  • IBM SPSS Statistics.
  • IBM Watson Studio.
  • SAS.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Spreadsheets (Excel and Google Sheets)

A

Are used for hosting data that has been exported from other systems, so they can be accessible, easy-to-read, and use to draw comparations between sets of data.

Excel add-ins: Data mining Client, XLMiner, and KnowledgeMiner.
GoogleSheets add-ins: Text Analysis, Text Mining, and Google Analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

R-Language packages:

A
  • tm: a framework for text mining applications within R.
  • twitteR: a framework for mining tweets.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

R-Language

A

commonly use for statistical modeling and computations by statisticians and data miners. With R Libraries we can perform data mining operations such as:
- Regression.
- Classification.
- Data Clustering.
- Association Rule Mining.
- Text Mining.
- Outlier Detection.
- Social Network Analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Python libraries:

A
  • Pandas.
  • NumPy.
  • Jupyter.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pandas

A

Any type of data format can be uploaded and organize, sort, and manipulate. We can perform:

  • Basic numerical computations such as mean, median, mode, and range.
  • Calculate statistics and make correlations between data and distribution of data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NumPy

A

is a tool for mathematical computing and data preparation, that offers a host of built-in functions and capabilities for data mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Jupyter

A

Data Scientist and Data Analysis use this tool to perform data mining and statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

IBM SPSS(Statistical Package for Social Sciences) Statistics

A

Popularly used for advanced analytics, text analytics, trend analysis, validation of assumptions, and translation of business problems into data science solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

IBM Watson Studio

A

Leverages a collection of open source tools such as Jupyter notebooks, and extends them with closed source IBM tools that make it a powerful environment for data analysis and data science.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SAS Enterprise Miner

A

Is a powerful graphical workbench that enables the capabilities for interactive data exploration, mine, transform, identify anomalies, analyze big data, identify patterns, and identity relationships within data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly