Data Flashcards

Question 1

Q

The proces of processing data follows the follwing order:
1- \_\_\_
2- \_\_\_
3- \_\_\_
4- \_\_\_
5- \_\_\_

Answer

A

1- Input Data
2- Preprocess
3- Data Mining
4- Post Processing
5- Knowledge

Question 2

Q

Classification is the problem of identifying to wich of a set of ___ a new ___ belongs

Answer

A

classes (categories, labels)

observation (input)

Question 3

Q

Clustering is categorization in the absense of ___

It finds ___ in the data that share ___

Answer

A

labels
groups
similar characteristics

Question 4

Q

Forecasting is the process of making ___ of the ___ based on ___ and ___ data and most commonly by ___

Answer

A

predictions
future
past and present
analysis of trends

Question 5

Q

Optimization is the process of finding the ___ among ___

Answer

A

best solution

all possible solutions

Question 6

Q

Heuristic optimization is the process of finding a ___ in a resonable ___

Answer

A

near optimal solution

time frame

Question 7

Q

Data Types vary from
1- ___ - categories, states, or “names of things”
2- ___ - atribute with only two states
3- ___ - Values have a meaningful order
4- ___ - Quantity / Interval / Ratio
5- ___ Attributes - finite or countably infinite set of values
6- ___ Attrbitues -real numbers as attribute balue

Answer

A

1- Nominal
2- Binary/Boolean
3- Ordinal
4- Numerical
5- Discrete
6- Continuous

Question 8

Q

To measure central tendency of data we can use:
1- ___ - that can either be weigthed arithmetic or trimmed
2- ___ - estimated by interpolation
3- ___ - value that occurs most frequently in the data

Answer

A

1- mean
2- median
3- mode

triple M

Question 9

Q

To measure the dispersion of data we can use:
1- ___, ___ and ___
2- ___ and ___

Answer

A

1- quartiles, outliers and boxplots

2- Variance and standard deviation

Question 10

Q

The stpes of data preprocessing are:
1- Data ___ - fill missing values, smooth noisy data and identify or remove outliers
2- Data ___ - with multiple datasets
3- Data ___ - data compression and dimesionality reduction
4- Data ___ and data ___ - normalization, aggregation and discretization

Answer

A

1- cleaning
2- integration
3- reduction
4- transformation and discretization

Question 11

Q

In the Data Cleaning step, to handle missing data we can:
1- ___
2- Fill in ___
3- Fill in ___

Answer

A

1- Ignore it
2- Fill manually
3- Fill automatically

Question 12

Q

In the Data Cleaning step, to handle noisy data we can use:
1- ___ - sort data and partition it
2- ___ - smooth by fitting the data into functions
3- ___ - detect and remove outliers
4- combined ___ and ___ inspection - detect suspicious values and check by human

Answer

A

1- binning
2- regression
3- clustering
4- computer and human

Question 13

Q

The Data Reduction porpouse is to obtain a ___ representation of the data set that is much ___ in volume but produces the same (or almost the same) ___

Answer

A

reduced
smaller
analytical results

Question 14

Q

Some Data Reductions strategies are:
1- ___ Reduction
2- ___ Reduction
3- Data ___

Answer

A

1- Dimensionality Reduction
2- Numerosity Reduction
3- Data Compression

DND

Question 15

Q

The Data Transformation porpouse is to ___ the entire set of values of a given ___ to a new set of ___

Answer

A

map
attribute
replacement values

Question 16

Q

Some Data Transformations strategies are:
1- ___ - scale values to fall within smaller, specified range
2- ___- divide the range of continuous attribute into intervals
3- ___ - values of multiple objects are grouped toghether to form a single summary value

Answer

Study These Flashcards

A

1- Normalization
2- Discretization
3- Aggregation

Question 17

Q

Similarity is the numerical measure of how ___ two data objects are while Dissimilarity is the numerical measure of hoe ___ two data objects are

Answer

Study These Flashcards

A

alike

different

Question 18

Q

An outlier is a data object that ___ significantly from the ___ objects as if it were generated by a ___

Answer

Study These Flashcards

A

deviates
normal
different mechanism

Question 19

Q

Some Discretization Methods are:
1- \_\_\_
2- \_\_\_ analysis
3- \_\_\_ analysis
4- \_\_\_ / \_\_\_ analysis
5- \_\_\_ analysis

Answer

Study These Flashcards

A

1- Binning
2- Histogram analysis
3- Clustering analysis
4- Decision-tree / classification analysis
5- Correlation analysis

Question 20

Q

The Five number summary corresponds to the ___, ___, ___ quartille, ___ quartille and ___ of a distribution

Answer

Study These Flashcards

A

minimum
maximum
lower
upper
median

Question 21

Q

The Five number summary are usefull to verify where the data is ___

Answer

Study These Flashcards

A

concentrated

Question 22

Q

Histograms show what ___ of cases fall

into each of several ___

Answer

Study These Flashcards

A

proportion

Data Flashcards

(24 cards)