Data Flashcards
The proces of processing data follows the follwing order: 1- \_\_\_ 2- \_\_\_ 3- \_\_\_ 4- \_\_\_ 5- \_\_\_
1- Input Data 2- Preprocess 3- Data Mining 4- Post Processing 5- Knowledge
Classification is the problem of identifying to wich of a set of ___ a new ___ belongs
classes (categories, labels)
observation (input)
Clustering is categorization in the absense of ___
It finds ___ in the data that share ___
labels
groups
similar characteristics
Forecasting is the process of making ___ of the ___ based on ___ and ___ data and most commonly by ___
predictions
future
past and present
analysis of trends
Optimization is the process of finding the ___ among ___
best solution
all possible solutions
Heuristic optimization is the process of finding a ___ in a resonable ___
near optimal solution
time frame
Data Types vary from
1- ___ - categories, states, or “names of things”
2- ___ - atribute with only two states
3- ___ - Values have a meaningful order
4- ___ - Quantity / Interval / Ratio
5- ___ Attributes - finite or countably infinite set of values
6- ___ Attrbitues -real numbers as attribute balue
1- Nominal 2- Binary/Boolean 3- Ordinal 4- Numerical 5- Discrete 6- Continuous
To measure central tendency of data we can use:
1- ___ - that can either be weigthed arithmetic or trimmed
2- ___ - estimated by interpolation
3- ___ - value that occurs most frequently in the data
1- mean
2- median
3- mode
triple M
To measure the dispersion of data we can use:
1- ___, ___ and ___
2- ___ and ___
1- quartiles, outliers and boxplots
2- Variance and standard deviation
The stpes of data preprocessing are:
1- Data ___ - fill missing values, smooth noisy data and identify or remove outliers
2- Data ___ - with multiple datasets
3- Data ___ - data compression and dimesionality reduction
4- Data ___ and data ___ - normalization, aggregation and discretization
1- cleaning
2- integration
3- reduction
4- transformation and discretization
In the Data Cleaning step, to handle missing data we can:
1- ___
2- Fill in ___
3- Fill in ___
1- Ignore it
2- Fill manually
3- Fill automatically
In the Data Cleaning step, to handle noisy data we can use:
1- ___ - sort data and partition it
2- ___ - smooth by fitting the data into functions
3- ___ - detect and remove outliers
4- combined ___ and ___ inspection - detect suspicious values and check by human
1- binning
2- regression
3- clustering
4- computer and human
The Data Reduction porpouse is to obtain a ___ representation of the data set that is much ___ in volume but produces the same (or almost the same) ___
reduced
smaller
analytical results
Some Data Reductions strategies are:
1- ___ Reduction
2- ___ Reduction
3- Data ___
1- Dimensionality Reduction
2- Numerosity Reduction
3- Data Compression
DND
The Data Transformation porpouse is to ___ the entire set of values of a given ___ to a new set of ___
map
attribute
replacement values