Descriptive analytics Flashcards

1
Q

The term about questioning if we have confidence and belief in the data source is called ___________

A

Data source reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The term about questioning if we have the right data for the job is called ______________

A

Data source accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The term about questioning if we can easily get to the data is called _____________

A

Data accecibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The term about questioning if the data source is secured to only access the data to those who are allowed to consult the data is called ___________

A

Data security

Data privacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

__________ means all the requested data elements are included in the data set

A

Data richness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

________ means the data is accurately collected and combined. It dminished the possibility that two records get mixed up during a data merge.

A

Data consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

_________ or ___________ means the data is as up to data as needed

A

data currency or data timeliness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

__________ means that the data is at the lowest level of detail as intended for use of the data

A

Data granularity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data ______ is the term used to describe a mismatch between the actual and expected value of a variable

A

validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

___________ means that the data in the data set are all relevant for the study

A

Data relevancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data is a collecion of _______ usually obtained by e_____, o______, transcations or e_______

A

facts, experiments, observations, experiences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data is the lowest/highest level of abstraction from which information is derived

A

lowest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

structured data is what data mining techniques use and can be classified as _________ or _______

A

categorical or numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

categorical data is:

and can be devided in _______ and ______

A

Categorical = labels of classes used to devide a variable into specific groups: education level, race, gender, etc

nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

nominal classification is ____________

A

simple codes assigned to objects as labels. Marital status = 1,2 or 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ordinal classification is ____________

A

assigning codes to objects as labels that ALSO represent RANK order

17
Q

what is the difference between ‘numeric data’ and ‘ratio data’

A

ratio data has values that can be compare to a non-arbitrary zero point: weight,angle, energy, temperature, velocity, etc

18
Q

Neural networks, support vector machines and logistic regression expects a certain form of data. Which is that?

A

Numeric data

19
Q

A _____ variable had infinite value range

A

continuous

20
Q

A discrete variable had a ________ value range

A

finite countable

21
Q

missing values in a collected data set due to an anomaly need to be _______ or ________

A

imputed (most probable value) or ignored

22
Q

Reasons for missing values is data is : ________ or ________

A

anomaly or intended

23
Q

Noisy data (outliers) should be

A

smoothed out

24
Q

Sometimes data of a variable is ______ between a certain minimum and maximum the data to ______ the potential _____

A

normalized

mitigate the potential bias

25
Q

What are some transformation tasks?

A

normalization
discretization
aggregation
convert numerical data to a categorical value
reduce nominal variables amount for a variable
reduce complexite: blood match 1 or - instead of blood groups

26
Q

The final step in transformation of data is called ________

A

data reduction

27
Q

In ‘predictive analysis’ and ‘data mining’ data sets have different dimensions that describe the phenomenon, when that data set needs to be reduced it is called __________(or _________)

A

dimensional reduction (or variable selection)

28
Q

Data reduction can not only be managed by reducing variables (columns) but also by __________ also called ___________

A

reducing records, sampling

29
Q

In a skewed data set is has been shown that ______ the represented classes and __________ the less represented samples is producing better prediction models than unbalanced ones

A

undersampling, oversampling