UNIT 2-3 Flashcards

1
Q

The process of preparing data for analysis by removing or modifying incorrect, incomplete, irrelevant, duplicated, or improperly formatted data

A

Data Cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

T/F: In importing data, you may not change the type, role, and name of each attribute (variable)

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Many different string values

A

Polynomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Exactly two values

A

Binomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A fractional number

A

Real

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A whole number

A

Integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Indicator for date and time

A

date_time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Indicator for date without time

A

date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Indicator for time without date

A

time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

It is an operation in RapidMiner which has criteria and retains data depending on the given criteria

A

Filtering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Instead of filtering, you may remove all cases with missing values using the ______________, instead of add filters

A

Condition Class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

To remove “white spaces” in the encoding, use the ______ operator

A

TRIM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

It is the graphical representation of data; Techniques used to communicate insights from data through visual representation

A

Data Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: Data Visualization is used to distill large datasets into visual graphics to allow for easy understanding of complex relationships within the data and analyze massive amounts of information and make data-driven decisions

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the common visualization techniques

A

Bar graph, Line graph, Pie graph, Histogram, Scatterplot, Boxplot, Heatmap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Used to compare counts, percentages, or other measures (average) for different discrete categories of data

A

Bar graph

17
Q

In using bar graph in RapidMiner, set the group by ______ and use the _________ aggregate function

A

Stage; Average

18
Q

T/F: Further customization of the title, axes range, font, etc. may be done on your own

A

TRUE

19
Q

It is used to observe trend

A

Line graph

20
Q

It shows the relative contribution that different categories contribute to an overall total

A

Pie graph

21
Q

It is the frequency distribution of continuous attribute

A

Histogram

22
Q

T/F: Histogram presents categorical attribute while bar graph represents numerical attribute

A

FALSE

23
Q

T/F: Bar graphs have spaces between bars, while histograms do not

A

TRUE

24
Q

T/F: In histogram, check the reverse axis to keep the order of the values

A

FALSE

25
Q

It plots two numerical attributes

A

Scatterplot

26
Q

It is the graphical representation of the quartiles

A

Boxplots

27
Q

It is a graphical representation of data where the individual values contained in a matrix (map) are represented as colors

A

Heatmap