UNIT 2-3 Flashcards
The process of preparing data for analysis by removing or modifying incorrect, incomplete, irrelevant, duplicated, or improperly formatted data
Data Cleaning
T/F: In importing data, you may not change the type, role, and name of each attribute (variable)
FALSE
Many different string values
Polynomial
Exactly two values
Binomial
A fractional number
Real
A whole number
Integer
Indicator for date and time
date_time
Indicator for date without time
date
Indicator for time without date
time
It is an operation in RapidMiner which has criteria and retains data depending on the given criteria
Filtering
Instead of filtering, you may remove all cases with missing values using the ______________, instead of add filters
Condition Class
To remove “white spaces” in the encoding, use the ______ operator
TRIM
It is the graphical representation of data; Techniques used to communicate insights from data through visual representation
Data Visualization
T/F: Data Visualization is used to distill large datasets into visual graphics to allow for easy understanding of complex relationships within the data and analyze massive amounts of information and make data-driven decisions
TRUE
What are the common visualization techniques
Bar graph, Line graph, Pie graph, Histogram, Scatterplot, Boxplot, Heatmap
Used to compare counts, percentages, or other measures (average) for different discrete categories of data
Bar graph
In using bar graph in RapidMiner, set the group by ______ and use the _________ aggregate function
Stage; Average
T/F: Further customization of the title, axes range, font, etc. may be done on your own
TRUE
It is used to observe trend
Line graph
It shows the relative contribution that different categories contribute to an overall total
Pie graph
It is the frequency distribution of continuous attribute
Histogram
T/F: Histogram presents categorical attribute while bar graph represents numerical attribute
FALSE
T/F: Bar graphs have spaces between bars, while histograms do not
TRUE
T/F: In histogram, check the reverse axis to keep the order of the values
FALSE
It plots two numerical attributes
Scatterplot
It is the graphical representation of the quartiles
Boxplots
It is a graphical representation of data where the individual values contained in a matrix (map) are represented as colors
Heatmap