Exploring Data and R Flashcards
a preliminary exploration of the data to better understand its characteristic.
Data Exploration
are numbers that summarize properties of the data.
Summary Statistics
is the percentage of time the value occurs.
Frequency
is the most frequent attribute value.
Mode
2 MEASURES OF LOCATION
- Mean
- Median
is the most common measure of the location of a set of points.
Mean
alternative of mean since it is very sensitive to outlier.
Median
2 WAYS TO MEASURE SPREAD
- Range
- Variance of Standard Deviation
is the difference between max and min.
Range
is the most common measure of the spread of a set of points.
Variance of Standard Deviation
is the conversion of data into a visual or tabular format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported.
Visualization
12 VISUALIATION TECHNIQUES / METHODS
- Representation
- Arrangement
- Selection
- Histogram
- Box Plots
- Two Dimensional Histograms
- Scatter Plots
- Contour Plots
- Matrix Plots
- Parallel Coordinates
- Star Plot
- Chernoff Faces
is a visualization technique which is the mapping of information to a visual format.
Representation
is the placement of visual elements within a display.
Arrangement
is the elimination or the deemphasis of certain objects and attributes.
Selection
usually shows the distribution of values of a single variable.
Histogram