Lecture 8 Flashcards
Multivariate Data
Univariate: Analysis are made only based on one variable
Bivariate: Analysis are made based on two variables
Multivariate: Analysis are made based on more than two variables.
Point Based Techniques
Project records from an n-dimensional data space to an arbitrary k-dimension display space , such that data records map to k-dimensional point
For each record, a graphical representation or mark is drawn at its associated k-dimensional point.
This can be achieved in two ways:
Scatterplots and Scatterplot Matrices
Force based technique
Scatterplots and Scatterplot Matrices
The choice of visual analysis in scatter plots consists of:
Dimension sub setting – Allowing the user to select a subset of the dimension
Dimension reduction – Using techniques such as principal component analysis to transform the high-dimensional data to data of lower dimension
Dimension embedding – Mapping dimension to other graphical attributes besides the position such as color, size or shape
Multiple Display – Showing, either superimposed or juxtaposed, several plots each of which contains some of the dimension.
Scatterplots and Scatterplot Matrices (Continued)
Scatterplot matrix uses multiple display
This consists of a grid of scatterplots, with the grid having N-Squared cells, where N is the number of dimension.
Thus, every pair wise plot will be shown twice, differing by a 90-degree rotation.
This can be understood clearly from the scatterplot matrix on the next slide, which shows the plot of a very famous dataset – iris_dataset. It has four variable, sepal_length, sepal_width, petal_length and petal_width. This plot is used to identify three types of iris flowers – Setosa, Versicolor and Virginica
Study Scatterplots and Scatterplot Matrices Graphic
Do it
Force Based Method
The key goal for projecting high dimensional points to 2D and 3D display is to maintain the dimensional features and characteristics of data throughout projections.
This, however, is not always possible when the dimension of data is very high.
Even though, we use some force based scaling to reduce the data.
Multidimensional scaling(MDS) is one of the method to do so.
The stress, difference between the properties of original dimension and scaled dimension is also calculated at the end of this process.
Study Force Based Method Graphic
Do it Slide 8
Line Based Technique
In line-based method, points corresponding to a particular record or dimension are linked together with straight or curved lines.
These lines not only reinforce the relationship among the data values, but also convey perceivable features of the data via slopes, curvature, crossings etc.
Popular line based technique to represent multivariate data are:
Line Graphs
Parallel Coordinates
Line Graphs
A line graph is a univariate visualization technique but it can be extended to multivariate data either by superimposing or juxtaposing the visual representation of individual variable.
For a modest number of data dimension, the line plot can be drawn on a common set of axes, differentiating the dimensions using color, line style, width or other graphical attributes.
As the dimension increases, or the dimensions have significant overlap, superimposing becomes more problematic.
Study Line Graphs Graphics
Slides 11 and 12
Study Parallel Coordinates Coordinates Graphic
Slide 14
Parallel Coordinates
Used extensively for multivariate data analysis
The basic idea is that axes, rather than being orthogonal, are parallel, with evenly spaced vertical or horizontal lines representing a particular ordering of the dimension.
A data point is plotted as a polyline that crosses each axis at a position proportional to its value for that dimension.
Region Based Techniques
In region-based techniques, filled polygons are used to convey values, based on shape, size, color, or other attributes.
Heat Map
Heatmaps are created by displaying the table of record values using color rather than text.
For this visualization technique, all data values are mapped to the same normalized color space, and each is rendered as a colored square or rectangle.
Using different colors enhances the usefulness of this technique.
It has the option to reorganize rows and columns to expose features of data.