1. Data Flashcards

Question

What is the process of dimensionality reduction?

Answer 1

Automatically detecting a relationship between multiple attributes and compressing them into fewer attributes

Answer 2

Selection involves making choices about which attributes to keep and which to discard. Reduction involves combining attributes so that we deal with fewer of them, but that the information in each original attribute is still present (in a compressed form)

Answer 3

The conversion of numerical (continuous) data into categorical (discrete) data. Continuous data is binned to become discrete data

Answer 4

Yes, because the only thing ordinal data is lacking in order to be interval data is information on the difference between the units the data is expressed in. With that info comes the ability to calculate mean and do operations like + and -

Answer 5

A characteristic or descriptor of data

Answer 6

Height, weight, age, eye color

Answer 7

The same thing as discretization. The process of turning continuous data into discrete categories (bins)

Answer 8

How frequently a discrete datum (or bin) appears in a data set

Answer 9

Frequency measures how many times a particular datum appears in a data set, where mode simply identifies the datum with the greatest frequency in that set

Answer 10

The average of the squares of the deviations of the data values from the mean.

Answer 11

S^2 = (x1-mean)^2 +...+(xn-mean)^2 / (n-1)

Answer 12

Take the square root of the variance

Answer 13

A measure of how changes to one dimension are associated with changes in a second dimension. Covariance measures the degree to which two variables are linearly associated.

Answer 14

Converting data into visual format (because human brains are pretty good at pattern recognition) Examples: - Histogram - Two-dimensional histogram - Box plots - Scatter plots - Correlation matrix

Answer 15

It aids in visualization, reduces data noise, makes it easier to do analysis, and still represents the data well with only minimal loss of information

Answer 16

They are combinations of observed dimensions (post-reduction) Observed data are then described in terms of these factors instead of the original dimensions.

Answer 17

Principle Component Analysis

Answer 18

It reduces the high dimensionality of big data sets to fewer dimensions that are easier for humans to comprehend and visualize. The variation (signal) in a data set can be seen as representing the information that we would like to keep. PCA reduces the dimensionality of data by creating new, artificial variables called principal components (linear combinations for the original variables) while still keeping as much variation as possible.

Answer 19

It means that no information about dimensions is used in the dimension reduction. PCA shows a visual representation of the dominant patterns in a data set

Answer 20

The two dimensions must be highly correlated or dependent so that they essentially tell us about the same underlying variance in the data. This way when they are compressed, little info on the original data is lost.

Answer 21

Combine related dimensions and focus on uncorrelated/independent dimensions (especially those along which the data have high variance)

Answer 22

We want a smaller set of dimensions that explain most of the variance in the original data, in more compact and insightful form

Answer 23

90 degrees, or uncorrelated. Changes to one will not affect the other

Answer 24

1. Let Xbar be the mean vector 2. Adjust the original data by the mean using x'=x-Xbar 3. Compute the covariance matrix S of adjusted x'. S=1/nXX^T 4. Find the eigenvectors and eigenvalues of S. Sa=lamda*a

Answer 25

The variance on a component

1. Data Flashcards

(49 cards)