EDA Flashcards

Question

Interquartile range

Answer 1

Difference between the 75th percentile and the 25th percentile IQR

Answer 2

N-1 in denominator instead of n If you use n you will underestimate the true value of the variance and the standard deviation in the population - biased When you do n-1, the variance becomes an unbiased estimate Degrees of freedom = takes into account the number of constraints in computing an estimate One constraint - standard deviation depends on calculating the sample mean

Answer 3

Visualize distribution of the data Top and the bottom of the box are 75th and 25th percentiles, respectively Had Median is the horizontal line in the box Whiskers - extend from the top and bottom to indicate the range of the bulk of the dataw

Answer 4

Tally of the count of numeric data values that fall into a set of intervals

Answer 5

Plot of the frequency table with the bins on the x-axis and the count on the y-axis

Answer 6

Smooth version of the histogram, often based on a kernel density estimate

Answer 7

Both frequency tables and percentiles summarize the data by creating bins In general, quartile and deciles will have the same count in each bin (equal-count bins), but bin size will be different Small bins = result is too granular and the ability to see bigger pictures is lost

Answer 8

1) location 2) variability 3) skew ness 4) kurtosis

Answer 9

Refers to whether the data is skewed to larger or smaller values

Answer 10

Propensity of the data to have extreme values

Answer 11

Smoothed histogram A density plot corresponds to plotting the history ram as a proportion rather than counts

Answer 12

The most commonly occurring category or value in a data set

Answer 13

When the categories can be associated with a numeric value, this give an average value based on a category’s probability of occurence 1) multiply each outcome by its probability of occurring 2) sum these values - future expectations and probability weights

Answer 14

Frequency or proportion for each category plotted as bars

Answer 15

Frequency or proportion for each category plotted as wedges in a pie

Answer 16

A metric that measures the extent to which numeric variables are associated with one another (ranges from -1 to +1) Multiply deviations from the mean for variable 1 times those for variable 2 and divide by the product of the standard deviations

Answer 17

A table where the variables are shown on both rows and columns, and the cell values are the correlations between the variables

Answer 18

A plot in which the x-axis is the values of one variable, and the y-axis the value of another

Answer 19

A tally of counts between 2 or more categorical variables

Answer 20

A plot of two numeric variables with the records binned into hexagons

Answer 21

A plot showing the density of 2 numeric variables like a topographical map

Answer 22

Similar to a boxplot but showing the density estimate Plot a numeric variable against a categorical variable

Answer 23

Visually compare the distributions of a numeric variable grouped according to a categorical variable

EDA Flashcards

(47 cards)