Lecture 1 Flashcards
What is the mode?
In categorial variables; the mode is the most frequent level. ( It is sometimes used for numerical variables as well when there are only a few different values.)
What is the Variation Ratio:
Only in categorial variables: The fraction of cass different from the mode.
What is entropy?
In categorial variables : The amount of disorder/uncertainty of information: is the data equally spread (high uncertainty, high H) or are the result extremely biased towards one answer (low uncertainty, low H)
What does a contengency table show?
The levels of one or two categorial variables withe their amounts numerically displayed in either propotions or relative shares. With two cat. variables the table becomes 2D (matrix style)
What is the mean?
Only in numerical variables; the mean is the sum of all values devided by the number of data (aka total average)
What is the median?
Only in numerical variables; it is the data point that is seperating the top and bottom 50%
What is meant with the dispersion of data?
With dispersion we talk about spread. More indepthly: Range, Variance, Standard deviation, Quantiles.
What is the range of data?
The (absolute) difference between the minimum and maximum of numerical data.
What is the downside of using range to descibe data?
As it only reflects the difference between minimum and maximum, it doesnt acount for the dispersion of data and is heavily influenced by outliers.
What is variance?
Only in numerical variables: It is the expected squared deviation from the mean.
What is the standard deviation?
Only in numerical variables: It is the squareroot of the variance, which is easier to understand as this is now in the same units as the variable.
What are quantiles?
Only in numerical variables: Cut points that divide the distribution of intervals in equal probability ( 16% of observations is below the 16th quantile)
What is Inter-Quartile-Range (IQR)?
In numerical Variables: Quartile is a quantile of 4 segments. The IQR is the rnage between the 3rd and 1st quartiles. so the inner 50% of data points.
What is the influence of bandwith in Density plots?
The smoothing of the density plot : Higher bandwith yields higher smoothing of the data, and vise versa.
What do the conditional (marginal) proportions show in a contingency table?
Instead of relating the shares/proportions of TWO categorial variables to the total, we make it relative to one of the two categorial variables. Hence each row/column of the table now equals to 100% instead of the whole table totaling to 100%.