Lectures 1-4: Datasets, Variables, Distributions, Estimation, Quantitative Methods Flashcards
What is a variable?
A variable represents a characteristic for each case within a dataset, which can be described using more than one value – e.g. a variable might record incomes or unemployment rates.
What is a nominal (a.k.a. categorical) variable?
A variable with distinct categories that does not tell you anything about the relationship between them, and cannot be ranked in terms of value or order – e.g. birthplace, religion.
Are nominal variables and categorical variables the same thing?
Yes.
What is an ordinal variable?
A variable with categories than can be ordered or ranked according to some sort of criterion (but where you cannot specify the precise size of the interval between any two categories) – e.g. a ranking of low-, semi- or high-skilled workers.
What is an interval (a.k.a. ratio) variable?
A variable on a scale, with an exact distance between any pair of values. It may be either continuous (e.g. height, income) OR discontinuous/discrete (e.g. indivisible units such as numbers of factories or people).
Are interval variables and ratio variables different?
No. They are the same thing.
What is a dummy variable?
A variable that cannot be measured but can still be used by assigning values that represent two (or more) categories – e.g. where 0 = no and 1 = yes.
What is an independent (a.k.a. explanatory) variable?
A variable that explains your dependent variable.
What is a dependent (a.k.a. response) variable?
It represents a phenomenon that you want to understand through comparison to other variables (i.e. your independent variables).
What are univariate (a.k.a. descriptive) statistics?
They capture the distribution of an individual variable; univariate analysis is the simplest form of statistical analysis.
What types of methods would you use to investigate univariate statistics?
For qualitative variables: frequency, mode and median. For quantitative variables: mean, median, mode, standard deviation, etc.
What are bivariate statistics?
They capture the relationship between 2 variables – e.g. racism and income.
What types of methods would you use to investigate bivariate statistics?
For qualitative variables: crosstabulate, Cramer’s V, logistic/multinomial regression. For quantitative variables: correlation (if the independent variable is quantitative), simple regression.
What are multivariate statistics?
They capture or model the relationships among 3 or more variables.
What types of methods would you use to investigate multivariate statistics?
For qualitative variables: logistic/multinomial regression. For quantitative variables: multiple regression.
What is statistical inference?
The process of analysing data to deduce the properties of an underlying distribution.
What is a dataset?
A series of units (individuals, household..) with one or more characteristics (variables). For each variable there is a sequence of observations, each with its own value
What do cross-sectional dataset capture?
They capture the characteristics of a comparable unit at a single point in time. The observations vary by the characteristics, not by unit type. Example: 1 round of ESS
What do times-series dataset capture?
Time series capture repeated observations at different time periods. For example: inflation in one country over several decades
What do cross-sectional times-series (CSTS) dataset capture?
They capture fixed and non-sampled units at different time periods. For example: inflation in several countries over several decades
What is a panel data?
A dataset that captures sampled units at different time intervals. Example Electoral Panels (same individuals over several elections)
What is a rolling cross-section dataset?
Dataset that captures sampled units (called cohorts) at different time intervals. Ex: Several rounds of ESS (different individuals but with same characteristics of age, nationality…)
What is the mean?
It is the average of all values
What is the median?
It is the value in the middle. There is the same number of observations above and below it
What is the mode?
The value that occurs the most
What is a quartile?
The n/4 quarter of the values that fall below the median. Ex. The first quartile has ¼ of the values below the median. The 3rd quartile has ¾ of the values below the median
What is an interquartile range?
The distance between the 1st & 3rd quartile
What is the mean deviation?
It measures how much each value differs from the mean. It is the average of all differences
What is the variance?
It is the mean of the squared deviations. The variance gets rid of the possible negative signs when calculating the mean deviation. It captures better the greater spread of the values. There is a disadvantage: it is expressed in “square units”
What is the Standard deviation?
The typical deviation from the mean
What is the coefficient of variation?
It allows to compare standard deviation of series measured in different units
What is a normal distribution or normal curve?
It takes the form of a bell-shaped. The curve is symmetrical and unimodal, so the mean, the median and the mode are identical
What is the skewness?
It measures the extent to which a distribution is asymmetrical. It is given by the relationship between the mean and the median: the greater the skewness, the greater the difference between the mean and the median
When is a distribution skewed to the right (positive skewness)?
When the tail of the curve stretches to the right containing a small number of very large values.
When is a distribution skewed to the left (negative skewness)?
When the tail of the curve stretches to the left with a small number of very low values.
What is the Kurtosis?
It measures the size of the distribution tails. It approaches to 3 in quasi-normal distributions.
What is the empirical law of the normal distribution?
A constant proportion of all the cases lie in a given distance from the mean measured in terms of standard deviation