Lectures 1-4: Datasets, Variables, Distributions, Estimation, Quantitative Methods Flashcards by Jason Rizzo

What is a variable?

A variable represents a characteristic for each case within a dataset, which can be described using more than one value – e.g. a variable might record incomes or unemployment rates.

How well did you know this?

Not at all

Perfectly

What is a nominal (a.k.a. categorical) variable?

A variable with distinct categories that does not tell you anything about the relationship between them, and cannot be ranked in terms of value or order – e.g. birthplace, religion.

How well did you know this?

Not at all

Perfectly

Are nominal variables and categorical variables the same thing?

Yes.

How well did you know this?

Not at all

Perfectly

What is an ordinal variable?

A variable with categories than can be ordered or ranked according to some sort of criterion (but where you cannot specify the precise size of the interval between any two categories) – e.g. a ranking of low-, semi- or high-skilled workers.

How well did you know this?

Not at all

Perfectly

What is an interval (a.k.a. ratio) variable?

A variable on a scale, with an exact distance between any pair of values. It may be either continuous (e.g. height, income) OR discontinuous/discrete (e.g. indivisible units such as numbers of factories or people).

How well did you know this?

Not at all

Perfectly

Are interval variables and ratio variables different?

No. They are the same thing.

How well did you know this?

Not at all

Perfectly

What is a dummy variable?

A variable that cannot be measured but can still be used by assigning values that represent two (or more) categories – e.g. where 0 = no and 1 = yes.

How well did you know this?

Not at all

Perfectly

What is an independent (a.k.a. explanatory) variable?

A variable that explains your dependent variable.

How well did you know this?

Not at all

Perfectly

What is a dependent (a.k.a. response) variable?

It represents a phenomenon that you want to understand through comparison to other variables (i.e. your independent variables).

How well did you know this?

Not at all

Perfectly

What are univariate (a.k.a. descriptive) statistics?

They capture the distribution of an individual variable; univariate analysis is the simplest form of statistical analysis.

How well did you know this?

Not at all

Perfectly

What types of methods would you use to investigate univariate statistics?

For qualitative variables: frequency, mode and median. For quantitative variables: mean, median, mode, standard deviation, etc.

How well did you know this?

Not at all

Perfectly

What are bivariate statistics?

They capture the relationship between 2 variables – e.g. racism and income.

How well did you know this?

Not at all

Perfectly

What types of methods would you use to investigate bivariate statistics?

For qualitative variables: crosstabulate, Cramer’s V, logistic/multinomial regression. For quantitative variables: correlation (if the independent variable is quantitative), simple regression.

How well did you know this?

Not at all

Perfectly

What are multivariate statistics?

They capture or model the relationships among 3 or more variables.

How well did you know this?

Not at all

Perfectly

What types of methods would you use to investigate multivariate statistics?

For qualitative variables: logistic/multinomial regression. For quantitative variables: multiple regression.

How well did you know this?

Not at all

Perfectly

What is statistical inference?

Study These Flashcards

The process of analysing data to deduce the properties of an underlying distribution.

What is a dataset?

Study These Flashcards

A series of units (individuals, household..) with one or more characteristics (variables). For each variable there is a sequence of observations, each with its own value

What do cross-sectional dataset capture?

Study These Flashcards

They capture the characteristics of a comparable unit at a single point in time. The observations vary by the characteristics, not by unit type. Example: 1 round of ESS

What do times-series dataset capture?

Study These Flashcards

Time series capture repeated observations at different time periods. For example: inflation in one country over several decades

What do cross-sectional times-series (CSTS) dataset capture?

Study These Flashcards

They capture fixed and non-sampled units at different time periods. For example: inflation in several countries over several decades

What is a panel data?

Study These Flashcards

A dataset that captures sampled units at different time intervals. Example Electoral Panels (same individuals over several elections)

What is a rolling cross-section dataset?

Study These Flashcards

Dataset that captures sampled units (called cohorts) at different time intervals. Ex: Several rounds of ESS (different individuals but with same characteristics of age, nationality…)

What is the mean?

Study These Flashcards

It is the average of all values

What is the median?

Study These Flashcards

It is the value in the middle. There is the same number of observations above and below it

What is the mode?

The value that occurs the most

What is a quartile?

The n/4 quarter of the values that fall below the median. Ex. The first quartile has ¼ of the values below the median. The 3rd quartile has ¾ of the values below the median

What is an interquartile range?

The distance between the 1st & 3rd quartile

What is the mean deviation?

It measures how much each value differs from the mean. It is the average of all differences

What is the variance?

It is the mean of the squared deviations. The variance gets rid of the possible negative signs when calculating the mean deviation. It captures better the greater spread of the values. There is a disadvantage: it is expressed in “square units”

What is the Standard deviation?

The typical deviation from the mean

What is the coefficient of variation?

It allows to compare standard deviation of series measured in different units

What is a normal distribution or normal curve?

It takes the form of a bell-shaped. The curve is symmetrical and unimodal, so the mean, the median and the mode are identical

What is the skewness?

It measures the extent to which a distribution is asymmetrical. It is given by the relationship between the mean and the median: the greater the skewness, the greater the difference between the mean and the median

When is a distribution skewed to the right (positive skewness)?

When the tail of the curve stretches to the right containing a small number of very large values.

When is a distribution skewed to the left (negative skewness)?

When the tail of the curve stretches to the left with a small number of very low values.

What is the Kurtosis?

It measures the size of the distribution tails. It approaches to 3 in quasi-normal distributions.

What is the empirical law of the normal distribution?

A constant proportion of all the cases lie in a given distance from the mean measured in terms of standard deviation

Lectures 1-4: Datasets, Variables, Distributions, Estimation, Quantitative Methods Flashcards

(37 cards)