D1. Data Basics Flashcards
Properties of categorical data
Qualitative: often represented using binary indicators (e.g. 1=employed, 0=unemployed).
Ordinal: order is meaningful but differences/ratios are not (e.g. 0=poor, 1=fair, 2=good).
Properties of numerical data
Value is cardinally meaningful.
Discrete: number of individuals in the household.
Continuous: height, age, etc. Many variables in reality are treated as continuous even if they are discrete (quantity, wage).
Define support
The set of values which a variable can take.
Discrete: usually an integer
Continuous: any real number (often within an interval).
What is multivariate data?
Measuring multiple variables at once: multivariate variables (e.g. height, eye colour, weight or 2017, 2018, 2019).
What is cross-sectional data?
Data on one attribute measured in N cases (e.g. people).
{X1, X2, …, XN} = {Xi}i = 1, 2, …, N
^ We ‘lay them out’ up and down on a real number line
If two attributes are measured, 2D number line.
If three, 3D number line.
What is time-series data?
Series of data points indexed in time order.
{X1, X2, …, XT} = {Xt}t = 1, 2, …, T
Time is an implicit second attribute
Series is a sequence of bivariate observations {time, value}
What sigma notations do we use for addition and multiplication?
Σ = add, Π = multiply