Data and Sampling Flashcards
What are data?
Observations/measurements of some phenomenon of interest
A dataset is a collection of realisations of random variables
What can the index of a data point indicate?
The set for cross sectional data, the time for time series or panel data
What are the types of data?
Numeric, grouped, or categorical
Data sets can be cross-sectional, time series, or some combination (panel data tracks a group over time, repeated cross-sectional data take different random samples)
What can be included in summary statistics?
Measures of central tendency, measures of dispersion, other measurements of the distribution, measures of relationship between the variables
What measures of central tendency are there?
Mean, median (50th percentile), mode, geometric mean (nth root of the product of the data, doesn’t work with negative data)
What measures of dispersion are there?
Standard deviation, range, interquartile range
What are some other measures of a sample distribution?
Sample skewness measures asymmetry of a distribution, sample kurtosis measures ‘tailedness’ or ‘peakness’ or how much of the variability is due to large deviations from the mean
What measures of relationship between variables are there?
Sample covariance which is positive if high x values are associated with high y values and 0 if there is no linear relationship, correlation coefficient
What is the formula for the population variance?
σn2 = 1/n * Σi=1n(xi-x̄)2
What is the formula for the sample variance?
sn-12 = 1/(n-1) * Σi=1n(xi-x̄)2
What is the formula for sample skewness?
(1/n * Σi=1n(xi-x̄)3) / sn3
What is the formula for sample kurtosis?
(1/n * Σi=1n(xi-x̄)4) / sn4
What is one common reason why correlation does not equal causation?
There can be a third variable causally correlated with both variables, often time
What is a sample survey?
A fraction of the total population observed to make statistical inferences about the population from which they are drawn (instead of using a census)
Why are samples are needed?
To estimate parameters for the probability distribution used in the analysis of events