Chapter 1: Origins of Data Flashcards
What are the alternative names for:
1. Data table
2. Observations
3. Variables
- The data matrix
- Cases
- Features
What does a data table consist of?
Rows with Observations and Columns with Variables with specific info relating to that observation. Each column is a variable.
What is the common format for data tables, and what does it stand for?
csv.
“Comma separated values”
What are csv files?
Text files of a data table, with rows and columns. Rows are separated by the end of line signs and columns are separated by a delimiter (i.e. semi colon). Can be imported in all stat software.
What are the 3 types of observation structures?
- Cross-sectional
- Time series
- Multi dimensional
What are the 5 features of xsec data?
- Observations come from the same time period and refer to different units i.e. different families
- Ideally = all observations in a xsec dataset are observed at the exact same time (a particular time interval)
- When the interval is narrow = is treated as a single point in time
- In most xsec data = the ordering of observations in the dataset doesn’t matter.
- Has the simplest data structure
What are the 2 features of tseries data?
- Observations refer to a single unit observed multiple times i.e. shop’s monthly sales
- There is a natural ordering of the observations
What is an alternative name for multi-dimensional data?
Panel data
What is the common type of panel data?
LONGITUDINAL DATA/ CROSS SECTIONAL TIME SERIES DATA (xt data) = It has many units, each observed multiple times.
What are 2 examples of xt data?
Countries observed repeatedly for several years, data on employees of a firm on a monthly basis etc
How can multi-dimensional datasets be represented in table formats for xt data? Explain.
Most convenient format has 1 observation representing 1 unit observed at 1 time (i.e. country-year observations) so that the one unit (country) is represented by multiple observations.
In xt data tables = observations are identified by 2 ID variables: 1 for the xsec units and one for time.
What is balanced xt data?
When all xsec units have observations for the very same time periods.
What is unbalanced xt data?
When some xsec units are observed more times than others
Name and explain the other important feature of data
Level of Aggregation of Observations.
Data with info on people may have observations at different levels i.e. age is at the individual level, home location is at the family level and real estate prices may be available as averages for zip code areas.
Time series data on transactions may have observations for each transaction, or for transactions aggregated over some time period.
Define the “garbage in - garbage out” principle
Summarises the prime importance of data quality.
The result of an analysis cannot be better than the data it uses.
What are the 6 key aspects of data quality?
- Content
- Validity
- Reliability
- Comparability
- Coverage
- Unbiased Selection
VRUCCC:
Real Value Understands Crap Crap Crap
OR/
Cash Value Comes from Reducing Useable Credit