06_secondary data 2 Flashcards
What is secondary data?
Secondary data: Data gathered and recorded by someone else prior to and for a purpose other than the current project.
What are the advantages of secondary data?
- Available
- Faster and less expensive than acquiring primary data
- Requires no access to subjects
- Inexpensive—government data is often free
- May provide information otherwise not accessible
- Often: high data quality (particularly large samples and established measures)
- Availability of time series data
- Simplification of cross-cultural studies
What are the disadvantages of secondary data?
-
No control over data quality and
representativeness - Lack of familiarity with variables
- Lack of key variables
- Uncertain validity
- Data not consistent with needs
- Inappropriate units of measurement
- (Probably) too old
What are the key characteristics of big data?
Volume:
- Huge amount of data created each day
- Data explosion:
- “Big” is relative
Variety:
- Internal vs. external data sources
- Mix of three different data formats:
Velocity:
- Speed of data production and processing
- Real-time information create highest value
- Quick response to big data can offer a competitive advantage
Veracity:
- Term coined by IBM
- Accuracy, quality, truthfulness, trustworthiness of data
Variability:
- Data flows can be inconsistent with periodic peaks
- Can be challenging to manage
Value Proposition
- Key purpose
- Assumption: Big data more valuable than small data
What is unstructued and structured data?
Unstructured: Video, Image, text data, voice
Structured: Numeric secondary data, categorial data, geographic data
Evaluate secondary data as a method for interring causal relationships?
Distinct entities: maybe, none (depends on the conceptual elaborations
Association: yes, easily identifieable
Temporal Precedence: yes, secondary data often collected at various points of time
Eliminating rival causalm relationships: maybe, can be problematic
–>In case of panel data, control of all temporarily invariant variables possible, otherwise problematic…
How is the eliminating rival causal explaination in experiments, survey and seondary data?
- Experiments: Can be eliminated by randomization
- Survey: in theory you can eliminate them by asking all rival explanations, and use them as controls, but difficult
- Secondary data: Measurement is possible, however in many cases it cannot be measured
What is longitudinal data?
Longitudinal data refers to data collected from the same subject or units over a period of time
What are the two forms of longitudinal data?
One unit of analysis: TIme series analysis
Several units of analysis: Panel data analysis
What are the challenges of longitudinal data?
Challenges:
- Violation of OLS assumption: “Residual are uncorrelated
–>Different estimators necessary; panel estimators, time series estimators
What are the advantages of longitudinal data?
- Allow to distinguish true loyalty effects from spurious effects
- Allow to include lagged values of the dependent variable as predictor and analysis of novel research questions
- Repeated measurements allow to address an omitted variable bias in different ways (Chapter 4.7
What is unstructured data?
Unstructured Data is a single data unit in which the information offers a relatively concurrent representation of its multifaceted nature without predefined organization or numeric values.
What are the advantage and challenges and forms of unstructured data?
Advantage
- Large amoounts of unstructured data available at companies
- derive deeper and novel insights
Challenges:
- Complex data structure
Forms:
- Text, images, audio, video
What is text analytics?
Idea
* Can investigate “what” is being said and “how” it is said, using both qualitative and quantitative inquiries with various degrees of human involvement”
* Based on content analysis (Chapter 5)
What is the distinction and purpose of text analytics?
Distinctions
* Text as a reflection of the producer
* Text’s impact on receivers
* Important to consider contextual influences on text
Purpose of text analytics
* Text for prediction
* Text for understanding