Week 7 & 8 Flashcards
Data linkage
The merging of datasets
Data integration & augmentation
Complementing survey data with additional information, usually from non-traditional data sources.
Data forms
Designed data (experiments, surveys) and organic data (aspirational, transactional)
Forms of data in big data
Transactional data, social media data, Internet of Things (IoT) data
Charasteristics of big data (7 V’s)
- Volume - big data is massive
- Variety - (no) structure
- Velocity - the speed of new data generation
- Veracity - accuracy
- Variability - the difference in meaning
- Value - increases when linked
- Visualization
Three ethical principles
- Beneficence - minimizing harm, while maximizing benefit
- Justice - burdens of research should not be unequally shared among groups of subjects, with some baring burdens and sme reaping benefits
- Autonomy - obtaining informed consent
Differences surveys and big data
Surveys use ‘designed data’, big data is ‘organic data’
For surveys researchers have control over the content, for big data there is no control
Surveys have a detailed documentation on the data generating process, big data don’t
Data curation
The management of data, to preserve the data to make analysis and reuse possible.
Data provenance
The process of tracing and recoding the origins of data and its movement between data bases.
Privacy-utility trade-off
Balancing the risk to the participants while still getting the most out of the data.
The big data paradox
The bigger the dataset, the bigger the chance of bias. As with every added N the confidence interval becomes narrower, the chance the real value is inside becomes smaller.