Lecture20 Flashcards
Big Data is
Large or complex datasets, which often need terabyte or petabytes of storage. They contain large amounts of info at a population, regional or local level or span different geographic areas. Combining data from multiple sources to explore population health outcomes.
Volume in big data:
computing capacity to store and analyse data
Velocity:
The speed at which data are created and analysed
Variety
The types of data sources available
Veracity
The accuracy and credibility of data
Variability:
Internal consistency of your data
Data linkage:
The process of matching records from different sources based on key information
Deterministic approach to data linkage:
Exact matches based on personal information appearing in all datasets
Probalistic approach to data linkage
Statistical weights are used to calculate the probability that data from different sources
NHI:
tracks your interactions with the health system
The IDI:
Integrated Data Infrastructure is a large research database containing microdata about people and households. De-identified data. Researchers use the IDI to complex questions to improve outcomes for New Zealanders.
Benefits of IDI:
De-identified, linkable, Resource is only as good as the data it contains.
Variables included in NZDep2013 (x9)
Communication (no internet access), Income (people aged 18-64 recieving a means tested benefit), income (People living in equavalised households with income below an income threshold), Employment, Qualifications, Owned Home, Support (people <65 living in a single parent family), Living space, Transport
What challenges do big data bring?
Data governance, data generation, data output
The five safes:
Safe people, projects, settings, output, data