Lecture 31 Big data Flashcards
What are the 7Vs that characterise big data
Volume, Velocity, Variety, Veracity, Variability, Value and Visualisation
Where does Big data come from
Electronic Medical or Health Records (EMRs and EHRs)
• The Internet of Things (IoT)
• Research/data repositories (genomes, other
researchers data
• Social media
What is Volume
the computing capacity required to store and analyse data
What is Veloctity
the speed at which data is created and analysed
What is Variety
the types of data sources available (text, images, social media, administrative)
What is Veracity
the accuracy and credibility of data
What is Variability
the internal consistency of your data (eg. reproducible research)
What is Value
the costs required to undertake big data analysis should pay dividends for your organisation and their patients.
What is Visualisation
the use of novel techniques to communicate patterns that would otherwise be lost in massive tables
What are the two methods of data linkage
Deterministic: Exact matches of personal information appearing in databases to be linked.
- Probabilistic: Statistical weights are used to calculate the probability that data from different sources refer to the same individual.
What is the benefits and risks of data linkage
benefits is that you are able to investigate multiple risk factors pertaining to one event.
Risk is that in probabilistic you could have duplicates/ mismatches.
What is Integrated Data infrastructure (IDI)
Large research database holding de-identified microdata from statsnz, 2013 census, and non government organisations about people and households
What are the benefits of IDI
It has : variety to gain system & life wide insights, can predictive risk modelling, identify characteristics of groups associated with good and bad outcomes and use this to tailor interventions and evaluate interventions
What are the risks of IDI
The resident population denominator can vary based on the data set, so could show bias due to missing data, selection bias. This could be driven culturally. Can’t identify individuals who are abusing the system.
What are the 3 big challenges of big data
Data governance, Data generation and Data output