big DATA Flashcards
wHAT IS BIG DATA
big data is a large or complex dataset that often needs terabytes or petabytes of storage
What are the 4 terms used to define characteristics of big data
Volume
velocity
variety
veracity
what are the r additional terms regarding data relevance
variability
value
visualisation
Volume
The computing capacity required to store and analyse data
Velocity
The speed at which data are created and analysed
Variety
The types of data sources available (text, images, social media, administrative)
Veracity
The accuracy and credibility of data
Variability
The internal consistency of your data
Value
The costs required to undertake big data analysis should pay dividends for your organisation and their patients
Visualisation
the use of novel techniques to communicate the patterns that would otherwise be lost in massive tables of data
Where do big data come from
1) Electronic or health records
2) the internet (IoT-internet of things)
3) research or data repositories
4) social media
what is data linkage
it is the process of matching records from different sources based on key information
What is deterministic data linkage
Exact matches based on personal information appearing in all of the datasets that are to be linked-N.B IT HAS TO BE EXACT MATCHES
probabilistic
statistical weights are used to calculate the probability that data from different sources refer to the same individual
NHI
it is basically a health number, and it is used to track your interactions with the health system
the purpose is basically so GPs, pharmacists, DHBs can be reimbursed for their data, services
Increasingly researchers are using encrypted versions of the NHI to investigate risk and protective factors associated with health outcomes
what is the IDI
it is a large research database containing microdata about people and their households
The deidentified data come from a range of government and non governemnt agencies
Benefits of IDI
De-identified, linkable data accessed in a data safe haven
The resource is only as good as the data it contains
-qualities about data quality
selection biases in data
Resident population definitions vary from study to study
Some data the IDI has
housing data,
people and communities data
education and training data
income and work data
benefits and social services data
population data
health data
justice data
privacy
refers to ability of a person to control the availability of information about themselves
Security
refers to how the agency stores and controls access to data it holds
Confidentiality
refers to the protection of information from and about individuals and organisations and ensuring that the information is not made available or disclosed to unauthorised individuals and entities