Lecture 31 Big data Flashcards

1
Q

What are the 7Vs that characterise big data

A

Volume, Velocity, Variety, Veracity, Variability, Value and Visualisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where does Big data come from

A

Electronic Medical or Health Records (EMRs and EHRs)
• The Internet of Things (IoT)
• Research/data repositories (genomes, other
researchers data
• Social media

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Volume

A

the computing capacity required to store and analyse data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Veloctity

A

the speed at which data is created and analysed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Variety

A

the types of data sources available (text, images, social media, administrative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Veracity

A

the accuracy and credibility of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Variability

A

the internal consistency of your data (eg. reproducible research)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Value

A

the costs required to undertake big data analysis should pay dividends for your organisation and their patients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Visualisation

A

the use of novel techniques to communicate patterns that would otherwise be lost in massive tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two methods of data linkage

A

Deterministic: Exact matches of personal information appearing in databases to be linked.
- Probabilistic: Statistical weights are used to calculate the probability that data from different sources refer to the same individual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the benefits and risks of data linkage

A

benefits is that you are able to investigate multiple risk factors pertaining to one event.
Risk is that in probabilistic you could have duplicates/ mismatches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Integrated Data infrastructure (IDI)

A

Large research database holding de-identified microdata from statsnz, 2013 census, and non government organisations about people and households

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the benefits of IDI

A

It has : variety to gain system & life wide insights, can predictive risk modelling, identify characteristics of groups associated with good and bad outcomes and use this to tailor interventions and evaluate interventions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the risks of IDI

A

The resident population denominator can vary based on the data set, so could show bias due to missing data, selection bias. This could be driven culturally. Can’t identify individuals who are abusing the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 3 big challenges of big data

A

Data governance, Data generation and Data output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Data governance related to

A

The practices ensuring formal management of data: storage transfer and privacy

17
Q

What is Data generation related to

A

Making sure quality-> large numbers don’t always mean more accurate. Includes capturing, curating, updating and accuracy

18
Q

What is Data Output related to

A

How do you analyse large datasets to generate meaningful and reliable outputs. Being able to share it

19
Q

What are the 5 safes of Big data

A

Safe people. projects, settings, data and output

20
Q

What is safe people

A

researchers / analysts must be proven trusted and sworn to secrecy

21
Q

What is safe projects

A

must have statistical purpose for public benefit, with analysis not on individuals

22
Q

What is safe settings

A

security arrangements to prevent unauthorised access to data

23
Q

What is safe data

A

de identified data

24
Q

What is safe output

A

random rounding and deidentification of results

25
Q

What are the policy implications for Big Data

A

Possible to hypothetically find impacts of policy. Inadvertent discrimination of sub-populations, Anonymity is not guaranteed: so ethical approvals involved.
Control over data
Privacy policies.