Big data technologies in Health Study Design and Summarising Data Flashcards

1
Q

What is the Big Data?

A
  • Huge volume of data
  • Billions of rows, millions of columns
  • Complexity of data types and structures
  • Relational, unstructured
  • Speed of new data creation and growth

Big Data is data whose scale, distribution, diversity, and/or timeliness require the use of new technical
architectures and analytics to enable insights that unlock new sources of business value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3(4) V-s of Big Data?

A

3(4) V-s of Big Data:
* Volume - scale of data
* Velocity - analysis of streaming data - updates/changes on the data life cycle
* Variety - different forms of data e.g. posts, patient notes
* (Veracity) - uncertainty of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the challenges of research with health data?

A

How do we represent the medical knowledge in data, so that it is:
* Standardised
* Portable
* Computable

Text means nothing
* Not searchable
* Not interoperable
* Not computable

Computers need codes – i.e human input to define a concept more clearly at input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the types of bias in big Health data?

A
  • Collected data used for multiple purposes
  • Patient information may not be complete, accurate, or current.
  • Clinicians and insurers have to be aware of this
  • Greater attention needs to be paid to the context in which data is recorded in the EHR system.
  • Addressing information gaps in Randomised Control Trials
  • Tracking provenance of data being produced
  • Reimbursement bias
  • Why record a Body Mass Index (BMI) in a thin person?
  • Software bias
  • System initiated – UK eHRs don’t allow negative values and <>
  • Data errors
  • 1% ‘resurrection’ rate in one UK longitudinal study
  • Myocardial infarction in code ‘NOT’ in text….
  • Different pick lists for terminologies and the use of non-standard representations e.g. BP!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the importance of reproducibility in health research?

A

Research community is struggling to ensure transparency and correctness of published research
* Reasons complex and interleaving (positive bias, intractable analysis, deluge of journals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the concept of Learning Health Systems?

A

A ‘learning health system’ (LHS) continuously analyses data which is collected as part of routine care to monitor outcomes, identify improvements in care, and implement changes on the basis of evidence.

  • Persistent issues with clinical research
  • Hard to identify subjects
  • Complex, costly CRFs with duplicate data entry
  • Funding not cost-effective
  • Integrated approach needed between clinical trials and observational studies
  • Secondary problem: Diagnostic error
  • 60% of litigation claims against GPs (UK/EU/US)
  • Failure of Decision Support Systems for Diagnosis
  • System increasingly data-driven!
  • Fundamentally a cross-disciplinary challenge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Function of Learning health system?

A

Defining functions of a LHS are to:
1.routinely and securely aggregate data from disparate sources 2.convert the data to knowledge
3.disseminate that knowledge, in actionable forms, to everyone who can benefit from it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Classification?

A

**Classification **– A systematic representation of terms and concepts and the relationship between them.
* The apple is the fruit of the APPLE TREE, which is part of the ROSE family.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Possible sources of bias?

A

Possible sources of bias
1. Health care system bias
a Reimbursement system, pay for performance (why record BMI of a thin person?)
b Role of clinician in the health care system; gatekeeping/non-gatekeeping
c Professional guidelines for recording (UK’s Quality Outcomes Framework)
d Ease of access by patients to their records
e Data sharing between health care providers
2. Practice workload
3. Variations between EHR system functionalities and lay-out
4. Coding systems and thesauruses
5. Knowledge and education regarding the use of electronic health record systems
6.Data extraction tools
7. Data processing – re-databasing
8. Research dataset preparation
9. Research methodologies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Non-Clementure?

A

Nomenclature (vocabulary) – An agreed system of assigned names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the Anonymisation techniques for patients?

A
  • Quantitative
  • removing or aggregating variables
  • reducing the precision or detailed textual meaning of a variable
  • In relational data, where connections between variables in related datasets can disclose identities
  • For geo-referenced data, where identifying spatial references also have a geographical value.
  • Qualitative
  • identifiers should not be crudely removed or aggregated, as this can distort the data or even make
    them unusable
  • Pseudonyms, replacement terms or vaguer descriptors should be used.
  • The objective: reasonable level of anonymisation whilst maintaining maximum content.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Obstacles in Big Data collection

A
  • Restrictive policies on data access
  • Lack of standard policy on patient data
  • privacy/confidentiality
  • No international standardisation on data collection routes
  • Licenses for access to data can be expensive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Legistrlations passed??

A

Data Protection Act 1998
* Provisions for secure processing of identifiable data for medical research
* No definitions of “secure” and “medical research”
* Led to consent-or-anonymise approach
* According to Information Commisioner’s Office (ICO) anonymisation code
Health and Social Care Act 2002
Section 251 of the NHS Act of 2006
* provisions for allowing linkage of patient-identifiable data
* Applications made to Health Research Authority (HRA)
* NHS Information Centre for Health and Social Care (NHSIC)
* Application assistance
* Trusted third party for data linkage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Challenges of Research Data Management

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Threats to reproducible science?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the standards to ensure reproducibility?

A

Traceability and accountability of research data are essential in clinical research
* Standards incude:
* GxP (including Good Clinical Data Management Practice and Good Clinical Practice)
* CONSORT for trial reporting
* CDISC ADAM – documents each derived variable
* STROBE for reporting observational studies
* RECORD, evolution of STROBE
* REporting of studies Conducted using Observational Routinely-collected Data

17
Q

Properties of learning health systems?

A
  • Every consenting patient’s characteristics and experience are available to learn from
  • Best practice knowledge is immediately available to support decisions
  • Improvement is continuous through ongoing study
  • An infrastructure enables this to happen routinely and
    with economy of scale
    All of this is part of the culture