ERS43 Health Information In The Era Of Big Data Flashcards

Question 1

Q

Data, Information, Knowledge

Answer

A

Real world
—(Collection Coding)—>
Data
—(Processing, Analysis, Interpretation, Presentation)—>
Information
—(Judgement, Conclusions)—>
Knowledge
—(Politics, Commitment)—>
Decision and Action

Question 2

Q

Key issues with Data

Answer

A

Validity (reflect reality?)
Reliability
Completeness
Timeliness
Analysis
Confidentiality
Information governance

Question 3

Q

Data linkage: Shared health records

Answer

A

Advantages

Timely and accurate information for care
Reduce duplication of tests / treatment
Reduce medical errors
Improve disease surveillance + monitoring of public health
Gather comprehensive statistics for formulating public health policy
Efficiency gains / Reduce cost from health expenditure

Question 4

Q

Routine data

Answer

A

Continually collected, assembled, made available repeatedly (not one-off)
Part of data collection system conducted at regular intervals
—> Track ***trends over time
Information coded according to
—> ***Well-defined protocols, standards to allow comparisons (e.g. with other countries / population / over time)
—> e.g. International Classification of Diseases standards

Demography
- basic characteristics of population e.g. age, sex, geographical distribution
- ***Census / Population registers
—> conducted every 10 years
—> Gold standard (in terms of completeness)
—> Disadvantage: Self-reported, Under-reporting, Problems with small area estimates, Outdated, Expensive
Vital statistics
- systemically tabulated information e.g. birth, marriages, deaths
- **Birth records
- **Mortality: causes, distribution (by time, person, place)
—> most reliable health data (∵ death is unambiguous)
—> causes often inaccurate / incomplete (hard to determine exact cause)
—> insensitive measure of health —> non-fatal disease burden not reflected
Morbidity
- prevalence, incidence of diseases
- **Infectious disease notifications
—> notifiable diseases
—> generally adequate for monitoring trends but sometimes incomplete
- **Disease registers (e.g. HK cancer registry)
—> identify a specific group
—> may miss case due to no contact / non-identification
- ***Impairment, disability, handicap
—> functional status more relevant to patient (compared with disease status)
—> collected only from surveys
Health services data
- access and supply, utilisation, activity, costs of using health services
- e.g. diagnoses, interventions, procedures, outcomes
- relevant if condition result in health care use e.g. fracture
- data likely to be ***incomplete, poor quality
- record health service activity rather than outcomes / effectiveness e.g. disease burden
Health-related characteristics / risk factors e.g. deprivation, living conditions, employment, housing
- data from other agencies: social care, labour, housing etc.
- ***limited use: categories / definitions may be incompatible between different data sets
- incomplete data, poor quality

Question 5

Q

Specially collected data

Answer

A

Collected for a particular purpose
—> Fulfil a specific ***time-limited study
Without intention of regular repetition / adherence to specific standards (outside of study needs)
Information coded according to
—> ***Task at hand, may not conform to international standards —> difficult to compare with other data
—> e.g. Research, Commissioned studies

Question 6

Q

Disease registers

Answer

A

Collect details of all diagnosed cases
***Reliable identification of cases: inclusion criteria with a defined population
***Continually updated (e.g. recovered, died, moved away)
***Expensive to maintain
Require multiple data sources for case ascertainment + exclude duplication
***Useful for incidence rates, survival, remission, trends, making projections
Linkage to other records e.g. health care events, co-morbidities, medication, spending, lifestyle

Question 7

Q

Health data in HK

Answer

A

Clinical Iceberg (most people not seek health service)
—> does not capture whole population disease burden

E.g. HA Clinical Management System

demographic data
health service activity
diagnosis, procedures codes: ICD standards (allow comparison)
laboratory / pathology results
radiology imaging + reports
clinical notes: structured (coded) / semi-structured / unstructured (free text, require mining to extract information)
medications record

Question 8

Q

Diagnostic coding for looking at diseases

Answer

A

Standardised codes of diagnoses for ***accurate comparison (with other places etc.)
Categorised for analysis
e.g. Diseases, Disorders, Symptoms, Injuries, Procedures

Examples:

ICD9, ICD10
ICPC-2 (primary care), DSM (psychiatry), SNOMED CT (medical terms for symptoms)
**Use:
1. Epidemiology - clinical burden of disease, risk factors
2. Financing, reimbursement
3. Health service planning / resource allocation
4. Evaluation of services

Limitations:

Only for those with disease + Use services (not full picture of morbidity)
Depends on accuracy / completeness of coding
Differing coding practices in different places (change in coding for money)
Expensive, time-consuming
Changes in case definitions across time / place
Historical comparisons - mapping to ICD from version 9 to 10

Question 9

Q

Surveys

Answer

A

Previous surveys: Local / National
- readily available
- may be authoritative
- may not be generalisable to specific population of interest —> require “modelling” assumption
- ***variable quality: self-reported? representativeness?
- Thematic Household Surveys: chronic disease, insurance, service utilisation
- Behavioural Risk Factor Surveillance System (BRFSS) (by CHP): smoking, alcohol, diet etc.
Commissioned surveys
- **tailor-made, expensive esp. from scratch
- **more relevant —> ∵ collect specific information of interest

Question 10

Q

Qualitative data

Answer

A

Local description of environmental / social factors

may give good understanding / stimulate further research
***difficult to assess scale of health impact of identified problems (∵ lack quantitative data)

Important to assess:

**People’s perception of how health problems affect them
could identify issues important to people
qualitative data need careful handling e.g. context, unstable responses if question wording is inconsistent
e.g. patient feedback

Question 11

Q

***Summary of Routine data Pros + Cons

Answer

A

Pros:

***Readily available, Lower cost (∵ already done)
Useful for ***initial assessment (baseline of expected levels of health / disease)
Identify important issues / hypotheses for further research

Cons:

***Not up-to-date, Less complete (except Census)
***Collected for different purpose so may not include specific variable of interests, report specific populations
***Not reliable e.g. subject to political influence / manipulation
Data linkage may not be possible (∵ cannot access raw data)
***Individual level data inaccessible

Question 12

Q

Alternative to Routine data: Research studies

Answer

A

Ecological studies
Cross-sectional surveys
Cohort studies
Other commissioned studies

Question 13

Q

Application of data: Diabetes

Answer

A

Question: How many diabetics in HK?

Answers:

Published studies by academia, government (CHP, household surveys), NGOs
Lab results: HbA1c, OGTT, Fasting glucose etc.
Diagnosis coding
Diabetic medication prescriptions
Self-reported diagnoses
Attendances at diabetic-specific clinics
Population denominator (time / person / place): Census, Population projections (determine population at risk to find out prevalence / incidence)

Limitations:

Completeness (undiagnosed cases, private healthcare system e.g. GP)
Data linkage (double counting of patients) between data sets
Matching numerator (no. of cases) and denominator (population at risk) across different time, area, population
Information governance
- patient confidentiality, data security, consent, ethical approval

Question 14

Q

Worldwide trends in diabetes

Answer

A

Prevalence ↑
- ∵ population ageing, growth, chronic incurable disease
Disease burden ↑ (in terms of prevalence / number of people affected)
—> ↑ faster in low/ middle income countries (than in high income countries)
—> e.g. higher proportion of deaths
Most people with diabetes from low / middle income countries
- even though prevalence higher in high income countries
Incidence beginning to stabilise in high income countries e.g. HK

Question 15

Q

Diabetes in China, HK

Answer

A

China: Absolute number highest in world, among highest in prevalence

HK: ~ prevalence (~11%)

Question 16

Q

Big data

Answer

A

Volume
Variety (e.g. texts, images, numbers)
Velocity (lots of data collected continuously)
Veracity (uncertainty of data)
Value (is data valuable?)

Question 17

Q

Use of big data

Answer

A

***Predictive analytics

Classical:

quantitative risk prediction
based on classical statistical learning
from more structured data sources

Current:
- **Digitalisation of health-related records and data sharing
—> bigger / more variety of data sets
—> cover more people
- Availability of AI / deep learning to analyse **heterogeneous data sets
—> e.g. strengths of digital imaging over human interpretation

Goals:

***Generate new models, predictions
***Improve decision making

Question 18

Q

Decision making

Answer

A

During clinical decisions (esp. difficult decisions)
—> Clinicians should view output as only statistical prediction and maintain suspicion
—> Prediction may be wrong

Statistical performance of Risk prediction models (e.g. QRISK3) —> measured by:
1. **Discrimination (Do patients with outcome have higher risk prediction than those without?) (究竟higher risk同普通人有無分別)
2. **Calibration (Does risk prediction have the exact number of outcomes?) (個predicted risk準唔準)
—> Studies usually have better Discrimination than Calibration

Question 19

Q

Axes of Machine learning and Big data

Answer

A

Traditional clinical studies:

analyse data from many patients using a statistical model
low on machine learning spectrum
Analyse data

Deep learning models:

top of spectrum
generative adversarial networks —> can ***generate new images from learning a large database of images
Analysis data + ***Generate new data (i.e. Predictive models)

Question 20

Q

Trust “black box”?

Answer

A

1. Machine learning algorithms
—> more complicated they are
—> more powerful / results better
—> but more opaque they are than classical statistical models (harder to give explanation how to arrive at output)
—> less easy to interpret

Developers reluctant to report algorithms ∵ proprietary
***Infeasible to interpret hidden features (∵ output depends on complex interactions with uninterpreted features in other layers)

Limitations:
1. ***Biases in training data
—> if have inherent biases (and reasoning not accounted)
—> AI will preserve the biases
—> generate data with biases
—> data inaccuracy, missingness, selective measurement even though more data is available
—> be careful of performance

***Privacy of health care data
- compare anonymised data with public information e.g. google image
—> can re-identify anonymised people