ERS43 Health Information In The Era Of Big Data Flashcards

1
Q

Data, Information, Knowledge

A
Real world
—(Collection Coding)—>
Data
—(Processing, Analysis, Interpretation, Presentation)—>
Information
—(Judgement, Conclusions)—>
Knowledge
—(Politics, Commitment)—>
Decision and Action
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Key issues with Data

A
  1. Validity (reflect reality?)
  2. Reliability
  3. Completeness
  4. Timeliness
  5. Analysis
  6. Confidentiality
  7. Information governance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data linkage: Shared health records

A

Advantages

  1. Timely and accurate information for care
  2. Reduce duplication of tests / treatment
  3. Reduce medical errors
  4. Improve disease surveillance + monitoring of public health
  5. Gather comprehensive statistics for formulating public health policy
  6. Efficiency gains / Reduce cost from health expenditure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Routine data

A
  • Continually collected, assembled, made available repeatedly (not one-off)
  • Part of data collection system conducted at regular intervals
    —> Track ***trends over time
  • Information coded according to
    —> ***Well-defined protocols, standards to allow comparisons (e.g. with other countries / population / over time)
    —> e.g. International Classification of Diseases standards
  1. Demography
    - basic characteristics of population e.g. age, sex, geographical distribution
    - ***Census / Population registers
    —> conducted every 10 years
    —> Gold standard (in terms of completeness)
    —> Disadvantage: Self-reported, Under-reporting, Problems with small area estimates, Outdated, Expensive
  2. Vital statistics
    - systemically tabulated information e.g. birth, marriages, deaths
    - **Birth records
    - **
    Mortality: causes, distribution (by time, person, place)
    —> most reliable health data (∵ death is unambiguous)
    —> causes often inaccurate / incomplete (hard to determine exact cause)
    —> insensitive measure of health —> non-fatal disease burden not reflected
  3. Morbidity
    - prevalence, incidence of diseases
    - **Infectious disease notifications
    —> notifiable diseases
    —> generally adequate for monitoring trends but sometimes incomplete
    - **
    Disease registers (e.g. HK cancer registry)
    —> identify a specific group
    —> may miss case due to no contact / non-identification
    - ***Impairment, disability, handicap
    —> functional status more relevant to patient (compared with disease status)
    —> collected only from surveys
  4. Health services data
    - access and supply, utilisation, activity, costs of using health services
    - e.g. diagnoses, interventions, procedures, outcomes
    - relevant if condition result in health care use e.g. fracture
    - data likely to be ***incomplete, poor quality
    - record health service activity rather than outcomes / effectiveness e.g. disease burden
  5. Health-related characteristics / risk factors e.g. deprivation, living conditions, employment, housing
    - data from other agencies: social care, labour, housing etc.
    - ***limited use: categories / definitions may be incompatible between different data sets
    - incomplete data, poor quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Specially collected data

A
  • Collected for a particular purpose
    —> Fulfil a specific ***time-limited study
  • Without intention of regular repetition / adherence to specific standards (outside of study needs)
  • Information coded according to
    —> ***Task at hand, may not conform to international standards —> difficult to compare with other data
    —> e.g. Research, Commissioned studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Disease registers

A
  • Collect details of all diagnosed cases
  • ***Reliable identification of cases: inclusion criteria with a defined population
  • ***Continually updated (e.g. recovered, died, moved away)
  • ***Expensive to maintain
  • Require multiple data sources for case ascertainment + exclude duplication
  • ***Useful for incidence rates, survival, remission, trends, making projections
  • Linkage to other records e.g. health care events, co-morbidities, medication, spending, lifestyle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Health data in HK

A
Clinical Iceberg (most people not seek health service)
—> does not capture whole population disease burden

E.g. HA Clinical Management System

  • demographic data
  • health service activity
  • diagnosis, procedures codes: ICD standards (allow comparison)
  • laboratory / pathology results
  • radiology imaging + reports
  • clinical notes: structured (coded) / semi-structured / unstructured (free text, require mining to extract information)
  • medications record
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Diagnostic coding for looking at diseases

A
  • Standardised codes of diagnoses for ***accurate comparison (with other places etc.)
  • Categorised for analysis
  • e.g. Diseases, Disorders, Symptoms, Injuries, Procedures

Examples:

  • ICD9, ICD10
  • ICPC-2 (primary care), DSM (psychiatry), SNOMED CT (medical terms for symptoms)
  • **Use:
    1. Epidemiology - clinical burden of disease, risk factors
    2. Financing, reimbursement
    3. Health service planning / resource allocation
    4. Evaluation of services

Limitations:

  • Only for those with disease + Use services (not full picture of morbidity)
  • Depends on accuracy / completeness of coding
  • Differing coding practices in different places (change in coding for money)
  • Expensive, time-consuming
  • Changes in case definitions across time / place
  • Historical comparisons - mapping to ICD from version 9 to 10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Surveys

A
  1. Previous surveys: Local / National
    - readily available
    - may be authoritative
    - may not be generalisable to specific population of interest —> require “modelling” assumption
    - ***variable quality: self-reported? representativeness?
    - Thematic Household Surveys: chronic disease, insurance, service utilisation
    - Behavioural Risk Factor Surveillance System (BRFSS) (by CHP): smoking, alcohol, diet etc.
  2. Commissioned surveys
    - **tailor-made, expensive esp. from scratch
    - **
    more relevant —> ∵ collect specific information of interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Qualitative data

A

Local description of environmental / social factors

  • may give good understanding / stimulate further research
  • ***difficult to assess scale of health impact of identified problems (∵ lack quantitative data)

Important to assess:

  • **People’s perception of how health problems affect them
  • could identify issues important to people
  • qualitative data need careful handling e.g. context, unstable responses if question wording is inconsistent
  • e.g. patient feedback
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

***Summary of Routine data Pros + Cons

A

Pros:

  1. ***Readily available, Lower cost (∵ already done)
  2. Useful for ***initial assessment (baseline of expected levels of health / disease)
  3. Identify important issues / hypotheses for further research

Cons:

  1. ***Not up-to-date, Less complete (except Census)
  2. ***Collected for different purpose so may not include specific variable of interests, report specific populations
  3. ***Not reliable e.g. subject to political influence / manipulation
  4. Data linkage may not be possible (∵ cannot access raw data)
  5. ***Individual level data inaccessible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Alternative to Routine data: Research studies

A
  1. Ecological studies
  2. Cross-sectional surveys
  3. Cohort studies
  4. Other commissioned studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Application of data: Diabetes

A

Question: How many diabetics in HK?

Answers:

  1. Published studies by academia, government (CHP, household surveys), NGOs
  2. Lab results: HbA1c, OGTT, Fasting glucose etc.
  3. Diagnosis coding
  4. Diabetic medication prescriptions
  5. Self-reported diagnoses
  6. Attendances at diabetic-specific clinics
  7. Population denominator (time / person / place): Census, Population projections (determine population at risk to find out prevalence / incidence)

Limitations:

  1. Completeness (undiagnosed cases, private healthcare system e.g. GP)
  2. Data linkage (double counting of patients) between data sets
  3. Matching numerator (no. of cases) and denominator (population at risk) across different time, area, population
  4. Information governance
    - patient confidentiality, data security, consent, ethical approval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Worldwide trends in diabetes

A
  1. Prevalence ↑
    - ∵ population ageing, growth, chronic incurable disease
  2. Disease burden ↑ (in terms of prevalence / number of people affected)
    —> ↑ faster in low/ middle income countries (than in high income countries)
    —> e.g. higher proportion of deaths
  3. Most people with diabetes from low / middle income countries
    - even though prevalence higher in high income countries
  4. Incidence beginning to stabilise in high income countries e.g. HK
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Diabetes in China, HK

A

China: Absolute number highest in world, among highest in prevalence

HK: ~ prevalence (~11%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Big data

A
  1. Volume
  2. Variety (e.g. texts, images, numbers)
  3. Velocity (lots of data collected continuously)
  4. Veracity (uncertainty of data)
  5. Value (is data valuable?)
17
Q

Use of big data

A

***Predictive analytics

Classical:

  • quantitative risk prediction
  • based on classical statistical learning
  • from more structured data sources

Current:
- **Digitalisation of health-related records and data sharing
—> bigger / more variety of data sets
—> cover more people
- Availability of AI / deep learning to analyse **
heterogeneous data sets
—> e.g. strengths of digital imaging over human interpretation

Goals:

  1. ***Generate new models, predictions
  2. ***Improve decision making
18
Q

Decision making

A

During clinical decisions (esp. difficult decisions)
—> Clinicians should view output as only statistical prediction and maintain suspicion
—> Prediction may be wrong

Statistical performance of Risk prediction models (e.g. QRISK3) —> measured by:
1. **Discrimination (Do patients with outcome have higher risk prediction than those without?) (究竟higher risk同普通人有無分別)
2. **
Calibration (Does risk prediction have the exact number of outcomes?) (個predicted risk準唔準)
—> Studies usually have better Discrimination than Calibration

19
Q

Axes of Machine learning and Big data

A

Traditional clinical studies:

  • analyse data from many patients using a statistical model
  • low on machine learning spectrum
  • Analyse data

Deep learning models:

  • top of spectrum
  • generative adversarial networks —> can ***generate new images from learning a large database of images
  • Analysis data + ***Generate new data (i.e. Predictive models)
20
Q

Trust “black box”?

A
1. Machine learning algorithms
—> more complicated they are
—> more powerful / results better
—> but more opaque they are than classical statistical models (harder to give explanation how to arrive at output)
—> less easy to interpret
  1. Developers reluctant to report algorithms ∵ proprietary
  2. ***Infeasible to interpret hidden features (∵ output depends on complex interactions with uninterpreted features in other layers)

Limitations:
1. ***Biases in training data
—> if have inherent biases (and reasoning not accounted)
—> AI will preserve the biases
—> generate data with biases
—> data inaccuracy, missingness, selective measurement even though more data is available
—> be careful of performance

  1. ***Privacy of health care data
    - compare anonymised data with public information e.g. google image
    —> can re-identify anonymised people