lecture 2 Flashcards

1
Q

Why health and biomedical informatics?

A
  • modern healthcare and biomedical research are information intensive activities
  • the crossover/intersection between health care/biomedical research and information technology
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the demand for specialised workforce?

A
  • people with the ability to combine ICT and biomedical skills and knowledge are in high demand in Australia and around the world
  • A job market study from June 2012 showed that job posts in Health Informatics have increased ten times faster than other health related jobs in recent years
  • need people with specific skills: people able to understand both the language of medicine and and the language of informatics technology
  • well paid in many countries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is rationale?

A

Background
• health care, biomedical research and public health are information intensive activities:
- medical images and clinical records
- DNA sequencing, molecular data
- literature and public databases
- clinical trials, biobanks, GWAS
• new data types (extremely complex and heterogeneous) are being generated at an unprecedented pace

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How long did/does it take to decode the human genome?

A
  • first decoded in 2003 after one decade of work

* nowadays takes one day

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Pubmed?

A
  • growing exponentially
  • 5000 biomedical research articles are published daily
  • over 22 million articles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are projects that human genome project has led to?

A
  • human microbiome project
  • exposome alliance project
  • ENCODE genome regulation
  • Human Epigenome Project
  • phenome levels (proteome, metabolome)
  • inter and intra individual genetic variation: 1000 genomes, mapping human genetic variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Big Data?

A

• global size of “Big Data” in Healthcare stands at roughly 150 Exabytes (10^18) in 2001, increasing at a rate btetween 1.2 and 2.4 Exabytes per year (SA)

defined by the 4 Vs:
• volume: Data at Rest, terabytes to exabytes of existing data to process
• velocity: data in motion, streaming data, milliseconds to seconds to respond
• variety: data in many forms, structured, unstructured, text, multimedia
• veracity: data in doubt, uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, model approximations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is small data?

A
  • our individual digital traces
  • personal devices specifically designed for self-tracking (Fitbit)
  • social networks, search engines, mobile operators, online games, and e-commerce sites that we access every day
  • our everyday behaviours are becoming data
  • data that are just about me, over time
  • data about us, but not being provided to us
  • from chronic pain to depression to memory enhancement and Crohn’s Disease
  • generating evidence where n=me
  • issues: open data, privacy, standards and tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are issues with informatics?

A
  • how can we efficiently collect, store, search, integrate, analyse and visualise all of this info ?
  • how can we use research data to model and simulate human physiology and pathology?
  • how can we facilitate the translation of research findings into clinical solutions?
  • which new information processing methods will be needed to respond to the emerging research approaches?
  • the tools you use are the result of our research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we connect levels of biomedical information?

A
  • connecting different levels of biomedical information
  • bottom - up approaches (from gene/molecule to environment)
  • top-down approaches (reverse)
  • need to link health informatics (population data, clinical data, patient generated data) with bioinformatics (genomic data, gene expression data, proteomics and metabolomics data)
  • relationship between genotype, phenotype
  • everything must also be connected with the complex interplay between itself and the environment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Health vs bioinformatics?

A
  • bioinformatics is different from health informatics
  • increasing opportunities for interaction
  • bioinformatics and computational systems biology
  • health and biomedical informatics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is working with clinical data so hard? Why is healthcare data different?

A

Humans are a result of evolution - not perfect - many levels

Data about humans that arises from a growing number of sources and contexts:

  • clinical research
  • clinical practice - EHRs
  • patient and disease registires
  • mHealth apps
  • Smart devices and sensors
  • Environmental data
  • Social media data
why different? 
• distributed (EMR, clinical departments) 
• different formats (text, images, numeric, videos) 
• same data exists in different systems 
• patient generated data
• data is structured and unstructured 
• inconsistent/variable definitions 
• new data coming out every day 
• complexity of data (the human body)
• changing regulatory requirements
• privacy issues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is biomedical informatics?

A
  • informatics is the science of information
  • information is data plus meaning
  • biomedical informatics is the science of information as applied to or studied inthe context of biomedicine
  • informaticians study information (data + meaning, in contrast to focusing exclusively on data)
  • thus, practitioners must understand the context or domain (biomedicine)
  • IT is different from biomedical informatics
  • IT is basically the technology that we use to process information
  • informatics is about providing meaning to data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some of the big challenges currently being address?

A
  • phenome –> genome * exposome
  • human → environmental sensors, phenomic sensors, genomic sensors
  • environmental sensors → environmental risk factors (pollution, radiation, toxic agents,…)
  • phenomic sensors → physiological, biochemical parameters (cholesterol, temperature, glucose, heart rate…)
  • genomic sensors: biomarkers (DNA sequence, proteins, gene expression, epigenetics)
  • all of these combined → integrated personal health record

challenge:
• how can we measure environmental exposure?
• e.g. diabetes
• measuring the exposome
• environement-wide association study on Type 2 Diabetes mellitus
• 266 environmental factors
• future: combined: GWAS-EWAS?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an example of a new way of presenting/visualising data?

A
  • microarrays: analysis of gene expression in cancer
  • how this is translated into survival curve
  • ontologies: systems that are expressed in a standardised way, so you could analyse data from two different places
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we extract knowledge from the literature?

A
  • text mining
  • automatically scan through abstracts and extract complex networks of interrelationships
  • way of filtering information for clinician/researcher
17
Q

What is increasing every day?

A
  • need for patient-specific decision support assistance
  • number of facts is out of capacity of any human mind
  • traditional health care (i.e. decisions by clinical phenotype) vs decisions that take into account structural genetics: e.g. SNPs, haplotypes, functional genetics: gene expression profiles, proteomics and other effector molecules
  • need to deliver systems that can overcome this, function as reminders, alerts, that can send messages to clinicians saying ‘hey, be careful, this patient could have an adverse drug reaction because there is an incompatibility between x and y’
18
Q

What is the role of informatics in new taxonomy of disease?

A
  • stratification of disease - ICD 11 - US Nat Academy - Towards precision medicine
  • new taxonomy based on human molecular biology
  • exposome
  • signs and symptoms
  • genome
  • epigenome
  • microbiome
  • other types of patient data
  • individual patients
  • e.g. skin, colon, parathyroid - BRAF mutation
  • MD ANderson CC - Breast, Ovarian, Uterine, Cervical – PIK3CA Mutation trial
  • in the future will classify cancers not in terms of where they are seen but by what is causing them
19
Q

What is network and systems medicine?

A
  • has a role in informatics at all levels
  • personalised and participatory medicine
  • preventative medicine
  • social component of disease
20
Q

How do we access information about genetic diseases?

A
  • there are many online bioinformatics resources that offer updated and reliable information on the molecular causes of genetic diseases
  • different methods of search can be used (catalogues, search engines, databases)
  • navigation across the large number of resources is not straightforward and discerning their quality and reliability poses challenges for clinicians and biomedical researchers
21
Q

What is another layer of complexity?

A
  • all/most of the databases are interconnected

* spaghetti

22
Q

What are questions you can get databases to answer?

A
  • what are the main features of the disease?
  • are there any drugs for the disease?
  • are there any gene therapies or clinical trials for the disease?
  • what laboratories perform genetic tests for the disease?
  • what genes cause the disease?
  • on which chromosomes are these genes located?
  • what mutations have been found in these genes?
  • what names are used to refer to these genes?
  • what are the proteins coded by these genes?
  • what are the functions of the gene product?
  • what is the 3D structure for these proteins?
  • what are the enzymes associated with these proteins?
23
Q

What are the main centres?

A
  • US Gov - NIH - NLM - NCBI (similar to EBI, offer information across the whole spectrum, genes, genetic information, proteins, metabolites, 3D structures)
  • EC – EBI
  • DDBJ - focussed on metabolic data (japan)
  • Switzerland - SIB - Expasy - protein data
  • usually will offer you a window where you can make a search
  • often offer training resources, online short courses

• in principle we say that everything you can get from those places is reliable, well-funded, last for a long time,

24
Q

What is NAR?

A
  • second strategy in searching for information
  • catalogues of resources
  • Nucleic Acids Research
  • free to access issue on databases
  • peer reviewed
  • high reliability, high quality
  • easy to navigate
  • tree
25
Q

What is bioinformatics.ca?

A
  • compendium of bioinformatics links

* based on NAR catalogue but adding some more resources

26
Q

What are integrated search engies?

A
  • NCBI Entrez is best one
  • another strategy in the search for information
  • system automatically sends your query to many databases
  • gives you a number of records that each source has dealing with your query
  • initial number for all may be way too many but within specific catalogues/sources etc there might be a manageable number e.g. if search for human variations of clinical significance
27
Q

What is the final strategy?

A
  • for when you are more confident/familiar is to go directly to the database that offers the information that you need
  • e.g. OMIM, HPO, Uniprot, Prosite, Interpro, Mapviewer/Ensembl, Genetests, GTR, PharmGKB, ClinTrials.gov, Wave, Enzyme, KEGG Pathways, KEGG medicus, Genecards, Entrez Gene, Gene ontology, HGNC, Orphanet/NORD, MeSH, PDB, RefSeq etc etc

Clinical
• Genetests, GTR: genetic tests
• PharmGKB: drugs
• ClinTrials.gov: drugs

DNA
• Genecards
• Entrez Gene
• RefSeq
• Wave
• Mapviewer / Ensembl
Ontologies: how can we name things in this area, not actual data, how to call, aliases, standardised names for everything etc 
• HPO
• MeSH
• HGNC
• Gene ontology 

Biochem
• Enzyme
• KEGG Pathways
• both available through japanese centre

Proteins
• PDB: structures 
• Uniprot: sequences 
• Prosite: profiles, motifs within sequence
• Interpro: ditto

Pathology
• OMIM: to understand more about disease, all info about all genetic diseases, the best
• Orphanet/NORD
• KEGG medicus

  • can also categorise these databases in a more information flow sort of diagram
  • starting from DNA (GenBank, EMBL, DDBJ, dbSNP, TSC, Genomas) → RNA (dbEST, Unigene, Array Express SMD) → protein sequences (PIR, UniProt, Swiss, 2D-Page Prowl) → protein structure (PDB) → physiopathology (OMIM, Medline, HGVD, KEGG, BIND pathway) → disease
28
Q

Where would you look if you were interested in just locating things in the human genome?

A

→ Map Viewer

29
Q

Where would you look if you are interesting in finding patterns of genetic variation?

A

→ dbSNP

30
Q

Where would you go if you wanted to see protein structures?

A

→ PDB
→ Protein Data Bank Brookhaven
→ single international reposityory for the processing and distribution of 3D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR
→ all public solved protein structures
→ non-redundant (only the best determination)

31
Q

Where would you look for metabolic pathways?

A

KEGG pathway

32
Q

Where would you look for chemical compounds and metabolites?

A

HMDB

33
Q

How to find out which labs offer genetic test?

A

GTR

34
Q

Want to know more about connection between genes, drugs and disease?

A

PharmGKB

35
Q

What is also important?

A
  • being a good user of these databases e.g. medline

* knowing how to use the MeSH terms, what to search, terminology