C10 Flashcards

1
Q

goals of biomedical text mining

A

interactive knowledge discovery: assisting the expert in finding the information they need

TM can assist researchers in
- finding, evaluating and interpreting the scientific literature and patented biomedical inventions
- generating new medical hypotheses using information extracted from patient information (health records, social media data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

topics for TM in biomedical research

A
  • gene/protein/disease extraction
  • adverse events (side-effects)
  • predicting time to death
  • drug interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What steps to take in bio-tm, eg. task of finding side-effects for medications on online forums?

A
  1. Filter the potentially relevant messages
  2. Get/create training data for NER
  3. Train an NER model to identify drug names and side effects in the messages
  4. Normalize the side effects (map to ontology)
  5. Relation extraction: co-occurrences of drug names and side effects in one message
  6. (Match the found relations to an existing knowledge base to identify which relations are new)

Needed:
- lists/ontologies of drug names and known side effects
- pre-processing
- pre-trained BERT models for NER and ontology linking
- labelled data for supervised NER finetuning and evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

one of the biggest challenges in bio-NER

A

recognition of genes and protein names in scientific text: often described using different names and symbols and multiple genes share symbols and names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

relation extration: co-occurrence based methods

A

assume that two concepts that often occur together in the same text are related

Statistics for co-occurrence frequencies:
- actual number of co-occurrences
- expected number of co-occurrences based on the frequencies of both entities
- a statistical test to decide if the co-occurrence is statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

relation extraction: structure-based methods

A

phrase based, able to detect triples in text, e.g. gene A inhibits gene B or gene C is involved in disease G

  • provides information about the type of relationship between two concepts
  • structure-based methods often have a higher precision than co-occurrence based methods but lower recall (limited set of relations)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

6 modules in bio-tm

A
  • Information Retrieval
  • Named Entity Recognition
  • Ontology linking
  • Relation Extraction
  • Knowledge Discovery
  • Visualization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

identifying biomedical entities in retrieved documents

A

mentions of entities are highlighted and linked to the specific concept in the controlled vocabulary (thesaurus or ontology)

Unified Medical Language System (UMLS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

differences in pre-training of domain-specific models vs general models

A

further pre-training vs. pre-training from scratch: the collection has to be huge for pre-training from scratch, so domain-specific models are often further trained

WordPiece vocabulary is optimized for the pre-training corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly