10. NLP for Clinical Text Flashcards
what is NER and how is it used in health
NER = named entity recognition
usage = clinical NER to scan clinical documents, research papers & categorise entities such as treatments, drugs and diagnoses
what is clinical de-identification & how does it satisfy the governance around patient privacy
HIPAA requires hcp to protect patient medical information from disclosure with the only exception of disclosing data without their consent or knowledge if it is de-identified
how does NLP apply to HIPAA requirements on HCP
can use NER to scan medical documents & identified protected hc info (PHI) including patient names & phone #
give examples of NLP in hc
- clinical documentation
- speech recognition/dictation
- computer assisted coding
- clinical decision support
- virtual scribe/chatbot
- clinical trial matching
- computation phenotyping
- EMR dictation
- root cause analysis
what is the EHR & it’s architecture
structured coded form of lab results, discharge diagnoses, pharmacy orders etc.
uses XML standard model to form CDA (clinical data architecture)
what is the NLP workflow
- obtain data
- preprocessing
- tokenisation
- word embedding & representation
- build & train model
- evaluation
what is stemming
crude chopping of affixes to reduce terms
what is lemmatisation
reducing inflections or variants to base form
give an example of the diff. between stemming & lemmatisation
lem: changing = change
stem: changing = chang
what is the purpose of word embedding
representing words in a vectorised format of fixed numbers
this allows calculation of similarities between two sets of text using the cosine distance, which measures the angle between two vectors when represented in space
what is a common embedding model
Word2Vec
what is lstm
long short term memory
modified version of RNN that make it easier to keep past data in memory
what is BERT
bidirectional encoder representations from transformers
what is the BERT procedure
- uses an MLM (masked language model)
- MLM masks a word and uses words on either side to predict the masked word
- uses a transformer architecture
- transformers create differential weights to identify the most important words in sentences that should be processed
- the transformer layer is often called the encoder
- transformers can be good for processing a lot of unsupervised data efficiently
- BERT does not have a decoder
- decoders are used to predict target output
- also uses next sentence prediction where pairs of sentences are used for training, so the subsequent sentence in the document can be predicted by the model
equation for precision
TP/TP+ FP