NLP Flashcards
Corpus
A collection of speech data. Used for training NLP models
Word Sense Disambiguation
The process of identifying the correct meaning of a word, when multiple meanings can be interpreted.
BLEU
Bi-Lingual Evaluation Understudy.
Metric used to measure the quality of a machine translation, compared to reference translations.
ROUGE
A metric used to measure the quality of machine summaries, compared to reference summaries.
Hidden Markov Model (HMM)
Statistical model used for part-of-speech tagging and speech recognition
Part-of-Speech tagging
Tags words in a sentence according to noun, verb, adverb etc.
Transfer Learning
Applying a model trained on one task to a different task.
N-Gram
A sequence of N continuous items from a text of speech.
Lemmatization
Reducing words to their base form, undoing any conjugations. Similar to Stemming.
Stemming
Reducing words to their stem, undoing any conjugations. Similar to Lemmatization.
Named Entity Recognition
Categorising names into predefined groups.
Co-reference Resolution
Identifying which words in a text refer to the same entity.
Stop Word
A commonly used word which does not contribute to a texts content.
Word Embeddings
Vectorisation of words. Similar words are mapped to nearby vectors in vector space.
Word2Vec
Word vectorisation method.