NLP Flashcards by Viktor Axén

Corpus

A collection of speech data. Used for training NLP models

How well did you know this?

Not at all

Perfectly

Word Sense Disambiguation

The process of identifying the correct meaning of a word, when multiple meanings can be interpreted.

How well did you know this?

Not at all

Perfectly

BLEU

Bi-Lingual Evaluation Understudy.
Metric used to measure the quality of a machine translation, compared to reference translations.

How well did you know this?

Not at all

Perfectly

ROUGE

A metric used to measure the quality of machine summaries, compared to reference summaries.

How well did you know this?

Not at all

Perfectly

Hidden Markov Model (HMM)

Statistical model used for part-of-speech tagging and speech recognition

How well did you know this?

Not at all

Perfectly

Part-of-Speech tagging

Tags words in a sentence according to noun, verb, adverb etc.

How well did you know this?

Not at all

Perfectly

Transfer Learning

Applying a model trained on one task to a different task.

How well did you know this?

Not at all

Perfectly

N-Gram

A sequence of N continuous items from a text of speech.

How well did you know this?

Not at all

Perfectly

Lemmatization

Reducing words to their base form, undoing any conjugations. Similar to Stemming.

How well did you know this?

Not at all

Perfectly

Stemming

Reducing words to their stem, undoing any conjugations. Similar to Lemmatization.

How well did you know this?

Not at all

Perfectly

Named Entity Recognition

Categorising names into predefined groups.

How well did you know this?

Not at all

Perfectly

Co-reference Resolution

Identifying which words in a text refer to the same entity.

How well did you know this?

Not at all

Perfectly

Stop Word

A commonly used word which does not contribute to a texts content.

How well did you know this?

Not at all

Perfectly

Word Embeddings

Vectorisation of words. Similar words are mapped to nearby vectors in vector space.

How well did you know this?

Not at all

Perfectly

Word2Vec

Word vectorisation method.

How well did you know this?

Not at all

Perfectly

GloVe

Study These Flashcards

Word vectorisation method.

BERT

Study These Flashcards

Word vectorisation method.

Bag-of-Words

Study These Flashcards

Method for representing a set of words, without regard to order or grammar.

TF-IDF

Study These Flashcards

Term Frequency - Inverse Document Frequency
Metric measuring how important a word is to a document in a corpus, relative to its frequency in the rest of the corpus.

Latent Semantic Analysis

Study These Flashcards

Analyses relationships between words in a document corpus to discover semantic structures.

Latent Dirichlet Allocation

Study These Flashcards

Generative probabilistic model identifying topics in a document corpus.

Generative Probabilistic Model

Study These Flashcards

Perplexity

Study These Flashcards

Metric measuring how well a model predicts a sample. Lower perplexity is better performing.

Componential Semantics

Study These Flashcards

Words represented by sets of semantic components which together describe the meaning of the word.

Distributional Semantics

Describing words according to the contexts they appear in.

Thematic Distance

Metric measuring the similarity of words based on the angle between their vectors.

Saltonian vector

Binary vector representation of a word. Zero everywhere except the index of the word in the corpuses complete wordlist.

Vector size reduction

Word error rate

A metric measuring the relative error between a generated text and a reference text. S + D + I / N S: Substitutions D: Deletions I: Insertions N: Number of words in reference text

Connectionist Temporal Classification

A method for end-to-end Automatic Speech Recognition.

Attention-based Encoder-Decoder Models

A method for end-to-end Automatic Speech Recognition

Transducer Models (RNN-T)

A method for end-to-end Automatic Speech Recognition

NLP Flashcards

(33 cards)