NLP Flashcards

1
Q

Corpus

A

A collection of speech data. Used for training NLP models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Word Sense Disambiguation

A

The process of identifying the correct meaning of a word, when multiple meanings can be interpreted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BLEU

A

Bi-Lingual Evaluation Understudy.
Metric used to measure the quality of a machine translation, compared to reference translations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ROUGE

A

A metric used to measure the quality of machine summaries, compared to reference summaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hidden Markov Model (HMM)

A

Statistical model used for part-of-speech tagging and speech recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Part-of-Speech tagging

A

Tags words in a sentence according to noun, verb, adverb etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Transfer Learning

A

Applying a model trained on one task to a different task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

N-Gram

A

A sequence of N continuous items from a text of speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lemmatization

A

Reducing words to their base form, undoing any conjugations. Similar to Stemming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Stemming

A

Reducing words to their stem, undoing any conjugations. Similar to Lemmatization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Named Entity Recognition

A

Categorising names into predefined groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Co-reference Resolution

A

Identifying which words in a text refer to the same entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stop Word

A

A commonly used word which does not contribute to a texts content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Word Embeddings

A

Vectorisation of words. Similar words are mapped to nearby vectors in vector space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Word2Vec

A

Word vectorisation method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

GloVe

A

Word vectorisation method.

17
Q

BERT

A

Word vectorisation method.

18
Q

Bag-of-Words

A

Method for representing a set of words, without regard to order or grammar.

19
Q

TF-IDF

A

Term Frequency - Inverse Document Frequency
Metric measuring how important a word is to a document in a corpus, relative to its frequency in the rest of the corpus.

20
Q

Latent Semantic Analysis

A

Analyses relationships between words in a document corpus to discover semantic structures.

21
Q

Latent Dirichlet Allocation

A

Generative probabilistic model identifying topics in a document corpus.

22
Q

Generative Probabilistic Model

A
23
Q

Perplexity

A

Metric measuring how well a model predicts a sample. Lower perplexity is better performing.

24
Q

Componential Semantics

A

Words represented by sets of semantic components which together describe the meaning of the word.

25
Q

Distributional Semantics

A

Describing words according to the contexts they appear in.

26
Q

Thematic Distance

A

Metric measuring the similarity of words based on the angle between their vectors.

27
Q

Saltonian vector

A

Binary vector representation of a word. Zero everywhere except the index of the word in the corpuses complete wordlist.

28
Q

Vector size reduction

A
29
Q

Word error rate

A

A metric measuring the relative error between a generated text and a reference text.
S + D + I / N
S: Substitutions
D: Deletions
I: Insertions
N: Number of words in reference text

30
Q

Connectionist Temporal Classification

A

A method for end-to-end Automatic Speech Recognition.

31
Q

Attention-based Encoder-Decoder Models

A

A method for end-to-end Automatic Speech Recognition

32
Q

Transducer Models (RNN-T)

A

A method for end-to-end Automatic Speech Recognition

33
Q
A