Lec 6 | Language Flashcards

1
Q

It spans all tasks where the AI gets human language as input.

A

Natural Language Processing (NLP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

the AI is given text as input and it produces a summary of the text as output.

A

automatic summarization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the AI is given a corpus of text and the AI extracts data as output

A

information extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the AI is given text and returns the language of the text as output.

A

language identification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the AI is given a text in the origin language and it outputs the translation in the target language.

A

machine translation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the AI is given text and it extracts the names of the entities in the text (for example, names of companies).

A

named entity recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the AI is given speech and it produces the same words in text.

A

speech recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the AI is given text and it needs to classify it as some type of text.

A

text classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

where the AI needs to choose the right meaning of a word that has multiple meanings (e.g. bank means both a financial institution and the ground on the sides of a river).

A

Word Sense Ambiguation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A sentence structure

A

Syntax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The meaning of words or sentences

A

Semantics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A system of rules for generating sentences in a language.

A

Formal Grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The text is abstracted from its meaning to represent the structure of the sentence using formal grammar.

A

Context Free Grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do these non-terminal symbols mean?

  • N
  • V
  • NP
  • VP
  • S
  • D
  • P
  • ADJ
A
  • Noun
  • Verb
  • Noun Phrase
  • Verb Phrase
  • Sentence
  • Determiner
  • Preposition
  • Adjective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A sequence of n items from a sample of text.

A

n-gram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What n-gram is this?

the items are characters

A

character n-gram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What n-gram is this?

the items are words

A

word n-gram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Continous sequence of items from a sample of text. They may have sequences of 1, 2, or 3.

3 answers

A

unigram, bigram, and trigram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Where can we use/implement n-gram?

A

It is useful for text-processing.

Since some words occur together more often then others, it is possible to also predict the next word with some probability. A helpful step in natural language processing is breaking the sentence into n-grams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the task of splitting a sequence of characters into pieces (tokens).

A

Tokenization

21
Q

Tokens can be words as well as sentences, in which case the tasks are called what?

A

word tokenization or sentence tokenization

22
Q

What are challenges faced when using splitting words? How do we deal with them?

A

Words with apostrophes (e.g. “o’clock”) and hyphens (e.g. “pearl-grey). Additionally, some punctuation is important for sentence structure, like periods. Dealing with these questions is the process of tokenization.

23
Q
  • Consists of nodes, the value of each of which has a probability distribution based on a finite number of previous nodes.
  • Can be used to generate text.
A

Markov Models

24
Q

How do we use Markov Models?

A

we train the model on a text, and then establish probabilities for every n-th token in an n-gram based on the n words preceding it.

25
Q
  • A model that represents text as an unordered collection of words.
  • This model ignores syntax and considers only the meanings of the words in the sentence.
  • This approach is helpful in some classification tasks, such as sentiment analysis.
A

Bag-of-Words Model

26
Q

This can be used, for instance, in product reviews, categorizing reviews as positive or negative.

A

Sentiment Analysis

27
Q

a technique that’s can be used in sentiment analysis with the bag-of-words model.

A

Naive Bayes

28
Q

What problem might we run into when using Naive Bayes? And how do we solve this problem?

A

One problem that we can run into is that some words may never appear in a certain type of sentence. One way to go about this problem is with Additive Smoothing.

29
Q

It is where we add a value α to each value in our distribution to smooth the data. This way, even if a certain value is 0, by adding α to it we won’t be multiplying the whole probability for a positive or negative sentence by 0.

A

Additive Smoothing

30
Q

A specific type of additive smoothing where it adds 1 to each value in our distribution, pretending that all values have been observed at least once.

A

Laplace Smoothing

31
Q

the task of finding relevant documents in response to a user query.

A

Information Retrieval

32
Q
  • We use this ti achieve Information Retrieval.
  • Models for discovering the topics for a set of documents.
A

Topic Modeling

33
Q

How can the AI go about extracting the topics of documents? And how does it work?

A

One way to do so is by looking at term frequency, which is simply counting how many times a term appears in a document.

34
Q

words that are used for syntactic purposes in the sentence and not as independent units of meaning, such as “am”, “do”, “is”, “with”, “the,” “by,” “and,” “which,” “yet,” etc.

A

Function Words

35
Q

words that carry meaning independently, such as “crime,” “brothers,” “demons,” “gentle,” “meek,” etc.

A

content words

36
Q

which is a measure of how common or rare a word is across documents in a corpus.

A

Inverse Document Frequency

37
Q

GIVE EQUATION

Inverse Document Frequency

A
log(TotalDocuments / NumDocumentsContaining(word)
38
Q
  • A library that can rank what words are important in a document.
  • This library is capable of multiplying the term frequency of each word by the inverse document frequency, thus getting a value for each word.
A

Term frequency - inverse document frequency (tf-idf)

39
Q

The task of extracting knowledge from documents. This task can take the form of giving a document to the AI as input and getting a list of companies and the years when they were founded as output.

A

Information Extraction

40
Q

A database similar to a dictionary, where words are given definitions as well as broader categories.

A

Word Net

41
Q

each word is represented with a vector that consists of as many values as we have words. Except for a single value in the vector that is equal to 1, all other values are equal to 0.

A

One-hot Representation

42
Q

Give a problem or drawback of one-hot representation. What is the solution to this problem/drawback?

A
  • It will be incredibly inefficient when representating more words (if you have 50000 words there will be 50000 vectors with the length of 50000 each).
  • Another problem in this kind of representation is that we are unable to represent similarity between words like “wrote” and “authored.”

Solution: we turn to the idea of Distributed Representation

43
Q

Meaning is distributed across multiple values in a vector. Each vector has a limited number of values.

It allows us to generate unique values for each word while using smaller vectors.

We are able to represent similarity between words by how different the values in their vectors are.

A

Distributed Representation

44
Q

an algorithm for generating distributed representations of words

A

word2vec

45
Q

CS50 QUIZ

Consider the below context-free grammar, where S is the start symbol.

S -> NP V
NP -> N | A NP
A -> “small” | “white”
N -> “cats” | “trees”
V -> “climb” | “run”
Consider also the following four sentences.

Cats run.
Cats climb trees.
Small cats run.
Small white cats climb.

Of the four sentences above, which sentences can be derived from the context-free grammar?

A

Sentence 1, Sentence 3, and Sentence 4.

46
Q

CS50 QUIZ

Which of the following is not a true statement?

  • Attention mechanisms can be used to determine which parts of an input sequence are most important to focus on.
  • One-hot representations of words better represent word meaning than distributed representations of words.
  • Transformers can be faster to train than recurrent neural networks because they are more easily parallelized.
  • A Naive Bayes Classifier assumes that the order of words doesn’t matter when determining how they should be classified.
A

One-hot representations of words better represent word meaning than distributed representations of words.

47
Q

CS50 QUIZ

Why is “smoothing” useful when applying Naive Bayes?

  • Smoothing allows Naive Bayes to better handle cases where evidence has never appeared for a particular category.
  • Smoothing allows Naive Bayes to better handle cases where there are many categories to classify between, instead of just two.
  • Smoothing allows Naive Bayes to be less “naive” by not assuming that evidence is conditionally independent.
  • Smoothing allows Naive Bayes to turn a conditional probability of evidence given a category into a probability of a category given evidence.
A

Smoothing allows Naive Bayes to better handle cases where evidence has never appeared for a particular category.

48
Q

CS50 QUIZ

From the phrase “must be the truth”, how many word n-grams of length 2 can be extracted?

A

3