What do these non-terminal symbols mean? N V NP VP S D P ADJ

Noun Verb Noun Phrase Verb Phrase Sentence Determiner Preposition Adjective

Lec 6 | Language Flashcards by CYRA DAGOY

It spans all tasks where the AI gets human language as input.

Natural Language Processing (NLP)

How well did you know this?

Not at all

Perfectly

the AI is given text as input and it produces a summary of the text as output.

automatic summarization

How well did you know this?

Not at all

Perfectly

the AI is given a corpus of text and the AI extracts data as output

information extraction

How well did you know this?

Not at all

Perfectly

the AI is given text and returns the language of the text as output.

language identification

How well did you know this?

Not at all

Perfectly

the AI is given a text in the origin language and it outputs the translation in the target language.

machine translation

How well did you know this?

Not at all

Perfectly

the AI is given text and it extracts the names of the entities in the text (for example, names of companies).

named entity recognition

How well did you know this?

Not at all

Perfectly

the AI is given speech and it produces the same words in text.

speech recognition

How well did you know this?

Not at all

Perfectly

the AI is given text and it needs to classify it as some type of text.

text classification

How well did you know this?

Not at all

Perfectly

where the AI needs to choose the right meaning of a word that has multiple meanings (e.g. bank means both a financial institution and the ground on the sides of a river).

Word Sense Ambiguation

How well did you know this?

Not at all

Perfectly

A sentence structure

Syntax

How well did you know this?

Not at all

Perfectly

The meaning of words or sentences

Semantics

How well did you know this?

Not at all

Perfectly

A system of rules for generating sentences in a language.

Formal Grammar

How well did you know this?

Not at all

Perfectly

The text is abstracted from its meaning to represent the structure of the sentence using formal grammar.

Context Free Grammar

How well did you know this?

Not at all

Perfectly

What do these non-terminal symbols mean?

Noun
Verb
Noun Phrase
Verb Phrase
Sentence
Determiner
Preposition
Adjective

How well did you know this?

Not at all

Perfectly

A sequence of n items from a sample of text.

n-gram

How well did you know this?

Not at all

Perfectly

What n-gram is this?

the items are characters

character n-gram

How well did you know this?

Not at all

Perfectly

What n-gram is this?

the items are words

word n-gram

How well did you know this?

Not at all

Perfectly

Continous sequence of items from a sample of text. They may have sequences of 1, 2, or 3.

3 answers

unigram, bigram, and trigram

How well did you know this?

Not at all

Perfectly

Where can we use/implement n-gram?

It is useful for text-processing.

Since some words occur together more often then others, it is possible to also predict the next word with some probability. A helpful step in natural language processing is breaking the sentence into n-grams.

How well did you know this?

Not at all

Perfectly

the task of splitting a sequence of characters into pieces (tokens).

Study These Flashcards

Tokenization

Tokens can be words as well as sentences, in which case the tasks are called what?

Study These Flashcards

word tokenization or sentence tokenization

What are challenges faced when using splitting words? How do we deal with them?

Study These Flashcards

Words with apostrophes (e.g. “o’clock”) and hyphens (e.g. “pearl-grey). Additionally, some punctuation is important for sentence structure, like periods. Dealing with these questions is the process of tokenization.

Consists of nodes, the value of each of which has a probability distribution based on a finite number of previous nodes.
Can be used to generate text.

Study These Flashcards

Markov Models

How do we use Markov Models?

Study These Flashcards

we train the model on a text, and then establish probabilities for every n-th token in an n-gram based on the n words preceding it.

* A model that represents text as an unordered collection of words. * This model ignores syntax and considers only the meanings of the words in the sentence. * This approach is helpful in some classification tasks, such as sentiment analysis.

Bag-of-Words Model

This can be used, for instance, in product reviews, categorizing reviews as positive or negative.

Sentiment Analysis

a technique that’s can be used in sentiment analysis with the bag-of-words model.

Naive Bayes

What problem might we run into when using Naive Bayes? And how do we solve this problem?

One problem that we can run into is that some words may never appear in a certain type of sentence. One way to go about this problem is with Additive Smoothing.

It is where we add a value α to each value in our distribution to smooth the data. This way, even if a certain value is 0, by adding α to it we won’t be multiplying the whole probability for a positive or negative sentence by 0.

Additive Smoothing

A specific type of additive smoothing where it adds 1 to each value in our distribution, pretending that all values have been observed at least once.

Laplace Smoothing

the task of finding relevant documents in response to a user query.

Information Retrieval

* We use this ti achieve Information Retrieval. * Models for discovering the topics for a set of documents.

Topic Modeling

How can the AI go about extracting the topics of documents? And how does it work?

One way to do so is by looking at term frequency, which is simply counting how many times a term appears in a document.

words that are used for syntactic purposes in the sentence and not as independent units of meaning, such as "am", "do", "is", "with", “the,” “by,” “and,” “which,” “yet,” etc.

Function Words

words that carry meaning independently, such as “crime,” “brothers,” “demons,” “gentle,” “meek,” etc.

content words

which is a measure of how common or rare a word is across documents in a corpus.

Inverse Document Frequency

# GIVE EQUATION Inverse Document Frequency

``` log(TotalDocuments / NumDocumentsContaining(word) ```

* A library that can rank what words are important in a document. * This library is capable of multiplying the term frequency of each word by the inverse document frequency, thus getting a value for each word.

Term frequency - inverse document frequency (tf-idf)

The task of extracting knowledge from documents. This task can take the form of giving a document to the AI as input and getting a list of companies and the years when they were founded as output.

Information Extraction

A database similar to a dictionary, where words are given definitions as well as broader categories.

Word Net

each word is represented with a vector that consists of as many values as we have words. Except for a single value in the vector that is equal to 1, all other values are equal to 0.

One-hot Representation

Give a problem or drawback of one-hot representation. What is the solution to this problem/drawback?

* It will be incredibly inefficient when representating more words (if you have 50000 words there will be 50000 vectors with the length of 50000 each). * Another problem in this kind of representation is that we are unable to represent similarity between words like “wrote” and “authored.” Solution: we turn to the idea of Distributed Representation

Meaning is distributed across multiple values in a vector. Each vector has a limited number of values. It allows us to generate unique values for each word while using smaller vectors. We are able to represent similarity between words by how different the values in their vectors are.

Distributed Representation

an algorithm for generating distributed representations of words

word2vec

# CS50 QUIZ Consider the below context-free grammar, where S is the start symbol. S -> NP V NP -> N | A NP A -> "small" | "white" N -> "cats" | "trees" V -> "climb" | "run" Consider also the following four sentences. Cats run. Cats climb trees. Small cats run. Small white cats climb. Of the four sentences above, which sentences can be derived from the context-free grammar?

Sentence 1, Sentence 3, and Sentence 4.

# CS50 QUIZ Which of the following is not a true statement? * Attention mechanisms can be used to determine which parts of an input sequence are most important to focus on. * One-hot representations of words better represent word meaning than distributed representations of words. * Transformers can be faster to train than recurrent neural networks because they are more easily parallelized. * A Naive Bayes Classifier assumes that the order of words doesn’t matter when determining how they should be classified.

One-hot representations of words better represent word meaning than distributed representations of words.

# CS50 QUIZ Why is “smoothing” useful when applying Naive Bayes? * Smoothing allows Naive Bayes to better handle cases where evidence has never appeared for a particular category. * Smoothing allows Naive Bayes to better handle cases where there are many categories to classify between, instead of just two. * Smoothing allows Naive Bayes to be less “naive” by not assuming that evidence is conditionally independent. * Smoothing allows Naive Bayes to turn a conditional probability of evidence given a category into a probability of a category given evidence.

Smoothing allows Naive Bayes to better handle cases where evidence has never appeared for a particular category.

# CS50 QUIZ From the phrase “must be the truth”, how many word n-grams of length 2 can be extracted?

Lec 6 | Language Flashcards

(48 cards)