Module 1 - chatbot and fundamentals Flashcards
A definite noun refers to a ____________ of a noun(s), while an indefinite noun refers to a of a noun(s)
A definite noun refers to a specific instance of a noun, while an indefinite noun refers to a general category of nouns
A women –> indefinite
thereafter,
The woman –> definite
Phonetics and phonology:
how words are related to sounds that realize them
Morphology:
how words are constructed from more basic meaning units
Syntax:
how words can be put together to form correct utterances
What structural role each word plays in the sentence
What phrases are subparts of other phrases
Lexical semantics:
what words mean
sole vs soul
Compositional semantics:
how word meanings combine to form larger meanings
Pragmatics:
how situation affects interpretation of utterance
Context matters
Discourse structure:
how preceding utterances affects processing of next utterance
Friend 1: I’m hungry.
Friend 2: Let’s go to the Fuji Gardens. (restaurant)
Friend 1: It’s a beautiful day.
Friend 2: Let’s go to the Fuji Gardens.
Morphology: How words are constructed from more basic units, called __________
How words are constructed from more basic units, called morphemes
adverb/adjective, pluralization, suffixes…
friend + ly = friendly
friend is the noun
suffic -ly turns it to an adjective (or for a verb, an adverb)
Temporal Interpretation
a subset of Discourse
Understanding of time impacts your meaning of the sentence.
“Max fell. John pushed him”
him refers to Max; pushing happned before falling; the second sentence is an explanation for the first here.
World Knowledge
a subset of discourse
What we know about the world and what we can assume our hearer
knows about the world is intimately tied to our ability to use language
I took the fugu from the plate and ate it.
refers to the dish made from fugu, not a live fugu fish.
_ is a fundamental problem of computational linguistics. Resolving _ is a crucial goal.
ambiguity
Normalization: Stemming is
the process of reducing a word to its stem/root word.
Normalization: Lemmatization is
related to stemming, it reduces words to its cononical forms based on a word’s lemma. Dictionary form of the word.
better —> good
Normalization: everything else
substitution and removal
- chars set to upper/lower
- remove numbers
- remove punctuation
- etc
stop word removal
missed it
Tokenization is
? single words basically
POS tagging
Parts of Speech process of dtagging words in a sentince to a prtice POS based on its position in sentence and onctext or something like that
N-grams are the combination of
multiple words used together.
used when we want to preserve sequence info in the doc, like what word is likely to follow a given one.
n- refers to number of words together e.g. bi-gram, tri-gram.
each individual word would be called a unigram. They dont’ contain any sequence ifo because each word taken individually.
vectorization is
the process of converting text into numbers - machine readable.
BOW
it is a method of __________
Bag of Words method for vectorization.
table iin lecture showed count of words in each sentence. it is missing the order of the words.
Type of regular expression
Literals are
normal text characters
Type of regular expression
Metacharacters are
characters that have special meanings in regex:
. & * + $ ? | \ ^ [ { (
Need escape character to use them literally.
Use of metacharacter:
a.b
period
see regex101.com
wildcard, any character except a newline
matches acb or azb or a&b