natural language Flashcards
2 things under NLP
NLU
NLG
topics under NLU
phonology
morphology
pragmatics
syntax
semantics
what is phonology
Part of Linguistics which refers to the
systematic arrangement of sound
what is morphology
Study of the internal structure of
words that represent the smallest
units of meaning known as
morphemes
2 types of morphemes
free/base morphemes
bound morphemes
what are morphemes
words that represent the smallest
units of meaning
what is free/base morphemes
word cannot
be divided and have meaning by itself
(e.g. table, phone)
what is bound morphemes
occur as part of
a word after adding prefix or suffix
(e.g. un-happy, cat-s)
what is inflectional morphemes
- Changes what a word does in terms of grammar but does not create a new word
- Still the same word (e.g. run, running,
ran)
what is derivational morphemes
- Creates a new word out of base words
- e.g. re + act = react, act + or = actor
what is lexical
- Interpret meaning of individual words
- Assign most probable part-of-speech (PoS) tags
- Use various techniques such stemming, lemmatization
what is syntax
- Study of the structure of phrases and sentences
- After PoS tagging is done at word level, words grouped to phrases, then grouped to sentences
- Sentences show structural dependency between the words
- Also known as Parsing – uncover phrases that covey more meaning compared to individual words
- Examines word order, stop-words, morphology and PoS
- Focus on identifying correct PoS (e.g. frowns on his face, frowns is a noun rather than verb)
what is semantic
- Determine proper meaning of a sentence by understanding most relevant words to derive concepts
- If sentence has actor, script, rating, reviews, sentence is about movie
- Also involves disambiguating words (e.g. bark)
- Interpret meaning of words or context inside sentence
- Focus on literal meaning of words
what is pragmatic
- Focus on the knowledge or content that comes from outside the content of document (i.e. speaker implied or listener infers) – inferred meaning
- Pragmatic ambiguity arises when different persons derive different interpretations of the text
what is an example of pragmatic vs semantics
Example “Do you know what time is it?
- Semantic: Asking for the current time
- Pragmatic: Expressing resentment to someone who missed the deadline
what are the text preprocessing activities
raw documents
tokenization
(case conversion, remove punctuations, normalize text, remove stop words, extract compound terms, strip special characters noises)
data structure (features representing the text)
GCP pre trained model supports ?
- Analyse Syntax
- Analyse Entities
- Analyse Sentiment
- Analyse Entities Sentiment
- Classify Content
analyse syntax does 2 operations ?
- Sentence Extraction
- Tokenisation
classify text must have enough tokens to generate a classification
True
the classification gives you different topics and their confidence levels
what is magnitude and score in analyse sentiment method GCP
score
* Indicates overall emotion
* Between -1.0 (negative) and 1.0 (positive)
* Mixed emotions could cancel out
magnitude
* Indicates how much emotional content
* 0.0 to infinity
* Not normalised; each expression of emotion adds up
* Often proportional to length of document
* Essential for comparing between documents to gauge relevant amount of emotional content
what are the keys in anlayse entities
- type: Entity Types (UNKNOWN, PERSON, LOCATION, ORGANIZATION, EVENT, WORK_OF_ART, CONSUMER_GOOD, OTHER, PHONE_NUMBER, ADDRESS, DATE, NUMBER, PRICE)
- salience
- Importance or relevance of this entity to the
entire document text - Assist information retrieval and summarization
by prioritizing salient entities - Scores closer to 0.0 are less important, while scores closer to 1.0 are highly important
what are the keys in analyse syntax
Keys in each token
* text
* partOfSpeech
* dependencyEdge
* lemma
what are dependency edge field key in the API response for analyse syntax
part-of-speech and morphological information are returned within the response’s partOfSpeech field.
For each sentence within the text provided to the Natural Language API for syntactic analysis, the API constructs a dependency tree that describes the syntactic structure of that sentence. The syntactic information are returned within the response’s dependencyEdge field