Test 2 Flashcards
Chapters 8, 16, 17, 21
Natural Language
Unfettered spoken or written language
-Primary means of human communication
Natural Language Processing (NLP)
Enabling the use of automated methods that represent the relevant information in the text with high validity and reliability.
Patrick Suppes
-Pioneer in computerized learning
“…the challenge to psychological theory made by linguists to provide an adequate theory of language learning may well be regarded as the most significant intellectual challenge to theoretical psychology in this century.”
Bag-of-Words
A language model where text is represented as a collection of words, independent of each other and disregarding word order.
Keyword
A word or phrase that conveys special meaning or to refer to information that is relevant to such a meaning,
Machine Learning
A computer technique in which information learned from data is used to improve system performance.
NLP Text Processing
- Lexical: Tokenization, part of speech, head, lemma
- Parsing and Chunking
- Semantic Tagging: Semantic role, word sense
- Certain Expressions: Named entities
- Discourse: coreference, discourse segments
NLP Speech Processing
- Phonetic transcription
- Segmentations (Puncutations)
- Prosody
Types of NLP: Information Extraction
Methods that process text to capture and organize specific information in the text and also to capture and organize specific relations between the pieces of information.
-Most common form in biomedicine.
Biosurveillance
A public health activity that monitors a population for occurrence of a rare disease or increased occurrence of a common one.
Named-entity Recognition
In language processing, a sub-task of information extraction that seeks to locate and classify atomic elements in text into predefined categories
Named-entity Normalization
The natural language processing method, after finding a named entity in a document, for linking (normalizing) that mention with appropriate database identifiers.
Modifiers of Interest
In NLP, a term used to describe or otherwise modify a named-entity that has been recognized.
Relations Among Named Entities
A characterization of two entities in NLP with respect to the semantic nature of the relationship between them.
Reference Resolution
In NLP, recognizing that two mentions in two different textual locations refer to the same entity.
Question Answering (QA)
A computer-based process whereby a user submits a natural language question that is then automatically answered by returning a specific response.
Text Summarization
Takes one or several documents as input and produces a single, coherent text that synthesizes the main points of the input documents.
Text Generation
Methods that create coherent natural language text from structured data or from textual documents in order to satisfy a communication goal.
Machine Translation
Automatic mapping of text written in one natural language into text of another language.
Text Readability Assessment and Simplification
An application of NLP in which computational methods are used to assess the clarity of writing for a certain audience or to revise the exposition using similar terminology and sentence construction.
Linguistic Steps in NLP: Morphology
The way words are built up from smaller, meaning-bearing units; the structure of words
- Various forms of basic words
- Make more words from less.
Linguistic Steps in NLP: Syntax
How words are put together to form correct sentences and what structural role each word has.
-Syntax tree assigned by grammar or lexicon.
Linguistic Steps in NLP: Semantics
What words mean and how these meanings combine in sentences to form sentence meanings.
Linguistic Steps in NLP: Pragmatics
How sentences are used in different situations and how use affects the interpretation of the sentence.
Linguistic Steps in NLP: Discourse
How the immediately preceding sentences affect the interpretation of the next sentence.
Natural Language Understanding (NLU)
Subtopic of NLP in Artificial Intelligence that deals with machine reading comprehension.
Applications of NLP
- Intelligent computer systems
- NLU interfaces to databases
- Computer-aided instruction
- Information Retrieval
- Intelligent web searching
- Data mining
- Machine translation
- Speech Recognition
- Natural Language Generation
- Question Answering
Difficulties of NLP
- Different ways of parsing a sentence.
- Word category ambiguity
- Word sense ambiguity
- Words can mean more than the sum of their parts
- Imparting world knowledge is difficult
- Fictitious worlds
- Defining scope
- Language is changing and evolving
- Complex ways of interaction between the kinds of knowledge
- Exponential complexity at each point in using the knowledge
Ambiguity
The fundamental problem of computational linguistics
Morpheme
The smallest unit in grammar that has a meaning or linguistic function.
-Generally a root of a word, a prefix, or a suffix
Free Morpheme
A morpheme that is a word and does not contain another morpheme
Bound Morpheme
A morpheme that creates a different form of a word but must always occur with another morpheme.
Inflectional Morpheme
A morpheme that creates a different form of a word without changing the meaning or part of speech.
Derivational Morpheme
A morpheme that changes the meaning or part of speech of a morpheme.
Regular Expression
A mathematical model of a set of strings, defined using characters of an alphabet and the operators concatenation, union, and closure.
-Zero or more occurrences of an expression
Lexicon
A catalogue of words in a language, usually containing syntactic information such as parts of speech, pluralization rules, etc.
Finite State Automaton
An abstract, computer-based representation of the state of some entity together with a set of actions that can transform the state.
-Collections of finite state automata can used to model complex systems.
Tokens (NLP)
The composite entities constructed from individual characters, typically words, numbers, dates, or punctuation.
Markov Process
A mathematical model of a set of strings in which the probability of a given symbol occurring depends on the identity of the immediately preceding symbol or the two immediately preceding symbols.
Lexemes
A minimal lexical unit in a language that res presents different forms of the same word.
Telegraphic (NLP)
Language that does not follow the usual rules of grammar but is compact and efficient.
Grammar (NLP)
A mathematical model of a potentially infinite set of strings.
Nested Structures (NLP)
A phrase or phrases that are used in place of simpler words within other phrases.
Probabilistic Context-Free Grammar
CFG in which the possible ways to expand a given symbol have varying probabilities rather than equal weight.
Dependency Grammar (NLP)
A linguistic theory of syntax that is based on dependency relations between words, where one word in the sentence is independent and other words are dependent on that word.
Logic-based Semantics
A knowledge representation method based on the use of predicates.
Conceptual Graph (Semantics)
A formal notation in which knowledge is represented through explicit relationships between concepts.
Word Senses
Possible meanings of a word
Semantic Types
The categorization of words into semantic classes according to meaning.
Semantic Patterns
The study of the patterns formed by the co-occurrence of individual words in a phrase of the co-occurrence of the associated semantic types of words.
Semantic Relations
A classification of the meaning of a linguistic relationship.
Referential Expression
A sequence of one or more words that refer to a particular person, object, or event.
Coreference Chains
Provide a compact representation for encoding the words and phrases in a text that all refer to the same entity.
Parse Tree
The representation of structural relationships that results when using a grammar to analyze a given sentence.
Transition Matrix
A table of numbers giving the probability of moving from one state in a Markov model to another state, or the state that is reached in a finite-state machine, depending on the current character of the alphabet.
Chunking (NLP)
A processing method for determining non-recursive phrases where each phrase corresponds to a specific part of speech.