MID Flashcards
What is Natural Language Processing?
The ability of a computer program to understand human speech as it is spoken.
Various kinds of knowledge of language?
- Phonetics and Phonology
knowledge about linguistic sounds - Morphology, knowledge of the meaningful components of words
- Syntax
knowledge of the structural relationships between words - Semantics
knowledge of meaning - Pragmatics
knowledge of the relationship of meaning to the goals and intentions of the speaker - Discourse
knowledge about linguistic units larger than a single utterance
What is Ambiguity?
one phrase often has multiple meanings
Most important models?
- state machines
- rule systems
- logic
- probabilistic models (crucial one)
- vector-space models
machine learning tools for language tasks?
- classifiers
2. sequence models
What is Regular Expressions (RE)?
- standard notation or language for specifying text sequences
- a formula in a special language that is used for specifying simple classes of strings
- an algebraic notation for characterizing a set of strings
Regex search requires?
pattern and corpus
Simplest kind of regex?
a sequence of simple characters
most common anchors in regex?
- caret symbol, matches the start of a line
2. dollar sign, matches the end of a line
RE: \d
any digit
ex: [0-9]
RE: \D
any non-digit
ex: [^0-9]
RE: \w
any alphanumeric or space
ex: [a-zA-Z0-9 ]
RE: \W
a non-alphanumeric
ex: [^\w]
RE: \s
whitespace (space, tab)
ex: [ \r\t\n\f]
RE: \S
non-whitespace
ex: [^\s]
special characters yang perlu pake backslash?
\* (tanda bintang) \. (tanda titik) \? (tanda tanya) \n (newline) \t (tab)
regular language can be describe by?
regular expressions and finite-state automata
3 standard solutions to the problem of non-determinism in finite-state automata?
- backup, whenever we come to a choice point, we could put a marker to mark where we were in the input and what state the automaton was in. then if it turns out that we took the wrong choice, we could back up and try another path.
- look-ahead, we could look ahead in the input to help us decide which path to take.
- parallelism, whenever we come to a choice point, we could look at every alternative path in parallel.
primitive operations of a regular expression?
- concatenation, ujung FSA1 sambung ke awal FSA2
- closure, start state sambung ke end state, end state sambung ke start state
- union, start state baru sambung ke start state FSA1 dan start state FSA2
operations in regular languages?
- intersection
- difference
- complementation
- reversal
closure known as?
kleene star
process steps in NLP?
- input
- tokenization
- syntactic analysis
- semantic analysis
- pragmatics
- output
What is Orthographic?
Orthographic rules tell us that English words ending in -y are pluralized by changing the -y to -i- and adding an -es.