Topic 1: Regular Expression Flashcards
key components of RE
search, string, pattern, corpus
what is regular expression
language for specifying text search.
expression used to specify a set of strings required for a particular purpose
what is string?
sequence of symbols.
in text based search, string is a sequence of alphanumeric character
what is pattern?
a specific sequence of character/symbols. useful in RE for text searching
a regular expression search require 2 things. what is it?
pattern to search..corpus (text to search through)
what are the application of regular expressions?
- test for a pattern within a string.
- use in database for selecting data
- substitution
what are the basic patterns in RE
- case sensitivie/disjunction..with example
- negation..with example
- range
- RE symbols: ? * +
- RE: disjunction, precedence
types of errors and definition
- false positive
2. false negative
what are the efforts to reduce error rate?
- increase accy / precision
2. increase coverage / recall
what is capture group?
usage of parenthesis storing a pattern in memory.
what is a corpus?
a computer-readable collection of text or speech
brown corpus?
brown sentence?
what is an utterance?
a unit of speech bounded by silence
what are the component in disfluencies?
fragments, filled pauses
give example of fragments and filled pauses
- main mainly
2. uh, uhm
definition of word types and tokens
word types are a numer of distinct word in a corpus..tokens are number of running words