Information Extraction Flashcards
Information Extraction
Turn unstructured information into structured data, in order to make the information more accessible to machines and humans
Named entity recognition
Definition and Two Approaches
Labeling of categories such as
people, organizations, locations
Handled as a supervised learning task
A sequence labeling task much like part-of-speech tagging (using Hidden Markov Models) to describe connections between tags and words
Relation extraction
Extraction of relations between entities
Use textual patterns (as with hypernymy)
Semantic drift
Compute a confidence score for each pattern, based on:
- The number of already known tuples it finds (hits & misses)
- Productivity: the overall number of tuples that the pattern produces
What information is used by named entity extraction?
- orthographical shape (capitalized?)
- predictive tokens (Mrs.)
- bags of words
Bootstrapping in relation extraction
Use bootstrapping - Start with seed tuples - Acquire patterns from a corpus OR - Start with seed patterns - Acquire tuples