Lexical Knowledge Bases Flashcards
What is a lexicon?
In computer science, a machine-readable dictionary that supports NLP functions such as POS, inflections (oxen instead of oxes), transitive vs. intransitive verbs (does the verb need an object?).
What is a lexical knowledge base?
- Organize words into senses
- Link senses via relations as examples below
- Goes beyond a lexicon because it connects words in a lexicon (synonyms, antonyms, hyponyms, hypernym, holonom/meronym). WordNet is a lexical knowledge base.
What is a hyponym/hypernym relationship?
In the dog example, a hyponym might be something like a springer spaniel. It gets less. A hypernym would be mammals, it gets more than just dogs.
What is a meronym/holoynym relationship?
A part of something else, e.g., a wheel is part of a car. It’s the whole thing that other things are part of. A car is a holonym of wheel.
What is a semantic network?
Knowledge base that has a network of different types of relationships between different words. A has relationship would be a cat has fur. A is a relationship is a cat is a mammal.
What is ontological distance?
Uses the hypernyms in wordnet to count the steps to get from one word to another. The closer the distance, the more similar the words are. It could also be done between documents or words in two documents.
Limits to this approach
Word sense. We don’t know which sense of the word chair was meant. We usually just use sense #1 - it should be right about 70% of the time.
What is monosemy?
Words that only have one meaning
What are polysemy?
Words that have more than one possible meaning. The more common a word is, the more polysemous they are.
Building or extending lexical knowledge bases
You can supplement WordNet by taking information from other sources such as dictionaries, encyclopedias, and taxonomies. Examples are Gety Vocabularies, Amazon, Urban Dictionanary, Wiktionary.
Urban Dictionary
1) Check the robots.txt file to find out if we have permission to grab data. http://website.com/robots.txt
2) If you’re good to go, go to the sitemap xml, can use Beautiful Soup to get all the words
Getty has an JSON API
Wiktionary can be downloaded as XML
Urban Dictionary also has an undocumented API (a lot of sites have undocumented APIs, search Stack Exchange)
Encyclopedic Resources
- Wikipedia
- IMDB
- DotDash (about.com)
- Investopedia
- International Encyclopedia of the First World War
- Internet Encyclopedia of Philosophy
Taxonomical organizations
- Curie (DMOZ)
- Sitemaps from CNN Money, Vogue, LA Times, SFGate
Applications of lexical knowledge bases
Examples:
1) Enhance search engines
- Query expansion: Adding more words to the words typed in.
- Related searches
- More like this function, uses psuedorelevance
2) Writing evaluation and advice
(helping children or adults learn how to write better)
-if you use fun three times, suggest another option
-suggest that the writer be more specific
Capsules
Short text for every search result
Pseudorelevance feedback
Expanding initial query/results to include more results for a query
What is a graph?
Data Structure that allow you to represent relationships. There are two main parts - vertices (nodes) where the data is stored and edges (connections) which connect the nodes. Once you put the actual values in a graph it becomes a knowledge graph. Schema of a table is similar to an ontology of a graph. The ontology defines the schema for a graph, but as soon as there are instances with, it becomes a knowledge graph.
Challenges
-Freshness: Is the information up to date?
-Coverage: Do we have all the information we need?
-Correctness: is our information accurate? Correctness is always hard. What is true and correct? There has to be human validation.
You can have two out of three
Entity resolution: If multiple sources or entries, we need to clean or determine which source will be the one.