AIIM icp Search Flashcards
Crawler
a machine that goes through the text and metadata to find things in the system
Search engine index
creates and organizes a database of keywords for all the different docs
Query Engine
plain language or boolean actually searches the documents for the words being searched
Human-powered directories
the directories are created by humans (mahalo) may be subjective
hybrid search engines
uses query and human powered directories (like google or yahoo)
homogeneous search
one search tool, multiple repository indexes
federated search
multiple search tools, multiple repository indexes (merlot)
universal search
one search tool, one repository index (disregards any other search tools or pre-existing repository indexes)
application search
built in searches in specific applications (like email search etc) keeps application security in place (but only limited to the application data)
Parametric search
rules based, fielded search. look at attributes (parameters) already built-in to the documents. Most precise but only limited to the declared fields
keyword search
a type of parametric search, they are set by the users. May be more precise because it won’t rely on only the words in the document and it applies human reasoning to the document a little inflexible since dependant on humans
semantic / pattern search Natural language search
looks for the meaning behind the words, not just the specific word.
Statistical search
using baseian probabilities looking on search. Also not based on language so it can fit no matter the language. uses relevancy ranking
Concept and fuzzy search
looks on synonyms and simpliar spellling etc. but can get large lists.
Concept clustering
using algorithims to profile the concepts and compared to the other documents creates a large organization of overlapping concepts
Social search
looking at all the social search and creating more relevant documents. looking a docs your friends have liked
Search Engine Optimization (SEO)
Works on content rather than the index to make the documents more “findable” ambient findability
Effective keyword use
not just in text, but title meta, header of html and xml docs
effective link-building
adding more links to make more effective, also plain-lanugage urls, image tags
use of social tools
organizations wikis, tweet, and social to highlight the best documents for people
thesaurus
manages and tracks the definitions of words and phrases and their relationships to one another in a heiarchical fashion to correlate between groups and applications
semantic networks
like thesaurus but a higher view, using a metadata based infrastructure to connect search terms to other related terms
comparison technics
going through and weeding out duplicates from results
semantic feaure extraction and comparison
we use words to discribe things so if document uses the a portion of the same words to describe the document then it is likely referring to the same or similar thing
hash code
a mathamatical description of a document, if identical then, maybe same document if not they are not same document