Chapter 6 Querying Full-Text Data Flashcards
What is full-text search?
Full-text search enables you to run full-text queries against character-based data in SQL Server tables.
What are the prerequisites before you start using full-text predicates and functions?
You must create full-text indexes inside full-text catalogs. The full-text indexes are on character-based columns in your database.
What does full-text search allow you to search for?
(1) Simple Terms - One or more specific words or phrases (2) Prefix Terms - Terms the words or phrases begin with (3) Generation Terms - Inflectional forms of the words (4) Proximity Terms - words or phrases close to another word or phrase (5) Thesaurus Terms - synonyms of a word (6) Weighted Terms - words or phrases that use custom weight values (7) Statistical Semantic Search - Key phrases in a document (8) Similar Documents - similarity defined by key phrases.
On what types of columns can you create full-text indexes?
CHAR, VARCHAR, NCHAR, NVARCHAR, TEXT, NTEXT, IMAGE, XML, VARBINARY(MAX)
What do you need in order to use full-text search?
(1) Check whether the Full-Text Search is installed: SELECT SERVERPROPERTY(‘IsFullTextInstalled’) (2) Full-text indexes must be created on character data or document (binary/image/xml) columns (3) For documents, you need appropriate filters (ifilters) that allow you to extract the textual information and remove the formatting from the documents. You can check which filters are installed: EXEC sys.sp_help_fulltext_system_components ‘filter’. (4) After you install the filter pack, register the filters: EXEC sys.sp_fulltext_service ‘load_os_resources’, 1; (5) You may need to restart SQL Server.
What is a word breaker?
Identifies individual words or tokens. Tokens are inserted in a full-text index in compressed format. Word breakers are language specific.
What is a stemmer?
Stemmers generate inflectional forms of a word based on the rules of a language. Stemmers are language specific.
What are stoplists?
You can prevent indexing “noise” words by creating stoplists of stopwords. This helps to prevent bloating a full-text index with words that don’t help your searches.
You can check current stoplists and stopwords using:
SELECT stoplist_id, names
FROM sys.fulltext_stoplists;
SELECT stoplist_id, stopword, language
FROM sys.fulltext_stopwords;
What is a thesaurus file?
Full-text queries can search not only for words you provide in a query, they can search for synonyms as well. SQL Server finds synonyms in thesaurus files. Each language has an associated XML thesaurus file. You can edit the thesaurus files and configure the following elements: (1) diacritics_sensitive - 0 accent insensitive/1 accent sensitive, (2) expansion - expansion word “author” to “writer” in order to link the two, (3) replacement - “Windows 2008” could be a replacement for “Win 2k8”. After you edit the thesaurus file for a specific language, you must load it with the following stored procedure call: EXEC sys.sp_fulltext_load_thesaurus_file 1033; (1033 is the language id - US-EN).
Can full-text queries search on document properties?
Yes. Which properties can be searched depends on the document filter. You can create a search property list to define searchable properties for you documents.
Where are full-text indexes stored?
Full-text indexes are stored in full-text catalogs. A full-text catalog is a virtual object, a container for full-text indexes. As a virtual object it does not belong to any filegroup.
Can you store indexes from the same full-text catalog to different filegroups?
Yes. A full-text catalog is a virtual object only; full-text indexes are physical objects. You can store each full-text index from the same catalog to a different file group.
What is statistical semantic search?
It gives you deeper insight into documents by extracting and indexing statistically relevant key phrases. Full-text search uses these key phrases to identify and index documents that are similar or related.
Which database do you have to install in order to enable the Semantic Search feature?
You need the semanticsdb database.
How can you create synonyms for the words searched?
You can add synonyms by editing the thesaurus file.