Chapter 7 Flashcards
authoritative pages
Web pages that are identified as particularly popular based on links by other Web pages and directories.
clickstream analysis
The analysis of data that occur in the Web environment.
clustering
Partitioning a database into segments in which the members of a segment share similar qualities.
corpus
In linguistics, a large and structured set of texts (now usually stored and processed electronically) prepared for the purpose of conducting knowledge discovery.
deception detection
A way of identifying deception (intentionally propagating beliefs that are not true) in voice, text, and/or body language of humans.
hubs
One or more Web pages that provide a collection of links to authoritative pages.
hyperlink-induced topic search
(HTS) The most popular publicly known and referenced algorithm in Web mining used to discover hubs and authorities.
polarity identification
Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities or to locate its position on the continuum between these two polarities.
Word / Term level.
1) use a lexicon as a reference library.
2) use a collection of training documents.
polyseme
Words also called homonyms, they are syntactically identical words (i.e., spelled exactly the same) with different meanings (e.g., bow can mean “to bend forward,”
“the front of the ship,” “the weapon that shoots arrows,” or “a
kind of tied ribbon”).
search engine
A program that finds and lists Web sites or pages (designated by URLs) that match some user-selected criteria.
sentiment analysis
The technique used to detect favorable and unfavorable opinions toward specific products and services using a large number of textual data sources (customer feedback in the form of Web postings).
SentiWordNet
An extension of WordNet used for sentiment identification.
singular value decomposition
Closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability (between words and documents).
social media analytics
The systematic and scientific way to consume the vast amount of content created by Web-based social media outlets, tools, and techniques for the betterment of an organization’s competitiveness.
social network analysis
(SNA) The mapping and measuring of relationships and information flows among people, groups, organizations, computers, and other information - or knowledge-processing entities. The nodes in the network are the people and groups, whereas the links show relationships or flows between the nodes.