Chapter 5 - Predictive Analytics II: Text, Web, and Social Media Flashcards by Justin Austin

What is Text Mining?

The semiautomated process of extracting patterns from large amounts of unstructured data sources.

How well did you know this?

Not at all

Perfectly

What are the Seven (7) Application Areas of Text Mining?

Information Extraction
Topic Tracking
Summarization
Categorization
Clustering
Concept Linking
Question Answering

How well did you know this?

Not at all

Perfectly

What are the Fourteen (1-5) Text Mining Terms we need to know?

Unstructured Data - Data that does not have a predetermined format and is stored as textual documents.
Corpus - A large and structured set of texts prepared for the purpose of conducting knowledge discovery.
Terms - Single word or phrase extracted directly from the corpus
Concepts - Features generated from a collection of documents
Stemming - Reducing inflected words to their base or root form

How well did you know this?

Not at all

Perfectly

What are the Fourteen (6-10) Text Mining Terms we need to know?

Stop Words - Words that are filtered out prior to or after processing of natural language data.
Synonyms and polysemes - Polysemes are also called homonyms (spelled exactly the same)
Tokenizing - Assignment of meaning to blocks of text (also known as tokens)
Term Dictionary - Collection of terms specific to a narrow field that can be used to restrict the extracted terms within a corpus
Word Frequency - Number of times a word is found

How well did you know this?

Not at all

Perfectly

What are the Fourteen (11-14) Text Mining Terms we need to know?

Part-of-Speech Tagging - Marking up the words in a text as corresponding to a particular part of speech based on a word’s definition and the context in which it is used.
Morphology - Studies the internal structure of words
Term-By-Document Matrix (Occurrence Matrix)
Singular Value Decomposition (Latent Semantic Indexing)

How well did you know this?

Not at all

Perfectly

What does NLP Stand For and How is it Defined?

Natural Language Processing studies the problem of “understanding” the natural human language, with the view of converting depictions of human language into more formal representations that are easier for computer programs to manipulate.

How well did you know this?

Not at all

Perfectly

What are some of the Challenges Related to NLP? (6)

Part-Of-Speech Tagging
Text Segmentation
Word Sense Disambiguation
Syntactic Ambiguity
Imperfect or Irregular Imput
Speech Acts

How well did you know this?

Not at all

Perfectly

What is Deception Detection as it Relates to Text Mining?

It is used in prediction models to differentiate deceptive statements from truthful ones

How well did you know this?

Not at all

Perfectly

What is Part-Of-Speech Tagging?

Tokenized terms (words) are matched and interpreted against the text based on the term’s definition and the context that it is being used.

How well did you know this?

Not at all

Perfectly

What are the Three (3) Steps/Tasks for Text Mining?

Establish the Corpus - Collect all documents related to the context being studied and transform them in a manner that they are all in the same representational form for computer processing.
Create the Term-Document Matrix - Rows represent documents and columns represent terms. Relationships between the terms and documents are characterized by indices.
Extract the Knowledge - Main extraction methods are Classification, Clustering, Association, and Trend Analysis.

How well did you know this?

Not at all

Perfectly

What is a TDM?

A Term-Document Matrix that indexes the relationships between terms and documents.

How well did you know this?

Not at all

Perfectly

What is SVD?

Singular Value Decomposition reduces the overall dimensionality of the input matrix to a lower-dimensional space where each consecutive dimension represents the largest degree of variability between words and documents.

How well did you know this?

Not at all

Perfectly

What is Sentiment Analysis?

Sentiment analysis is trying to answer the question “What do people feel about a certain topic?” by digging into opinions using a variety of automated tools.

How well did you know this?

Not at all

Perfectly

What are the Seven (7) Discrete Sentiment Analysis Applications Stated by the Author?

Voice of the Customer (VOC)
Voice of the Market (VOM)
Voice of the Employee (VOE)
Brand Management
Financial Markets
Politics
Government Intelligence

How well did you know this?

Not at all

Perfectly

What is the Sentiment Analysis Process?

Sentiment Detection
N-P Polarity Classification
Target Identification
Collection and Aggregation

How well did you know this?

Not at all

Perfectly

What are the Three (3) Different Elements of Sentiment Analysis?

Study These Flashcards

Polarity Identification
Identifying Semantic Orientation of Sentences and Phrases
Identifying Semantic Orientation of Documents

What is Polarity Identification?

Study These Flashcards

The process of identifying the sentiments under one of two opposing polarities, or locate the position along the continuum between the polarities.

What are the Two (2) Methods of Polarity Identification?

Study These Flashcards

Using a lexicon as a reference library
Using a collection of training documents as the source of knowledge about the polarity of terms within a specific domain.

What is Web Mining?

Study These Flashcards

The process of discovering intrinsic relationships from Web data, which are expressed in the form of textual, linkage, or usage information.

What are Web Crawlers?

Study These Flashcards

AKA Spiders are used to read through the content of a website automatically.

What is an Authoritative Page?

Study These Flashcards

Use of a web page or a relevance index that improves the search results and rankings of relevant pages.

What is a HITS?

Study These Flashcards

A hyperlink-induced topic search. It is a link-analysis algorithm that rates Web pages using the hyperlink information contained within them.

What is Web Structure Mining and Why is it Important?

Study These Flashcards

Web Structure mining is the process of extracting useful information from the links embedded in Web documents. It is used to identify authoritative pages and hubs which are the cornerstones of page-rank algorithms relied upon by Google and other search engines.

What is SEO?

Study These Flashcards

Search Engine Optimization is the intentional activity of affecting the visibility of a website in a search engine’s natural search results.

What is Clickstream Analysis?

The analysis of the information collected by Web servers that help us better understand user behavior. It is used to discern interesting patterns from clickstreams.

What is Social Analytics?

Monitoring, analyzing, measuring, and interpreting digital interactions and relationships of people, topics, ideas, and content.

What are the Three (3) Social Network Categories?

Connections Distributions Segmentation

What are the Subcategories for Connections? (5)

Homophily - Actors form ties with similar vs. dissimilar others Multiplexity - Number of content forms contained in a tie Mutuality/Reciprocity - How much two actors reciprocate interaction or friendship Network Closure - Measure of the completeness of relational triads Propinquity - Tendency to have more ties with geographically close others

What are the Subcategories for Distributions? (6)

Bridge - An individual whose weak ties fill a structural hole, providing the only link between two individuals or clusters Centrality - Metrics that aim to quantify the importance of a particular node within a network Density - Proportion of direct ties in a network relative to the total Distance - Minimum number of ties needed to connect two actors Structural Holes - Absence of ties between two parts of a network Tie Strength - Linear combination of time, emotional intensity, intimacy, and reciprocity

What are the Subcategories for Segmentation? (3)

Cliques and Social Circles - Cliques if every individual is tied to every other individual and Social Circles if there is less stringency of direct contact Clustering Coefficient - Likelihood two members of a node are associates. Higher clustering indicates a great cliquishness Cohesion - Degree to which actors are connected directly to each other by cohesive bonds

Chapter 5 - Predictive Analytics II: Text, Web, and Social Media Flashcards

(30 cards)