B02 Text Analytics I Flashcards

1
Q

Define Text Analytics

A

Text Analytics is the process of extracting high quality insights from textual data. It is sometimes referred to as text mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define this type of Text Analytics: Search and information retrieval

A

Search and information retrieval covers indexing, searching, and retrieving documents from large text databases with keyword queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define this type of Text Analytics: Document Clustering

A

Document clustering uses an unsupervised machine learning approach to group similar documents into clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define this type of Text Analytics: Document Classification

A

Document classification assigns a known set of labels to untagged documents using a model learned from documents with known labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define this type of Text Analytics: Information Extraction

A

The goal of information extraction is to construct (or extract) structured data, such as names, places and organizations, from unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define this type of Text Analytics: Syntactic Parsing

A

Syntactic parsing uses part of speech tagging (POS) techniques to identify words in order to use them in a grammatical or useful context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define this type of Text Analytics: Concept Extraction

A

Concept extraction is focused on grouping words, phrases and other lexical structures into semantically similar groups in order to understand the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Regular Expressions (Regex)

A
  • Regular expressions (“regex”) provide us with a concise language for describing patterns in text by using a set of metacharacters ($ * + . ? [] ^ { } | ( ) ). - Identifing patterns and manipulating text allow us to transform textual data from unstructured or semi-structured form to structured form for analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The syntax for regular expressions can be grouped into four major categories:

A

1.Operators. 2.Escape sequences. 3.Anchors and repetitions. 4.Character classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regex

Operators:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regex

Escape Sequences

A

-Some special characters in R cannot be referenced directly.
-For example, to refer to a pattern that contains a single
quote (’), we refer to the single quote by preceding it with an
escape character ().
-Some common escape sequences include: single quote (\’),
double quote (\”), newline (\n), carriage return (\r), tab
character (\t) and even the back slash (\).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regex

Anchors and Repetitions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regex

Character Classes 1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regex

Character Classes 2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Hypertext Markup Language (HTML)

A

-The standard markup
language used for
creating web pages.
-It describes the structure
of web pages by the use
of tags such as

,
, , etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Extensible Markup Language (XML)

A

-A markup language
created with the
intention of being both
human and machine
readable.
-Similar to HTML but with
some key differences.

17
Q

JavaScript Object Notation (JSON)

A

A text-based open
standard designed for
exchanging structured
data over the web.
Often, JSON is embedded
within HTML tags on a
website.

18
Q
A