Chapter 2 Flashcards

1
Q

Characteristics of data warehousing

A
  1. Subject oriented
  2. Integrated
  3. Time-variant (time series)
  4. Nonvolatile
  5. Web based
  6. Relational/multi-dimensional
  7. Client/server
  8. Real-time
  9. Include metadata
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data?

A

A collection of facts usually obtained as the result of experiences, observations or experiments.
- the lowest level of abstraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data in Analytics can be categorized into:

A
  • structured data
  • unstructured or semi-structured data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Structured data can be categorized into:

A
  • categorical
  • numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Categorical data can be cateorized into:

A

-nominal
- ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Numerical data can be categorized into:

A
  • interval
    -ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unstructured or semi-structured data can be categorized into:

A
  • textual
  • multimedia
  • XML/JSON
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multimedia data can be cateorized

A
  • image
  • audio
  • video
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the measures of centrality?

A
  • arithmetic mean
  • mean
  • mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the measures if dispersion?

A
  • range
  • variance
  • standard deviation
  • mean absolute deviation
  • quartiles
  • box plots
  • shape distribution: skewness, kurtosis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define data visualization

A

use of visual representations to explore, make and communicate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the role of dashboards?

A

they provide visual displays of important information that is consolidated and arranged on a single screen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the best practices in dashboard design?

A
  1. Benchmark KPIs with industry standards
  2. Warp metrics with contextual metadata
  3. Validated design by usability specialist
  4. Prioritizte and rank alerts and exceptions
  5. Pick the right visual constructs
    6- Provide guided analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are types of Information Retrieval?

A
  • Document Matching
  • Link Analysis
  • Search Engines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are types of Web Mining?

A
  • Web content mining
  • Web structure mining
  • Web usage mining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are types of data mining?

A
  • Classification
  • Clusering
  • Association
17
Q

What are types of Natural Language Processing?

A
  • POS Tagging
  • Lemmatization
  • Word Disambiguation
18
Q

What are types of Text Mining?

A
  • Web Mining
  • Data Mining
  • Information Retrieval
  • Natural Language Porcessing
19
Q

Why is text difficult?

A
  • often “unstructured”
  • linguistic nature intended for humans, not for computers
  • text is relatively “dirty”
  • context is important
  • goal is to rune text into feature-vector form
20
Q

What is a Document?

A

one piece of text, no matter how large or small

21
Q

What do individual tokens and term compose?

A

a document

22
Q

What is a collection of documents called?

A

a corpus

23
Q

What are representation techniques?

A
  • bag of words
  • term frequency
  • inverse document frequency
  • TFIDF
  • N-gramms
24
Q

Bag of words: what does it involve?

A

creating a “bag” or set of words from a text document

25
Q

Bag of words: what does it only consider?

A

the presence or absence of words, not their sequence

26
Q

What is term frequency?

A

measure of how often a term (word) appears in a document

27
Q

What does Invesre document frequency measure?

A

how important a term is across a collection of documents

28
Q

What is TFIDF?

A

numerical statistic that combines the TF and IDF scores to reflect the importance of a term in a document within a larger collection

29
Q

What is the role of N-gramms?

A

they capture local patterns and relationships between adjacent elements in a sequence

30
Q
A