4: Big Data Flashcards

Question 1

Q

Big Data differs from traditional data sources by set characteristics:

Answer

A

Volume: huge quantities
Variety: data sources
Velocity: speed
(Veracity): trustworthiness/
dependability

Question 2

Q

Preprocessing for structured data:

Answer

A

Extraction
Aggregation
Filtration
Selection
Conversion

Question 3

Q

Algorithms with errors in assumptions leads to:

Answer

A

High bias w/ poor approximation

Leading to: underfitting & high in-sample error

Question 4

Q

The degree to which a model fits the training data

Answer

A

Bias Error

Question 5

Q

Unstable models lead to:

Answer

A

Picking up noise and producing high variance,
Resulting in: overfitting and out-of-sample error

Question 6

Q

Harmonic mean of precision and recall

Question 7

Q

Most useful in situations where the chance of rejecting the null, when it is true, is costly

Answer

A

Precision, helps Type 1 errors

High precision= reduce risk of Type 1 error

Question 8

Q

Most useful in situations where the chance of accepting the null, when it is false, is costly

Answer

A

Recall, helps with Type 2 errors

High recall= reduce risk of Type 2 error

Question 9

Q

Training a model that a degree of noise or randomness is mistaken for patterns and relationships, leads to:

Answer

A

Overfitting

Question 10

Q

Receiver Operating Characteristic (ROC), shows the tradeoff between:

Answer

A

False positives
&
True positives

Question 11

Q

Root mean square is used when:

Answer

A

Target Variable is continuous

Question 12

Q

When there is unequal class distributions in the data set, the best measure of accuracy is:

Answer

A

F1; harmonic mean of precision & recall

Question 13

Q

Bag of words occurs in Feature Selection, which is after the text has been cleansed & normalized so BOW would be concise:

Answer

A

Text cleansing & wrangling has been completed, so BOW would be free of:
* stemming
* lemmatization
* lower casing
* stop words

Question 14

Q

Unstructured data

What is removed in the text processing/cleansing stage?

Answer

A

HTML tags
punctuation
numbers
white space

Question 15

Q

What occurs in the text wrangling/preprocessing/ normalization stage?

Answer

A

stemming
lemmatization
lower casing
stop words

4: Big Data Flashcards

(15 cards)