Book Notes Flashcards

Question 1

Q

What is classification?

Answer

A

Predicts for each individual in a population, to which set of classes that individual belongs to.

List of classes must be exhaustive and mutually exclusive.

Related tasks: scoring and class probability estimation.

Scoring model: estimates a score to determine the probability that each individual belongs to each class .

Question 2

Q

What is regression?

Answer

A

Attempts to estimate or predict, for each individual, the numerical value of some variable for that individual: “given x, what is y?”

Value estimation > Has a numerical target (€, height, time etc.)

Predicts how much something will happen

Question 3

Q

Whst is similarity matching?

Answer

A

Attempts to identify similar individuals (people, product etc.) based on data known about them

Often used for making product recommendations

“Given these characteristics, who is similar to my target audience”

Question 4

Q

What is clustering?

Answer

A

Attempts to group individuals in a population together by their similarity (but not driven by any specific purpose)

Useful in preliminary domain exploration to see which natural groups exist in data set

Basis for developing subsequent subtasks

Question 5

Q

What is co-occurence grouping?

Question 6

Q

What is profiling?

Answer

A

Attempts to characterize the typical behavior of an individual, group or population

Can be done at different levels: entire population or sub-clusters of the data

Use case: Often used to establish behavioral norms (typical purchases or user behavior) for anomaly detection applications such as fraud detection and monitoring intrusions

Question 7

Q

What is link prediction?

Answer

A

Attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly estimating the strength of the link

Use case A: Recommending friends in social networks, based on shared social connections

Use case B: Recommending movies on Netflix

Logic at work: Searching for links that do not exist, but are predicted to exist and be strong

Question 8

Q

What is data reduction?

Answer

A

Attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information of in the larger set

Goal: Higher readability, thus easier to generate insights, at a (small) loss of information

Question 9

Q

What is causal remodelling?

Answer

A

Attempts to help us understand what events or actions actually influence others

Key question: Is event B actually influenced by A or do other factors explain the observation

Useful techniques: A/B testing

Question 10

Q

What is supervised data mining?

Answer

A

Problem statement is clearly defined (based on certain conditions which must be met)

⇒ If specific target can be identified a supervised method should be deployed

⇒Results are much more useful

⇒Data on the target is essential > labeling the target before analyzing it

⇒Common methods: classification, regression & causal modeling

Subclass A: Regression = numeric target

Subclass B: Classification = categorical target

Question 11

Q

What is Unsupervised data mining?

Answer

A

No specific outcome, purpose or target is defined at first.

Even if technique yields a result, is it not clear if causal relationships are given and if the result can be used further. Possible that the created groups are not meaningful.

Common methods: Clustering, co-occurrence grouping & profiling

Book Notes Flashcards

(11 cards)