Session 3.1 Flashcards

Question 1

Q

Data science tasks can be split into two groups

Answer

A

Unsupervised Methods
there is no specific target variable
Supervised Methods
there is a specific target variable

Question 2

Q

Unsupervised Methods

Answer

A

Affinity grouping
Similarity Matching
Clustering
Sentiment Analysis

Question 3

Q

Supervised Methods

Answer

A

Predictive Modeling

- Causal (Explanatory) Modeling

Question 4

Q

Predictive modeling

Answer

A

is the process of applying a statistical model or data mining algorithm to data for the purpose of predicting new or future observations.

Predictive modeling is a method for estimating an unknown value of interest, which is called target

Question 5

Q

Causal (Explanatory) Modeling

Answer

A

is the use of statistical models for explaining how the world works (by testing causal explanations)

Question 6

Q

Why Empirical Explanation and Empirical Prediction Differ?

Answer

A

Explanatory models are based on underlying causal relationships between theoretical constructs while predictive models rely on associations between measurable variables.
Explanatory modeling seeks to minimize model bias (i.e., specification error) to obtain the most accurate representation of the underlying theoretical model, predictive modeling seeks to minimize the combination of model bias and sampling variance (how much does the model change with new data; more on this in a future session)

Question 7

Q

Linear Regression is…

Answer

A

an approach for modeling the relationship between a dependent variable and one or more explanatory variables

Question 8

Q

Logistic regression can also be used for classification

Answer

A

If the target variable is binary, we can interpret the dependent variable as a Yes/No outcome
The outcome of a model resulting from a logistic regression can be interpreted as a probability

Question 9

Q

Decision Trees

Answer

A

Trees create a segmentation of the data
Each node in the tree contains a test of an attribute
Each path eventually terminates at a leaf
Each leaf corresponds to a segment, and the attributes and values along the path give the characteristics
Each leaf contains a value for the target variable

Question 10

Q

The process of recursively segmenting the population stops when at least one of the following conditions is met:

Answer

A

All elements of a segment belong to the same class (special case: the segment contains only one element)
The maximum allowed tree depth has been reached
Using more attributes does not “help”

Question 11

Q

How to evaluate if a model is a good model? How do we compare two models?

Answer

A

It is important to consider carefully what would we like to achieve when building a model

This seems simple but many times we see some statistics reported without a clear understanding why this is the right statistic

Now: Basic performance metrics

Later: More sensible performance metrics

Question 12

Q

Accuracy is …

Answer

A

the proportion of correct decisions made by the classifier.

Question 13

Q

Accuracy is a popular metric because …

Answer

A

it is very simple to calculate

Question 14

Q

Error rate is …

Answer

A

the proportion or wrong decisions made by the classifier.