Session 3.1 Flashcards

1
Q

Data science tasks can be split into two groups

A
  • Unsupervised Methods
    there is no specific target variable
  • Supervised Methods
    there is a specific target variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unsupervised Methods

A
  • Affinity grouping
  • Similarity Matching
  • Clustering
  • Sentiment Analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Supervised Methods

A
  • Predictive Modeling

- Causal (Explanatory) Modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Predictive modeling

A

is the process of applying a statistical model or data mining algorithm to data for the purpose of predicting new or future observations.

Predictive modeling is a method for estimating an unknown value of interest, which is called target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Causal (Explanatory) Modeling

A

is the use of statistical models for explaining how the world works (by testing causal explanations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why Empirical Explanation and Empirical Prediction Differ?

A
  1. Explanatory models are based on underlying causal relationships between theoretical constructs while predictive models rely on associations between measurable variables.
  2. Explanatory modeling seeks to minimize model bias (i.e., specification error) to obtain the most accurate representation of the underlying theoretical model, predictive modeling seeks to minimize the combination of model bias and sampling variance (how much does the model change with new data; more on this in a future session)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linear Regression is…

A

an approach for modeling the relationship between a dependent variable and one or more explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logistic regression can also be used for classification

A
  • If the target variable is binary, we can interpret the dependent variable as a Yes/No outcome
  • The outcome of a model resulting from a logistic regression can be interpreted as a probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Decision Trees

A
  • Trees create a segmentation of the data
  • Each node in the tree contains a test of an attribute
  • Each path eventually terminates at a leaf
  • Each leaf corresponds to a segment, and the attributes and values along the path give the characteristics
  • Each leaf contains a value for the target variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The process of recursively segmenting the population stops when at least one of the following conditions is met:

A
  • All elements of a segment belong to the same class (special case: the segment contains only one element)
  • The maximum allowed tree depth has been reached
  • Using more attributes does not “help”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to evaluate if a model is a good model? How do we compare two models?

A

It is important to consider carefully what would we like to achieve when building a model

This seems simple but many times we see some statistics reported without a clear understanding why this is the right statistic

Now: Basic performance metrics

Later: More sensible performance metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Accuracy is …

A

the proportion of correct decisions made by the classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Accuracy is a popular metric because …

A

it is very simple to calculate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Error rate is …

A

the proportion or wrong decisions made by the classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly