Data Science Flashcards

1
Q

What is precision?

A

TP/(TP+FP). Proportion of correctly classified positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is recall?

A

TP/(TP+FN). Proportion of correctly classified positives and actual positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is sensitivity?

A

Same as recall: (TP/(TP+FN))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is specificity?

A

TN/(TN+FP). “Negative” variant of recall or sensitivity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is accuracy?

A

(TP+TN)/(TP+TN+FP+FN). Proportion of correctly classified samples over all samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the coefficient of determination (r-squared)?

A

The proportion of the variation in the dependent variable that is predictable or explainable from the independent variable(s). 1-(residual sum of squares/total sum of squares)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a residual in regression analysis?

A

The vertical distance between a data point and the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the total sum of squares?

A

The sum over all squared differences between the observations and their overal mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is bootstrapping?

A

A resampling technique used to estimate the sampling distribution of an estimator by sampling with replacement from the original sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is sampling with replacement?

A

Sampling with replacement means that after a unit (e.g., a person, an observation, etc.) is selected from the sample, it is returned to the sample before the next unit is selected. This means that the same unit can be selected more than once in the sampling process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is bagging?

A

a machine learning ensemble method that stands for “bootstrapped aggregation.” It is a technique used to improve the stability and accuracy of machine learning models by training several models on different subsets of the data and averaging their predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a greedy algorithm?

A

An algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is dynamic programming?

A

An algorithmic paradigm that solves problems by breaking them down into smaller subproblems, storing the solutions to these subproblems, and then using these stored solutions to solve the larger problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is bias?

A

The error that is introduced by simplifying assumptions made by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is variance?

A

The error that is introduced by the sensitivity of the model to small fluctuations in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In what situation is precision more important than recall?

A

When a false positive is more acceptable than a false negative

  • Classifier to identify spam emails
  • Search engine results
17
Q

In what situation is recall more important then precision?

A

When it’s important to find (all) instances of the positive class.

  • Diagnosing a disease
  • Detecting fraud
18
Q

What are the four types of analytics technique that are used in organizations?

A
  • Descriptive analytics (what happened?)
  • Diagnostic analytics ( why did it happen?)
  • Predictive analytics (what will happen?)
  • Prescriptive analytics (what should be done?)