Questions Flashcards

1
Q

What is sensitivity and specificity formula?

A

Sensitivity, True Positive Rate or recall for binary classification: TP/TP+FN
Specificity, True Negative Rate: TN/TN+FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are parametric and non-parametric statistical tests?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are confidence interval and confidence level related?

A

Using CLT, given a sample, its mean, its std (or population std), we can use z scores related to different confidence levels to calculate confidence interval.
Example: Website
When we have a sample sized 100,
sample mean: 20
population STD: 10
The sample mean approximates the population mean and pop std/ √ n= sample std, so the sample std=1
the sample has gaussian distribution, so 1.96 * std covers 95% of the values. (that’s why z-score for 95% confidence level equals 1.96), so we can say: there’s a 95% chance (we are 95% confident) than the population mean is within this interval: sample mean ±, 1.96

The premise is that we can estimate the parameters of sampling distribution of mean, using one sample, then we are calculating the confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between using CLT and bootstrapping for sampling?

A

Given a large enough sample size, confidence intervals for the mean can be constructed by applying the Central Limit Theorem or by the bootstrap method (Bootstrap estimated distributions of test statistics are most certainly not always Gaussian. The beauty of the bootstrap is that you need not make any assumptions about that distribution, as it can often be wrong).
Boostrap is done using sampling with replacement, where CLT has assumtions such as Samples should be independent of each other
CLT video

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s the average precision?

A

The general definition for the Average Precision (AP) is finding the area under the precision-recall curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is balanced accuracy calculated? when is it used?

A

Balanced Accuracy is used in both binary and multi-class classification. It’s the arithmetic mean of sensitivity and specificity (sensitivity+specificity /2), its use case is when dealing with imbalanced data, i.e. when one of the target classes appears a lot more than the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the assumptions of CLT?

A
  1. The data must follow the randomization condition. It must be sampled randomly
  2. Samples should be independent of each other. One sample should not influence the other samples
  3. Sample size should be not more than 10% of the population when sampling is done without replacement
  4. The sample size should be sufficiently large. Now, how we will figure out how large this size should be? Well, it depends on the population. When the population is skewed or asymmetric, the sample size should be large. If the population is symmetric, then we can draw small samples as well
  5. In general, a sample size of 30 is considered sufficient when the population is symmetric.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the mean of a bootstrapped sample approximate?

A

The mean of bootstrapped samples, apporximates the mean of the original sample.
i.e. the distribution of means of the samples acquired from bootsrapping, approximates the mean of the original sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Micro-precision values can be high even if the model is performing very poorly on a rare class since it gives more weight to the common classes. True/False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For single-label multi-class problems, micro-averaging would result in precision being exactly the same as accuracy. That does not provide any additional information about the model’s performance. True/False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the definition of P-value?

A

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is P-value calculated?

A

Assuming the null hypothesis is true, we have a normal distribution ( it’s the sample mean distribution), with the mean equal to the null’s mean value and std equal to the alternative’std, then we calculate the z score as:
(Alt mean- Null mean)/ (Alt std/√number of samples)

And from the z score we find the value of alpha (P-value), which is the probability of the sample we have happening, assuming the null hypothesis is true, if this is less than 0.05, it means that there is less than 5% chance that we could obtain the sample we have just by chance, hence, we can reject the null hypothesis

Reference:Khan Academy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can we use a sample’s STD as an approximation of the population STD?

A

Yes. The standard deviation is a measurement of the spread of the data — it is the average distance of the data from the mean. We are rarely interested in the amount of variation in our sample: the sample standard deviation is only useful as an approximation of the population standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When do we use T distribution table?

A

The rules for when to use a T distribution table are as follows.

Population standard deviation UNKNOWN and original population normal or symmetrical
OR
sample size greater than or equal to 30 and Population standard deviation UNKNOWN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A statistic is an unbiased estimator of a parameter when the ____ of its sampling distribution is equal to the actual value of the parameter.

A

Mean. In other words, a statistic is unbiased, when on average, it equals to the value of the population parameter it’s estimating.

So if for example the Q1 of population is 70, then the sampling distribution of Q1’s mean should equal 70 if it’s unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The shape of the sampling distribution will match the shape of the parent population if the sample size is <30. True/False
|External

A

True

17
Q

What’s the difference between Bernoulli and Binomial distributions? Give one example for each

A

The Bernoulli distribution represents the success or failure of a single Bernoulli trial. The Binomial Distribution represents the number of successes and failures in n independent Bernoulli trials for some given value of n.

Bernoulli deals with the outcome of the single trial of the event, whereas Binomial deals with the outcome of the multiple trials of the single event.

Bernoulli example: a team will win a championship or not
Binomial example: Rolling a die: Probability of getting the number of six (6) (0, 1, 2, 3… 50) while rolling a die 50 times

18
Q

When is paired t test used? External

A

A paired t-test is used when we are interested in the difference between two variables for the same subject. Often the two variables are separated by time. For example, finding out if the mean time of delivery from a restaurant by drive-through is any different from ordering at the counter. We have on subject (the restaurant) and two variables related to it, two properties of the subject (drive-through and counter ordering) Here the variables are dependent (unlike two sample t test)

19
Q

When is two sample t test used? External

A

The two-sample t-test (Snedecor and Cochran, 1989) is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment. Here the variables are independent (unlike paired t test)

20
Q

What is the relationship between confidence level and significance level? External

A

The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%. If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant