Questions Flashcards

Question 1

Q

What is sensitivity and specificity formula?

Answer

A

Sensitivity, True Positive Rate or recall for binary classification: TP/TP+FN
Specificity, True Negative Rate: TN/TN+FP

Question 2

Q

What are parametric and non-parametric statistical tests?

Answer

A

Video

Parametric tests an their non-parametric counterparts

Question 3

Q

How are confidence interval and confidence level related?

Answer

A

Using CLT, given a sample, its mean, its std (or population std), we can use z scores related to different confidence levels to calculate confidence interval.
Example: Website
When we have a sample sized 100,
sample mean: 20
population STD: 10
The sample mean approximates the population mean and pop std/ √ n= sample std, so the sample std=1
the sample has gaussian distribution, so 1.96 * std covers 95% of the values. (that’s why z-score for 95% confidence level equals 1.96), so we can say: there’s a 95% chance (we are 95% confident) than the population mean is within this interval: sample mean ±, 1.96

The premise is that we can estimate the parameters of sampling distribution of mean, using one sample, then we are calculating the confidence interval

Question 4

Q

What’s the difference between using CLT and bootstrapping for sampling?

Answer

A

Given a large enough sample size, confidence intervals for the mean can be constructed by applying the Central Limit Theorem or by the bootstrap method (Bootstrap estimated distributions of test statistics are most certainly not always Gaussian. The beauty of the bootstrap is that you need not make any assumptions about that distribution, as it can often be wrong).
Boostrap is done using sampling with replacement, where CLT has assumtions such as Samples should be independent of each other
CLT video

Question 5

Q

What’s the average precision?

Answer

A

The general definition for the Average Precision (AP) is finding the area under the precision-recall curve.

Question 6

Q

How is balanced accuracy calculated? when is it used?

Answer

A

Balanced Accuracy is used in both binary and multi-class classification. It’s the arithmetic mean of sensitivity and specificity (sensitivity+specificity /2), its use case is when dealing with imbalanced data, i.e. when one of the target classes appears a lot more than the other.

Question 7

Q

What are the assumptions of CLT?

Answer

A

The data must follow the randomization condition. It must be sampled randomly
Samples should be independent of each other. One sample should not influence the other samples
Sample size should be not more than 10% of the population when sampling is done without replacement
The sample size should be sufficiently large. Now, how we will figure out how large this size should be? Well, it depends on the population. When the population is skewed or asymmetric, the sample size should be large. If the population is symmetric, then we can draw small samples as well
In general, a sample size of 30 is considered sufficient when the population is symmetric.

Question 8

Q

What does the mean of a bootstrapped sample approximate?

Answer

A

The mean of bootstrapped samples, apporximates the mean of the original sample.
i.e. the distribution of means of the samples acquired from bootsrapping, approximates the mean of the original sample.

Question 9

Q

Micro-precision values can be high even if the model is performing very poorly on a rare class since it gives more weight to the common classes. True/False?

Question 10

Q

For single-label multi-class problems, micro-averaging would result in precision being exactly the same as accuracy. That does not provide any additional information about the model’s performance. True/False

Question 11

Q

What is the definition of P-value?

Answer

A

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis.

Question 12

Q

How is P-value calculated?

Answer

A

Assuming the null hypothesis is true, we have a normal distribution ( it’s the sample mean distribution), with the mean equal to the null’s mean value and std equal to the alternative’std, then we calculate the z score as:
(Alt mean- Null mean)/ (Alt std/√number of samples)

And from the z score we find the value of alpha (P-value), which is the probability of the sample we have happening, assuming the null hypothesis is true, if this is less than 0.05, it means that there is less than 5% chance that we could obtain the sample we have just by chance, hence, we can reject the null hypothesis

Reference:Khan Academy

Question 13

Q

Can we use a sample’s STD as an approximation of the population STD?

Answer

A

Yes. The standard deviation is a measurement of the spread of the data — it is the average distance of the data from the mean. We are rarely interested in the amount of variation in our sample: the sample standard deviation is only useful as an approximation of the population standard deviation.

Question 14

Q

When do we use T distribution table?

Answer

A

The rules for when to use a T distribution table are as follows.

Population standard deviation UNKNOWN and original population normal or symmetrical
OR
sample size greater than or equal to 30 and Population standard deviation UNKNOWN.

Question 15

Q

A statistic is an unbiased estimator of a parameter when the ____ of its sampling distribution is equal to the actual value of the parameter.

Answer

A

Mean. In other words, a statistic is unbiased, when on average, it equals to the value of the population parameter it’s estimating.

So if for example the Q1 of population is 70, then the sampling distribution of Q1’s mean should equal 70 if it’s unbiased

Question 16

Q

The shape of the sampling distribution will match the shape of the parent population if the sample size is <30. True/False
|External

Answer

Study These Flashcards

A

True

Question 17

Q

What’s the difference between Bernoulli and Binomial distributions? Give one example for each

Answer

Study These Flashcards

A

The Bernoulli distribution represents the success or failure of a single Bernoulli trial. The Binomial Distribution represents the number of successes and failures in n independent Bernoulli trials for some given value of n.

Bernoulli deals with the outcome of the single trial of the event, whereas Binomial deals with the outcome of the multiple trials of the single event.

Bernoulli example: a team will win a championship or not
Binomial example: Rolling a die: Probability of getting the number of six (6) (0, 1, 2, 3… 50) while rolling a die 50 times

Question 18

Q

When is paired t test used? External

Answer

Study These Flashcards

A

A paired t-test is used when we are interested in the difference between two variables for the same subject. Often the two variables are separated by time. For example, finding out if the mean time of delivery from a restaurant by drive-through is any different from ordering at the counter. We have on subject (the restaurant) and two variables related to it, two properties of the subject (drive-through and counter ordering) Here the variables are dependent (unlike two sample t test)

Question 19

Q

When is two sample t test used? External

Answer

Study These Flashcards

A

The two-sample t-test (Snedecor and Cochran, 1989) is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment. Here the variables are independent (unlike paired t test)

Question 20

Q

What is the relationship between confidence level and significance level? External

Answer

Study These Flashcards

A

The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%. If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant

Questions Flashcards

(20 cards)