Linear regression-2 Flashcards by Sindhusha Boyapati

What is the problem with R squared

R squared is sensitive to amount of data. When we see a pattern in small dataset, we don’t have confidence to say that the pattern is not due to random chance.

How well did you know this?

Not at all

Perfectly

What is p-value?

p-value gives us a measure of confidence in the results from statistical analysis.

How well did you know this?

Not at all

Perfectly

Set the intuition of p-value with example

Let us consider two drugs. Drug A and Drug B. p-value helps us determine if both these drugs are equally efficient or one is more efficient than other. If p-value allows us to establish a difference, then we worry about if A is good/worse than B

How well did you know this?

Not at all

Perfectly

p-value interpretation when A and B are different.

We did one experiment giving drugs A and B to people. A cured 37% and B cured 31%. From the overall picture, A worked better than B. But how confidently can we say that A is better than B. This is where p-values help. p-values are numbers between 0 and 1 and they quantify how CONFIDENT WE SHOULD BE THAT DRUG a IS DIFFERENT THAN DRUG B.

How well did you know this?

Not at all

Perfectly

What does p-value 0 denote

The closer a p-value is to 0, the more confidence we have that drugs A and B are different.

How well did you know this?

Not at all

Perfectly

The closer a p-value is to 0, the more confidence we have that drugs A and B are different. - How close should a p-value should be to 0?

In practice, commonly used threshold is 0.05.

How well did you know this?

Not at all

Perfectly

What does p-value threshold of 0.05 mean?

It means that if theres no difference between drugs A and B, and if we did the exact same experiment a bunch of times, then ONLY 5% OF THOSE EXPERIMENTS WOULD RESULT IN WRONG DECISION

How well did you know this?

Not at all

Perfectly

How to calculate p-value

Using a statistical test like - Fischer’s test.

How well did you know this?

Not at all

Perfectly

Explain the definition of p-value threshold when the value is 0.05 with example

Example:
1. Let’s give same drug, drug A to two groups of people. The first test gave p-value 0.09 - failed to see a difference between the two groups.
2. Repeat the same experiment and the p-value will be really high and we will fail to see the difference.
3. Next, once in a while there might be a chance that group of people allergic to the drug might end up in the same group. In this case drug A might fail to work on them and will get a small p-value - suggesting that there is a difference in the drug used by the two groups, although they are using the same drug. This is a FALSE POSITIVE.

A threshold of 0.05 means that 5% of the experiments, where the only differences come from weird random things will generate a p-value smaller than 0.05.

How well did you know this?

Not at all

Perfectly

How to set threshold for p-value

For extremely important test, like the effectiveness of drug, we need high confidence when make a statement that the drugs are either different or not difference. In such cases we can use extremely small thresholds like 0.01 or 0.001 (1 in 1000 experiments can lead to False Positive)

Likewise, for not so important tests, we can use a bit higher threshold like - 0.2 (2 in 10 can lead to FP)

How well did you know this?

Not at all

Perfectly

p = 0.24 denotes that drugs …..

We are not confident that drugs A and B are different.

How well did you know this?

Not at all

Perfectly

p = 0.02 denotes that drugs ….

We are confident that the drugs A and B are different.

How well did you know this?

Not at all

Perfectly

What is Hypothesis testing?

The idea of trying to determine if the drugs are same or not is called Hypothesis testing.

How well did you know this?

Not at all

Perfectly

What is Null Hypothesis?

The Null Hypothesis is the drugs are same and the p-values helps us to decide if we should reject the Null Hypothesis.

How well did you know this?

Not at all

Perfectly

What does p-value DOESN’T tell us

Although p-value helps us decide if drugs A and B are different, they don’t tell us HOW DIFFERENT they are.

How well did you know this?

Not at all

Perfectly

What is the MOST IMPORTANT property of p-value that you should keep in mind while viewing the p-value results?

Study These Flashcards

A small p-value DOESN’T IMPLY that the effect size or difference between drugs A and B is large.

A large p-value DOESN’T IMPLY that the effect size or difference between drugs A and B is small.

How does Linear regression fit a line, like final decision based on what?

Study These Flashcards

Linear regression fits a line to the data that minimizes the Sum of Squared Residuals.

How to quantify the accuracy or quality of the predictions in Linear regression?

Study These Flashcards

Once we fit a line to the data, we can calculate R squared which gives us a sense of how accurate our predictions will be.
And then linear regression provides p-value for R squared, so we should get a sense of how confident we should be on the predictions.

How to minimize the SSR in linear regression

Study These Flashcards

Choose the y-axis intercept and slope such that it minimizes the SSR

What is analytical solution in general?

Study These Flashcards

In Analytical solution we end up with a formula that we can plug the data into and the output is the optimal value.

What is the analytical solution for linear regression to find optimal y-intercept

Study These Flashcards

Keeping the slope constant, we see how SSR changes for different Y-intercept value => result in a curve in u shape.
Find the lowest SSR/point on the curve.
To find the lowest point in the curve is to calculate the derivative of the curve. When the derivative is 0, that is the bottom of the curve.

What is iterative approach in Linear regression

Study These Flashcards

Gradient Descent - A way to find the optimal slope and y-axis intercept.

Which one is quicker - analytical or iterative approach

Study These Flashcards

Iterative.

Where is iterative approach used?

Study These Flashcards

Scenarios where there are no analytical solutions, including Logistic regression, Neural networks etc.

What does Rsq - 0.66 and p-value - 0.1 suggest

For p-value - 0.1 implies that there is a 10% chance that the random data(random data is used in deriving p-value) could give us Rsq >= 0.66. this is relative high p-value, so we might not have a lot of confidence in the predictions.

Linear regression-2 Flashcards

P-values, linear regression (26 cards)