Stats Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is Central Limit Theorem

A

No matter the distribution, if you take k subsets with each subset having n samples the mean of the sample set is approximately normal distribution. The distribution would have the same mean as the original data set and the variance is n time smaller than the variance of the original data set.
Important when -> Find average age of all people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sampling?

A
  • Obtain a part of the data population.
  • Good sample is representative of entire population.
  • Should not be a bias (convenient or voluntary response) sample
  • Unbiased sample give each think an equal chance of being chosen.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some sampling techniques?

A
  1. Simple random sampling - unbiased
  2. Stratified Random Sample – Group similar featured data and get equal amount of random samples from each
  3. Multistage sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is null hypothesis?

A

A null hypothesis is a type of conjecture used in statistics that proposes that there is no difference between certain characteristics of a population or data-generating process.

The alternative hypothesis proposes that there is a difference.

Hypothesis testing provides a method to reject a null hypothesis within a certain confidence level. (Null hypotheses cannot be proven, though.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is type 1 error?

A

Null hypothesis is true but rejected (false positive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is type 2 error?

A

Null hypothesis is false but failed to reject (false negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is p-value?

A

The probability of making a type I error is represented by your alpha level (α), which is the p-value below which you reject the null hypothesis. A p-value of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis.

You can reduce your risk of committing a type I error by using a lower value for p. For example, a p-value of 0.01 would mean there is a 1% chance of committing a Type I error.

However, using a lower value for alpha means that you will be less likely to detect a true difference if one really exists (thus risking a type II error).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is /beta?

A

The probability of making a type II error is called Beta (β), and this is related to the power of the statistical test (power = 1- β). You can decrease your risk of committing a type II error by ensuring your test has enough power.

You can do this by ensuring your sample size is large enough to detect a practical difference when one truly exists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why type I and type II errors are important?

A

The consequences of making a type I error mean that changes or interventions are made which are unnecessary, and thus waste time, resources, etc.

Type II errors typically lead to the preservation of the status quo (i.e. interventions remain the same) when change is needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Bias?

A

In Data Science, Bias is a deviation from expectation in the data. In a general sense, bias in data science refers to an error in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are types of Bias? Explain each

A
  1. Confirmation bias
    data scientists or analysts tend to lean towards data that is in alignment with their beliefs, views, and opinions.
  2. Selection bias
    selection bias develops when a subset of the data is systematically (i.e., non-randomly) excluded from the analysis
  3. Availability bias
    data scientists make inferences based on readily available data or recent information alone
  4. Survivorship bias
    tend to distort data sets by focusing on successful examples and ignoring failures
  5. Recall bias
    information bias where participants do not ‘recall’ previous events, memories, or details. This is also related to recency bias, where we tend to remember things better that have happened more recently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is statistical significance?

A
  • Determination that a relationship between two or more variables is caused by something other than chance.
  • Measured by p-value –> which is probability of observing results as extreme as those in the data
  • p-value of 5% or lower is often considered to be statistically significant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you assess the statistical significance?

A
  1. Do hypothesis testing, state null hypothesis and alternative hypothesis.
  2. Calculate p-value – probability of null hypothesis is true
  3. Calculate the significance – Alpha
  4. P-value < alpha we will reject the null hypothesis (the results is statistically significant)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is statistical power?

A

Statistical power’ refers to the power of a binary hypothesis, which is the probability that the test rejects the null hypothesis given that the alternative hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is variation?

A

Gives an idea how spread the data is. But simple “Range” of numbers are susceptible to outliers.
Variance averaged squared deviation from the mean.

Standard deviation is just a square root of variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

State the variance types and explain the differences.

A

σ^2=(Σ(x_i-μ ̅ )^2)/N Population variation

s^2=(Σ(x_i-x ̅ )^2)/(n-1) Sample variation (N-1 to remove biase)