Statistics Flashcards
What is the Central Limit Theorem? What is the importance of this? What are the assumptions for the CLT?
The sampling distribution of the mean is will approximately follow a normal distribution given:
Also:
- Data must be sampled randomly
- Samples should be independent of each other
- The sample size should be sufficiently large (sample size > 30, sometimes 50 or 100)
- Sample size should be not more than 10% of the population
Also:
The standard deviation of the sampling distribution of the mean is equal to the population standard deviation divided by the square root of the sample size
Assumptions:
What is a normal distribution? What are the properties?
A distribution that has three 3 main properties. These are that the mean, median, and mode equal each other. It is symmetric around the mean, and 50 percent of the values are on each side of the mean.
What is a simple random sample? What are the advantages of a simple random sample? Disadvantages?
A randomly selected subset of a population where each member of the population has an exactly equal chance of being selected. An advantage is that little has to be known about the population in advance to conduct this sampling method. A disadvantage is that it requires that a complete list of every element in the population be obtained
What is a population?
The entire group you want to draw conclusions about.
What is a sample?
A subset of the population you will collect information from.
What is systematic sampling? When should I use this?
A sampling method where every nth element is taken.
Ex. Taking every 1oth e
What is convenience sampling?
A sampling method where the data first accessed is used (hence the term convenience).
What is stratified sampling?
A sampling method that first divides the population into groups called strata. A sample is taken from each of these strata using either random, systematic, or convenience sampling.
What is cluster sampling?
A sampling method that first divides the population into clusters. The clusters are randomly selected, and all elements in the selected clusters are used
How can I test if my data is normally distributed?
1.QQ Plot ( Quantile vs Quantile Plot), which plots theoretical quantiles against the actual quantiles of our variable.
2. Hypothesis testing
a. Most powerful when testing for a normal distibution is the shapiro-wilks test.
Note:
If I was explaining to a general audience, I would show visually with histogram.
What is power?
The odds of rejecting the null hypothesis given it is false. (1- beta) where beta is the probaility of Type 2 error.
What is significance?
What is a hypothesis test? Can you take me through each step?
What is a confidence interval? How do i calculate?
What is Anova?