Chapter 2: Statistics Revisited Flashcards
- What is inferential statistics?
- Why is the Normal Distribution so important?
- What is an i.i.d random sample?
- How does sample size impact the confidence interval? What is a paired t-test?
- What is the OLS estimator all about?
What is descriptive statistics and inferential statistics?
Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample (e.g., mean and standard deviation). Taken from all data
for randomness and drawing inferences about the larger population.
Sample
Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. Taken from a sample
These inferences may take the form of:
- estimates of numerical characteristics (estimation)
- answers to yes/no questions (hypothesis testing),
- forecasting of future observations (forecasting),
- descriptions of association (correlation), or
- modeling of relationships (regression).
Data Mining is sometimes referred to as exploratory statistics generating new hypotheses.
What are random variables?
π is a random variable if it represents a random draw from some population, and is associated with a probability distribution.
- a discrete random variable can take on only selected values (e.g., Binomial or Poisson distributed), Person height
- a continuous random variable can take on any value in a real interval (e. g., uniform, Normal or Chi-Square distributions) 0-180 Grad
For example, a Normal distribution, with mean π and variance π2 is written as π(ΞΌ, Ο2) has a pdf of
f(x) = (1 / Ο sqrt(2Ο)e)-(x-ΞΌ)^2/2Ο^2
The Standard Normal
Any random variable can be βstandardizedβ by subtracting the mean, π, and dividing by the standard deviation, π , so
πΈπ =0,ππππ =1.
Thus, the standard normal, π 0,1 , has probability density function (pdf):
Statistical Estimation
Populiation with parameters -every member of the population has the same chance to be selected-> Random sample
Random sample -estimation-> Population
Expected Value of X: Population Mean E(X)
- The expected value is a probability weighted average of π
- πΈ(π) is the mean or expected value of the distribution of π, denoted by uπ₯
- Let π(π₯π) be the (discrete) probability that X = π₯π, then
- ux= πΈ(π)=(n bis i=1)Ξ£xi f(xi)
- Law of large numbers: the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
Sampling Distribution of the Mean
- We can say something about the distribution of sample statistics (such as the sample mean)
- The sample mean is a random variable, and consequently it has its own distribution and variance
- The distribution of sample means for different samples of a population is centered on the population mean
- The mean of the sample means is equal to the population mean
- If the population is normally distributed or when the sample size is large, sample means are distributed normally
Examples of Estimators
- Suppose we want to estimate the population mean
- Suppose we use the formula for πΈ(π), but substitute 1/π for π(π₯π) as the probability weight since each point has an equal chance of being included in the sample, then we can calculate the sample mean:
- π describes the random variable for the arithmetic mean of the sample, while π₯ is the mean of a particular realization of a sample.
Estimators should be Unbiased
An estimator (e.g., the arithmetic sample mean) is a statistic (a function of the observable sample data) that is used to estimate an unknown population parameter (e.g., the expected value)
Standard Error of the Mean: Standard Deviation of Sample Means
The standard deviation of the sample means is equal to the standard deviation of the population divided by the square root of the sample size.
Ο / sqrt(n)
Rule:Var[aX + b] a2 Var[X]
Random Samples and Sampling
- For a random variable π, repeated draws from the same population can be labeled as π1, π2, . . . , ππ
- If every combination of π sample points has an equal chance of being selected, this is a random sample
- A random sample is a set of independent, identically distributed (i.i.d) random variables
Central Limit Theorem
- The central limit theorem states that the standardized average of any population of i.i.d. random variables ππ with mean ππ and variance π2 is asymptotically ~π(0,1), or
- Asymptotic Normality implies that π(π < π§) Ξ¦(π§) as
π β> unendlich , ππ π(π < π§) β Ξ¦(π§) - In other words:
- π1, β¦ , ππ be π i.i.d. random variables with mean ΞΌ and standard deviation Ο.
- If π is sufficiently large, the sample mean X is approximately
- Normal with mean ΞΌ and standard deviation π/βπ
- i.e., the mean of the sample means is equal to the population mean
- i.e., the standard deviation of the sample means is equal to the standard deviation of the population divided by the square root of the sample size
- Normal with mean ΞΌ and standard deviation π/βπ
Statistical Estimation
- Population with mean: ΞΌ= ? β>
- A simple random sample of π elements is selected from the population. β>
- The sample data provide a value for the sample mean π₯ β>
- The value of π₯ is used to make inferences about the value of ΞΌ.
Studentβs t-Distribution
- When the population standard deviation is not known, or when the sample size is small, the Studentβs t-distribution should be used
- This distribution is similar to the Normal distribution, but more spread out for small samples
- The formula for standardizing the distribution of sample means to the t-distribution is similar, except that the sample standard deviation π is used
Student t-Distribution