Week 7 - Probability and Sampling DIstribution Flashcards
Define probability as Long-Run Relative Frequency
it is the proportion of times that a certain outcome would occur in a very long sequence of observations or repeated trials
What is a random variable
A variable that takes a certain value or range of values by chance
eg a coin toss, whether we observe head or tail is a random variable
(1) Suppose Y is assigned a certain value. For each case or observation, the value of Y is fixed; however, the value of Y varies across cases or observations. Y may be called “a random variable.” True or false? Correct it if it is false and explain why.
FALSE: in this example, Y was already observed throughout samples.
once it is observed, it is no longer a random variable, its just a regular variable because its fixed in each observation
once we randomly sample variable Y from the population[, it becomes a random variable
what is probability distribution
it specifies the distributions of probabilities over the specific values of a random variable
eg if we roll a die, Y = 1,2,3,4,5, or 6
1/6 for each value of Y is the probability distribution
what is Standard deviation
a parameter indicating the variability of the values of a random variable Y - variance denoted by sigma
name the 3 popular probability distributions
- normal
- bernoulli
- student’s t distribution
what are the characteristics of a normal distribution
it is the probability distribution of a continuous random variable which is symmetric, bell shaped, and characterized by its mean μ and standard deviation σ
what are the bernoulli distribution characteristics
it is the probability distribution for a binary random variable: Y= 0 or 1
E(Y) or mean, denoted by μ is the probability of
variance (σ) is given by μ x (1-μ)
eg. if 20% of ppl chose disagree and 80% chose agree, it would be in terms of proportion on the graph, so μ=0.80
*** note that generally a probability distribution with a HIGHER μ will have lower variance - because the outcome is more predictable
μ with the highest variability in a bernoulli distribution will be 0.50 because it means a 50/50 chance of drawing Y=1 or 0
what is the central limit theorem
when the sample size N is large, the sampling distribution or y-bar can be approximated by the normal distribution
what is a sampling distribution
Summary statistics, such as sample means, computed from different random samples are different from each other. Hence, we can consider the distribution of summary statistics across repeated sampling. A sampling distribution is the name used to describe this distribution of summary statistics over repeated sampling.
what is the difference between variance and standard deviation
Variance = how far the numbers are from the average (in squared units).
Standard Deviation = same idea, but back to original units (not squared).
so standard deviation tells us the variance, but its is the square root of variance
Standard error of the sampling distribution of sample means equals the population standard deviation divided by the number of observations in a sample. True or false? Correct it if it is false and explain why.
when we calculate SE, its always σ(population standard deviance) divided by the SQUARE ROOT of N which is number of observations in a sample
The sampling distribution of sample means (y-bar) can be approximated by a normal distribution as long as we take a random sample from the population and our sample includes a large number of observations. True or false? Correct it if it is false and explain why.
Answer: True
This is the Central Limit Theorem. The theorem applies regardless of the shape of the population distribution.
In the above figure, whatever the shape of the green distribution at the top, (the population distribution), the shape of the purple distribution at the bottom (the sampling distribution of y-bar) can be approximated by a normal distribution, as long as a random sample is taken and the number of observations in a sample is large.
what are the three characteristics of the sapling distribution of y-bar that still holds true regardless of the shape of the population distribution
the centre, the variability, and the shape of the sampling distribution, as long as N is large