Week 5,6,7,8 Flashcards
Properties of probability distributions
Properties
Describe the probability for the entire sample space
Area under the probability distribution always sums to one
Can be used for continuous and discrete random variables
Probability
The proportion an event would occur if a random trial was completed many times
Components of randomness
Random trial, sample space, event
The types of probability distributions
Discrete
-For discrete random variables
-Typically shown as a graph of vertical bars with no space between the events
- Y axis is probability mass
Continuous
-Continuous random variables
-Typically shown as a single curve as a function of the continuous event
-Y axis is probability density
-If the range is 0, then the probability is also 0
What do population parameters do?
They describe attributes of the statistical population - considered fixed
Estimation
Descriptive statistics provide an estimate of the population parameter
Sampling Distributions
Probability distribution of a descriptive statistic from repeatedly sampling a statistical population many times
-probability distribution of the means of repeatedly sampling a population
Aspects of the Central Limits Theorum
Standard Error: the standard deviation of a sampling distribution
1. Have shape independence
-Sampling distribution becomes a Normal distribution
-Mean of the sampling distribution is the same as the statistical population
- Variance depends on sample size
-Standard deviation of the sampling distribution is the standard error
-As sample size increases, standard error decreases
-Standard error (SE) can be calculated from the standard deviation (σ) of the statistical population and the sample size (n) as ]
SE=σ/√n
Chain of Inference
-a single sample from a statistical population is enough for us to estimate the sampling distribution.
Key characteristics of sampling distributions
Key characteristics
Shape of sampling distribution is independent of the statistical population so long as the sample size is sufficiently large
The variance of a sampling distribution increases as the number of sampling units decreases
Issue and resolution of central limit theorum
Central limit theorem assumes we know the statistical population perfectly but we must estimate σ
Solution is using the students t-distribution
SE=s/√n
Confidence intervals
Confidence intervals are the range over a sampling distribution that brackets the centre-most probability of interest.
Describe the uncertainty in the descriptive statistics of a sample
Steps in hypothesis testing
- define the null and alternative hypothesis
- Mutually exclusive:
- Exhaustive:
- Equality
○ The null hypothesis always includes the equality statement.
- Establish the null distribution
-the sampling distribution from a statistical population where the null hypothesis is true - Conduct the statistical test
-Need two probabilities from the null distribution - Type 1 error rate (⍺): probability of rejecting the null hypothesis when it is in fact true
- P-value (p): probability of seeing your data, or something more extreme, under the null hypothesis
- Draw scientific conclusions
-Strength of inference
Acknowledge that your inference is only as good as the data
Avoid absolute statements
-Effect size
Only a consideration when the statistical conclusion is to reject the null hypothesis
Refers the whether the observed difference is meaningful for the research question
Rules for Making the Statistical Decision
f the p-value is less than the Type 1 Error Rate, then we reject the null hypothesis
If the p-value is greater than or equal to the Type 1 Error Rate then we fail to reject the null hypothesis
Type I vs Type II Error rates
Type I error: probability of rejecting the null hypothesis when it is true
probability under the null distribution
In hypothesis testing it is under control of the researcher and is known as (⍺).
Type II error: probability of failing to reject the null hypothesis when it is false
Probability under the alternative distribution
Hypothesis testing the distribution is typically unknown meaning the type II error is also typically unknown
However they trade of with the Type I error rate - increases and decreases proportionally to each other
Single Sample T-Test
Evaluate whether your sample is different from a reference value (Does to sample mean differ from my reference)
the null distribution for a single-sample t-test is a t-distribution.
Shape depends on degrees of freedom (df=n-1)
The reporting of a single-sample t-test should include
-The sample mean and standard deviation
-The observed t-score (two decimal places)
-Degrees of freedom
-P-value (three decimal places)