Data Distributions and Introduction to Inferential Statistics Flashcards
What is a frequency distribution?
A theoretical continuous curve that best fits a data histogram
- numerical discrete variables have frequency histograms, while numerical continuous variables have density curves
Why are frequency distributions important?
- help us model our data & determine which descriptive statistics would be most useful
- parametric tests
What is a parametric test?
A statistical method that assumes that the data come from a specific theoretical distribution (e.g. a normal distribution) and makes inferences based on that assumption.
- parametric tests examples include t-tests and ANOVA
When should parametric tests be used?
If the dependent variable has a normal frequency distribution
What is the difference between a frequency histogram and a density curve?
Frequency histogram:
- used for numerical discrete variables
- displays frequency or count of observations for each discrete value or range of values
Density curve:
- used for numerical continuous variables
- displays the probability density function, which represents the probability of observing a value within a range of values
What are some common frequency distributions?
- Binomial distribution
- Poisson distribution
- Normal distribution
What is a binomial distribution?
Describes the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes (e.g., success or failure, heads or tails, yes or no)
What are the characteristics of a binomial distribution?
- a fixed number of trials
- only two possible outcomes per trial
- independence of the trials
- a constant probability of success on each trial
- a discrete number of successes
What is a Poisson distribution?
It gives the probability of an event happening a certain number of times (k) within a given interval of time or space.
What are the characteristics of a Poisson distribution?
- a fixed interval of time or space
- rare events occurring with a constant average rate
- independence of the events
- a discrete number of occurrences
What is a normal distribution?
A symmetrical probability distribution that is characterised by its mean and standard deviation
- often referred to as a bell curve because of its shape
What are the properties of a normal distribution?
- symmetrical
- mean, median and mode are equal
- it is described by its mean & standard deviation
- majority of data falls within 1 standard deviation of the mean
- almost all the data falls within 3 standard deviations of the mean
What is the standard deviation?
A measure of how dispersed the data is in relation to the mean
What is the null hypothesis (H0) in statistical hypothesis testing?
The null hypothesis (H0) is the hypothesis that there is no difference or no association between our variables.
What is the alternative hypothesis (H1) in statistical hypothesis testing?
The alternative hypothesis (H1) is the hypothesis that there is a statistical significant difference or association between our variables.
What is the goal of statistical hypothesis testing?
To determine if we can reject the null hypothesis and accept the alternative hypothesis
What is the significance level in statistical hypothesis testing?
The threshold for rejecting the null hypothesis.
- represents the probability of making a type I error (rejecting the null hypothesis when it is actually true)
- commonly used significance level is 0.05
What level of confidence do we like to have before rejecting the null hypothesis?
By convention, we like to be at least 95% confident that the null hypothesis is wrong before we reject it
What do most hypotheses that we test use?
Use data that is characterised by variation and uncertainty
What are the two types of errors we risk when evaluating whether we can reject the null hypothesis or not?
Type I and Type II errors
What is a Type I Error?
A Type I Error is when we reject a null hypothesis even though it is actually true.
What is a Type II Error?
A Type II Error is when we accept a null hypothesis even though it is actually false.
Why do we set very stringent confidence levels in rejecting the null hypothesis?
Because a Type I Error is more serious than a Type II Error