Stats Midterm (L2 - L12) Flashcards
What is the difference between sample and population?
The population refers to the entire group of individuals, items, or data points of interest in a specific study.
A sample is a subset of the population, representing a smaller group of individuals or data points selected from the population.
Describe the principles of statistics.
Sample —> Empirical Distribution —> Theoretical Distribution —-> Population
Lecture 2, Slide 9
What are the four key components of descriptive statistics?
- Measures of central tendency.
- Measures of Dispersion (variability)
- Measures of distribution shape
ChatGPT Added
4. Frequency Distribution
Lecture 2: Slide 13
Define hypothesis testing.
Hypothesis testing is a method used to evaluate two competing claims about a population based on sample data, with the goal of determining whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.
With regard to hypothesis testing, what is the significance level (α)?
The significance level (α) is the probability of rejecting the null hypothesis when it is actually true (a false positive).
Extra Notes: A common value is 0.05, meaning there is a 5% risk of rejecting the null hypothesis when it is true.
With regard to hypothesis testing, what is the p value?
The p-value in hypothesis testing is the probability of observing the data (or something more extreme) assuming that the null hypothesis H0 is true. It quantifies the strength of the evidence against the null hypothesis: the smaller the p-value, the stronger the evidence that the null hypothesis may be incorrect.
What is the relationship between p and α when rejecting or failing to reject the null hypothesis?
p < α —> reject the null hypothesis
p > α —> do not reject the null hypothesis
What is the difference between a histogram and a cumulative histogram?
Histogram: Shows the individual frequency for each bin.
Cumulative Histogram: Shows the running total (cumulative frequency) for each bin and those that came before it.
True or False:
The mean is sensitive to outliers.
True.
Lecture 2 - Slide 16
Describe how a histogram could be used to find the median.
A histogram can be used to estimate the median of a dataset by visually analyzing the distribution of data across the bins. The median is the value that separates the dataset into two equal halves, meaning 50% of the data points lie below it and 50% lie above it.
True or False:
The standard deviation is a measure of dispersion.
True.
Lecture 2 - slide 24
Define DOF with respect to statistics.
The DOF is the number of values in the final calculation of a statistic that are free to vary.
Lecture 2 - slide 27
What are the two basic measures of general shape?
- Skewness
- Kurtosis
Lecture 2- slides 30 to 33
A normal distribution has a kurtosis of ________.
3
Lecture 2- Slide 33
Define the probability density function (PDF).
The PDF f(x) defines the probabiity that the variate f has the probability x.
Lecture 2 - slide 37
Define the cumulative distribution function (CDF).
The CDF F(x) is the sum of a discrete PDF or the integral of a continuous one.
It tells you the probability that F takes on a value of less than x.
Lecture 2 -slide 37
Define the interquartile range (IQR).
The interquartile range (IQR) is a measure of statistical dispersion that represents the range within which the central 50% of the data points lie. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1) of a dataset.
Define a uniform distribution.
A uniform distribution is a type of probability distribution where all outcomes are equally likely within a specified range. In a uniform distribution, each value within the range has the same probability of occurring. The distribution is “flat” because no single outcome is favored over others.
Define the binomial distribution.
The binomial distribution is a discrete probability distribution that describes the likelihood of obtaining a fixed number of successes in a specified number of independent trials of a binary experiment, where each trial has only two possible outcomes: success or failure. It is widely used in situations where the outcome of interest is a count of successes over multiple attempts.
Source:
Describe a Poisson distribution.
The Poisson distribution expresses the probability of a given number of events occuring in a fixed interval (of space or time), if these events occur with a known constant mean rate and independantly of the time since the last event.
Lecture 3- slide 24
True or False:
The rate must be constant for a Poisson distribution to hold.
True.
Lecture 3 - slide 24