Lesson 1 Flashcards
What does the term ‘random variable’ refer to in statistics?
A variable whose value is subject to variations due to chance.
True or False: The mean and median are always equal in a symmetric distribution.
False
What is the formula for calculating the variance of a data set?
Sum of (each data point - mean)^2 divided by the number of data points.
In a normal distribution, what percentage of data falls within one standard deviation of the mean?
Approximately 68%
What is the purpose of conducting a hypothesis test in statistics?
To determine if there is enough evidence to reject a null hypothesis.
What is the formula for calculating the z-score of a data point in a normal distribution?
(Data point - mean) divided by standard deviation.
What is the definition of the p-value in statistical hypothesis testing?
The probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true.
What does the term ‘confidence interval’ represent in statistics?
A range of values within which a population parameter is estimated to lie.
What is the difference between correlation and causation in statistics?
Correlation indicates a relationship between two variables, while causation implies that one variable directly affects the other.
What does the ‘central limit theorem’ state in statistics?
Regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normally distributed for large sample sizes.
What is the formula for calculating the standard error of the mean?
Standard deviation divided by the square root of the sample size.
In a regression analysis, what does the coefficient of determination (R^2) measure?
The proportion of the variance in the dependent variable that is predictable from the independent variable(s).
What is the purpose of conducting a chi-squared test in statistics?
To determine if there is a significant association between two categorical variables.
What is the formula for calculating the margin of error in a confidence interval?
Critical value multiplied by standard error.
What does the term ‘outlier’ refer to in statistics?
An observation that lies an abnormal distance from other values in a dataset.
What is the difference between a Type I error and a Type II error in hypothesis testing?
Type I error occurs when the null hypothesis is true but is rejected, while Type II error occurs when the null hypothesis is false but is not rejected.
In probability theory, what is the complement rule?
The probability of an event not occurring is equal to 1 minus the probability of the event occurring.
What is the formula for calculating the coefficient of variation?
Standard deviation divided by the mean, multiplied by 100.
What is the purpose of a box plot in statistics?
To visually represent the five-number summary of a dataset and identify outliers.
What is the difference between a population and a sample in statistics?
A population includes all members of a specified group, while a sample is a subset of the population used to make inferences about the population.
What is the formula for calculating the odds ratio in a 2x2 contingency table?
(ad)/(bc) where a, b, c, and d are the cell counts in the table.
What is the definition of a statistical parameter?
A measurable characteristic of a population, such as the mean or standard deviation.
What does a p-value of 0.05 indicate in hypothesis testing?
A 5% chance of observing the data if the null hypothesis is true, commonly used as a threshold for statistical significance.
What is the formula for calculating the interquartile range of a dataset?
The difference between the third quartile (Q3) and the first quartile (Q1).
In a two-sample t-test, what does the t-statistic measure?
The difference between the means of two samples relative to the variability within the samples.