Business Analytics Module 2 Flashcards
Population
The complete set of individuals or items in which an analyst or researcher is interested. When it is difficult to learn about every member of a population, random samples are often drawn from a population and analyzed in order to draw inferences about the population.
Sample
A group of observations selected from a population. We generally compute statistics based on a random sample to help us estimate the parameters of a population.
If a sample is sufficiently large and representative of the population, the sample statistics, x and s, should be reasonably good estimates of the population parameters, μ and σ, respectively.
Normal Distribution
The normal distribution is a symmetric, bell-shaped continuous distribution, with a peak at the mean. A normal distribution is completely determined by two parameters, its mean and standard deviation. Approximately 68% of a normal distributions outcomes fall within one standard deviation of the mean and approximately 95% of its outcomes fall within two standard deviations of the mean. The mean, median and mode of a normal distribution are equal.
One standard deviation
About 68% of the probability is contained in the range reaching one standard deviation away from the mean on either side, that is, P(μ-σ≤ 𝑥 ≤μ+σ)≈ 68%.
Two standard deviations
About 95% of the probability is contained in the range reaching two standard deviations (1.96 to be exact) away from the mean on either side, that is, P(μ-2σ≤ 𝑥 ≤μ+2σ)≈ 95%.
Three standard deviations
About 99.7% of the probability is contained in the range reaching three standard deviations away from the mean on either side, that is, P(μ-3σ≤ 𝑥 ≤μ+3σ)≈ 99.7%
Z-value
The z-value of a data point is the distance in standard deviations from the data point to the mean. Negative z-values correspond to data points less than the mean; positive z-values correspond to data points greater than the mean.
Central Limit Theorem
A theorem stating that if we take sufficiently large randomly-selected samples from a population, the means of these samples will be normally distributed regardless of the shape of the underlying population. (Technically, the underlying population must have a finite variance.)
Distribution of Sample Means
The probability distribution of the means of all randomly-selected samples of the same size that could be taken from a population. The Central Limit Theorem states that for sufficiently large randomly-selected samples, the distribution of sample means approximates a normal distribution. The standard deviation of the distribution of sample means is equal to the standard deviation of the population divided by the square root of the sample size. If we do not know the standard deviation of the population, we can estimate it using the sample standard deviation.
Confidence Interval for a Population Mean
A range constructed around a sample mean that estimates the true population mean. The confidence level of a confidence interval indicates how confident we are that the range contains the true population mean. For example, we are 95% confident that a 95% confidence interval contains the true population mean. The confidence level is equal to 1 – significance level.
n≥30
For large samples (n≥30), the lower and upper bounds are calculated using the following equation:
x ± z*s/sqrt(n)
The function CONFIDENCE.NORM calculates the margin of error, which we add and subtract from the sample mean to find the confidence interval.
n<30
For small samples (n<30), the lower and upper bounds are calculated using the following equation:
x ± t*s/sqrt(n)
For small samples, we use a t-distribution, which is shorter and wider than a normal distribution. The t- distribution provides a wider range, a more conservative estimate of where the true population mean lies.
The function CONFIDENCE.T calculates the margin of error, which we add and subtract from the sample
mean to find the confidence interval.
Dummy Variable
A variable that takes on one of two values: 0 or 1. Dummy variables are used to transform categorical variables into quantitative variables. A categorical variable with only two categories (e.g. “heads” or “tails”) can be transformed into a quantitative variable using a single dummy variable that takes on the value 1 when a data point falls into one category (e.g. “heads”) and 0 when a data point falls into the other category (e.g. “tails”). For categorical variables with more than two categories, multiple dummy variables are required. Specifically, the number of dummy variables must be the total number of categories minus one.
Find Cumulative Probability
=NORM.DIST(x, mean, standard_dev, cumulative)
• When cumulative is set to “TRUE”, NORM.DIST finds the cumulative probability, that is, the probability of
being less than or equal to the specified value x, for a normal distribution with the specified mean and standard deviation.
Find Cumulative Probability Z Value
=NORM.S.DIST(z, cumulative)
• When cumulative is set to “TRUE”, NORM.S.DIST finds the cumulative probability, that is, the probability of
being less than or equal to the specified value z for a standard normal distribution.