Business Analytics Module 1 Flashcards
Histogram
A common graphical representation of statistical data, used to represent the distribution of values of a single variable in a data set.
A histogram’s x-axis represents bins corresponding to ranges of data; its y-axis indicates the frequency of
observations falling into each bin.
In Excel: Data > Data Analysis > Histogram > Select Y range > Select X Range > Lables In First Row
Outlier
A data point in a data set that is atypical in that it lies far outside of the range of the other points in the data set. Technically, an outlier is more than 1.5 times the interquartile range greater than the upper quartile or 1.5 times the interquartile range less than the lower quartile.
Scatter plot
A graph showing the relationship between two variables. One variable (generally the independent variable) is measured along the x-axis, and the other (generally the dependent variable) is measured along the y-axis. A single marker is placed for each observation in the data set, allowing for easy visualization of the relationship between the variables.
In Excel - “Instert” > “Scatter” > “Input Y Range” > “Input X Range” > Check Lables in First Row
Hidden Variable
A variable that is correlated with two different variables that are not directly related to each other. The two variables may appear to be unrelated, but are mathematically correlated because each of them is correlated with a third, the hidden variable that drives the observed correlation. A hidden variable makes its presence known through its relationship with each of the two variables that are being observed.
Descriptive Statistics
Also known as summary statistics, these are numbers that provide a quick overview of the properties of a data set. Typically, descriptive statistics include the data set’s mean, median, mode, standard deviation, sample size, minimum, maximum, and range.
In Excel: Data > Data Analysis > Descriptive Statistics > Select Range > Lables In First Row > Select Output Range > Select Summary Statistics
Mean
The most common statistic used to describe the center of the values in a data set. The mean is also known as the average. For a distribution that has discrete values, the mean is equal to sum of the values of all the data points in the set, divided by the number of data points.
In Excel: =AVERAGE(number 1, [number 2], …)
Median
The median identifies the middle of a data set such that the same number of data points have values below the median as have values above the median. The median is found by arranging the values of the data points in order of magnitude. If the total number of data points is odd, the median is the value that lies exactly in the middle.
In Excel: =MEDIAN(number 1, [number 2], …)
Mode
The mode is the value that occurs most frequently in a data set.
In Excel: =MODE.SNGL(number 1, [number 2], …)
Range
The distance between the smallest and greatest values in a data set. Range is the simplest measure of the variability of a data set.
Variance
A measure of the spread of a data set’s values around its mean value. If the true population mean is known, the variance is equal to the sum of the squares of the differences between each point of the data set and the population mean, divided by the total number of data points. If the mean is estimated from a sample, the variance is equal to the sum of the squares of the differences between each point of the data set and the sample mean, divided by the total number of data points in the sample minus one. The variance is the square of the standard deviation. The variance is measured in squared units (e.g., if the data set contains data denominated in dollars, the variance will be in squared dollars).
In Excel: =VAR.S(number 1, [number 2], …)
Standard Deviation
A measure of the spread of a data set’s values around its mean value. The standard deviation is the square root of the variance. The standard deviation is measured in the same units (such as dollars or hours) as the observations in the data.
In Excel: =STDEV.S(number 1, [number 2], …)
Coefficient of Variation (CV)
A measure of a data set’s variability relative to its mean. The coefficient of variation (CV) is particularly helpful when comparing the variability of two data sets with different means. Calculated as the standard deviation divided by the mean, the CV is typically expressed as a percentage. For example the CV of a data set with mean = 100 hours and standard deviation = 15 hours is 15 hours/100 hours = 15%.
Conditional Mean
A conditional mean is the mean (average) of a subset of data. We apply a condition and calculate the mean for values that meet that condition. For example, in a data set that contains data on both males and females, a conditional mean might be the mean of the data pertaining to only the females in the data set.
In Excel:
=AVERAGEIF(range, criteria, [average_range])
• Returns the conditional mean, or average of the cells in a specified range that meet the given criteria.
• range contains the one or more cells to which we wish to apply the criteria or condition.
• criteria is the condition that is to be applied to the range.
• [average_range] is the range of cells containing the data we wish to average.
Percentile
The value of a variable for which a certain percentage of the data set falls below. For example, if 87% of students taking the GMAT exam earn scores below 670, the 87th percentile for the GMAT exam is 670 points.
In Excel:
=PERCENTILE.INC(array, k)
• Returns the k-th percentile of value in the specified array. For example, if we want to know the 95th percentile
for an array of data, k would be 0.95.
Correlation Coefficient
A measure of the strength of a linear relationship between two variables. The correlation coefficient can range from -1 to +1. A correlation coefficient of -1 indicates a perfect negative linear relationship between two variables, whereas a correlation coefficient of +1 indicates a perfect positive linear relationship. A correlation coefficient of 0 indicates that no linear relationship exists between two variables, though it is possible that a non-linear relationship exists between the two variables.
In Excel: =CORREL(array 1, array 2)