Business Analytics Module 1 Flashcards

1
Q

Histogram

A

A common graphical representation of statistical data, used to represent the distribution of values of a single variable in a data set.

A histogram’s x-axis represents bins corresponding to ranges of data; its y-axis indicates the frequency of
observations falling into each bin.

In Excel: Data > Data Analysis > Histogram > Select Y range > Select X Range > Lables In First Row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outlier

A

A data point in a data set that is atypical in that it lies far outside of the range of the other points in the data set. Technically, an outlier is more than 1.5 times the interquartile range greater than the upper quartile or 1.5 times the interquartile range less than the lower quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scatter plot

A

A graph showing the relationship between two variables. One variable (generally the independent variable) is measured along the x-axis, and the other (generally the dependent variable) is measured along the y-axis. A single marker is placed for each observation in the data set, allowing for easy visualization of the relationship between the variables.

In Excel - “Instert” > “Scatter” > “Input Y Range” > “Input X Range” > Check Lables in First Row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hidden Variable

A

A variable that is correlated with two different variables that are not directly related to each other. The two variables may appear to be unrelated, but are mathematically correlated because each of them is correlated with a third, the hidden variable that drives the observed correlation. A hidden variable makes its presence known through its relationship with each of the two variables that are being observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive Statistics

A

Also known as summary statistics, these are numbers that provide a quick overview of the properties of a data set. Typically, descriptive statistics include the data set’s mean, median, mode, standard deviation, sample size, minimum, maximum, and range.

In Excel: Data > Data Analysis > Descriptive Statistics > Select Range > Lables In First Row > Select Output Range > Select Summary Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean

A

The most common statistic used to describe the center of the values in a data set. The mean is also known as the average. For a distribution that has discrete values, the mean is equal to sum of the values of all the data points in the set, divided by the number of data points.

In Excel: =AVERAGE(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Median

A

The median identifies the middle of a data set such that the same number of data points have values below the median as have values above the median. The median is found by arranging the values of the data points in order of magnitude. If the total number of data points is odd, the median is the value that lies exactly in the middle.

In Excel: =MEDIAN(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

A

The mode is the value that occurs most frequently in a data set.

In Excel: =MODE.SNGL(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Range

A

The distance between the smallest and greatest values in a data set. Range is the simplest measure of the variability of a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variance

A

A measure of the spread of a data set’s values around its mean value. If the true population mean is known, the variance is equal to the sum of the squares of the differences between each point of the data set and the population mean, divided by the total number of data points. If the mean is estimated from a sample, the variance is equal to the sum of the squares of the differences between each point of the data set and the sample mean, divided by the total number of data points in the sample minus one. The variance is the square of the standard deviation. The variance is measured in squared units (e.g., if the data set contains data denominated in dollars, the variance will be in squared dollars).

In Excel: =VAR.S(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard Deviation

A

A measure of the spread of a data set’s values around its mean value. The standard deviation is the square root of the variance. The standard deviation is measured in the same units (such as dollars or hours) as the observations in the data.

In Excel: =STDEV.S(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coefficient of Variation (CV)

A

A measure of a data set’s variability relative to its mean. The coefficient of variation (CV) is particularly helpful when comparing the variability of two data sets with different means. Calculated as the standard deviation divided by the mean, the CV is typically expressed as a percentage. For example the CV of a data set with mean = 100 hours and standard deviation = 15 hours is 15 hours/100 hours = 15%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Conditional Mean

A

A conditional mean is the mean (average) of a subset of data. We apply a condition and calculate the mean for values that meet that condition. For example, in a data set that contains data on both males and females, a conditional mean might be the mean of the data pertaining to only the females in the data set.

In Excel:
=AVERAGEIF(range, criteria, [average_range])
• Returns the conditional mean, or average of the cells in a specified range that meet the given criteria.
• range contains the one or more cells to which we wish to apply the criteria or condition.
• criteria is the condition that is to be applied to the range.
• [average_range] is the range of cells containing the data we wish to average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Percentile

A

The value of a variable for which a certain percentage of the data set falls below. For example, if 87% of students taking the GMAT exam earn scores below 670, the 87th percentile for the GMAT exam is 670 points.

In Excel:
=PERCENTILE.INC(array, k)
• Returns the k-th percentile of value in the specified array. For example, if we want to know the 95th percentile
for an array of data, k would be 0.95.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Correlation Coefficient

A

A measure of the strength of a linear relationship between two variables. The correlation coefficient can range from -1 to +1. A correlation coefficient of -1 indicates a perfect negative linear relationship between two variables, whereas a correlation coefficient of +1 indicates a perfect positive linear relationship. A correlation coefficient of 0 indicates that no linear relationship exists between two variables, though it is possible that a non-linear relationship exists between the two variables.

In Excel: =CORREL(array 1, array 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Time Series

A

Data about an attribute for a given subject (e.g. person, organization, or country) in temporal order, measured at regular time intervals (e.g. minutes, months, or years).

17
Q

Cross-Sectional Data

A

Data that provide a measure of an attribute across multiple different subjects (e.g. people, organizations, or countries) at a given moment in time or during a given time period.