Unit6Vocabulary Flashcards
Skewed Left
Also known as negatively skewed, the bulk of the data items are clustered on the positive end of a graph with the long tail to the left.

Mean
The average value of all the data in a dataset. Calculated by adding up the values of all data items and then dividing by the number of items in the dataset.
z-score
A value indicating the number of standard deviations a data item is from the mean of its dataset.
Box and Whisker Plot
A graphical representation of the five number summary.

Upper Quartile
The median of the upper half of a dataset.
Bivariate
Two datasets used to measure correlation.
Strong Positive Correlation
Indicated by a correlation coefficient as defined below:
{ r | 0.7 < r < 1 }
Weak Negative Correlation
Indicated by a correlation coefficient as defined below:
{ r | -0.1 < r < -0.3 }
Weak Positive Correlation
Indicated by a correlation coefficient as defined below:
{ r | 0.1 < 0.3 }
Maximum
The largest data value in a dataset.
Neutral Positive Correlation
Indicated by a correlation coefficient as defined below:
{ r | 0.4 < r < 0.6 }
Correlation Coefficient
A statistical measure of how linear a bivariate dataset is. Typically represented with a lowercase r:
{ r | -1 < r < 1 }
Lower Quartile
The median value of the lower half of a dataset.
Skewed Right
Also known as positively skewed, the bulk of the data items are clustered on the negative end of a graph with the long tail to the right.

Histogram
A graphical representation of the clustering of a dataset based on a specified bin width and the number of data items within each bin.

Bell Curve
A graphical representation of the spread of a normal dataset indicating 1, 2, and 3 standard deviations from mean.

Median
The middle data item in a dataset. When the number of items is even, the median is calculated by taking the middle 2 terms and averaging them.
Standard Deviation
A statistical measure of the average distance the data items within a dataset are from the mean.
Strong Negative Correlation
Indicated by a correlation coefficient as defined below:
{ r | -0.7 < r < -1 }
Causation
In a bivariate data analysis, high correlation is often cited as an indication of a causal relationship. Causation is when it is proven that one thing causes a change in another thing. Correlation does not imply causation.
No Correlation
Indicated by a correlation coefficient near or equal to zero.
Five Number Summary
A measure of a dataset’s spread and distribution accomplished by partitioning the data into quarters:
- Minimum
- Lower Quartile
- Median
- Upper Quartile
- Maximum
Minimum
The data item with the smallest value in a dataset.
Interquartile Range
The difference between the upper and lower quartiles of a dataset.
Outlier
A data item within a dataset whose value is far from the bulk of the other data item’s values.
Normal Distribution
A dataset whose histogram maps closely to a bell curve.

Mode
The number(s) that appear the most in a dataset. If all items appear only once, then there is no mode defined for that dataset.
Neutral Negative Correlation
Indicated by a correlation coefficient as defined below:
{ r | -0.4 < r < -0.6 }
Data Spread
A measure of the range of a dataset.
Percentile Ranking
The percentage of data items whose values are less than the item being ranked.
Range
The width of a dataset’s values. It is calculated as the difference between the maximum and minimum of a dataset.
Data Distribution
A measure of a datasets clustering and spread.