Descriptive Statistics Flashcards
What is a stem and leaf plot?
A stem and leaf plot is a way to visualize smaller sets of data, it consists of a leaf which, is the smallest significant digit and the stem which is the rest of the number. It helps visualize trends and shows the general center of the data without losing any exact values.
What is an outlier?
An outlier is a value that does not fit with the rest of the data set, either much larger or much smaller than the rest.
How can a stem and leaf plot represent two different data sets?
Stem and leaf plots can represent two different data sets if they share a stem and the leafs of each different set are on respective sides of the plot.
What is relative frequency?
Relative frequency is the frequency of a specific data values divided by the total number of data values, in precent or decimal form.
What is a frequency table?
Frequency tables display classes or individual data points on one side mapped to their respective frequencies on the other.
What are the two major types of data?
The two major types of data are categorical or discrete and continuous data.
What is discrete or categorical data?
Values that don’t take on continuous values, values that can be counted, like integers.
What is continuous data?
Data that can not be individually counted, like the real numbers.
What is a line graph?
A line graph displays continuous data by connected pairs of data points, such as a frequency of individual times a data point arises.
What is a bar graph?
A bar graph represents categorical or discrete data in individual bars that represent each type of data. The name of the variable goes on the x axis and the frequency on the y axis, the boxes are separated. Bar graphs allow you to easily compare categorical data.
How are data classes constructed?
There are two major indicators of classes, class boundaries and limits. First determine the number of classes, the size of the classes, the class limits, the class boundaries and the class median.
What is a histogram?
A histogram displays continuous data with bars that are bounded by class boundaries and represent classes of data, the class bounds are on the x axis and the frequency or relative frequency on the y axis.
What is a frequency polygon?
Frequency polygons are line graphs that graph the midpoint of a data class. They graph against the value of the midpoint (x axis), and the frequency of the class (y axis).
What is a paired data set?
Paired data sets are 1:1 data sets where each point in the set maps to another in the other set. An example of this is data collected over time, where each time value connected to the data point collected at that time.
What is a time series graph?
Time series graph graphs the paired data set of data sets collected over time. Typically the data point is on the y axis and the time interval is on the x axis.
What is the mean?
The mean or the “average” of the data set, it gives a measure of center but it is highly affected by skew.
How is the mean calculated in a full data set?
How is the mean calculated in classes or frequency tables?
What is an outlier?
A value that is significantly greater than or less than most of the data.
What are percentiles?
Percentiles are values that divide the data set into 100 pieces, into a percentage. Say given 170 the 60th percentile, 60 percent of values are less than 170, and 40 percent are greater than 170. Though it can be inclusive or exclusive depending on how it’s specified.
How do you find a percentile of a data set?
To find the specific index of a percentile, use the formula below, where i is the index, k is the percentile, and n is the total number of data points. If the index is not an integer take the value at the index above and the value at the index below the given value and average them, this is the percentile.
How do you find the percentile of a data point?
Given a specific data point use the formula below to find it’s percentile where, x is the number of values (not including the value) in the data set below the specified number, y is the number of times that value occurs, and n is the total size of the data set.
What is the median?
The median is a measure of center of the data set. It cuts it perfectly into halves. The median is found by ordering the data set and dividing the size of the data set by two, the value at this index is the median. If the value is not an integer than the average of the two values surrounding that index are the mean. The median is quartile 2 and also the 50th percentile. The median is less affected by skew than the mean.
Is the median included when calculating quartiles?
It depends on the method, inclusive or exclusive, but AP uses exclusive.
Where does data go if it falls on a class boundary?
If data falls on the lower boundary it is in that class if data falls on an upper boundary (the lower boundary of the next class) then it is in the next class up.
What are quartiles?
Quartiles are values that divide the data set into quarters. Q1 is the 25th percentile, Q2 is the median or 50th percentile, and Q3 is the 75th percentile.
How do you find quartiles?
Divide the data set into two, and divide the two resulting data sets by two, excluding the median typically. These values are Q1, median, and Q3.