Descriptive Statistics Flashcards

1
Q

What is a stem and leaf plot?

A

A stem and leaf plot is a way to visualize smaller sets of data, it consists of a leaf which, is the smallest significant digit and the stem which is the rest of the number. It helps visualize trends and shows the general center of the data without losing any exact values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an outlier?

A

An outlier is a value that does not fit with the rest of the data set, either much larger or much smaller than the rest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can a stem and leaf plot represent two different data sets?

A

Stem and leaf plots can represent two different data sets if they share a stem and the leafs of each different set are on respective sides of the plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is relative frequency?

A

Relative frequency is the frequency of a specific data values divided by the total number of data values, in precent or decimal form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a frequency table?

A

Frequency tables display classes or individual data points on one side mapped to their respective frequencies on the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two major types of data?

A

The two major types of data are categorical or discrete and continuous data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is discrete or categorical data?

A

Values that don’t take on continuous values, values that can be counted, like integers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is continuous data?

A

Data that can not be individually counted, like the real numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a line graph?

A

A line graph displays continuous data by connected pairs of data points, such as a frequency of individual times a data point arises.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a bar graph?

A

A bar graph represents categorical or discrete data in individual bars that represent each type of data. The name of the variable goes on the x axis and the frequency on the y axis, the boxes are separated. Bar graphs allow you to easily compare categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are data classes constructed?

A

There are two major indicators of classes, class boundaries and limits. First determine the number of classes, the size of the classes, the class limits, the class boundaries and the class median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a histogram?

A

A histogram displays continuous data with bars that are bounded by class boundaries and represent classes of data, the class bounds are on the x axis and the frequency or relative frequency on the y axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a frequency polygon?

A

Frequency polygons are line graphs that graph the midpoint of a data class. They graph against the value of the midpoint (x axis), and the frequency of the class (y axis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a paired data set?

A

Paired data sets are 1:1 data sets where each point in the set maps to another in the other set. An example of this is data collected over time, where each time value connected to the data point collected at that time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a time series graph?

A

Time series graph graphs the paired data set of data sets collected over time. Typically the data point is on the y axis and the time interval is on the x axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mean?

A

The mean or the “average” of the data set, it gives a measure of center but it is highly affected by skew.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is the mean calculated in a full data set?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is the mean calculated in classes or frequency tables?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is an outlier?

A

A value that is significantly greater than or less than most of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are percentiles?

A

Percentiles are values that divide the data set into 100 pieces, into a percentage. Say given 170 the 60th percentile, 60 percent of values are less than 170, and 40 percent are greater than 170. Though it can be inclusive or exclusive depending on how it’s specified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you find a percentile of a data set?

A

To find the specific index of a percentile, use the formula below, where i is the index, k is the percentile, and n is the total number of data points. If the index is not an integer take the value at the index above and the value at the index below the given value and average them, this is the percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you find the percentile of a data point?

A

Given a specific data point use the formula below to find it’s percentile where, x is the number of values (not including the value) in the data set below the specified number, y is the number of times that value occurs, and n is the total size of the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the median?

A

The median is a measure of center of the data set. It cuts it perfectly into halves. The median is found by ordering the data set and dividing the size of the data set by two, the value at this index is the median. If the value is not an integer than the average of the two values surrounding that index are the mean. The median is quartile 2 and also the 50th percentile. The median is less affected by skew than the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Is the median included when calculating quartiles?

A

It depends on the method, inclusive or exclusive, but AP uses exclusive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Where does data go if it falls on a class boundary?

A

If data falls on the lower boundary it is in that class if data falls on an upper boundary (the lower boundary of the next class) then it is in the next class up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are quartiles?

A

Quartiles are values that divide the data set into quarters. Q1 is the 25th percentile, Q2 is the median or 50th percentile, and Q3 is the 75th percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do you find quartiles?

A

Divide the data set into two, and divide the two resulting data sets by two, excluding the median typically. These values are Q1, median, and Q3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the mode?

A

The mode is the most common data point in a set, it is a measure of center and is affected very little by the skew. Mode can be calculated for non-numerical data sets.

29
Q

What is skew?

A

Skew is the shape of asymmetrical data.

30
Q

How do you determine the direction of skew?

A

The end which is drawn out more than the other will be the direction of the skew.

31
Q

How does skew affect the measures of center?

A

Skew affects mean the most, median less than mean, and mode the least.

32
Q

How does left skew affect the measures of center?

A
33
Q

How does right skew affect the measures of center?

A
34
Q

What is a 5 point summary?

A

The 5 point summary consists of the minimum, Q1, median, Q3, and maximum. It shows the shape, spread, and center, of the data set.

35
Q

What is a box plot?

A

Box plots are graphical ways to display the 5 digit summaries, they consist of 25% (depending on if the lower or upper limit is used this may be different) of the data. The box plot is placed over a number line and scaled to fit to show the distribution.

36
Q

What does a box plot tell us about the data?

A

Box plots tell us about the shape, spread, and center of the data.

37
Q

What are the important features of a box plot?

A

They are bounded by either the lower and upper limits or the minimum and maximum values. The inner boxes are bounded by Q1 and Q2 with a center at the median. The other important feature is any outliers marked with dots.

38
Q

What is the IQR?

A

The interquartile range is the difference between Q3 and Q1. It shows how much a data set is spread.

39
Q

What is the lower bound of a data set based on IQR?

A

Q1-1.5(IQR)

40
Q

What is the upper bound of a data set based on IQR?

A

Q3+1.5(IQR)

41
Q

What are the three most important things about single variable data to describe?

A

Shape, center, and spread.

42
Q

What must every graph have?

A

Title, labels, scales

43
Q

What is variance?

A

Variance is the average square of the deviation from the mean.

44
Q

Why is variance squared?

A

The squaring of deviation when calculating variance does two major things, it eliminates the negatives and it makes larger deviations play a bigger role in the calculation.

45
Q

What is the formula for variance in a sample?

A
46
Q

What is the formula for variance in a population?

A
47
Q

What is a sample?

A

Sample is a sub set of the total population.

48
Q

What is a population?

A

The population is the total possible data set.

49
Q

What is the difference between variation in a sample vs variation in a population?

A

The variation in a sample must be corrected for error in the sample so the sample error is divided by n-1. The population variance on the other hand does not need to be corrected so it is divided by n.

50
Q

What is standard deviation?

A

The standard deviation is the square root of variance.

51
Q

Why use standard deviation over variance?

A

Variance and standard deviation are both perfectly fine measures of spread, but variance is not in the same units as the data, standard deviation is.

52
Q

What two major things does standard deviation say about data/data points?

A

Standard deviation tells us how spread out the data is (variability) and how far data points are from the center of the distribution.

53
Q

What is a z-score?

A

A z-score the is the number of standard deviations a data point is away from the mean.

54
Q

Why is a z-score a better measure of how far data is spread?

A

Things like “how far” something is changes based on the data and how the data is distributed, what is far for one data set may not be far from another. Z-score gives an analytical way to quantify how far apart data points are without regard to the nuances of the individual data set.

55
Q

How is z-score calculated?

A
56
Q

What does the sign of z-score tell us?

A

If the z score is negative the data point is that number of standard deviations LOWER than the mean, if the z score is positive than the data point is that number of standard deviations HIGHER than the mean.

57
Q

When does standard deviation work best?

A

Standard deviation works best in symmetric or normally distributed data, it is heavily affected by skew and that decreases it’s reliability.

58
Q

When does standard deviation work less well?

A

In skewed data sets outliers have an outsized affect on the deviation, so it is a less accurate measure of spread.

59
Q

What can standard deviation tell us about all data?

A

At least 75% of any data set will be within 2 standard deviations of the mean, 89% within 3, and 95% within 4.

60
Q

What does standard deviation tell us about normally distributed data?

A

68% of data will be within 1 standard deviation of the mean, 95% in two and 99% in 3.

61
Q

What is a statistic?

A

A statistic is any calculation made on a sample that is used to analyze the data.

62
Q

What is the law of large numbers?

A

As the number of samples or sample size increases the statistics of the data get closer and closer to the true values of the population.

63
Q

What is the difference between population mean and sample mean?

A

Population mean represents the entire population, sample mean only represents a sample.

64
Q

What is a sampling distribution?

A

Sampling distribution is the distribution that the relative frequency distribution approaches as more samples are taken.

65
Q

What is standard error?

A

Because a sample is not a perfect representation of a population, errors occur. The average error is called the standard error.

66
Q

What is the range?

A

The range is the difference between the largest value in a set and the smallest value in a set.

67
Q

What makes the mean median mode and standard deviation different in a frequency table or a histogram?

A

You can’t calculate the exact values you can just identify general trends.

68
Q

What is the formula for the mean of a nonspecific frequency distribution?

A

The sum of the frequencies of each individual distribution times the mean of interval divided by the sum of all of the frequencies in the distribution.