Descriptive Statistics Flashcards

Question 1

Q

What is a stem and leaf plot?

Answer

A

A stem and leaf plot is a way to visualize smaller sets of data, it consists of a leaf which, is the smallest significant digit and the stem which is the rest of the number. It helps visualize trends and shows the general center of the data without losing any exact values.

Question 2

Q

What is an outlier?

Answer

A

An outlier is a value that does not fit with the rest of the data set, either much larger or much smaller than the rest.

Question 3

Q

How can a stem and leaf plot represent two different data sets?

Answer

A

Stem and leaf plots can represent two different data sets if they share a stem and the leafs of each different set are on respective sides of the plot.

Question 4

Q

What is relative frequency?

Answer

A

Relative frequency is the frequency of a specific data values divided by the total number of data values, in precent or decimal form.

Question 5

Q

What is a frequency table?

Answer

A

Frequency tables display classes or individual data points on one side mapped to their respective frequencies on the other.

Question 6

Q

What are the two major types of data?

Answer

A

The two major types of data are categorical or discrete and continuous data.

Question 7

Q

What is discrete or categorical data?

Answer

A

Values that don’t take on continuous values, values that can be counted, like integers.

Question 8

Q

What is continuous data?

Answer

A

Data that can not be individually counted, like the real numbers.

Question 9

Q

What is a line graph?

Answer

A

A line graph displays continuous data by connected pairs of data points, such as a frequency of individual times a data point arises.

Question 10

Q

What is a bar graph?

Answer

A

A bar graph represents categorical or discrete data in individual bars that represent each type of data. The name of the variable goes on the x axis and the frequency on the y axis, the boxes are separated. Bar graphs allow you to easily compare categorical data.

Question 11

Q

How are data classes constructed?

Answer

A

There are two major indicators of classes, class boundaries and limits. First determine the number of classes, the size of the classes, the class limits, the class boundaries and the class median.

Question 12

Q

What is a histogram?

Answer

A

A histogram displays continuous data with bars that are bounded by class boundaries and represent classes of data, the class bounds are on the x axis and the frequency or relative frequency on the y axis.

Question 13

Q

What is a frequency polygon?

Answer

A

Frequency polygons are line graphs that graph the midpoint of a data class. They graph against the value of the midpoint (x axis), and the frequency of the class (y axis).

Question 14

Q

What is a paired data set?

Answer

A

Paired data sets are 1:1 data sets where each point in the set maps to another in the other set. An example of this is data collected over time, where each time value connected to the data point collected at that time.

Question 15

Q

What is a time series graph?

Answer

A

Time series graph graphs the paired data set of data sets collected over time. Typically the data point is on the y axis and the time interval is on the x axis.

Question 16

Q

What is the mean?

Answer

A

The mean or the “average” of the data set, it gives a measure of center but it is highly affected by skew.

Question 17

Q

How is the mean calculated in a full data set?

Question 18

Q

How is the mean calculated in classes or frequency tables?

Question 19

Q

What is an outlier?

Answer

A

A value that is significantly greater than or less than most of the data.

Question 20

Q

What are percentiles?

Answer

A

Percentiles are values that divide the data set into 100 pieces, into a percentage. Say given 170 the 60th percentile, 60 percent of values are less than 170, and 40 percent are greater than 170. Though it can be inclusive or exclusive depending on how it’s specified.

Question 21

Q

How do you find a percentile of a data set?

Answer

A

To find the specific index of a percentile, use the formula below, where i is the index, k is the percentile, and n is the total number of data points. If the index is not an integer take the value at the index above and the value at the index below the given value and average them, this is the percentile.

Question 22

Q

How do you find the percentile of a data point?

Answer

A

Given a specific data point use the formula below to find it’s percentile where, x is the number of values (not including the value) in the data set below the specified number, y is the number of times that value occurs, and n is the total size of the data set.

Question 23

Q

What is the median?

Answer

A

The median is a measure of center of the data set. It cuts it perfectly into halves. The median is found by ordering the data set and dividing the size of the data set by two, the value at this index is the median. If the value is not an integer than the average of the two values surrounding that index are the mean. The median is quartile 2 and also the 50th percentile. The median is less affected by skew than the mean.

Question 24

Q

Is the median included when calculating quartiles?

Answer

A

It depends on the method, inclusive or exclusive, but AP uses exclusive.

Question 25

Q

Where does data go if it falls on a class boundary?

Answer

A

If data falls on the lower boundary it is in that class if data falls on an upper boundary (the lower boundary of the next class) then it is in the next class up.

Question 26

Q

What are quartiles?

Answer

A

Quartiles are values that divide the data set into quarters. Q1 is the 25th percentile, Q2 is the median or 50th percentile, and Q3 is the 75th percentile.

Question 27

Q

How do you find quartiles?

Answer

A

Divide the data set into two, and divide the two resulting data sets by two, excluding the median typically. These values are Q1, median, and Q3.

Question 28

Q

What is the mode?

Answer

A

The mode is the most common data point in a set, it is a measure of center and is affected very little by the skew. Mode can be calculated for non-numerical data sets.

Question 29

Q

What is skew?

Answer

A

Skew is the shape of asymmetrical data.

Question 30

Q

How do you determine the direction of skew?

Answer

A

The end which is drawn out more than the other will be the direction of the skew.

Question 31

Q

How does skew affect the measures of center?

Answer

A

Skew affects mean the most, median less than mean, and mode the least.

Question 32

Q

How does left skew affect the measures of center?

Question 33

Q

How does right skew affect the measures of center?

Question 34

Q

What is a 5 point summary?

Answer

A

The 5 point summary consists of the minimum, Q1, median, Q3, and maximum. It shows the shape, spread, and center, of the data set.

Question 35

Q

What is a box plot?

Answer

A

Box plots are graphical ways to display the 5 digit summaries, they consist of 25% (depending on if the lower or upper limit is used this may be different) of the data. The box plot is placed over a number line and scaled to fit to show the distribution.

Question 36

Q

What does a box plot tell us about the data?

Answer

A

Box plots tell us about the shape, spread, and center of the data.

Question 37

Q

What are the important features of a box plot?

Answer

A

They are bounded by either the lower and upper limits or the minimum and maximum values. The inner boxes are bounded by Q1 and Q2 with a center at the median. The other important feature is any outliers marked with dots.

Question 38

Q

What is the IQR?

Answer

A

The interquartile range is the difference between Q3 and Q1. It shows how much a data set is spread.

Question 39

Q

What is the lower bound of a data set based on IQR?

Answer

A

Q1-1.5(IQR)

Question 40

Q

What is the upper bound of a data set based on IQR?

Answer

A

Q3+1.5(IQR)

Question 41

Q

What are the three most important things about single variable data to describe?

Answer

A

Shape, center, and spread.

Question 42

Q

What must every graph have?

Answer

A

Title, labels, scales

Question 43

Q

What is variance?

Answer

A

Variance is the average square of the deviation from the mean.

Question 44

Q

Why is variance squared?

Answer

A

The squaring of deviation when calculating variance does two major things, it eliminates the negatives and it makes larger deviations play a bigger role in the calculation.

Question 45

Q

What is the formula for variance in a sample?

Question 46

Q

What is the formula for variance in a population?

Question 47

Q

What is a sample?

Answer

A

Sample is a sub set of the total population.

Question 48

Q

What is a population?

Answer

A

The population is the total possible data set.

Question 49

Q

What is the difference between variation in a sample vs variation in a population?

Answer

A

The variation in a sample must be corrected for error in the sample so the sample error is divided by n-1. The population variance on the other hand does not need to be corrected so it is divided by n.

Question 50

Q

What is standard deviation?

Answer

A

The standard deviation is the square root of variance.

Question 51

Q

Why use standard deviation over variance?

Answer

A

Variance and standard deviation are both perfectly fine measures of spread, but variance is not in the same units as the data, standard deviation is.

Question 52

Q

What two major things does standard deviation say about data/data points?

Answer

A

Standard deviation tells us how spread out the data is (variability) and how far data points are from the center of the distribution.

Question 53

Q

What is a z-score?

Answer

A

A z-score the is the number of standard deviations a data point is away from the mean.

Question 54

Q

Why is a z-score a better measure of how far data is spread?

Answer

A

Things like “how far” something is changes based on the data and how the data is distributed, what is far for one data set may not be far from another. Z-score gives an analytical way to quantify how far apart data points are without regard to the nuances of the individual data set.

Question 55

Q

How is z-score calculated?

Question 56

Q

What does the sign of z-score tell us?

Answer

A

If the z score is negative the data point is that number of standard deviations LOWER than the mean, if the z score is positive than the data point is that number of standard deviations HIGHER than the mean.

Question 57

Q

When does standard deviation work best?

Answer

A

Standard deviation works best in symmetric or normally distributed data, it is heavily affected by skew and that decreases it’s reliability.

Question 58

Q

When does standard deviation work less well?

Answer

A

In skewed data sets outliers have an outsized affect on the deviation, so it is a less accurate measure of spread.

Question 59

Q

What can standard deviation tell us about all data?

Answer

A

At least 75% of any data set will be within 2 standard deviations of the mean, 89% within 3, and 95% within 4.

Question 60

Q

What does standard deviation tell us about normally distributed data?

Answer

A

68% of data will be within 1 standard deviation of the mean, 95% in two and 99% in 3.

Question 61

Q

What is a statistic?

Answer

A

A statistic is any calculation made on a sample that is used to analyze the data.

Question 62

Q

What is the law of large numbers?

Answer

A

As the number of samples or sample size increases the statistics of the data get closer and closer to the true values of the population.

Question 63

Q

What is the difference between population mean and sample mean?

Answer

A

Population mean represents the entire population, sample mean only represents a sample.

Question 64

Q

What is a sampling distribution?

Answer

A

Sampling distribution is the distribution that the relative frequency distribution approaches as more samples are taken.

Answer 58

A

Because a sample is not a perfect representation of a population, errors occur. The average error is called the standard error.

Answer 59

A

The range is the difference between the largest value in a set and the smallest value in a set.

Answer 60

A

You can’t calculate the exact values you can just identify general trends.

Answer 61

A

The sum of the frequencies of each individual distribution times the mean of interval divided by the sum of all of the frequencies in the distribution.

Brainscape's Knowledge GenomeTM

Descriptive Statistics Flashcards

Brainscape's Knowledge Genome^TM