2.4-2.8 Flashcards

1
Q

What does knowledge of a data set’s variability and center help us to do?

A

can help us visualize the shape of the data set as well as its extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the range?

A

The range of a quantitative data set is equal to the largest measurement minus the smallest measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a con of range? Why is this a con?

A

a rather insensitive measure of data variation when the data sets are large. This is because two data sets can have the same range and be vastly different with respect to data variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sample variance?

A

The sample variance for a sample of n measurements is equal to the sum of the squared deviations from the mean, divided by . The symbol s^2 is used to represent the sample variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean if the deviations are mostly small?

A

he data are clustered around the mean, x

, and therefore do not exhibit much variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the sample standard deviation?

A

The sample standard deviation, s, is defined as the positive square root of the sample variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What symbol represents a population variance?

A

sigma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we use n-1 when calculating sample variance instead of just n?

A

n tends to produce an underestimate of sigma^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the value of standard deviation indicate?

A

The larger the standard deviation, the more variable the data are. The smaller the standard deviation, the less variation there is in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Chebyshev’s rule?

A

a. It is possible that very few of the measurements will fall within one standard deviation of the mean
b. At least 3/4 of the measurements will fall within two standard deviations of the mean
c. At least 8/9 of the measurements will fall within three standard deviations of the mean
d. Generally, for any number k greater than 1, at least (1-1/k^2) of the measurements will fall within k standard deviations of the mean (x-ks, x+ks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the empirical rule?

A

a. Approximately 68% of the measurements will fall within one standard deviation of the mean
b. Approximately 95% of the measurements will fall within two standard deviations of the mean
c. Approximately 99.7% (essentially all) of the measurements will fall within three standard deviations of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What sets of data do Chebyshev’s rule and the empirical rule apply to?

A

CR: any
EMP: mound-shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

WHat are measures of relative standing?

A

Descriptive measures of the relationship of a measurement to the rest of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the pth percentile?

A

For any set of n measurements (arranged in ascending or descending order), the pth percentile is a number such that p% of the measurements fall below that number and (100-p)% fall above it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the quartiles/

A

The lower quartile (QL) is the 25th percentile of a data set. The middle quartile (M) is the median or 50th percentile. The upper quartile (QU) is the 75th percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a z score for measurement x? WHat does it represent?

A

z = (x-x)/sb
The final result, the z-score, represents the distance between a given measurement x and the mean, expressed in standard deviations.

17
Q

What does a large, small and zero/near zero z-score indicate?

A

A large positive z-score implies that the measurement is larger than almost all other measurements, whereas a large negative z-score indicates that the measurement is smaller than almost every other measurement. If a z-score is 0 or near 0, the measurement is located at or near the mean of the sample or population.

18
Q

What are expectations of z-scores for mound-shaped data?

A

Approximately 68% of the measurements will have a z-score between -1 and 1.

Approximately 95% of the measurements will have a z-score between -2 and 2.

Approximately 99.7% (almost all) of the measurements will have a z-score between -3 and 3.

19
Q

What is an outlier?

A

An observation that is unusually large or small relative to the data values we want to describe

20
Q

When do outliers most often occur?

A

The measurement is observed, recorded, or entered into the computer incorrectly.
The measurement comes from a different population.
The measurement is correct but represents a rare (chance) event

21
Q

What is the interquartile range?

A

The interquartile range (IQR) is the distance between the lower and upper quartiles:

22
Q

Describe a box plot.

A

A rectangle (the box) is drawn with the ends (the hinges) drawn at the lower and upper quartiles. The median M of the data is shown in the box, usually by a line or a symbol (such as “ +”).

23
Q

What are the inner fences? What are the whiskers? in a box plot.

A

The points at distances 1.5(IQR) from each hinge mark the inner fences of the data set. Lines (the whiskers) are drawn from each hinge to the most extreme measurement inside the inner fence.

24
Q

What are the outer fences?

A

A second pair of fences, the outer fences, appears at a distance of 3(IQR) from the hinges. One symbol (e.g., “*”) is used to represent measurements falling between the inner and outer fences, and another (e.g., “0”) is used to represent measurements that lie beyond the outer fences. Thus, outer fences are not shown unless one or more measurements lie beyond them.

25
Q

What does the line inside the box in a box plot represent?

A

Center of the distribution

26
Q

What does it mean if one whisker is longer than the other in a box ploy?

A

Data is probably skewed in the direction of the longer whisker.

27
Q

How much of the total measurements should fall beyond the inner fences?

A

Less than 5%.

28
Q

What should you do before removing an outlier from the data set?

A

Before removing the outliers from the data set, a good analyst will make a concerted effort to find the cause of the outliers

29
Q

What is a rule of thumb for detecting outliers in box plots?

A

Observations falling between the inner and outer fences are deemed suspect outliers. Observations falling beyond the outer fence are deemed highly suspect outliers.

30
Q

What is a rule of thumb for detecting outliers in a z-score?

A

Observations with z-scores greater than 3 in absolute value are considered outliers. For some highly skewed data sets, observations with z-scores greater than 2 in absolute value may be outliers.

31
Q

What is a bivariate relationship?

A

relationship between two quantitative variables

32
Q

What is a way to describe a bivariate relationship?

A

Scatterplot.

33
Q

What does it mean if variables are positively correlated? Negatively?

A

When an increase in one variable is generally associated with an increase in the second variable, we say that the two variables are “positively related” or ­“positively correlated.
if one variable has a tendency to decrease as the other increases, we say the variables are “negatively correlated.

34
Q

Can a measure of reliability be attached to inferences made about bivariate relationships based on scatterplots of sample data?

A

No.

35
Q

Explain the difference between a bar graph and a histogram.

A

Bar graph is used to describe qualitative data, histogram is for quantitative data. Therefore, the x-axis
of bar graph can be anything (categories name) while x-axis of histogram can only be numerical values.
The columns in bar graph do not touch each other, but histogram column can.

36
Q

give the shortcut to determine variance

A