Module 2 - Section 2 Flashcards

1
Q

What graphs are best for smaller data sets of numerical variables?

A

Stem plots and dot plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What graphs are best for large data sets of quantitative data?

A

histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

appearance of a dot plot?

A

y-axis: frequency
x-axis: name of variable and the values that the data will fall between
. .
. . . . . . .
. . . . . . . . .
values
-dot above where that data point is
-more dots above a point to indicate a frequency more than one
-(i don’t know look at notes if you are confused)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

stem

A

the leading digits of the number in the data
ex: 75 has leading digit or stem 7
100 could have leading digits 100 or 1 (depending on the data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

leaf

A

the last digit of the number in the data

ex: 75 has leaf 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

a key is required for …

A

a stemplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

bins

A

equal-width interval for multiple different numbers of data that are close in values
ex: 70-79 is one bin if 7 is the stem 0-9 are the leaves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

appearance of stemplot

A
stem | leaves
4       |0
5       |
6       |05588
7       |00000455
8       |5
9       |05

Price of Walking shoes
8|5 represents $85

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

back-to-back stem plots

A

-used for the comparison of the distribution of two groups
leaves | stem | leaves
-still require key
-leaves get bigger as you move away from stem! pay attention to left side group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

left inclusion

A

-interval notation as [a,b)
so a on the left is included but not b
-used for histograms along the x-axis to organize bins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

histogram appearance

A
  • bins on x-axis
  • frequency or relative frequency on y-axis
  • bars with no spaces between (unless there is an empty bin)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For dot plots, stem plots, and histograms, which does/does not retain all data values

A

dot and stem plots retain all data values but not histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how can we describe the distribution of a plot?

A

shapes - modes, symmetry or skewness, deviation or outliers
center
spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

mode(s)

A

number of bumps / humps / peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

uniform

A

no modes, square / rectangle appearance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

unimodal

A

a single peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

bimodal

A

two peaks

ex:heights of adults and children will have two peaks one for adults and one for children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

multimodal

A
rarely occurs (except for covid?)
more than two peaks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

symmetry

A

when a graph is symmetrical

if you didn’t get this…I am ashamed lol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

non symmetric graphs are

A

skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

skewed to the right

A

positively skewed
peaks quickly and then slowly trickles down to the right
as if the tail end of the peak on the right has been pulled to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

negatively skewed

A

skewed to the left
the left tail is extended and longer than the right tail ( if peak is essentially symmetric)
……^. .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Outlier

A

a deviation that does not follow the overall pattern of the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

numerical summaries

A

a few important and meaningful numbers that preserves the relevant features of the data set so that you can draw useful conclusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

y

A

variable of interest

the variable for which we have sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

n

A

the sample size / number of observations of the variable y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

y₁

A

the first sample observation of the variable y

28
Q

yn

A

the nth sample observation of the variable y

29
Q

center and examples

A

the value that split the data in half or a typical range of values at the center of the graph
median, mean, mode

30
Q

spread and examples

A

how much do the data values vary around the center?
the range of values, concentration, are most values close to or far from the center?
range, standard deviation, IQR

31
Q

n
Σyᵢ
i=1

What is this? describe all elements.

A

n is the upper boundary
i is the lower boundary
where the set runs from the ith to the nth piece of data
Σ is sigma or summation
This describes adding all of the values of y
used to find the mean

32
Q

ȳ

A
y bar is the mean
mean is 
n
Σyᵢ
i=1
--------
n
aka the sum of all the values in a data set divided by the number of observations
33
Q

M

A

median
the value that divides the ordered sample into two sets
for n is odd, it is the middle value
for n is even, it is the mean of the two middle values

34
Q

mean vs median

A

mean is affected by outliers, while the median is resistant to outliers or skewness

35
Q

mode

A

the value that occurs with the highest frequency in a data set
may be more than one mode

36
Q

center values of symmetric, right skewed and left skewed data sets

A

symmetric: mean=median=mode
right skewed: mean>median>mode
left skewed: mean

37
Q

range

A

describes spread
the difference between the maximum and minimum values in a data set
Range = max - min
strongly influenced by outliers

38
Q

larger range means

A
larger variability (usually)
however sometimes outliers overestimate this
39
Q

deviation

A

yᵢ - ȳ

The deviation of an observation from the mean

40
Q

positive vs negative deviation

A

positive means it is above the mean

negative means it is below the mean

41
Q

the set of all deviations

A
  • all add to 0
  • describes the variability
  • can square every deviation before summing them all up to make the deviations more useful as a number for calculations
42
Q

variance

A

s² = (Σ (yᵢ-ȳ)²) / (n-1)

where Σ has lower boundary i-1 and upper boundary n

43
Q

why is variance problematic?

A

It is measured in squared units which is not very interpretable on its own

44
Q

standard deviation

A

s
square root of the variance
most common measure of variability
tells us how closely data is clustered around the mean
measured in the same units as the original data

45
Q

when would s=0

A

when all observations have the same value

46
Q

what happens if s > 0

A

the standard deviation s increases as observations become more spread out / has greater variability

47
Q

when can/should we use standard deviation? why?

A

we should only use standard deviation and mean together
neither of them are resistant to outliers, thus neither should be used if outliers are present and affecting them to be inaccurate

48
Q

IQR

A

interquartile range
measure of variability
resistant to outliers, ∴ goes with median
divides the data into 4 equal sections ( quartiles

49
Q

percentile

A

the pth percentile is the value so that p% of the measurements fall below the pth percentile and (100-p)% are above it

50
Q

what is the median in percentile?

A

50%

51
Q

can 215 be p?

A

no, percentiles are always between 0-100

52
Q

Q₁

A

the lower quartile is the 25th percentile (separates 25% and 75% of measurements)
median between measurements that fall below the overall median

53
Q

Q₃

A

the upper quartile is the 75th percentile ( separates the top 25% from the bottom 75%)
median between measurements that fall above the overall median

54
Q

what is between Q₁ and Q₃?

A

the middle 50% of measurements that fall between Q₁ and Q₃

55
Q

IQR calculation

A

Q₃-Q₁

56
Q

if IQR is small

A

data is clustered around the center

57
Q

if IQR is large

A

data is scattered far from the center

58
Q

how do we choose a numerical summary?

A
  1. draw a graph
  2. use mean and standard deviation for reasonably symmetric data
  3. use median and IQR for skewed data
  4. If there are multiple modes try to understand why and consider splitting data into two groups
  5. If using mean and standard deviation with outliers, report them with outliers present and removed
59
Q

five-number-summary

A

minimum, Q₁, median, Q₃, maximum

60
Q

boxplot

A

visual representation of data using the 5 number summary
shows the center, spread, symmetry/skewness at the same time
useful for comparing groups

61
Q

fences

A

upper fence = Q₃ + (1.5 x IQR)
lower fence = Q₁ - (1.5 x IQR)
measurements outside the fences are considered outliers

62
Q

whiskers

A

line drawn at the end of the box plot where the highest or lowest value is that is within the fences (not an outlier)

63
Q

far outliers

A

outliers that are farther than 3 IQRs from the quartiles

64
Q

appearance of boxplots

A

x___|——-|̲̅ ̅ ̲̲̲̅̅ ̲̅ ̲̅ ̲̅ ̲̅|̲̅ ̲̅ ̲̲̅̅ ̲̅|——–|

symbols for outliers, whiskers, box for the IQR and a line in the box for the median

65
Q

box plots that are skewed

A

symmetrical
skewed right: median to the left of center and a long right whisker
skewed left: median to the right of center and a long left whisker

66
Q

comparative box plots

A

draw two box plots in one graph to compare the data in two different categories

67
Q

time plot

A

used when interested in how the data behaves over time