Unit 1 - Exploring One-Variable Data Flashcards

1
Q

How can statistics be used to help answer important, real-world questions based on data that vary?

A

Collect data
Analyze data
Interpret results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Individuals

A

May be people, animals, or things described by a set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

Characteristic that changes from one individual to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cateogrial variable

A

Take values that are category names or labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative variable

A

Takes numerical values for a measured or counted quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Not all variables that take numerical values are

A

quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

It is possible to make a quantitative variable categorical by

A

grouping values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can we represent categorical data in tabular form

A

With a frequency table or relative frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does these tabular representations help us describe categorical data?

A

Counts & relative frequencies of categorical data reveal information that can be used to justify claims about data in context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Frequency table

A

gives the number of individuals or counts in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Relative frequency table

A

Gives the proportion or percent of individuals (cases) in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to represent categorical data

A

Bar chart
Pie chart
Frequency/ relative frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Making bar charts for categorical data

A

Label axes
Scale axes
Draw bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Label axes

A

Variable name on horizontal axis

Frequency/ Relative frequency on vertical axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Scale axes

A

Category labels spread out along horizontal axis

Start scaling vertical axis at 0 and go up in equal increments until you equal or exceed maximum frequency or relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Draw bars

A

Make the bars equal in width and leave gaps between them

Heights of the bars represent the category frequencies or relative frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pie charts

A

Include legend or key to indicate what each part means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Relative frequencies can make it easier to compare

A

distributions of data with different number of parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Charts can be made to be based off of whatever variables is

A

stronger or more supportive of situations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Discrete variable

A

Usually involves counting

Variable that can take on countable numbers of values with gaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example of discrete variable

A

number of siblings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Continuous variable

A

Usually involves measuring

Variable that can take on infinitely many values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Example of continuous variable

A

Height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Dotplot

A

Shows every single point in a data set
Easy to see shape of distribution
May be difficult to make for large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Stem/ Leaf plot

A

Shows all points in a data set

Easy to see shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Histogram

A

Easier for large data sets
Easy to see shapes
Lose the single point in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

4 factors to consider when describing distribution

A

Shape
Unusual Features
Center
Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Shape

A
Symmetric -> about same data on both sides
Skewed left -> more data on high end
Skewed right -> more data on low end
Unimodal -> one peak
Bimodal -> two peak
Uniform -> same data across
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Unusual Features

A

Outliers or gaps/clusters

30
Q

Center

A

Which value in distribution best describes the typical response?

31
Q

Variability

A

Are values in distribution packed close together?

32
Q

Mean

A

Sum of all data values divided by number of values

33
Q

Median

A

Middle value of an ordered data set (odd number of values)

Average of the two middle values of an ordered data set (even number of values)

34
Q

Q1

A

First quartile is median of first half of ordered data set

35
Q

Q3

A

Third quartile is median of second half of ordered data set

36
Q

Range

A

Difference between the max and min

37
Q

Interquartile Range

A

Difference between third and first quartiles

38
Q

Standard Deviation

A

Typical distance that each value is away from the mean

Xs = square root (1/n-1 * sum of(observed value - mean)^2)

39
Q

Square of standard deviation is

A

variance

40
Q

What summary stats can be used to describe the center of a distribution of quantitative data?

A

Mean
Median
Q1 and Q3

41
Q

What summary stats can be used to describe the variability of a distribution of quantitative data?

A

Range
IQR
standard deviation

42
Q

What is the 5 number summary?

A
Max
Min
Median 
Q1
Q3
43
Q

How do we use the 5 number summary to make a boxplot?

A

Use it to split data into quartiles

44
Q

In a skewed right distribution, how does the mean and median compare?

A

mean > median

45
Q

In a skewed left distribution, how does the mean and median compare?

A

mean < median

46
Q

In a symmetric distribution, how does the mean and median compare?

A

mean = median

47
Q

Boxplot

A

Shows the 5 number summary and outliers
Splits the data into quartiles
Does not show every individual value
Can hid certain features of the shape of a distribution

48
Q

how can we determine if a value in a data set is an outlier

A

Less than 1.5 IQR below Q1 or more than 1.5 IQR above Q3

2 ore more standard deviations away from the mean

49
Q

Which summary statistics are resitant and whicha re not

A

Resistant - median or IQR

Nonresistant - mean, SD, rnage

50
Q

Which measures of center & variability are best for describing a skewed distribution?

A

Median

IQR

51
Q

Which measures of center & variability are best for describing a symmerticdistribution?

A

Mean

SD

52
Q

Low outlier

A

< Q1 - 1.5 IQR
OR
< mean - 2SD

53
Q

High outlier

A

> Q3 + 1.5 IQR
OR
< mean + 2SD

54
Q

What are important characteristics to discuss when comparing distributions of quantitative data?

A
think SOCS
Shape
Outlier / Unusual features
Center
Spread/ Variability
55
Q

What is needed for a complete response when comparing distributions of quantitative data?

A

Address the 4 important characteristics
Use comparative words
Include context

56
Q

Percentile

A

Percent of data values less than or equal to a given value

57
Q

How to interpret the percentile

A

“The value of _____ is at the pth percentile. About (p) percent of the values are less than or equal to _____”

58
Q

Standardized score

A

Calculated as data value - mean / standard deviation

59
Q

How to interpret the z-score

A

“The value of _____ is (z-score) standard deviations above or below the mean.”

60
Q

Percentiles and z-scores can be calculated for

A

Distributions with any shape

61
Q

If a number repeats, use the last value of the repeated number to

A

calculate the percentile.

Exp - 2 2 2 2 2 3 4
Use the 5th “2”

62
Q

Normal distribution

A

Mound-shaped and symmetric

Determined by mean and standard deviation

63
Q

Many quantitative variables in the real world can be modeled by

A

normal distribution

64
Q

Within 1 SD of the mean, about

A

68% of the data exists

65
Q

Within 2 SD of the mean, about

A

95% of the data exists

66
Q

Within 3 SD of the mean, about

A

99.7% of the data exists

67
Q

Empirical rule

A

68-95-99.7

68
Q

How can we use the z-score to find the percent of data values in a given interval for a normal distribution?

A

Calculate a z-score and then use Table A

69
Q

How can we use z-score to find the percent of data values in a given interval for a normal distribution?

A

Left : get area from Table A
Right : 1 - area from Table A
Between: subtract two areas from Table A

70
Q

How do we find a value, given an area from a normal distribution

A

Use Table A to find z-score

Set up equation and solve