Unit 1 - Exploring One-Variable Data Flashcards

1
Q

How can statistics be used to help answer important, real-world questions based on data that vary?

A

Collect data
Analyze data
Interpret results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Individuals

A

May be people, animals, or things described by a set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

Characteristic that changes from one individual to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cateogrial variable

A

Take values that are category names or labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative variable

A

Takes numerical values for a measured or counted quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Not all variables that take numerical values are

A

quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

It is possible to make a quantitative variable categorical by

A

grouping values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can we represent categorical data in tabular form

A

With a frequency table or relative frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does these tabular representations help us describe categorical data?

A

Counts & relative frequencies of categorical data reveal information that can be used to justify claims about data in context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Frequency table

A

gives the number of individuals or counts in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Relative frequency table

A

Gives the proportion or percent of individuals (cases) in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to represent categorical data

A

Bar chart
Pie chart
Frequency/ relative frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Making bar charts for categorical data

A

Label axes
Scale axes
Draw bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Label axes

A

Variable name on horizontal axis

Frequency/ Relative frequency on vertical axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Scale axes

A

Category labels spread out along horizontal axis

Start scaling vertical axis at 0 and go up in equal increments until you equal or exceed maximum frequency or relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Draw bars

A

Make the bars equal in width and leave gaps between them

Heights of the bars represent the category frequencies or relative frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pie charts

A

Include legend or key to indicate what each part means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Relative frequencies can make it easier to compare

A

distributions of data with different number of parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Charts can be made to be based off of whatever variables is

A

stronger or more supportive of situations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Discrete variable

A

Usually involves counting

Variable that can take on countable numbers of values with gaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example of discrete variable

A

number of siblings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Continuous variable

A

Usually involves measuring

Variable that can take on infinitely many values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Example of continuous variable

A

Height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Dotplot

A

Shows every single point in a data set
Easy to see shape of distribution
May be difficult to make for large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Stem/ Leaf plot
Shows all points in a data set | Easy to see shape
26
Histogram
Easier for large data sets Easy to see shapes Lose the single point in a data set
27
4 factors to consider when describing distribution
Shape Unusual Features Center Variability
28
Shape
``` Symmetric -> about same data on both sides Skewed left -> more data on high end Skewed right -> more data on low end Unimodal -> one peak Bimodal -> two peak Uniform -> same data across ```
29
Unusual Features
Outliers or gaps/clusters
30
Center
Which value in distribution best describes the typical response?
31
Variability
Are values in distribution packed close together?
32
Mean
Sum of all data values divided by number of values
33
Median
Middle value of an ordered data set (odd number of values) Average of the two middle values of an ordered data set (even number of values)
34
Q1
First quartile is median of first half of ordered data set
35
Q3
Third quartile is median of second half of ordered data set
36
Range
Difference between the max and min
37
Interquartile Range
Difference between third and first quartiles
38
Standard Deviation
Typical distance that each value is away from the mean Xs = square root (1/n-1 * sum of(observed value - mean)^2)
39
Square of standard deviation is
variance
40
What summary stats can be used to describe the center of a distribution of quantitative data?
Mean Median Q1 and Q3
41
What summary stats can be used to describe the variability of a distribution of quantitative data?
Range IQR standard deviation
42
What is the 5 number summary?
``` Max Min Median Q1 Q3 ```
43
How do we use the 5 number summary to make a boxplot?
Use it to split data into quartiles
44
In a skewed right distribution, how does the mean and median compare?
mean > median
45
In a skewed left distribution, how does the mean and median compare?
mean < median
46
In a symmetric distribution, how does the mean and median compare?
mean = median
47
Boxplot
Shows the 5 number summary and outliers Splits the data into quartiles Does not show every individual value Can hid certain features of the shape of a distribution
48
how can we determine if a value in a data set is an outlier
Less than 1.5 IQR below Q1 or more than 1.5 IQR above Q3 2 ore more standard deviations away from the mean
49
Which summary statistics are resitant and whicha re not
Resistant - median or IQR Nonresistant - mean, SD, rnage
50
Which measures of center & variability are best for describing a skewed distribution?
Median | IQR
51
Which measures of center & variability are best for describing a symmerticdistribution?
Mean | SD
52
Low outlier
< Q1 - 1.5 IQR OR < mean - 2SD
53
High outlier
> Q3 + 1.5 IQR OR < mean + 2SD
54
What are important characteristics to discuss when comparing distributions of quantitative data?
``` think SOCS Shape Outlier / Unusual features Center Spread/ Variability ```
55
What is needed for a complete response when comparing distributions of quantitative data?
Address the 4 important characteristics Use comparative words Include context
56
Percentile
Percent of data values less than or equal to a given value
57
How to interpret the percentile
"The value of _____ is at the pth percentile. About (p) percent of the values are less than or equal to _____"
58
Standardized score
Calculated as data value - mean / standard deviation
59
How to interpret the z-score
"The value of _____ is (z-score) standard deviations above or below the mean."
60
Percentiles and z-scores can be calculated for
Distributions with any shape
61
If a number repeats, use the last value of the repeated number to
calculate the percentile. Exp - 2 2 2 2 2 3 4 Use the 5th "2"
62
Normal distribution
Mound-shaped and symmetric Determined by mean and standard deviation
63
Many quantitative variables in the real world can be modeled by
normal distribution
64
Within 1 SD of the mean, about
68% of the data exists
65
Within 2 SD of the mean, about
95% of the data exists
66
Within 3 SD of the mean, about
99.7% of the data exists
67
Empirical rule
68-95-99.7
68
How can we use the z-score to find the percent of data values in a given interval for a normal distribution?
Calculate a z-score and then use Table A
69
How can we use z-score to find the percent of data values in a given interval for a normal distribution?
Left : get area from Table A Right : 1 - area from Table A Between: subtract two areas from Table A
70
How do we find a value, given an area from a normal distribution
Use Table A to find z-score | Set up equation and solve