Chapter 1: Data Analysis Flashcards

1
Q

Define Categorical Variable

A

assigns labels to place individuals into categories (usually non-numeric)

ex. hair color, eye color, zip-code, Birthday, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Quantitative Variable

A

number values that can be used in calculations (numbers)

ex. height, weight, number of siblings, how long it takes, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to distinguish between Categorical and Quantitative Variables?

A

Ask = does it make sense to find the average of the data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Discrete Quantitative Variable

A

fixed set of values with gaps in between (dots)

ex. Number of pets you have

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Continuous Quantitative Variable

A

any value on an interval (line segment)

ex. how long it takes to finish homework

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to distinguish between Categorical and Quantitative Variables?

A

Ask = does it make sense to have 0.5 of something? ==> yes for height, but no for number of pets you have.

measurements are usually continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a distribution?

A

values a variable takes and how often it takes those values.

ex. in a die roll,
[1, 2, 3, 4, 5, 6] –> “values a variable takes”
[got 2 two times, 4 one times, and 3 two times] –> “how often it takes those values”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Frequency Table

A

“bare bone”
how many each values were taken

x. 1. 2. 3.
freq. 3. 6. 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Relative Frequency Table

A

“more info”
percentage of the frequency table

x. 1. 2. 3.
rel. freq. 3/20. 6/20. 4/20.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is it better to use relative frequency?

A

when comparing groups of different sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

marginal relative frequency

A

turns total into relative frequency

B/C –> interpretation: proportion of all who are ___________ is ____#____.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

joint relative frequency

A

A/C –> interpretation: the proportion of all who are __________ and __________ is ____#____.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

conditional relative frequency

A

A/B –> interpretation: the proportion of __________s who are __________ is ____#____.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does it mean for two variables to have an association?

A

if you know 1 variable, it helps you predict the other variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Graphs of Categorical Variables

A
  1. Side-by-side bar graph
  2. segmented bar graph
  3. mosaic plot
  4. pie chart
  5. pictograph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Graphs of Quantitative Variables

A
  1. histogram
  2. stem-and-leaf plot (KEY AND LABEL)
  3. dot plot
  4. box plot
  5. scatter plot
  6. ogive
17
Q

What does it mean for two variables to not have an association?

A

knowing one or the other would not help you predict the other variable.

18
Q

Distribution shapes names

A
  1. symmetric
  2. skewed left/right
  3. unimodal (single-peaked)
  4. bimodal (double-peaked)
  5. uniform (no peaks)
19
Q

DESCRIBING A DISTRIBUTION

A

SOCS + context.
Use comparative language

  • Shape
  • Outliers
  • Center
  • Spread
20
Q

Define statistic

A

a number that was calculated from a SAMPLE

21
Q

Define parameter

A

a number that was calculated from a POPULATION

22
Q

Define resistant measure (with examples)

A

one that outliers/extreme value won’t affect

Resistant measure: Median, IQR
Not a Resistant measure: Mean, Range, Standard Deviation, Variance

23
Q

Define Range and explain its problem

A

Max-Min

  1. not a resistant measure
  2. ignores all values in the data set except the max and min
24
Q

Define Standard Deviation

A

How far, on average, the values of the distribution are from the mean (average distance data is from the mean)
The equation of S.D. for parameters and statistics are different. MEMORIZE or know where to find it from the equation sheet

25
Properties of Standard Deviation
1. Sx ≥ 0 (0 = no variability; all data are the same) 2. large values of Sx = more variability 3. Sx is not a resistant measure 4. Sx measures variation about the mean.
26
Interpretation of Standard Deviation
The __context__ typically varies by __S.D.__ from the mean of __mean__. ex. The time spect on homework typically varies by 34.7 minutes from the mean of 41 minutes.
27
Define Quartiles
divide the data into four groups with roughly the same # of values in each fourth [Min., Q1, Q2 (median), Q3, Max.]
28
Define Interquartile Range (IQR)
IQR = Q3-Q1 - resistant measure
29
Define IQR Dance
boundaries to determine outliers - Q1 - 1.5*(IQR) - Q3 + 1.5*(IQR)
30
Define Five-Number Summary & how is it displayed
[Minimum, Q1, Q2 (median), Q3, Maximum] displayed with a box plot
31
strength and weakness of boxplots
strength = can know how spread the data are weakness = cannot see peaks or gaps in the data
32
how to measure center and spread of the data depending on the shape of the distribution
Symmetric data without outliers - mean and standard deviation Skewed data with outliers - median and IQR
33
how to estimate mean of histogram or dotplot
balancing point of the histogram/dotplot if made of solid material
34
Skewness depending on the shape of the distribution
symmetric : mean=median skewed right/high outlier : mean > median skewed left/low outlier : mean < median
35
strength and weaknesses for every graphs