Chapter 2 Flashcards

1
Q

Data

A

Facts that convey information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two parts of Data

A

Observation and variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

Name for what is being counted, measured, or observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Observations

A

Actual data values observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variation

A

Observations vary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Distribution

A

Pattern summarizing variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Two main types of Quantitative Variables

A

Discrete and Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Two main types of Categorical (Qualitative) Variables

A

Ordinal and Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discrete

A

Possible values belong to a set of distinct numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous

A

Possible values belong to an interval (such as 10-50) and can take on any value in that interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ordinal

A

Ordered categories (education levels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nominal

A

Un-ordered categories (color, marital status)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two notes on Variable types

A

A continuous variable may be simplified into a Categorical one for short

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Bar Chart used for?

A

Displaying Categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bar Chart x-axis

A

Categories or classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bar Chart y-axis

A

Count (frequency) or relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Relative Frequency for sample size n

A

A percent value find by dividing the frequency of a given class by the sample size (A/n, B/n, …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does using Relative Frequency change in a Bar Chart?

A

Only the y-axis quantities to percents; proportionally the chart remains unchanged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a Frequency Histogram used for?

A

Displaying quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Frequency Histogram x-axis

A

Intervals (bins)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Frequency Histogram y-axis

A

Counts (frequency) or relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Sturge’s Rule

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Two main types of Variables

A

Quantitative and Categorical (Qualitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Quantitative

A

Observations which take on numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Categorical (Qualitative)
Each observation belongs to any one set of categories
26
Different interval locations
Change histogram
27
How to determine if Frequency Histogram shows true pattern of variation?
Create several histograms and choose one that displays features common to most to analyze
28
Four patterns to look for in Frequency Histogram
1. Modality (# of peaks) 2. Symmetry (is it mirrored?) 3. Center (where is it?) 4. Spread (how spread is data?)
29
Outliers
Values lying well away from rest of data
30
Modal Bar
Bar with height greater than or equal to those adjacent to it
31
Mode
Location of modal bar
32
Unimodal
Single modal bar
33
Bi-modal
Exactly two modal bars
34
Multimodal
More than one modal bar
35
Symmetric
Bars to left of some point are mirror images of those to right of same point
36
Skewed
Not symmetric
37
Right skewed
Tail extends farther right
38
Left skewed
Tail extends farther left
39
Symmetric relative to mean and median
Both are equal
40
Right skewed relative to mean and median
Mean greater than median
41
Left skewed relative to mean and median
Mean less than median
42
What causes symmetric unimodal?
Homogeneous populations or measurement errors
43
What causes right skewness?
Data that is lower bounded but not upper bounded (salaries, age of living adults)
44
What causes left skewness?
Data that is upper bounded but not lower bounded (age at death, lifespan)
45
What causes multi-modality?
Non-homogeneous populations (male/female mixed heights)
46
What casues short tails?
Mixture of streams.
47
Time Series Plot
Plots data against time
48
Time Series Plot x-axis
Time or order
49
Time Series Plot y-axis
Observed value at given times
50
Stationary process
A data-generating mechanism in which the variation of data does not change over time (graph averages out to a horizontal line)
51
Stratified Plot
Data broken into groups (strata) and distributions are compared
52
What is a Stratified Plot used for?
Comparing data from stationary processes
53
What to look for in a Stratified Plot
Spread within strata and differences in centers between different strata
54
Within-Variation
Variation or spread within strata
55
Between-Variation
Differences in center location between strata
56
Noise
Random variation caused by poor supervision or training
57
How to reduce Noise?
Change in process
58
Bias
Systematic variation caused by improper machines or tools
59
How to reduce Bias?
Correct mistakes
60
Descriptive Statistics
Numerical summaries reflecting important characteristics of a data set
61
Three Types of Numerical Summaries
1. Measures of center 2. Measures of spread 3. Measures of location
62
Measures of Center
Mean and Median
63
Measures of Spread
Range, Variance, Standard Deviation, IQR
64
Measures of Location
Percentiles and Quartiles
65
Mean
Sum of observations divided by number of observations
66
Median
Halfway point of ordered observation values
67
Median if odd
Middle observation (n+1)/2 th of n observations
68
Median if even
Average of two middle observations n/2 th and (n/2 + 1)th of n observations
69
Robust
Median is more robust than mean
70
Mean relative to tails
Mean always dragged toward the longer tail in observation
71
Range
Max value minus min value
72
Variance
Deviation of each observation from the mean combined into a single number reflecting overall spread of data
73
Formula for Variance
74
Standard Deviation
Square root of average squared distance from mean.
75
Empirical Rule
For distributions that are bell shaped and approximately symmetric, of all observations: 68% fall within 1 SD from mean (mean - s , mean + s) 95% fall within 2 SD from mean (mean - 2s, mean + 2s) 99.7% fall within 3 SD from mean (mean - 3s, mean + 3s)
76
The pth Percentile
A value such that p% of the observations fall below it
77
Median as a Percentile/Quartile
50th percentile or Q2 (second quartile)
78
First Second Third Quartile
First - 25% of observations fall below it Second - 50% of observations below/above (median) Third - 75% of observations below
79
Finding first quartile
Find the median arranging data in increasing order, then find the median of the first half of the data (median excluded)
80
Finding third quartile
Find the median arranging data in increasing order, then find the median of the second half of the data (median excluded)
81
Interquartile range (IQR)
Range of middle 50% of data
82
Finding IQR
IQR = Q3 - Q1
83
Mathematical determination of a Potential Outlier
Falls more than 1.5(IQR) below first quartile or 1.5(IQR) above third quartile
84
Five Number Summaries
1. Minimum value 2. First Quartile 3. Mean/Second Quartile 4. Third Quartile 5. Maximum value
85
Box-Plot
Graphical display using Five Number Summaries
86
Four features of a Box-Plot
1. Box goes from Q1 to Q3 and contains central 50% 2. Line inside box marks median 3. Lines extend from box to encompass remaining data for potential outliers 4. Outliers shown separately using another symbol
87
Whiskers
Lines extending from box that indicate area for potential outliers
88
Box-Plot skewness
Side with larger part of box and longer whisker usually has skew in that direction
89
Box-Plot Modality
Good for unimodal but NOT multimodal data
90
Resistant
Measures that are not seriously affected by outliers
91
Measures that are resistant
Median, mode, IQR
92
Measures that are not resistant
Mean, standard deviation, range
93
Z-Score
measure that specifies number of standard deviations observation falls from mean
94
Z-Score formula
95
Quartiles and shape of the distribution
If distance between Q1 and Q2 is greater than that between Q2 and Q3, data is left skewed and vice versa