Chapter 2 Flashcards

1
Q

Data

A

Facts that convey information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two parts of Data

A

Observation and variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

Name for what is being counted, measured, or observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Observations

A

Actual data values observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variation

A

Observations vary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Distribution

A

Pattern summarizing variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Two main types of Quantitative Variables

A

Discrete and Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Two main types of Categorical (Qualitative) Variables

A

Ordinal and Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discrete

A

Possible values belong to a set of distinct numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous

A

Possible values belong to an interval (such as 10-50) and can take on any value in that interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ordinal

A

Ordered categories (education levels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nominal

A

Un-ordered categories (color, marital status)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two notes on Variable types

A

A continuous variable may be simplified into a Categorical one for short

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Bar Chart used for?

A

Displaying Categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bar Chart x-axis

A

Categories or classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bar Chart y-axis

A

Count (frequency) or relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Relative Frequency for sample size n

A

A percent value find by dividing the frequency of a given class by the sample size (A/n, B/n, …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does using Relative Frequency change in a Bar Chart?

A

Only the y-axis quantities to percents; proportionally the chart remains unchanged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a Frequency Histogram used for?

A

Displaying quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Frequency Histogram x-axis

A

Intervals (bins)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Frequency Histogram y-axis

A

Counts (frequency) or relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Sturge’s Rule

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Two main types of Variables

A

Quantitative and Categorical (Qualitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Quantitative

A

Observations which take on numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Categorical (Qualitative)

A

Each observation belongs to any one set of categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Different interval locations

A

Change histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How to determine if Frequency Histogram shows true pattern of variation?

A

Create several histograms and choose one that displays features common to most to analyze

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Four patterns to look for in Frequency Histogram

A
  1. Modality (# of peaks) 2. Symmetry (is it mirrored?) 3. Center (where is it?) 4. Spread (how spread is data?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Outliers

A

Values lying well away from rest of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Modal Bar

A

Bar with height greater than or equal to those adjacent to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Mode

A

Location of modal bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Unimodal

A

Single modal bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Bi-modal

A

Exactly two modal bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Multimodal

A

More than one modal bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Symmetric

A

Bars to left of some point are mirror images of those to right of same point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Skewed

A

Not symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Right skewed

A

Tail extends farther right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Left skewed

A

Tail extends farther left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Symmetric relative to mean and median

A

Both are equal

40
Q

Right skewed relative to mean and median

A

Mean greater than median

41
Q

Left skewed relative to mean and median

A

Mean less than median

42
Q

What causes symmetric unimodal?

A

Homogeneous populations or measurement errors

43
Q

What causes right skewness?

A

Data that is lower bounded but not upper bounded (salaries, age of living adults)

44
Q

What causes left skewness?

A

Data that is upper bounded but not lower bounded (age at death, lifespan)

45
Q

What causes multi-modality?

A

Non-homogeneous populations (male/female mixed heights)

46
Q

What casues short tails?

A

Mixture of streams.

47
Q

Time Series Plot

A

Plots data against time

48
Q

Time Series Plot x-axis

A

Time or order

49
Q

Time Series Plot y-axis

A

Observed value at given times

50
Q

Stationary process

A

A data-generating mechanism in which the variation of data does not change over time (graph averages out to a horizontal line)

51
Q

Stratified Plot

A

Data broken into groups (strata) and distributions are compared

52
Q

What is a Stratified Plot used for?

A

Comparing data from stationary processes

53
Q

What to look for in a Stratified Plot

A

Spread within strata and differences in centers between different strata

54
Q

Within-Variation

A

Variation or spread within strata

55
Q

Between-Variation

A

Differences in center location between strata

56
Q

Noise

A

Random variation caused by poor supervision or training

57
Q

How to reduce Noise?

A

Change in process

58
Q

Bias

A

Systematic variation caused by improper machines or tools

59
Q

How to reduce Bias?

A

Correct mistakes

60
Q

Descriptive Statistics

A

Numerical summaries reflecting important characteristics of a data set

61
Q

Three Types of Numerical Summaries

A
  1. Measures of center 2. Measures of spread 3. Measures of location
62
Q

Measures of Center

A

Mean and Median

63
Q

Measures of Spread

A

Range, Variance, Standard Deviation, IQR

64
Q

Measures of Location

A

Percentiles and Quartiles

65
Q

Mean

A

Sum of observations divided by number of observations

66
Q

Median

A

Halfway point of ordered observation values

67
Q

Median if odd

A

Middle observation (n+1)/2 th of n observations

68
Q

Median if even

A

Average of two middle observations n/2 th and (n/2 + 1)th of n observations

69
Q

Robust

A

Median is more robust than mean

70
Q

Mean relative to tails

A

Mean always dragged toward the longer tail in observation

71
Q

Range

A

Max value minus min value

72
Q

Variance

A

Deviation of each observation from the mean combined into a single number reflecting overall spread of data

73
Q

Formula for Variance

A
74
Q

Standard Deviation

A

Square root of average squared distance from mean.

75
Q

Empirical Rule

A

For distributions that are bell shaped and approximately symmetric, of all observations: 68% fall within 1 SD from mean (mean - s , mean + s) 95% fall within 2 SD from mean (mean - 2s, mean + 2s) 99.7% fall within 3 SD from mean (mean - 3s, mean + 3s)

76
Q

The pth Percentile

A

A value such that p% of the observations fall below it

77
Q

Median as a Percentile/Quartile

A

50th percentile or Q2 (second quartile)

78
Q

First Second Third Quartile

A

First - 25% of observations fall below it Second - 50% of observations below/above (median) Third - 75% of observations below

79
Q

Finding first quartile

A

Find the median arranging data in increasing order, then find the median of the first half of the data (median excluded)

80
Q

Finding third quartile

A

Find the median arranging data in increasing order, then find the median of the second half of the data (median excluded)

81
Q

Interquartile range (IQR)

A

Range of middle 50% of data

82
Q

Finding IQR

A

IQR = Q3 - Q1

83
Q

Mathematical determination of a Potential Outlier

A

Falls more than 1.5(IQR) below first quartile or 1.5(IQR) above third quartile

84
Q

Five Number Summaries

A
  1. Minimum value 2. First Quartile 3. Mean/Second Quartile 4. Third Quartile 5. Maximum value
85
Q

Box-Plot

A

Graphical display using Five Number Summaries

86
Q

Four features of a Box-Plot

A
  1. Box goes from Q1 to Q3 and contains central 50% 2. Line inside box marks median 3. Lines extend from box to encompass remaining data for potential outliers 4. Outliers shown separately using another symbol
87
Q

Whiskers

A

Lines extending from box that indicate area for potential outliers

88
Q

Box-Plot skewness

A

Side with larger part of box and longer whisker usually has skew in that direction

89
Q

Box-Plot Modality

A

Good for unimodal but NOT multimodal data

90
Q

Resistant

A

Measures that are not seriously affected by outliers

91
Q

Measures that are resistant

A

Median, mode, IQR

92
Q

Measures that are not resistant

A

Mean, standard deviation, range

93
Q

Z-Score

A

measure that specifies number of standard deviations observation falls from mean

94
Q

Z-Score formula

A
95
Q

Quartiles and shape of the distribution

A

If distance between Q1 and Q2 is greater than that between Q2 and Q3, data is left skewed and vice versa