elementary statistics vocabulary CH 1-3 Flashcards

1
Q

CH 1-3

Data

A

Collections of observations (such as measurements, genders, survey responses).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CH 1-3

Statistics

A

The science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CH 1-3

Population

A

The complete collection of all individuals (scores, people, measurements, and so on) to be studied. The collection is complete in the sense that it includes all of the individuals to be studied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CH 1-3

Census

A

The collection of data from every member of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

CH 1-3

Sample

A

A subcollection of members selected from a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

CH 1-3

Collection of sample data

A

Sample data must be collected in an appropriate way, such as through a process of random selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

CH 1-3

Inappropriate collection of sample data

A

If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

CH 1-3

Statistical thinking - factors

A
  1. Context of the data
  2. Source of the data
  3. Sampling method
  4. Conclusions
  5. Practical implications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

CH 1-3

Practical implications

A

Statistical significance vs. practical significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CH 1-3

Parameter

A

A numerical measurement describing some characteristic of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

CH 1-3

Statistic

A

A numerical measurement describing some characteristic of a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CH 1-3

Quantitative (numerical) data

A

Numbers representing counts or measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CH 1-3

Categorical (qualitative, attribute) data

A

Names or labels that are not numbers representing counts or measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CH 1-3

Discrete data

A

Result when the number of possible values is either a finite number or a “countable” number. (That is, the number of possible values is 0 or 1 or 2, and so on.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

CH 1-3

Continuous (numerical) data

A

Result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CH 1-3

Nominal level of measurement

A

Is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

CH 1-3

Ordinal level of measurement

A

Data can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

CH 1-3

Interval level of measurement

A

Is like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, data at this level do not have a natural zero staring point (where none of the quantity is present).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

CH 1-3

Ratio level of measurement

A

The interval level with the additional property that there is also a natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and ratios are both meaningful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

CH 1-3

Voluntary response sample (self-selected sample)

A

One in which the respondents themselves decide whether to be included.

Cannot be used for making conclusions about a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

CH 1-3

Correlation

A

A statistical association between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

CH 1-3

Causality

A

The dependence of one variable upon another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

CH 1-3

Correlation caveat

A

Correlation does not imply causality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

CH 1-3

Observational study

A

Subjects are observed and specific characteristics are measured, but there is no attempt to modify the subjects being studied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

CH 1-3

Experiment

A

Some treatment is applied to the subjects (experimental units), and its effects upon them are observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

CH 1-3

Experimental units

A

The subjects of an experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

CH 1-3

Simple random sample

A

A sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

CH 1-3

Random sample

A

Each member of the population has an equal chance of being selected. Computers are often used to generate random samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

CH 1-3

Probability sample

A

Involves selecting members of a population in such a way that each member of the population has a known (but not necessarily the same) chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

CH 1-3

Systematic sample

A

Select some starting point, then select every kth (such as every 50th) element in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

CH 1-3

Convenience sampling

A

Use results that are easy to get.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

CH 1-3

Stratified sampling

A

Subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender or age bracket), then draw a sample from each subgroup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

CH 1-3

Cluster sampling

A

Divide the population into sections (or clusters), then randomly select some of those clusters, and then choose all members from those selected clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

CH 1-3

Multistage sampling

A

Uses come combination of the basic sampling mathods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

CH 1-3

Multistage sample design

A

Pollsters select a sample in different stages, and each stage might use different sampling methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

CH 1-3

Cross-sectional study

A

Data are observed, measured, and collected at one point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

CH 1-3

Retrospective (case-control) study

A

Data are collected from the past by going back in time (through examination of records, interviews, and so on).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

CH 1-3

Prospective (longitudinal, cohort) study

A

Data are collected in the future from groups sharing common factors (called cohorts).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

CH 1-3

Cohort

A

A group sharing common factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

CH 1-3

Randomization

A

The assigning of subjects to different groups through a process of random selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

CH 1-3

Replication

A

The repetition of an experiment on more than one subject.

Alternately, replication refers to the repetition or duplication of an experiment so that results can be confirmed or verified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

CH 1-3

Blinding

A

A technique in which the subject doesn’t know whether he or she is receiving a treatment or a placebo.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

CH 1-3

Placebo effect

A

Occurs when an untreated subject reports an improvement in symptoms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

CH 1-3

Double-blind

A

Blinding occurs at two levels: (1) the subject doesn’t know whether he or she is receiving the treatment or a placebo, and (2) the dispenser of the treatment doesn’t know either.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

CH 1-3

Confounding

A

Occurs in an experiment when the experimenter cannot distinguish among the effects of various factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

CH 1-3

Completely randomized experimental design

A

Assign subjects to different treatment groups through a process of random selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

CH 1-3

Randomized block design

A

If testing one or more different treatments with different blocks:

(1) Form blocks (or groups) of subjects with similar characteristics.
(2) Randomly assign treatments to the subjects within each block.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

CH 1-3

Block

A

A group of subjects that are similar, but where the groups differ in ways that might affect the outcome of the experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

CH 1-3

Rigorously controlled design

A

Carefully assign subjects to different treatment groups, so that those given each treatment are similar in the ways that are important to the experiment.

Extremely difficult to implement due to possible lack of consideration of all relevant factors.

50
Q

CH 1-3

Matched pairs design

A

Compare exactly two treatment groups by using subjects matched in pairs that are somehow related or have similar characteristics.

The matched pairs may also consist of before and after measurements.

51
Q

CH 1-3

Sampling error

A

The difference between a sample result and the true population result, resulting from chance sample fluctuations.

52
Q

CH 1-3

Nonsampling error

A

Occurs when the sample data are incorrectly collected, recorded, or analyzed.

53
Q

CH 1-3

Characteristics of data

A

CVDOT

  1. Center
  2. Variation
  3. Distribution
  4. Outliers
  5. Time
54
Q

CH 1-3

Center

A

A representative or average value that indicates where the middle of the data set is located.

55
Q

CH 1-3

Variation

A

A measure of the amount that the data values vary.

56
Q

CH 1-3

Distribution

A

The nature or shape of the spread of the data over the range of values (such as bell-shaped, uniform, or skewed).

57
Q

CH 1-3

Outliers

A

Sample values that lie very far away from the vast majority of the other sample values.

58
Q

CH 1-3

Time

A

Changing characteristics of the data over time.

59
Q

CH 1-3

Frequency distribution (frequency table)

A

Shows how a data set is partitioned among all of several categories (ro classes) by listing all of the categories along with the number of data values in each of the categories.

60
Q

CH 1-3

Frequency

A

The number of original values within a particular class.

61
Q

CH 1-3

Lower class limits

A

The smallest numbers that can belong to the different classes.

62
Q

CH 1-3

Upper class limits

A

The largest numbers that can belong to the different classes.

63
Q

CH 1-3

Class boundaries

A

The numbers used to separate classes, but without the gaps caused by class limits.

64
Q

CH 1-3

Class midpoints

A

The values in the middle of the classes.

65
Q

CH 1-3

Class width

A

The difference between two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution.

66
Q

CH 1-3

Relative frequency distribution (percentage frequency distribution)

A

The frequency of a class is replaced with a relative frequency (a proportion) or a relative frequency (a percent),

67
Q

CH 1-3

Sum of relative frequencies

A

The sum of the relative frequencies in a relative frequency distribution must be close to 1 (or 100%).

68
Q

CH 1-3

Cumulative frequency

A

The sum of the frequencies for that class and all previous classes.

69
Q

CH 1-3

Normal frequency distribution

A
  1. The frequencies start low, then increase to 1 or 2 high frequencies, then decrease to a low frequency.
  2. The distribution is approximately symmetric, with frequencies preceding that maximum being roughly a mirror image of those that follow the maximum.
70
Q

CH 1-3

Histogram

A

A graph consisting of bars of equal width drawn adjacent to each other (without gaps).

  • The horizontal scale represents classes of quantitative data values.
  • The vertical scale represents frequencies.
  • The heights of the bars correspond to the frequency values.
71
Q

CH 1-3

Relative frequency histogram

A
  • Same shape and horizontal scale as a histogram

* The vertical scale is marked with relative frequencies, as percentages or proportions, instead of actual frequencies.

72
Q

CH 1-3

Frequency polygon

A

Uses line segments connected to points located directly above class midpoint values.

73
Q

CH 1-3

Relative frequency polygon

A

Use relative frequencies, either proportions or percentages, for the vertical scale.

74
Q

CH 1-3

Ogive

A

A line graph that depicts cumulative frequencies.

75
Q

CH 1-3

Dotplot

A

A graph in which each data value is plotted as a point (or dot) along a scale of values. Dots representing equal values are stacked.

76
Q

CH 1-3

Stemplot (stem-or-leaf plot)

A

Represents quantitive data by separating each values into 2 parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit).

77
Q

CH 1-3

Bar graph

A

Uses bars of equal width to show frequencies of categories of qualitative data.

  • Vertical scale represents frequencies or relative frequencies.
  • Horizontal scale identifies the different categories of qualitative data.
  • Bars may or may not be separated by small gaps.
78
Q

CH 1-3

Multiple bar graph

A

Has 2 or more sets of bars, and is used to compare 2 or more data sets.

79
Q

CH 1-3

Pareto chart

A

A bar graph for qualitative data, with the added stipulation that the bars are arranged in descending order according to frequencies.

  • Vertical scale - frequencies or relative frequencies.
  • Horizontal scale - different categories of qualitative data.
80
Q

CH 1-3

Pie chart

A

A graph that depicts qualitative data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category.

81
Q

CH 1-3

Scatterplot (scatter diagram)

A

A plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis.

  • Horizontal axis - first (x) value
  • Vertical axis - second (y) value
82
Q

CH 1-3

Time-series graph

A

A graph of time-series data, which are quantitative data that have been collected at different points in time.

83
Q

CH 1-3

Descriptive statistics

A

Summarize or describe relevant characteristics of data.

84
Q

CH 1-3

Inferential statistics

A

Used to make inferences, or generalizations, about a population.

85
Q

CH 1-3

Mean (arithmetic mean)

A

The measure of a data set’s center, found by adding the data values and dividing by the number of data values.

86
Q

CH 1-3

Sample size

A

The number of data values.

87
Q

CH 1-3

Measure of center

A

A value at the center or middle of a data set.

88
Q

CH 1-3

Median

A

The measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

89
Q

CH 1-3

Mode

A

The value that occurs with the greatest frequency.

90
Q

CH 1-3

Bimodal

A

A data set with 2 data values that occur with the same greatest frequency.

91
Q

CH 1-3

Multimodal

A

A data set with more than 2 data values which occur with the same greatest frequency.

92
Q

CH 1-3

No mode

A

A data set has no mode when no data value is repeated.

93
Q

CH 1-3

Midrange

A

The measure of center that is the value midway between the maximum and minimum values in the original data set.

94
Q

CH 1-3

Round-off rule, mean, median and midrange

A

Carry one more decimal place than is present in the original set of values.

95
Q

CH 1-3

Weighted mean

A

The mean calculated when data values are assigned different weights.

96
Q

CH 1-3

Skewed

A

A distribution of data is skewed if it is not symmetric and extends more to one side than to the other.

97
Q

CH 1-3

Symmetric

A

A distribution of data is symmetric if the left half of its histogram is roughly a mirror image of its right half.

98
Q

CH 1-3

Negatively skewed (skewed to the left)

A

Data with a longer tail, and a mean and median to the left of the mode.

99
Q

CH 1-3

Positively skewed (skewed to the right)

A

Data with a longer right tail, and a mean and median to the right of the mode.

100
Q

CH 1-3

Range

A

The difference between the maximum data value and the minimum data value.

101
Q

CH 1-3

Standard deviation

A

A measure of variation of data values about the mean.

102
Q

CH 1-3

Variance

A

A measure of variation equal to the square of the standard deviation.

Sample variance is an unbiased estimator of the population variance.

103
Q

CH 1-3

Unbiased

A

The sample variance tends to target the population variance instead of systematically over- or underestimating it.

104
Q

CH 1-3

Range rule of thumb

A

For many data sets, the vast majority (~95%) of sample values lie within 2 standard deviations of the mean.

105
Q

CH 1-3

Empirical rule

A

For data sets having a distribution that is approximately bell-shaped:

  • 60% (1 SD)
  • 95% (2 SD)
  • 99.7% (3 SD)
106
Q

CH 1-3

Chebyshev’s Theorem

A

The proportion (or fraction) of any data set lying within K SDs of the mean is always at least 1 - 1/K2, where K is positive and greater than 1.

107
Q

CH 1-3

Mean absolute deviation (MAD)

A

The mean distance of data from the mean.

108
Q

CH 1-3

Coefficient of variation (CV)

A

Describes the standard deviation relative to the mean, expressed as a percent.

109
Q

CH 1-3

z score (standardized value)

A

The number of standard deviations that a given value is above or below the mean.

  • Ordinary: z score between -2 and 2
  • Unusual: z score less than -2 or greater than 2
110
Q

CH 1-3

Percentile

A

A measure of location dividing a set of data into 100 groups with about 1% of the values in each group.

111
Q

CH 1-3

Quartile

A

A measure of location, denoted Q1, Q2, and Q3, which divide a set of data into 4 groups with about 25% of the values in each group.

112
Q

CH 1-3

Q1 (first quartile)

A

Separates the bottom 25% of the sorted values from the top 75%.

At least 25% of the sorted values are less than or equal to Q 1 , and at least 75% of the values are greater than or equal to Q 1 .

113
Q

CH 1-3

Q2 (second quartile)

A

Same as the median; separates the bottom 50% of the sorted values from the top 50%.

114
Q

CH 1-3

Q3 (third quartile)

A

Separates the bottom 75% of the sorted values from the top 25%.

At least 75% of the sorted values are less than or equal to Q 3 , and at least 25% of the values are greater than or equal to Q 3 .

115
Q

CH 1-3

Interquartile range (IQR)

A

Q3 - Q1

116
Q

CH 1-3

Semi-quartile range

A

(Q3 - Q1) / 2

117
Q

CH 1-3

Midquartile

A

(Q3 + Q1) / 2

118
Q

CH 1-3

10-90 percentile range

A

P90 - P10

119
Q

CH 1-3

5-number summary

A
  • Minimum
  • Q 1 (first quartile)
  • Q 2 (median)
  • Q 3 (third quartile)
  • Maximum
120
Q

CH 1-3

Boxplot (box-and-whisker diagram)

A

A graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile Q1, the median, and the third quartile Q3.