Chapter 1 Vocab Flashcards

1
Q

Data Analysis

A

Organizing, displaying, summarizing and asking questions about a certain topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Individuals

A

Objects described by a set of data. Individuals may be people, animals or things.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

A characteristic of an individual. A variable can take different values for different individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical Variable

A

Places an individual into one of several groups or categories. (Variables that take on values that are names or labels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative Variable

A

Variables that have are measured on a numeric or quantitative scale. (Takes numerical values for which it makes sense to find an average).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Distribution

A

The pattern of variation of a variable. Tells us what value a variable takes and how often it takes it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inference

A

Draws conclusions that go beyond the data at hand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Frequency Table

A

Displays the counts (frequencies) of stations in each format category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Relative Frequency Table

A

Shows the percent (relative frequencies) of stations in each format category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Roundoff Error

A

The difference between an approximation of a number used in computation and its exact value (the difference between its number value and the percentage it represents)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pie Chart

A

Show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bar Graph

A

Represent each category as a bar. The bar heights show the category counts or percents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two-Way Tables

A

The observed number or frequency for two variables, the rows indicating one category and the columns indicating the other category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Marginal Distribution

A

(Of one of the categorical values in a two-way table) is the distribution of values of that variable among all individuals described by the table. Essentially the row and column totals in a two-way table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Conditional Distribution

A

Describes the values of a variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable. Also known as Ditional Distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Side by Side Bar Graph

A

A bar graph representing 2 separate categorical values, of which are represented separately across the x-axis by different colors and are placed next to each other, making it easier to read between the two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Segmented Bar Graph

A

Used for grouping or categorizing the parts of a whole. The bars in this chart are categorized into stacking order to represent different values. The bar segments within a category bar are placed on top of each other. Different colors will show distinctive parts of the whole bar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Association

A

The relation that two variables share. The term “association” is used between two variables when knowing the value of one variable helps predict the other variable’s value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Dotplot

A

A graph that displays quantitative data by showing each data value as a dot above its location on the number line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Overall Pattern

A

Describes the distribution by the shape, center and spread of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Departures

A

An individual value that falls outside the overall pattern.

22
Q

Center

A

Center is the median and/or mean of the data.

23
Q

Spread

A

The spread is the range of the data.

24
Q

Shape

A

The shape describes the type of graph. The four ways to describe shape are whether it is symmetric, how many peaks it has, if it is skewed to the left or right, and whether it is uniform.

25
Q

Outlier

A

A data point that differs significantly from other observations.

26
Q

Mode

A

The most common value of a data set.

27
Q

Symmetric Distribution

A

When the right and left sides of the graph are approx. mirror images of each other.

28
Q

Right-Skewed Graph

A

Where the tail of the graph is on the right side, if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side.

29
Q

Left-Skewed Graph

A

Where the tail of the graph is on the left side, if the left side of the graph (containing the half of the observations with larger values) is much longer than the right side.

30
Q

Unimodal Dot Plot

A

A dot plot in which there is only one peak

31
Q

Bimodal Dot Plot

A

A dot plot in which there are two peaks

32
Q

Stemplot

A

A plot where each data value is split into a “leaf” (usually the last digit) and a “stem” (the other digits). Stems are written in a vertical column with the smallest at the top to the largest at the bottom, where no stem is skipped, even if there is no data value. A vertical line is written at the right of this column, for the “leaf” is to be written to the right of the line. The “leaf” on the right side are arranged in numerical order, increasing in number from the stem. (Ex: [5|2 4] represents the values of 52 and 54).

33
Q

Stem

A

All digits but the final (ones) digit

34
Q

Leaf

A

The final (ones) digit

35
Q

Splitting Stems

A

A method used to more accurately represent data using a stemplot, therefore making it easier to identify the shape of the plot. Separates the “leaf” values from 0-4 and 5-9 on separate stems of the same value.

36
Q

Back-to-Back Stemplot

A

A stemplot used for representing two sets of categorical data. This is done by representing one set of data’s “leaf” values on the left from the stem, and one set of data on the right.

37
Q

Histogram

A

A diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. Displays the distribution of a quantitative variable.

38
Q

Mean

A

The average of the data set. Also known as “x bar”(x̅), and represented by the letter x with a horizontal line above it. Found by adding all of the data points and dividing by the amount of points that were added. Formula in compact notation is x̅ = ∑xi / n

39
Q

Resistant Measure of Center

A

Identifies if the measure of center is affected by outliers. If it is affected, it isn’t a resistant measure of center. If it isn’t affected, it is a resistant measure of center.

40
Q

Median

A

The midpoint of a distribution, the number such that about half the observations are smaller and about half are larger. Arrange all observations in order of size, from smallest to largest. If the number of observations n is odd, the median is the center observation in the ordered list. If the number of observations n is even, the median is the average of the two center observations in the ordered list.

41
Q

Range

A

Shows the full spread of the data. Calculated by subtracting the smallest number in the data set from the largest number in the data set. Could be less accurate due to outliers.

42
Q

First Quartile

A

The median of the observations that are to the left of the median in the ordered list.

43
Q

Third Quartile

A

The median of the observations that are to the right of the median in the ordered list.

44
Q

Interquartile Range

A

(Q3-Q1) Is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles.

45
Q

1.5 Outlier Rule

A

This rule uses the first and third quartile values as well as the IQR to calculate outliers in the data set. The rule is: if a data point is less than (Q1- 1.5 x IQR) or more than (Q3+ 1.5 x IQR), it is an outlier.

46
Q

Five Number Summary

A

A set of 5 numbers that show the overall spread and diversity of a data set. The 5 numbers are: the minimum data point, the first quartile, the median, the third quartile, and the maximum data point (in that order).

47
Q

Boxplot

A

A graph that is formed by the 5 number summary, creating a visual on a number line that shows the quarters of the data set. A boxplot is arranged above a number line, with a central box drawn from the first quartile (Q1) to the third quartile (Q3), and a line inside the box to mark the median. Lines (called whiskers) extend from the box out to the smallest and largest observations that are not outliers. Outliers are marked with a special symbol like an asterisk.

48
Q

Deviation

A

The distance a data point is from the mean of the set. (xi-x̄)

49
Q

Variance

A

The expectation of the squared deviation of a random variable from its mean. The average of the squared deviations. Formula: (S2 = ∑(xi-x̄)/n-1) Variance is also represented by (s2x).

50
Q

Standard Deviation

A

The “typical” distance of the values in the data set from the mean. Formula: (Standard Deviation = √variance) Standard deviation is also represented by (sx).