Chap I Flashcards

1
Q

Individuals

A

The objects described by a set of data - can be people, animals, or things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variable

A

Any characteristic of an individual - can take different values for different individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical Variable

A

places in individual into one of several groups or categories - values are names or labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative Variable

A

takes numerical values for which it makes sense to find an average - represent a measurable quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete Variables

A

A variable that cannot take on any value between its minimum and maximum value - for example, when flipping a coin, the number of heads can be any integer value between 0 and plus infinity, but could not be any value because you could not get 2.5 heads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous Variable

A

A variable that can take on any value between its minimum and maximum value - for example, the weight of a firefighter between 150-250 pounds, because the firefighter’s weight could be any value between 150-250 pounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Univariate Data

A

A study that looks at only one variable - e.g. a study that looks at the weight of high school students

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bivariate Data

A

A study that examines the relationship between two variables - e.g. a study looking at the relationship between the height and weight of high school students.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Population

A

The total set of observations that can be made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample

A

A set of observations drawn from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Census

A

A study that obtains data from every member of a population - often no practical because of time/cost involved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

Tell us what values the variable takes and how often it takes those values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Inference

A

Drawing conclusions that go beyond the data at hand, though it depends on how the data is produced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Frequency Table

A

Displays counts (frequencies) of x variable in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relative Frequency Table

A

Displays percentages (relative frequencies) of x variable in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interquartile Range (IQR)

A

Measures of the range of the middle 50% of the data - measure of variability, equal to Q3 - Q1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Five-Number Summary

A

Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation & divides each distribution roughly into quarters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Boxplot

A

A type of graph used to display patterns of quantitative data & splits the data into quartiles, consisting of a box the size of the Q1 & Q3, with a line in the middle representing the median and lines, or whiskers, extending from the box to the largest and smallest observations that aren’t outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Standard Deviation

A

A numerical value used to indicate how widely individuals in a group vary - measures the deviation from the mean and differs based upon population or a sample. Standard deviation for a population is found using σ = sqrt [ Σ ( Xi - X )2 / N ] and standard deviation for a sample is found using s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Variance

A

A numerical value used to indicate how widely objects in a group vary and is equal to the square of standard deviation. Variance of a population is found using σ2 = Σ ( Xi - X )2 / N & variance of a sample is found using s2 = Σ ( xi - x )2 / ( n - 1 )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Roundoff Error

A

When the exact percentages add up to 100%, but the rounded percentages only come close - does not indicate mistakes in work

22
Q

Pie Chart

A

Shows distribution of categorical variable as a pie, with slices sized by count or percentage per category - must have all categories

23
Q

Bar chart

A

Represent each category as a bar, where heights show the count or percentage - can be more flexible than a pie chart and display the distribution of categorical variables or compare quartiles.

24
Q

Two-Way Table

A

Examines relationships between categorical variables - contains a row variable and a column variable

25
Q

Marginal Distribution

A

The distribution of values of one of the categorical variables in a two-way table of counts among all individuals described by the table, though a percentage is often more informative. Divide the row/column total by the table total and convert to a percentage to get the MD.

26
Q

Conditional Distribution

A

Describes values of that variable among individuals who have a specific value of another variable - separate conditional distribution for each value of the other variable, often uses relative frequencies

27
Q

Segmented Bar Graph

A

A bar graph that uses one category to separate into bars (ex: male/female) and another divided into connected segments of the bar, adding up to 100%.

28
Q

Side-by-Side Bar Graph

A

A bar graph where two categories (ex: male/female) are made of two (or more) separate bars for one category, the bars being repeated each category

29
Q

Association

A

When knowing the value of a variable helps to predict the value of the other

30
Q

Simpson’s Paradox

A

An effect where the marginal association between two categorical variables is qualitatively different than the partial association between the same two variables - tldr - averages can be misleading

31
Q

Dotplot

A

A plot where each data value is shown as a dot above its locative on a numberline.

32
Q

Shape

A

Describes the way a graph looks - focus on the main features, such as major peaks, clusters, obvious gaps, and potential outliers

33
Q

Mode

A

the most common value

34
Q

Center

A

the midpoint of the data

35
Q

Spread

A

similar to range, but not a singular value - data varies from __ to __

36
Q

Range

A

A measure of variability that shows the full spread of the data - single value gotten by subtracting the smallest value from the largest value

37
Q

Outlier

A

Any observation that falls more than 1.5 x IQR above the third quartile or below the first quartile

38
Q

Symmetric Distribution

A

When the right and left sides of a graph are approximately mirror images of the other

39
Q

Skewed Right

A

When the right side of the graph is longer than the left - in the direction of the tail

40
Q

Skewed Left

A

When the left side of the graph is longer than the right - in the direction of the tail

41
Q

Unimodal

A

having a single peak

42
Q

Bimodal

A

having two clear peaks

43
Q

Multimodal

A

having more than two clear peaks

44
Q

Stemplot

A

A plot used to display quantitative data, usually from smaller data sets, consisting of a stem (including all but the final digits of an observation) and leaves (the final digit of an observation

45
Q

Splitting Stems

A

Dividing a stem into further pieces - eg 0-9 stem becomes two 0 stems, one with a spread of 0-4 and the other with a spread of 5-9

46
Q

Back-to-Back Stem

A

A stemplot plot where leaves are on either side of the stem, often to represent two different categories of data

47
Q

Plots

A

A graphing technique used to represent a data set, often showing the relationship between or more variables

48
Q

Histogram

A

A graph of distribution using quantitative data where nearby values are grouped together

49
Q

Mean

A

An average score that shows how large each data value would be if the total were split equally amongst the observations & found by finding the sum of individual scores and diving it by the number of individuals. Not resistant measure of center.

50
Q

Median

A

The midpoint of distribution, where around half of the observations are smaller than the value and about half are larger. Resistant measure of center.