vocab Flashcards

1
Q

Individuals

A

One data point in a survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variable

A

Something that can change in an experiment. Sometimes the variables are not obvious and can cause confusing results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical variable

A

A variable that can fall into one of multiple categories, as opposed to a quantitative one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative variable

A

A variable with a number value that holds significance (i.e although a zip code is a number, the numerical value is unimportant, so it is categorical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete variables

A

A type of data that can only come at set values. For example, number of pairs of socks can only be a whole number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous

A

A type of data that can fall anywhere on a spectrum. For example, temperature in degrees Celsius isn’t restricted to any type of discrete step, it can be any value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Univariate data

A

Data with one variable associated. For example, when asking students how many socks they have, the only variable associated with that student is their number of socks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bivariate data

A

Data with two variable associated. For example, a survey that asks students both how many socks they have and how many shoes they have is collecting bivariate data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Population

A

The group of people of objects that a survey collects data about. For an interview asking students at Grady how many socks they have, the population is the students at Grady.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample

A

A sample is a number of individuals taken from a population that should represent that population fairly well. For example, instead of asking every student at Grady how many socks they have, one could instead take a sample of all the students, and only ask 20 random students from each grade.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Census

A

A census is a survey which gathers data from every individual in a population. The goal is to get the most accurate data as no extrapolation is necessary. A good example is the US Census.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

The distribution is how the data are spread out. It is easiest to understand when graphed on a dot plot, bar chart, or histogram. The distribution of number of socks the students at Grady have would look quite different from the distribution of the distribution of the incomes of families in the United States.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Inference

A

An inference is an extrapolation from a sample to a whole population. For example, if in a sample of 80 Grady students, 78% reported having more than 20 pairs of socks, one could infer that a similar percentage of the whole population of Grady students would have more than 20 pairs of socks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Frequency Table

A

A table which matches an occurrence with the number of times it occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relative Frequency Table

A

A relative frequency table matches on occurrence with the percent of times it occurred in relation to the total number of samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Roundoff error

A

The error created when values are rounded off at a certain point. For example, if 8 categories each had the same frequency (12.5%), but that was rounded to the nearest whole number, one might find that the total comes out to be 13% * 8 or 104%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pie chart

A

A visual chart showing how different categories contribute to a whole. Shown as a circle split into segments whose internal angles are determined by the relative frequency of that segment’s corresponding category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Bar graph

A

A graph which represents data by using bars of different heights or lengths.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two-way table

A

A way of organizing data in a table so that each cell has a number in it which is the number of people who correspond with that cell’s location. For example, a two-way table could have the countries UK and US along the top, and preferred superpowers of students along the left. Each cell would house the number of students from that country who preferred that superpower.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Marginal distribution

A

In a two-way table, the totals of each row and column. Whereas each cell contains the number of individuals matching two criteria, the marginal distributions contain the number of individuals who match with a single criterion.

21
Q

Conditional distribution

A

In a two-way table, instead of each cell having the frequency of an event, it instead houses the probability of that event given then column that it’s in. For example, a two-way table could have the countries UK and US along the top, and preferred superpowers of students along the left. Then, each cell would have the probability that a student would choose that superpower given they are from either the UK or US.

22
Q

Segmented Bar Graph

A

A bar graph in which each bar is stacked on top op each other so that the top is at 100%. Similar to a pie chart, but in a bar. Can be useful for comparing the distributions in multiple populations

23
Q

Side-by-side bar graph

A

A bar graph where each possibility has multiple bars associated with it. Each bar represents the frequency of that possibility in a different population. Useful for comparing the probabilities of various events in different populations.

24
Q

Association

A

Two or more variables are associated when a change in one predicts a change in the other. Mathematically, when a line of best fit has a high r-squared.

25
Q

Simpson’s paradox

A

A paradox that occurs when two populations are treated as one. For example, when considering income versus happiness among dogs and cats. The dog population is generally poorer and sadder, however within the population, the richer dogs are the saddest. The cats are generally richer and happier, but again the richer cats are sadder than the poorer cats. Analyzing the data from both groups, one might conclude that the richer an animal is, the happier. However, the truth within the populations is that more money is associated with less joy.

26
Q

Dotplot

A

A plot which shows the frequency of quantitative data by putting a dot above a number on a line each time that number appears in the data.

27
Q

Shape

A

Either symmetrical, skewed right, or skewed left.

28
Q

Mode

A

The most common value(s) in a data set.

29
Q

Center

A

Usually either mean or median. A single number which gives the “average” value of a data set.

30
Q

Spread

A

A measure of how wide a range the data cover; how far apart they are.

31
Q

Range

A

The difference of the highest and lowest values in a data set

32
Q

Outlier

A

A data point which is significantly different from the others that special consideration should be taken with it.

33
Q

Symmetric

A

A data set whose values are roughly mirrored about the median/mean.

34
Q

Skewed Right

A

A data set with a long tail going out to the right

35
Q

Skewed left

A

A data set with a long tail going out to the left

36
Q

Unimodal

A

A data set with a clear mode or most common value

37
Q

Bimodal

A

A dataset with 2 modes or most common values

38
Q

Multimodal

A

A dataset with more than 2 modes or most common values

39
Q

Stemplot

A

A way of organizing data so that all but the last digit of each data point is on the left of a line, and all of the last digits of data points with the corresponding first digit(s) are on the right of the line.

40
Q

Splitting Stems

A

A stemplot in which instead of each “bucket” containing a range of 10 values (eg 30-39), each contains a different number, often 5 (eg 30-34, 35-39, 40-44).

41
Q

Back-to-back stemplots

A

A stemplot with the final digits of data appearing on both sides of the stem. Often the left and right sides represent different populations.

42
Q

Histogram

A

A graph in which each “bucket” represents a range of values, and the height of the bar above that bucket shows how many data points fall into that range.

43
Q

Mean

A

The average of a data set. The quotient of the sum of every value and the total number of values.

44
Q

Median

A

The middle value of a data set when organized from least to greatest. If there are an even number of data points, the mean of the two center data points.

45
Q

Interquartile Range (IQR)

A

The difference between Q3 (75th percentile) and Q1 (25th percentile).

46
Q

Five-Number summary

A

The minimum, 1st quartile, median, 3rd quartile, and maximum of a data set.

47
Q

Boxplot

A

A plot that shows each number from the five number summary with lines between min and Q1 and Q3 and max, with a box between Q1 and Q3 and a vertical line through the median. Usually situated above a number line.

48
Q

Standard Deviation

A

A measure of the amount of variation or dispersion of a set of values. A low standard deviation means the values are close to the mean.

49
Q

Variance

A

The square of the standard deviation of a data set. A low variance means the values are close to the mean.