AP Stat Vocabulary Flashcards

1
Q

Individuals

A

the objects described by a set of data ( people, animals, things, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical Variable

A

data that places an individual into 1 of several groups or categories (pie chart, bar graphs, two way tables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Quantitative Variable

A

data that takes numerical values for which it makes sense to find an average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discrete Variables

A

variables that can only take a finite number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Continuous

A

variables that can take an infinite number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Univariate Data

A

When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bivariate Data

A

When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variable

A

any characteristic of an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

population

A

refers to the total set of observations that can be made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

sample

A

set of observations drawn from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

census

A

a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

distribution

A

tells us what values a variable take and how often it takes those values- the pattern of variation of a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Inference

A

drawing conclusions that go beyond the data- making a conclusion on a population based on a set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Frequency Table

A

displays the frequencies counts for categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relative Frequency

A

measure of the number of times that an event occurs.- usually a proportion or percentage
Relative frequency = Subgroup count / Total count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Table

A

a table that shows relative frequencies for different categories of a categorical variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Roundoff Error

A

when each percent is rounded to the tenth, but the numbers do not equal 100%- points to the effect of rounding results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pie Chart

A

shows the distribution of a categorical variable as “pie” whose slices are sized by the counts or percentages for the categories.
- used when you want to emphasize a categories relation to the whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Bar Graph

A

represent each category as a bar. The bar height shows the category as counts or percents
- compares quantities by comparing the heights of the bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Two-way Table

A

A two-way table (also called a contingency table) is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies (just like a one-way table ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Marginal Distribution

A

Entries in the “Total” row and “Total” column are called marginal frequencies or the marginal distribution. tells us the distribution of values among ALL the individuals

22
Q

Conditional Distribution

A

The relative frequencies in the body of the table are called conditional frequencies or the conditional distribution.
- describes the values of variables among individuals who have a specific value of another variable

23
Q

Segmented Bar Graph

A

segmented bars on the graph (stacked on top of each other) showing a larger category being divided into a smaller one.
- shows the relationship of the small variable to the category as a whole

24
Q

Side by Side Bar Graph

A

the bars are split into colored bar segments (next to each other) the heights are used to compare the variables to the whole. (displays two categorical variables)

25
Q

Association

A

when knowing the value of one variable helps you predict the value of another, then the variables have this

26
Q

Simpson’s Paradox

A

a condition in which the same set of data can show opposite trends depending on different groups analyzed
- conditional variables can be secretly hidden, and greatly influence the data

27
Q

Dot Plot

A

each data value is shown as a dot above its location on the number line.

  • used to compare frequency counts within categories or groups. As you might guess, a dotplot is made up of dots plotted on a graph.
  • The dots are stacked in a column over a category, so that the height of the column represents the relative or absolute frequency of observations in the category
28
Q

Shape

A

describes the type of graph

  • the shape is described by whether it is symmetric, how many peaks it has, if it is skewed, or whether it is uniform
  • clusters? gaps?
29
Q

Mode

A

most frequently appearing value in a population or sample

30
Q

Center

A

the midpoint of the data

31
Q

Spread

A

the extent to which a distribution is stretched or squeezed

- can be measured by variance, IQR, and standard deviation

32
Q

Range

A

the difference between the maximum and minimum value in a data set

33
Q

Outlier

A

a data point that diverges greatly from the overall pattern of data is called an outlier.
- “rule of thumb”- an extreme value is considered to be an outlier if it is at least 1.5 interquartile ranges below the first quartile (Q1), or at least 1.5 interquartile ranges above the third quartile (Q3).

34
Q

Symmetric

A

if the right and left sides of the graph are approximately mirror images of each other

35
Q

Skewed Right

A

when the right side is much longer than the left side, the higher values are on the left side, and the tail of the graph is to the right, the values tend to be lower

36
Q

Skewed Left

A

when the left side is much longer than the right side, the higher values are on the right side, and the tail of the graph is to the left, the values tend to be higher

37
Q

Unimodal

A

when a graph has a clear single peak “one mode”

38
Q

Bimodal

A

when a graph has two clear peaks “two modes”

39
Q

Multimodal

A

when a graph has more than two clear peaks “many modes”

40
Q

Stem Plot

A

A stem plot is used to display quantitative data, generally from small data sets (50 or fewer observations).
- gives us a the shape of distribution with numerical values

41
Q

Splitting Stems

A

for every stem split into 2, the leaves are split 0-4 and 5-9

42
Q

Back to back Stem Plot

A

Back-to-back stemplots are a graphic option for comparing data from two populations. The center of a back-to-back stemplot consists of a column of stems, with a vertical line on each side. Leaves representing one data set extend from the right, and leaves representing the other data set extend from the left.

43
Q

Histogram

A

made up of columns plotted on a graph

-displays the distribution of a quantitative variable

44
Q

Mean

A

A mean score is an average score, often denoted by X. It is the sum of individual scores divided by the number of individuals.
sum of the scores/ number of scores

45
Q

Median

A

a simple measure of central tendency; the middle value of a data set
TO FIND:
- arrange data from smallest to largest values
-If there is an odd number of observations, the median is the middle value.
- If there is an even number of observations, the median is the average of the two middle values.

46
Q

Interquartile Range (IQR)

A

The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles.
-defined as the difference between the largest and smallest values in the middle 50% of a set of data.
- “Q3- Q1”
Q1 is the “middle” value in the first half of the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the “middle” value in the second half of the rank-ordered data set.

47
Q

Five Number Summary

A

consists of the smallest observation, first quartile, median, the third quartile, and the largest observation
“minimum Q1 median Q3 maximun”

48
Q

Box Plot/ Box and Whisker Plot

A
  • type of graph used to display patterns of quantitative data.
  • splits the data set into quartiles
  • box- goes from first quartile to third quartile
  • vertical line is drawn at Q2 (median)
  • whiskers go from minimum to Q1, and then maximum to Q3
  • If the data set includes one or more outliers, they are plotted separately as points on the chart
49
Q

Standard Deviation

A

The standard deviation is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the standard deviation is big; and vice versa.

It is important to distinguish between the standard deviation of a population and the standard deviation of a sample. They have different notation, and they are computed differently. The standard deviation of a population is denoted by σ and the standard deviation of a sample, by s.

Formula:
σ = sqrt [ Σ ( Xi - X )2 / N ]

where σ is the population standard deviation, X is the population mean, Xi is the ith element from the population, and N is the number of elements in the population.

The standard deviation of a sample is defined by slightly different formula:

s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

where s is the sample standard deviation, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample.

And finally, the standard deviation is equal to the square root of the variance.

50
Q

Variance

A

The variance is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa.

It is important to distinguish between the variance of a population and the variance of a sample. They have different notation, and they are computed differently. The variance of a population is denoted by σ2; and the variance of a sample, by s2.

The variance of a population is defined by the following formula:

σ2 = Σ ( Xi - X )2 / N

where σ2 is the population variance, X is the population mean, Xi is the ith element from the population, and N is the number of elements in the population.

The variance of a sample is defined by slightly different formula:

s2 = Σ ( xi - x )2 / ( n - 1 )

where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the variance of the sample is an unbiased estimate of the variance of the population.

-the variance is equal to the square of the standard deviation.