AP Statistics Flashcards

1
Q

Individuals

A

Individuals are any data set that contains information about a group. The group that is being studied or experimented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variable

A

A variable is an attribute that describes a person, place, thing, or idea.
The value of the variable can “vary” from one entity to another.
For example, suppose we let the variable x represent the color of a person’s hair. The variable x could have the value of “blond” for one person, and “brunette” for another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical Variable

A

Categorical. Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative Variable

A

Quantitative. Quantitative variables are numerical. They represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete Variables

A

Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous

A

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter’s weight could take on any value between 150 and 250 pounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Univariate Data

A

Univariate data. When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bivariate Data

A

Bivariate data. When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Population

A

In statistics, population refers to the total set of observations that can be made.

For example, if we are studying the weight of adult women, the population is the set of weights of all the women in the world. If we are studying the grade point average (GPA) of students at Harvard, the population is the set of GPA’s of all the students at Harvard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample

A

In statistics, a sample refers to a set of observations drawn from a population.

Often, it is necessary to use samples for research, because it is impractical to study the whole population. For example, suppose we wanted to know the average height of 12-year-old American boys. We could not measure all of the 12-year-old boys in America, but we could measure a sample of boys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Census

A

A census is a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur. When a distribution of categorical data is organized, you see the number or percentage of individuals in each group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Inference

A

The process of using data analysis to deduce properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Frequency Table

A

When a table shows frequency counts for a categorical variable, it is called a frequency table Below, the bar chart and the frequency table display the same data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relative Frequency

A

A frequency count is a measure of the number of times that an event occurs.

To compute relative frequency, one obtains a frequency count for the total population and a frequency count for a subgroup of the population. The relative frequency for the subgroup is:

Relative frequency = Subgroup count / Total count

The above equation expresses relative frequency as a proportion. It is also often expressed as a percentage. Thus, a relative frequency of 0.50 is equivalent to a percentage of 50%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Table

A

The values of the cumulative distribution functions, probability functions, or probability density functions of certain common distributions presented as reference tables for different values of their parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pie Chart

A

A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Bar Graph

A

A bar chart is a graphical representation of the categories as bars. … A bar chart can be plotted vertically or horizontally. Usually it is drawn vertically where x-axis represents the categories and y-axis represents the values for these categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two-way Table

A

A two-way table (also called a contingency table) is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies (just like a one-way table ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Marginal Distribution

A

Entries in the “Total” row and “Total” column are called marginal frequencies or the marginal distribution. Entries in the body of the table are called joint frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Conditional

A

Conditional probability is the probability of one event occurring with some relationship to one or more other events. For example: Event A is that it is raining outside, and it has a 0.3 (30%) chance of raining today. Event B is that you will need to go outside, and that has a probability of 0.5 (50%).
The formula for conditional probability is:

P(B|A) = P(A and B) / P(A)

which you can also rewrite as:

P(B|A) = P(A∩B) / P(A)

22
Q

Segmented Bar Graph

A

A segmented Bar chart is one kind of stacked bar chart, but each bar will show 100% of the discrete value. For example, there are a total of 40 students in your classroom. Out of them, 25 students like Basketball, 30 students like Volleyball, and 20 students like Badminton. There are 25 boys and 15 girls in the class. The data along the vertical side of the box represents sports while the horizontal represents a certain percentage for each sport. Each bar will show the preference of each sport according to the number of boys and girls and the bars will be separated by stacked order, representing one group for the boys and the other for the girls.

23
Q

Side by Side Bar

A

In a side-by side bar chart, the bars are split into colored bar segments. The bar segments are placed next to each other. … In a stacked bar chart, the bar segments within a category bar are placed on top of each other, and in a side-by-side bar chart, they are placed next to each other.

24
Q

Graph

A

A statistical graph or chart is defined as the pictorial representation of statistical data in graphical form. The statistical graphs are used to represent a set of data to make it easier to understand and interpret statistical data.

25
Q

Association

A

In Statistics, association tells you whether two variables are related. The direction of the association is always symbolized by a sign either positive (+) or negative (-).

26
Q

Simpson’s Paradox

A

Simpson’s paradox, which goes by several names, is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.

27
Q

Dot Plot

A

A Dot Plot, also called a dot chart or strip plot, is a type of simple histogram-like chart used in statistics for relatively small data sets where values fall into a number of discrete bins (categories).

28
Q

Shape

A

The center is the median and/or mean of the data. The spread is the range of the data. And, the shape describes the type of graph. The four ways to describe shape are whether it is symmetric, how many peaks it has, if it is skewed to the left or right, and whether it is uniform.

29
Q

Mode

A

The mode is the number that is repeated more often than any other.

30
Q

Center

A

The center of a distribution is the middle of a distribution. For example, the center of 1 2 3 4 5 is the number 3. … Look at a graph, or a list of the numbers, and see if the center is obvious.

31
Q

Spread

A

Measures of spread describe how similar or varied the set of observed values are for a particular variable (data item). Measures of spread include the range, quartiles and the interquartile range, variance and standard deviation.

32
Q

Range

A

In statistics, the range of a set of data is the difference between the largest and smallest values.

33
Q

Outlier

A

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses.

34
Q

Symmetric

A

A symmetric distribution is a type of distribution where the left side of the distribution mirrors the right side. By definition, a symmetric distribution is never a skewed distribution. … The normal distribution is symmetric. It is also a unimodal distribution (it has one peak). Standard normal distribution.

35
Q

Skewed Right

A

A distribution that is skewed right (also known as positively skewed) is shown below. … For a right skewed distribution, the mean is typically greater than the median. Also notice that the tail of the distribution on the right hand (positive) side is longer than on the left hand side.

36
Q

Skewed Left

A

A distribution that is skewed left has exactly the opposite characteristics of one that is skewed right: the mean is typically less than the median; the tail of the distribution is longer on the left hand side than on the right hand side; and. the median is closer to the third quartile than to the first quartile.

37
Q

Unimodal

A

A unimodal distribution is a distribution with one clear peak or most frequent value. The values increase at first, rising to a single peak where they then decrease. … The normal distribution is an example of a unimodal distribution; The normal curve has one local maximum (peak).

38
Q

Multimodal

A

A multimodal distribution is a probability distribution with more than one peak, or “mode.” A distribution with one peak is called unimodal. A distribution with two peaks is called bimodal. A distribution with two peaks or more is multimodal.

39
Q

Stemplot

A

A stem and leaf plot is a way to plot data where the data is split into stems (the largest digit) and leaves (the smallest digits). … A very long leaf means that “stem” has a large amount of data.

40
Q

Splitting Stems

A

Split stems is a term used to describe stem-and-leaf plots that have more than 1 space on the stem for the same interval. Example would be 1 with leaves 1-4, and a 2nd 1 containing leaves 5-9.

41
Q

Back-to-Back Stem

A

Back-to-back stemplots are a graphic option for comparing data from two populations. The center of a back-to-back stemplot consists of a column of stems, with a vertical line on each side. Leaves representing one data set extend from the right, and leaves representing the other data set extend from the left.

42
Q

Plots

A

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables.

43
Q

Histogram

A

A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis. The y-axis represents the number count or percentage of occurrences in the data for each column and can be used to visualize data distributions.

44
Q

Mean

A

A mean score is an average score, often denoted by X. It is the sum of individual scores divided by the number of individuals.

45
Q

Median

A

The median is a simple measure of central tendency. To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values.

46
Q

Interquartile Range

A

The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles.

Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively.

For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. Q1 is the middle value in the first half of the data set. Since there are an even number of data points in the first half of the data set, the middle value is the average of the two middle values; that is, Q1 = (3 + 4)/2 or Q1 = 3.5. Q3 is the middle value in the second half of the data set. Again, since the second half of the data set has an even number of observations, the middle value is the average of the two middle values; that is, Q3 = (6 + 7)/2 or Q3 = 6.5. The interquartile range is Q3 minus Q1, so IQR = 6.5 - 3.5 = 3.

47
Q

Five-Number

A

The five-number summary is a set of descriptive statistics that provides information about a dataset.

48
Q

Summary

A

The information that gives a quick and simple description of the data. Can include mean, median, mode, minimum value, maximum value, range, standard deviation, etc.

49
Q

Boxplot

A

In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and- …

50
Q

Standard Deviation

A

A quantity expressing by how much the members of a group differ from the mean value for the group.

51
Q

Variance

A

The variance is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa.

It is important to distinguish between the variance of a population and the variance of a sample. They have different notation, and they are computed differently. The variance of a population is denoted by σ2; and the variance of a sample, by s2.

The variance of a population is defined by the following formula:

σ2 = Σ ( Xi - X )2 / N

where σ2 is the population variance, X is the population mean, Xi is the ith element from the population, and N is the number of elements in the population.

The variance of a sample is defined by slightly different formula:

s2 = Σ ( xi - x )2 / ( n - 1 )

where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the variance of the sample is an unbiased estimate of the variance of the population.

And finally, the variance is equal to the square of the standard deviation.