AP Stat Ch 1 Flashcards

0
Q

Available data

A

The data that were produced in the past for some other purpose but that may help answer a present question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Statistics

A

The science of collecting, analyzing, and drawing conclusions from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Observational study

A

In an observational study, we observe individuals and measure variables of interest but do not attempt to influence the responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Experiment

A

In an experiment, we deliberately do something to individuals in order to observe their responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Individuals

A

Individuals are the objects described by a set of data. Individuals may be people, but they may also be animals or things. Do not get individuals confused with the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Population

A

The population of interest is the entire collection of individuals or objects about which information is desired

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Variable

A

Any characteristic of an individual whose value may change from one individual to another.
Ex. Hair color, height, brand of car, gpa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Categorical variable

A

An individual into one of several groups or catergories.
Ex. Hair color, brand of car
USUALLY WORDS AS OPTIONS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Quantitative

A

Numerical data. Takes numerical values for which arithmetic operations such as adding and averaging make sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical vs quantitative variables

A

Categorical is w words whereas quantitative is with numbers–can do operations to them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Census

A

When you study an entire population, it is called a census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sample

A

A sample is a subset of the Population, selected for study in some prescribed manner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Descriptive statistics

A

The branch of statistics that includes methods for organizing and summarizing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Inferential statistics

A

The branch of statistics that involves generalizing about a population based on information from a sample of that population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Statistical inference

A

The process of drawing these generalizations about inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Distribution of a variable

A

Tells us what values the variable takes and how often it takes these values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Discrete data

A

Quantitative data is discrete if the possible values are isolated points on the number line.

Shoe size, number of birthdays. Count them. Whole numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Continuous data

A

Numerical data is continuous if the possible values form an entire interval on the number line

Foot length, age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Discrete vs continuous variables

A

Measure continuous, count discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Types of variables

A

First decide if Categorical or quantitative.
If catergorical, then it is words– hair color, fav color, fav president
If quantitative then it is numbers – age, number siblings

If quantitative, then discrete or continuous
Discrete if u can count it, continuous if u measure it.
Discrete is number of pages, continuous is length of an inseam

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
Are the following quantitative (continuous or discrete) or caterogircal: 
Length of pen
Color of pants
Subject of book
Type of pen
Number of pockets
Number of pages
Number of pens in a box
Length of an inseam
Area of a page
A

Length of pen– quantitative, continuous
Color of pants– caterogircal
Subject of book– cateofgircal
Type of pen– cateorgircal
Number of pockets– quantitative, discrete
Number of pages– quantitative, discrete
Number of pens in a box – quantitative, discrete
Length of inseam– quantitative, continuous
Area of a page– quantitative, continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Frequency table

A

For caterogircal data, make a frequency table – displays the possible catergories and either the count or the present of individuals who fall in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Frequency

A

Count– # of items in that group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Relative frequency

A

Percent of your thing. If you have 2 and there are 11 total, relative frequency = 2/11

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Ways to display caterogircal data

A

Bar graphs and relative frequency bar graphs
Pie charts and segments bad charts
Two way table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Bar graphs and relative frequency bar graphs

A

Label variables and scales
The bars should be the same width and not touching each other
The order of the categories doesn’t matter
Relative frequency bar charts make it easier to compare multiple distributions, especially when the sample sizes are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Pie charts and segmented bar charts

A

Label variables and categories
Pie charts are easier to construct with a computer spreadsheet program or stat software
Pie charts help us visually see what part of the whole each group forms
Segmented bar charts are basically rectangular pie charts, each bar is a whole, divide each bar proportionally into segments corresponding to the percentage in each group
Segmented bar charts make it easier to compare distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

AP exam common error with charts

A

BE SURE TO LABEL GRAPHS!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Suppose I wanted to compare AP stat scores for tenth, eleventh, and twelfth graders. Which type of graph would be the best?

A

Segmented bar chart

Three bars, one with tenth, one with eleventh, one with twelfth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Two way table

A

A table with two categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Marginal distribution

A

Distributions of categorical data that appear at the right and bottom margins of a two way table. They help us to look at the distribution of each variable separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Conditional distributions

A

Caterogiral distrivutions inside a two way table that deals w a specific number inside the table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How many total conditional distributions are there?

A

Rows + columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Simpson’s paradox

A

An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This reversal is called Simpson’s paradox. Therefore You must be careful when data from several groups are combined to form a single group!
Data that suggests one conclusion when aggregated and a different conclusion when presented in subcategories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Lurking variables

A

With Simpson’s paradox
Sometimes the relationship between two variables is influenced by other variables that we did not measure or even think about! Because the variables are lurking in the background, we call them lurking variables. They are not among the explanatory or response variables in a study, but they may influence the interpretation of the relationship among these variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Conclusions from Simpson’s paradox

A

It is caused by a combination of a lurking variable and data from unequal sized groups being combined into a single data set. The unequal group sizes, in the prescense of a lurking variable, can weight the results incorrectly. This can lead to seriously flawed conclusions. The obvious way to prevent it is to not combine data sets of different sizes from diverse sources!
A great deal of care has to be taken when combining small data sets into a larger one.
Sometimes Conclusions from large data sets are the opposite of conclusions from smaller ones. Conclusions from large set are usually wrong!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Dotplot

A

A simple way to display quantitative data when the set is reasonably small

37
Q

How to construct a dotplot

A

Label your axis (horizontal line) with the variable and title your graph
Scale the axis based on the values of the variable
Mark a dot above the number on the horizontal axis corresponding to each data value. Stack multiple dots vertically

38
Q

Stem and leaf plot

A

Another way to display a relatively small numerical data set. Often the values of the variable are too spread out to make a dotplot, so this is a better option. Stem is the first part of the number and leaf is last digit

39
Q

How to construct a stem plot

A

Separate each observation into a set consisting of all but the rightmost digit and a leaf, the final digit.
Write the stems vertically in increasing order from top to bottom, and draw a vertical line to the right of the stems.
Write each leaf to the right of its stem.

Numbers to the left of the line are the stems and to the right are the leaves.
MUST INCLUDE A KEY W UNITS
LEAVES MUST BE IN SINGLE DIGITS, NO COMMAS
it is best if leaves are in numerical order

40
Q

Back to back stemplots

A

Useful for comparing distributions
Example is comparing female and male weights
Have stem in the middle and leaves on both sides with male above one side and female above the other

41
Q

Split stemplot

A

When a data set is very compact, it is often useful to split stems to stretch the display to investigate the shape. Whenever you split stems, be sure that each stem is assigned an equal number of possible leaf digits.
When given data all between 96 and 99, make stems 96,96,97,97,98,98,99,99 and have the top be 0-4 for leaves and bottom be 5-9

42
Q

What to do when data is spread out for stem and leaf plot

A

Truncate or round the data to shrink the display

Change 10.53 to 11

43
Q

Describing a distribution

A

SHAPE, CENTER, AND SPREAD

44
Q

Shape

A

Symmetric, skewed right, or skewed left
Unimodal if one peak, bimodal if two peaks
Uniform if a plateau, get same values

45
Q

Outliers

A

Data values that fall outside the overall pattern of the rest of the distribution.
<Q1 - 1.5IQR or >Q3 + 1.5IQR

46
Q

Clusters

A

Isolated groups of points of points

47
Q

Gaps

A

Large spaces between points

48
Q

Symmetric

A

If the right and left sides of the historgram are approximately mirror images of each other

49
Q

Skewed

A

The thinner ends of a distribution are called the tails. If one tail stretches out further than the other, the historgram is said to be skewed to the side of the longer tail

50
Q

Histogram

A

Used to display larger data sets for quantitative data

51
Q

Discrete histogram vs continuous histogram

A

In discrete historgrams, make the bars over the center of the number on the X-axis.
In continuous histograms, make classes where the bars fall between. For example, make groups of 5 and have on the left edge 40, right edge 45 and then 50 and then 55.

52
Q

How to make histograms

A

Label axis and scales
Bars should touch
Y axis is frequency or relative frequency
X axis is variable

53
Q

Relative frequency histograms

A

Same as regular histogram, but have relative frequency (percent of total) rather than frequency (number of observations) on the vertical axis. Relative frequency histograms are more useful because you can compare two distributions easier

54
Q

Histograms vs bar graphs

A

Histograms uses QUANTITATIVE variables while bar graphs use CATEGORICAL data. Histograms don’t have spaces between bars, bar graphs have spaces

55
Q

Continuous histograms

A
Make classes of the same length that never overlap
Divide the range of the data into classes of equal width. Count the number of observations in each class 
Five classes is a good minimum. Too few will give a skyscraper graph and too many will give a pancake graph. 
Label and scale your axes 
If an observation falls on a boundary, put the value into the upper class.
56
Q

Ogive

A

Culumative relative frequency graph

Relative culm frequency is percentile

57
Q

Measuring the center of a data set

A

Look at the mean and the median

58
Q

Population mean

A

Greek letter mu (u with long stem)

The arithmetic average of all values in the entire population

59
Q

Sample mean

A

X with a bar above it.
Since we rarely study the entire population, estimate population mean with the sample mean
= sum of all values / number of values

60
Q

Median

A

The middle score

To find which value is the middle score, put all the data in order

61
Q

Mode

A

Most frequency observation. Not a useful measure of center.

62
Q

Resistant measure

A

Measure not affected by outliers

63
Q

Are median and mean resistant?

A

Median is resistant–not affected by outliers so it is better for a skewed data set
Mean is not resistant–affected by outliers, as outliers affect arithmetic average.

64
Q

When to use mean and when to use median

A

Use median with all data

Mean with symmetrical data since mean is not resistant and median is

65
Q

Skewed right vs skewed left

A

Skewed left is when the tail is to the left.
Median> mean
Lower values that push the graph to the left.
Bell curve on right. Tail on left.

Skewed right is when the tail is to the right
Mean>median
Bell curve on the left.

66
Q

Mean vs median in skewed data

A

Skewed left:
Median> mean

Skewed right:
Mean> median

Symmetric:
Mean roughly equal to median

67
Q

Range

A

Full spread of data by simply finding the difference between the largest and the smallest observation. ONE NUMBER
MAX-MIN

BUT it is not resistant. Outliers heavily influence the range

68
Q

How to measure spread

A

Range for roughly symmetric data without outliers.

IQR when skewed or have outliers

69
Q

Inter-Quartile Range

A

A resistant measure of spread. It is the distance between the first and third quartiles. The range of the middle half of the data.

IQR=Q3-Q1

70
Q

Quartiles

A

Q1 is first quartile–the point that divides the lowest 25% of the data from the upper 75%
Q2 is the median
Q3 is the third quartile–the point that divides the lowest 75% of the data from the upper 25%

71
Q

How to find quartiles

A

Get data in order. Find median. Median is Q2
Half the data above the median is Q3 and half the data below the median is Q1
In that data above the median, take that median. That value is Q3. Do the same for the data below the median and get Q1

72
Q

Shape center spread summary

A

Shape:
Skewed (direction) or symmetric
Unimodal or bimodal

Center:
Mean or median

Spread:
IQR, range, standard deviation

73
Q

Five number summary

A

(MIN,Q1,MED,Q3,MAX)

74
Q

Box plot

A

A graph of the five number summary. Easy to make and clearly shows center and spread of the distribution. Skewed toward the side with the longer box.
Useful to compare multiple distributions – side by side boxplots and are usually drawn vertically

75
Q

Drawing a box plot

A

Central box spans the quartiles Q1 and Q3
A line in the box marks the Median, M
Lines extend from the box out to the smallest and largest observations.
Width of the box = IQR
label axes and scale

76
Q

Modified boxplot

A

Specifically identifies outliers, in addition to median and quartiles. Regular boxplot connects outliers.

77
Q

Variance

A

Averaged squared deviation of the observations from the mean

S squared

78
Q

Deviation

A

The deviation of an observation is its distance from the mean (x-x bar). The mean is the point that makes the sum of the deviations=0. We square the deviations to make negatives positives.

79
Q

Population standard deviation

A

Greek letter looking like the letter “o’
Standard deviation of all the values in the entire population. Typical deviation from the mean or the average distance form the average

= SQRT (sum of (x-mu)^2 / n)

80
Q

Population variance

A

Square of the population standard deviation

Greek letter looking like “o” squared

81
Q

Sample standard deviation

A

Represented by s
Since we rarely study entire populations, use this.
Your distance from the center or your average distance from the average.
Approximates the average, or typical deviation

= SQRT ( sum of (x-x bar)^2 / (n-1))

82
Q

Sample variance

A

Square of the sample standard deviation.

S squared

83
Q

Why do we divide by n-1 when calculating the sample standard deviation

A

Some error between x bar and mu, so this helps to accounts for this.

84
Q

When to use sample and when use population

A

Use sample unless told otherwise

85
Q

When to use sample standard deviation

A

When talking about mean, as this measures spread about the mean.

86
Q

When does s=0

A

When there is no spread. All observations have same value. Otherwise, s>0
As observations are more spread out about their mean, s is larger.

87
Q

Is s resistant?

A

No because like the mean

Strong skewness or outliers can make S very large

88
Q

5 number summary

A

Usually better than the mean and standard deviation for describing a skewed distribution or a distribution with strong outliers
Use the mean and s when with reasonably symmetric disturbition free of outliers.

89
Q

Transformations to lists

A

Shape never changes.
Center always changes – when multiplying each observation by b, multiply both mean and median by b. Adding same number a adds a to mean and median
Spread stays the same when adding same amount to each but increases if multiply each data point by something. When milt by b, spread is multiplied by b.