AP Stat Ch 1 Flashcards
Available data
The data that were produced in the past for some other purpose but that may help answer a present question
Statistics
The science of collecting, analyzing, and drawing conclusions from data
Observational study
In an observational study, we observe individuals and measure variables of interest but do not attempt to influence the responses
Experiment
In an experiment, we deliberately do something to individuals in order to observe their responses
Individuals
Individuals are the objects described by a set of data. Individuals may be people, but they may also be animals or things. Do not get individuals confused with the population
Population
The population of interest is the entire collection of individuals or objects about which information is desired
Variable
Any characteristic of an individual whose value may change from one individual to another.
Ex. Hair color, height, brand of car, gpa
Categorical variable
An individual into one of several groups or catergories.
Ex. Hair color, brand of car
USUALLY WORDS AS OPTIONS
Quantitative
Numerical data. Takes numerical values for which arithmetic operations such as adding and averaging make sense.
Categorical vs quantitative variables
Categorical is w words whereas quantitative is with numbers–can do operations to them
Census
When you study an entire population, it is called a census
Sample
A sample is a subset of the Population, selected for study in some prescribed manner
Descriptive statistics
The branch of statistics that includes methods for organizing and summarizing data
Inferential statistics
The branch of statistics that involves generalizing about a population based on information from a sample of that population.
Statistical inference
The process of drawing these generalizations about inferential statistics
Distribution of a variable
Tells us what values the variable takes and how often it takes these values
Discrete data
Quantitative data is discrete if the possible values are isolated points on the number line.
Shoe size, number of birthdays. Count them. Whole numbers.
Continuous data
Numerical data is continuous if the possible values form an entire interval on the number line
Foot length, age
Discrete vs continuous variables
Measure continuous, count discrete
Types of variables
First decide if Categorical or quantitative.
If catergorical, then it is words– hair color, fav color, fav president
If quantitative then it is numbers – age, number siblings
If quantitative, then discrete or continuous
Discrete if u can count it, continuous if u measure it.
Discrete is number of pages, continuous is length of an inseam
Are the following quantitative (continuous or discrete) or caterogircal: Length of pen Color of pants Subject of book Type of pen Number of pockets Number of pages Number of pens in a box Length of an inseam Area of a page
Length of pen– quantitative, continuous
Color of pants– caterogircal
Subject of book– cateofgircal
Type of pen– cateorgircal
Number of pockets– quantitative, discrete
Number of pages– quantitative, discrete
Number of pens in a box – quantitative, discrete
Length of inseam– quantitative, continuous
Area of a page– quantitative, continuous
Frequency table
For caterogircal data, make a frequency table – displays the possible catergories and either the count or the present of individuals who fall in each category
Frequency
Count– # of items in that group
Relative frequency
Percent of your thing. If you have 2 and there are 11 total, relative frequency = 2/11
Ways to display caterogircal data
Bar graphs and relative frequency bar graphs
Pie charts and segments bad charts
Two way table
Bar graphs and relative frequency bar graphs
Label variables and scales
The bars should be the same width and not touching each other
The order of the categories doesn’t matter
Relative frequency bar charts make it easier to compare multiple distributions, especially when the sample sizes are different
Pie charts and segmented bar charts
Label variables and categories
Pie charts are easier to construct with a computer spreadsheet program or stat software
Pie charts help us visually see what part of the whole each group forms
Segmented bar charts are basically rectangular pie charts, each bar is a whole, divide each bar proportionally into segments corresponding to the percentage in each group
Segmented bar charts make it easier to compare distributions
AP exam common error with charts
BE SURE TO LABEL GRAPHS!!!
Suppose I wanted to compare AP stat scores for tenth, eleventh, and twelfth graders. Which type of graph would be the best?
Segmented bar chart
Three bars, one with tenth, one with eleventh, one with twelfth
Two way table
A table with two categorical variables
Marginal distribution
Distributions of categorical data that appear at the right and bottom margins of a two way table. They help us to look at the distribution of each variable separately
Conditional distributions
Caterogiral distrivutions inside a two way table that deals w a specific number inside the table
How many total conditional distributions are there?
Rows + columns
Simpson’s paradox
An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This reversal is called Simpson’s paradox. Therefore You must be careful when data from several groups are combined to form a single group!
Data that suggests one conclusion when aggregated and a different conclusion when presented in subcategories
Lurking variables
With Simpson’s paradox
Sometimes the relationship between two variables is influenced by other variables that we did not measure or even think about! Because the variables are lurking in the background, we call them lurking variables. They are not among the explanatory or response variables in a study, but they may influence the interpretation of the relationship among these variables.
Conclusions from Simpson’s paradox
It is caused by a combination of a lurking variable and data from unequal sized groups being combined into a single data set. The unequal group sizes, in the prescense of a lurking variable, can weight the results incorrectly. This can lead to seriously flawed conclusions. The obvious way to prevent it is to not combine data sets of different sizes from diverse sources!
A great deal of care has to be taken when combining small data sets into a larger one.
Sometimes Conclusions from large data sets are the opposite of conclusions from smaller ones. Conclusions from large set are usually wrong!