Ap Stats Midterm Flashcards
Population
Who you are trying to learn about as a whole
Parameter
Numerical value that describes a population
Sample
A smaller group of population that is hopefully representative
Statistic
Numerical value that describes the sample
Qualitative/categorical data
Mostly non-numerical
Ex:color of car, jersey number, brand of shoe
Quantitative data
All numeric, calculations and not percentages
Who What Where When Why How
- who is the study about(population)
- variables, quantity
- date it happened, if given
- location study or experiment took place
- what’s the purpose
- how did they get the data;
- survey
- experiment
- record keeping
Pie chart
For percentage categories
Bar chart
Bars decrease in height from left to right
Contingency table
Each cell of the table gives the count for a combination of values of the two variables
Independence
Tells us weather there is an association btw these variables
Distribution
How are the numbers spread out? Where is the center?
Any repetition?
Histogram
The bars touch and the height shows frequency.
Bin width is how thick one of the bars is
Stem and leaf plot
- Good for small data sets
- still shows relative shape
- maintains data
- always make a key
Dot plots
Good for integer data and small data sets
Describing the distribution using CUSS
C-center
U-unusual:any outliers or gaps
S-shape
S-spread (I️f all you have is the graph say the range)
Unimodal and symmetric
One tallest bar, generally symmetric shape
Skewed
Bars stretch out on to the side that is skewed
Uniform
All bars are generally the same height
Median
- middle number
- numbers need to be in order when finding median
- 1 center number or average of two center numbers
- not affected by outliers
- good for skewed data
Mean
- sum of #s divided my # of #s
- affected by outliers
- only use for unimodal and symmetric distributions
Mode
Most frequent number
Use term loosely
Range
Max#-min#
Very very biased
Interquartile range(IQR)
Q3-Q1
Unbiased
Standard deviation
- always goes with mean
- (add up all)•(X-Xbar)^2 all over (number of numbers)-1
- or 1.5xIQR
5 number summary
Min, Q1, median, Q3, max
Time plots
What is the trend of the data, increase or decrease?
When adding constant
- center increased by that amount
- the spread does not change
Z-scores formula
X-(mue)over O
Or X-xbar over s
Datum-mean over standard deviation
Z-scores
Is how many standard deviations from the mean it is
If z is less than -2 or greater than 2 u are unusual
Empirical rule
68%-95%-99.7%
68% fall in 1sd
95% fall in 2sd
And the rest in 99.7%
Normal model steps
- z-score
- draw man diagram with mean and a-score
- normalcdf and label
- answer to 4 decimal places
DUFUS
- direction(positive or neg slope)
- form(linear,curve, quadratic)
- unusual features(outliers,gaps)
- strength(strong, moderate, weak)
Correlation coefficient
Shown with r
Y=ax+b or y-intercept
A-the intercept
B-is the slope
R^2-coefficient of determination
R-correlation coefficient
Y in context
When y hat = zero what is a
Slope in context
For every one y-hat b in predicted to increase by “n” amount
R^2 in context
- always start with “according to the model”
- what percent can be explained by the model
R in context
- start with “according to the model”
- for every one standard deviation you expect an approximate increase in “n” SD
Residual
Actual-predicted
Y Y-hat
Negative residual
The point is below the line
Also an over estimate
Subsets
Breaking the data into manageable parts
Extrapolation
When making a prediction outside the data collected
Influential points
Outlier with leverage that is not near line of best fit, it does not change the slope
Conclusion in context
- start with according to my simulation
- I️ expect an average of “n” before something happens
Undercoverage
Some ppl arnt included in the sample ( ppl that could have been included )
Population
Everyone you want to be in your sample
Non-response bias
Ppl just don’t answer/respond
Response bias
When the response is not the real answer
-could be lying, question may be worded to bring in bias
Convenience sample
- Easy to get data
- easy to be misrepresentative
- bad
Voluntary response
- respond if u want
- only ppl that feel strongly with respond
- bad
Simple random sample
- Everyone is assigned a number randomly
- use rnt/rng to select ppl
- everyone has same chance of being selected
- good
Stratified sample
- seperate into groups based off some characteristics
- then randomly sample in each group
Cluster sample
- randomly selecting one whole group
- often done geographically
Systematic sample
Every nth person
Single blind experiment
-participants don’t know which group is which
Double blind experiment
-participants & assessor don’t know which group in which
Placebo
Fake treatment
Control group
No treatment, for comparison
Random phenomena
We don’t have an amount
Trail
EX roll of dice
Outcome
What number comes up
Event
An outcome or a combination of outcomes
Sample space
A list of all possible outcomes
Law of large numbers
As the number of trials grows, the outcomes become closer to theoretical probabilities
Disjoint
Events are disjoint if they cannot happen at the same time
Addition rule
- or
- add probabilities together
Event
- and
- multiply them together
Complement
Probability of event not happening
P(A) vs. P(notA)
Disjoint
If there is a probability of having both then it can’t be disjoint
P(x/given)
P(both/given)
A random variable
Has a variety of numerical outcomes and we cannot predict those outcomes
Expected amount/value
Just the mean
Z scores
X-m O X is value they are talking about M is the mean O is sd
How to find an outlier
Calculate IQR
1.5xIQR
Q3+IQR upper fence
Q1-IQR lower fence
Leverage
On scatter plot a point that is in line with other points but either far right or far left
Influential
Point is far left or right but also not in line with other points
Confounding
Something that may have influenced results that was not an anticipated variable