Chapter 6 - Statistical Data Flashcards
We collect data from ___________ and use them to ____________.
real world experiences, draw conclusions
We choose a ___________ to study, collect measurements from a small representative subset or ________ of that population, then apply our findings back to the larger population to assess if they ____.
population, sample, and fit.
What is a simple model?
An average or a mean
What is one way to calculate a simple model?
Calculate an average, then look at the difference between each data point and the average value.
How can we observe how well our data fits the model?
Take the sum of the squared differences (Standard Deviation)
How do you quantify model accuracy?
Model + Error
What does the frequency distribution indicate in the deviation around the average? There are two main take-aways
- Flat Distribution = More deviation
- Skewed Distribution = a few outliers pulling the average up or down
What does the Standard Deviation measure?
The amount of variation among the individuals you sampled.
How does the Standard Deviation measure variation among variables?
By comparing the values / measurements of each value to the mean of all values.
How do you retrieve the Standard Error?
- Take a bunch of subsamples
- Calculate the mean of each subsamples
- Calculate the standard deviation of the subsample means
What is a Confidence Interval?
Calculating the top and bottom levels within which a measurement will fall 95% of the time.
What do we use to find the Confidence Intervals?
A formula to calculate the range of values in which 95% of the actual measurements occur.
What do Confidence Intervals indicate?
Being 95% sure that the true population means that 95% of your population will fall within that range.
What is the equation of the simplest model?
Linear Model: y = mx + b
True or False: Not all relationships can be modelled well using a straight line.
True
What are the stages of constructing a Model?
- Stating a Problem
- Developing a Hypothesis about a population
- Make a prediction
- Collect data from sample
- Fit a model to the data points
- Test how well model represents
What does it mean if a model does not explain variation (relationships) between data points very well?
We have less confidence bin that model to predict what’s going on in the broader population.
What are two ways to compare means?
- Independent Sample T-Tests
- Analysis of Variance (ANOVA)
What does ANOVA stand for?
Analysis of Variance
Which method of comparing means takes “two groups that are independent of each other”?
Independent Sample T-Test
Which method of comparing means takes “more than two groups”?
Analysis of Variance
When comparing means, how can you determine whether the difference between the means is “real”?
By analyzing the p-value.
What does the p-value indicate in comparing differences in means?
It incorporates
- the magnitude of difference in means
- the sample size within each group, and
- the variation of values in each group to make its judgement
The __________ tells you whether the difference between groups is probably just a function of random chance or whether it’s something that likely holds true throughout the population.
p-value
How can you test whether two numeric variables are significantly related to one another?
By using a correlation coefficient or linear regression analysis.