Lesson 1 Flashcards
Why do we do statistics?
To systematize the way we account uncertainty when making data-based decisions.
What can we say about the p-value in: Consider the one-tailed test for our estimated test-statistic of 𝑡Ƹ= 1.86 that produces a p-value of p = 0.032.
There is a 0.032 probability of observing a test statistic at least as large as 𝑡Ƹ, if the null hypothesis is true.
What is the difference between Inference and Prediction?
- Inference focusses on the relation of variables to the outcome
- Prediction focusses on building a tool that can guess future values.
What best represents a strength of statistical modeling relative to statistical testing?
The ability to control for confounding factors.
True or False: A p-value is a test statistic
False
How does the Data Science Cycle look?
- Define problem
- (Formulate hypotheses)
- Collect data
- Process data
- Clean data
- (EDA)
- (Modeling/Testing)
- (Evaluate)
- (Report findings)
- (Build data product)
What is the difference between exploratory and confirmatory data analysis
When the data is well understood -> confirmatory.
If we don’t care about testing hypotheses -> exploratory.
EDA can be used to generate hypotheses for CDA and sanity check hypotheses.
Why do we need statistical reasoning?
To quantify the uncertainty of our conclusions
What defines a sampling distribution?
A mathematical function that describes all of the possible values that a statistic can take.