Exam info Flashcards
NUMERICAL VARIABLES
Numerical:
• Continuous (entities get a distinct score), e.g. temperature,
body length.
• Discrete (counts), e.g.: number of defects
CATEGORICAL VARIABLES
Categorical (entities are divided into distinct categories):
• Binary variable (two outcomes), e.g. dead or alive.
• Ordinal variable, e.g. bad, intermediate, good.
• Nominal variable (order not important), e.g. whether someone is an omnivore,
vegetarian or vegan
HYPOTHESIS TESTING
- State the null-hypothesis H0 and the alternative Ha
- Collect evidence (data)
- Can H0 be maintained, given
the evidence?
if p-value <= 0.05 – Reject H0
if p-value > 0.05 – Do not reject H0 - At the a% significance level, there is(not) sufficient statistical evidence to infer …
Types of Hypothesis errors
- Type I error (α): Reject H0 when H0 is true – Jury convicts an innocent person.
- Type II error (β): Do not reject H0 when H0 is false – Jury acquits a guilty person.
- Correct decision: Reject H0 when H0 is false – Jury convicts a guilty person.
- Correct decision: Do not reject H0 when H0 is true – Jury acquits an innocent person.
Confidence interval
Confidence interval - consists of an interval of numbers produced
by a point estimate, and an associated confidence level specifying the probability that the interval contains the
population parameter.
• Confidence intervals have the general form:
Point Estimate +/- Margin of Error
Statistical Inference
Methods for estimating/predicting and testing hypotheses about population
characteristics based on information contained in a sample.
Population
A population is collection of all elements of interest for a particular study
Parameter
parameter is a characteristic of a population
(e.g., such as the mean number of
customer service calls of all customers).
Sample
A sample is a representative subset of the population.
Statistic
A statistic is a characteristic of a sample (e.g., mean number of customer
service calls of the 5000 customers in the sample (1.563)).
Sample Proportion
The sample proportion p, is the statistic used to measure the unknown value of the population proportion p.
Point estimation
Use of a single known value of a statistic to estimate the associated population
parameter.
p-value
probability of observing a sample statistic at least as extreme as the statistic actually observed,
if we assume that H0= is true.
1-sample t-test
H0: μ = μ0
Can be used for a numerical variable
the test statistics is t from t distribution
Test for Proportion
H0: π = π0
Can be used for a categorical variable
the test statistics is Z from standard normal distribution