Inferential stat Flashcards
Normal distribution:
- Mean = median = mode
- Mean pushes the graph to the left or right
Lower standard deviation: taller peak, thinner tailes
- Mean pushes the graph to the left or right
Standard normal distribution:
Også kendt som Z-fordelingen
Mean = 0
Standard deviation = 1
To standardize any normal distribution:
Z = (x - mean) / standard deviation
Central limit theorem:
No matter the underlying distribution, the sampling distribution approximates a normal. For CLT to apply, we need at least 30 observations.
Standard error:
Measures the accuracy with which a sample distribution represents a population by using standard deviation. (Bruges da der er usikkerhed forbundet med stikprøven.)
Standard-fejlen har en præcis værdi, der afhænger af andelen i populationen. I praksis kender vi ikke p (populationsandelen). Så i stedet estimerer vi standard-fejlen på baggrund af den estimerede stikprøveandel p^
Alpha and confidence level:
Confidence level = 1 - alpha
For a confidence level of 95%, alpha is 5%. This means that there is a 2,5%(=alpha/2) that the mean is lower than the lower bound and 2,5% it is higher than the higher bound.
T-statistic:
- Works for smaller sample sizes and when population’s standard deviation is unkown (so we use the sample standard deviation, hence the fatter tails i.e. more uncertainty).
- Follows a Student’s T-distribution: It looks like a normal distribution but have fatter tails.
- Have degrees of freedom. Usually we use n-1 for degrees of freedom. Ex.: For 20 observations, we use 19 degrees of freedom. (However if we have two variables: n-2 etc.)
- The degrees of freedom and alpha determines the t-score.
After 30 degrees of freedom, the t-table is almost identical to the Z-table.
Margin of error:
- is the formula for CI. So, the confidence interval for the mean is the mean +- the margin of error. Therefore, a smaller margin of error is a narrower CI.
To get a smaller margin of error: reduce Z or t statistic or standard deviation. Increase sample size.
Hvornår accepterer man og forkaster baseret på Z-score?
If the calculated Z-score is between -1,96 and 1,96 we accept the null hypothesis (for a significane level of 0,05 and two-side test). If the Z-score is in the rejection region, we reject.
For a one sided test, the Z-score will have to be lower than -1,65 before we can reject the null hypothesis. If the null hypothesis is: mean > 300, the rejection region will be on the left hand side of the Z-distribution.
When testing a hypothesis, you compare the Z-score (calculated value from sample) with the critical value z.
P-value:
The smallest level of significane at which we can still reject the null hypothesis.
Reject null hypothesis if p
Forskellen mellem formlen for Z- og t-statistik
The only difference in the formula for CI for Z-statistics and t-statistics is that we use t-score and sample standard deviation instead of Z-score and population standard deviation.
Hvilke tests eksisterer og hvad bruges de til?
Alle forklarende variable er kategori.
Z-test: To andele, responsvariabel: kategori
T-test: To middelværdier, kvantitativ
Chi^2-test: To eller flere andele, kategori
ANOVA-test: To eller flere middelværdier, kvantitativ
Hvad er type 1 og type 2 fejl?
Type1-fejl: En sand nulhypotese forkastes
α er lig med risikoen for et begå en type1-fejl
P (Type1-fejl) = α
Type2-fejl: En forkert nulhypotese accepteres
Risikoen for at begå en type2-fejl kaldes beta
P (Type2-fejl) = Beta
En reduktion i risikoen for den ene fejltype gør at risikoen for den anden øges. Eneste mulighed for at sænke begge typer risici er at inddrage mere information, altså en større stikprøve.
5 trin i hypotesetest
Antagelser
Hypotese: Nulhypotese (Ho) og alternativ hypotese Ha
Teststørrelse
t-statistik (t-score)
z-statistik (z-score)
P-værdi
Sandsynligheden for at nulhypotesen kan forkastes
Konklusion
Hypotese og P-værdi sammenfattes til en konklusion som kan formidles til ikke-statistikere
Styrke for hypotesetesten
Styrke = 1–P(Type2-fejl error) = sandsynligheden for at forkaste nulhypotesen, når den er falsk
Des højere styrke des bedre test
I praksis er det ideelt, hvis hypotesetesten har høj styrke og relativ lille α (signifikansniveau)