02_Basic statistic characteristics Flashcards
What is descriptive, inductive statistics and hypothesis testing?
descriptive: distinction of different scale levels and understanding of respective analysis constraints + calculation of different measures
Inductive: Concept of sampling error, fundamental characteristics of theoretical distributions, estimating and testing
hypothesis testing: univariate and bivariate parametric and non-parametric statistical tests
What are the two things needed to analyze a relationship?
operationalization: represents the development of scales for measuring characteristic values of a particular concept/variable
Scale of measurement: defines the mathematical characteristics of a scale and thereby of the data to be gathered
what are the 4 measurement scales?
- Nominal scale: Assignment of objects to categories (Sex: male, female)
- Ordinal scale: Ranking (counting and ordering,
- Interval scale: constant units –>inferences about distance (no natural zero point)
- Ratio scale: constant units, fixed and multiplications possible
What is the likert scale?
Likert scale: ordinal scale with mostly 5 to 7 scale points
(from Fully disagree to fully agree)
What is a quasi-metric ordinal scale=
Ordinal scale with the assumption: equal distances between scale points, –>treated just like Interrval scale (5 to 7 scale points, so that measures such a mean and variance are meaningful
What is a percentile?
Percentiles are generalizations of the median: observations are arranged according to their size and a percentile divides them in two groups.
–>The pth percentile: value such that p percent of the observations fall at or below
What is the mode?
= the value that *most frequently occurs in a data set
Interpretation of standard deviation
Standard deviation measures the amount of variation
low value: data points close to mean, mean is informative
high value: data points further away from the mean, not informative
What is the intuition behind the “coefficent of variation?
coefficient of variation:is a measure that expresses the relative variability of a set of data points compared to their mean (average)
–>independent of scale of the data, thus makes comparision between two variable on different scals possible
When comparing CVs, asmaller value implies greater consistency relative to the mean, while a larger value implies greater variability
What is the meaning of Skewness?
Measures for the symmetry of a distribution
–>symmetric Skewness=0
”< 0 –>left skewed
>0 –>right-skewed
What is the sampling error?
(Why do we have one)
By taking samples from a population, we have uncertainity because there are different samples possible
Sampling error: provideds information about the standard deviation of a variable when drawing several sample of the size n
Standard error= 0.17 –>If several samples would be drawn, the standard deviation of their mean/variable x would be 0.17
Confidence intervall interpretation
CI (95%) = (1.6, 2.26)
Confidence intervall: (1-alpha) probability that the true parameter lies within the confidence interval
Ex: average CS is between 1.6 and 2.26 in 95% of repeated samples
What is the margin of error?
Margin of error= 1/2 of the confidence intervall
What does the confidence intervall depent on?
- significance level (1-alpha): larger –>decreases the CI
-
Sample size:
- Larger –>lower standard error–>decreasing CI
- Smaller –>higher standard error–>increasing CI
3.Standard deviation:
What is the intuition behind the H0 and H1 hypothesis?
H0: observered result, completly explained by standard error (chance)
H1: accounting for standard error, the results are still significant
What are the two error types?
Alpha-error (type 1): Reject H(0) even though H0 is true
Beta-error: (type 2): Don´t rejects H0, but H0 is not true –>Statistical power
cal pow
What is the statistical power?
The statistical power (1-beta) of a significance test is the long-term probability, given the population effect size, significance criterion and sample size of rejection of H0.
–>Statistical Power: The probability that a statistical test will correctly reject a false null hypothesis, or in other words, the probability of detecting a true effec
What are the three drivers of statistical power?
drivers of statistical power:
- Effect size expressed in alternative hypothesis: stronger effects are easier to detect
- Chosen significance level: decrease in alpha error decreases statistical power (1-beta)
- Sample size: larger n increaases power of the test (1-beta)
Possilble outcomes: Test and interpretion –>No rejection of H0 (related to statistical power)
No rejection of H0
high statistical power:
- Evidence for H0
- Refutation of the substantial testing hypothesis (H1)
–>the study successfully avoids making a Type I error (false positive) by not incorrectly rejecting a true null hypothesis
low statistical power:
- Inconclusive status, neither support for H1 nor for H0
- Danger of seemingly contracdictory research findings –>type 2 error
Possible outcomes: Test and Interpretation: rejection of H0 (high/low statistical power)
Rejection of H0:
high statistical power:
- Danger that very small effects will be statistically significant
- Practical relevance of the finndings need to be established
Low statistical power:
- support for H1
What does the MAD measure?
The MAD (Mean Absolute Deviation) = average of the absolute deviation from a measure of certainty (mean or median)
MAD from the mean is never smaller than the mean absolute deviation from the median
What are the 3 assumptions of the t-test for two populations means?
T- test for two population means
Assumptions:
- independent samples
- both variables are normally distributed in the population
- the variance of observed variables are equal
When can we use the paired t-test?
When data is not independent —> two related groups or repeated measuring
cannot be used with aggregated data
For the chi square test of goodness of fit, what is the prerequisite and the ho and h1
—> are preferences of males equally distributed across all 3 functions ( one sample)
For every category i = n*p >= 5 if not
— merge adjacent categories
— ignore corresponding category ( if merge not possible)