L7 - Introduction to inferential statistics Flashcards
Measure of central tendency (location)
Mean: average value
Median: exact middle value
Mode: most frequently value
Measure of dispersion (variability / spread)
Range and standard deviation
Range
- The spread of data - distance between the min and max values of the variable.
- Can use to describe the variability of open-ended questions (Respondents define range by their answers).
Standard deviation
Describes the average distance of the distribution values from the mean.
> indicate the usefulness of the mean as typical value.
Role of Descriptive analysis
+ Provide summary measures of typical or average values
+ Present data in a digestible format
+ Provide preliminary insights about the distribution of values for each variable
+ Help detect errors in the coding process
Population (Malhora, 2010)
the complete set of individuals or objects of interest
Sample (Malhora, 2010)
a subset of population from which information is gathered
Parameter (Malhora, 2010)
- true value of a variable
- fixed values referring to the population and are unknown
> It is the same from sample to sample
Sample statistic (Malhora, 2010)
- value of a variable that is estimated from a sample.
- it is hoped to be close to parameter of the population of which the sample is a subset.
Point estimate (Malhora, 2010)
a single value that is obtained from sample data and is used as the best guess of the corresponding population parameter
> It differs from sample to sample
Confidence interval
a range into which the true population parameter will fall, assuming a given level of confidence.
CI = sample statistic +- k * standard error
Standard error parameter (k)
value of desired standard errors for the estimate (ex: k = 1.96 for a 95% CI)
Hypothesis (Hair, 2017)
an unproven supposition that tentatively explains certain facts or phenomena. It is developed prior to data collection.
> Test are designed to disprove null hypothesis.
Null hypothesis
If null hypothesis is accepted, we do not have to change the status quo. If cannot rejecting, conclude that it may be true.
Steps in hypothesis testing (slide)
1) Formulate the hypothesis
2) Decide on test, test statistic
3) Select a significance level
4) Statistical decision (reject or not reject)
5) Conclusion
Test the hypothesis based on 4 factors:
- Type of hypothesis
- Number of variables
- Scale of measurement
- Distribution assumptions
Three types of hypothesis
- Specific population characteristics
- Contrasts / Comparisons
- Associations / Relationships
2 types of Distribution assumptions
Parametric (interval scale, normal bell-shaped distribution) and Nonparametric (nominal and ordinal scale) types of statistic.
Type of scale use what Appropriate statistic: measure of location, spread and statistics technique
- Nominal: mode, none, Chi-square
- Ordinal: median, percentile, Chi-square
- Interval: mean, standard deviation, t-test and ANOVA
Comparing means with Independent vs. Related samples
- Means are from independent samples: (ex: coffee drink of female and male)
- Means are from related samples: (ex: coffee drink and milk tea drink of female) Since the sample is the same, it is called a paired sample.
Test statistic
- serves as a decision maker, since the decision to accept or reject Ho depends on its magnitude (how close the sample comes to the Ho)
- an univariate hypothesis test using the t distribution, which is used when the standard deviation is unknown and the sample size is small.
Frequency distribution (Malhora, 2013; Hair, 2017)
- a mathematical distribution whose objective is to obtain a count of the number of responses associated with different values of one variable and to express these counts in percentage terms.
- descriptive statistics are used to accomplish this task.
Role of frequency distribution (Malhotra, 2013)
- Determine the extent of item nonresponse.
- Indicate the extent of illegitimate responses.
- Detect outlier cases with extreme value.
- Indicate the shape of empirical distribution of the variable. By constructing a histogram, we can examine whether the observed distribution is consistent with the assumed distribution.
One-tailed and two-tailed test differences
- It is a one-tailed test because the alternative hypothesis is expressed directionally (<= or >).
- It is a two-tailed test where the alternative hypothesis is not expressed directionally.
Type I error
sample result as rejecting null hypothesis when in fact it is true.
> Significance level: the probability of making Type I error. ( α = 0.05 )
Type II error
sample result as non-rejecting null hypothesis when in fact it is false.
Power of a test ( 1 - β )
the probability of rejecting null hypothesis when it is in fact false and should be rejected.
p value
the probability of observing a value of the t-test as extreme as the value actually observed, assuming that the null hypothesis is true. ( = α )
Reject Ho when:
- l t-test l > l critical value l
- or Probability of t-test < significance level ( α )
Coefficient of variation (CV)
- The ratio of the standard deviation to the mean (%).
- It shows the variability in relation to mean of the population.
Statistics associated with frequency distribution
Measures of location, Measures of dispersion, Measures of shape
Measures of shape
The shape is assessed by examining skewness and kurtosis.
Skewness (Malhotra, 2013)
- Assess the distribution’s symmetry about the mean (mode = mean = median).
=> Skewness - the tendency of the deviations from the mean to be larger in one direction than in the other.
Kurtosis (Malhotra, 2013)
- A measure of the relative peakedness or flatness of the curve defined by the frequency distribution.
- Normal distribution = 0. More peaked >0. Flatter <0.
Calculate t-test
= (sample statistic - hypothesized parameter value) / standard error of the statistic
When to use F-test (Malhora, 2010)
In two independent samples test: Using F test as the statistical test of the equality of the variances of two populations.
t-distribution
- It is similar to the normal distribution in appearance, but it has more area in the tails and less in the center.
- An increase in number of df > 2 similar distributions.