Statistical Concepts Flashcards
Orderable versus nonorderable
Nonorderable variable: A discrete measure in which the sequence of categories cannot be meaningfully ordered
Orderable discrete variable: A discrete measure that can be meaningfully arranged into an ascending or descending sequence
Discrete versus continuous
Discrete variable: A variable that classifies persons, objects, or events according to the kind or quality of their attribute
Dichotomous variable: A discrete measure with two categories that may or may not be ordered (two MECE categories like “male” and “female”)
Continuous variable: A variable that, in theory, can take on all possible numerical values in a given interval (slightly counterintuitive examples: age, years at school, number of children born, occupational prestige, annual earned income)
Dichotomous variable
Discrete variables with directionality, as it an always be coded as 0 or 1. Example: Disability status.
Nominal variable
Discrete variables with no clear ordering. Example: Nationality.
Ordinal variable
Discrete variables that have a clear ordering to them. Example: Satisfaction.
Interval variable
Continuous variables where the distance between levels is equal. Yet they do not have meaningful zero points; for example, 80 degrees farenheit is not twice 40 degrees farenheit.
Ratio variable
Continuous variables where the distance between levels is equal and they DO have meaningful zero points; for example, 25 pounds is 5 times as heavy as 5 pounds.
Manifest variable
A variable that can be observed (opposite of latent)
Latent variable
A variable that cannot be observed and can only be measured indirectly (e.g., degree of centralization in decision making; inclusivity of company culture); opposite of manifest
Status variable
A variable whose outcome cannot be manipulated, like race or gender (often treated as independent variables nevertheless)
Predictor variable
A variable that has an antecedent or potentially causal role, usually appearing first in hypotheses
Outcome variable
A variable that has an affected role in relation to the predictor variable; in other words, the values taken on by the outcome variable depend on the predictor variable
Recoding
The process of changing the codes established for a variable; this could either be grouping values together, or winnowing down the number of groups.
Recoding becomes important when considering statistical tests; 80% of categories should have 5 or more observations, 100% should have at least 1 observation.
Inferential Statistics
Allows for generalizations or conclusions to be made about population parameters based on random sample parameters.
Needs to be random sample - every member of population has an equal, non-zero chance of being exposed to the research.
Null Hypothesis Significance Testing (NHST)
Judges whether variables are related in a population by testing the hypothesis that they are unrelated based on what we see in our sample. That is, we use inference to test the null hypothesis
Test the null hypothesis, not the alternative hypothesis
We can reject or fail to reject (in which case we “tentatively accept: the alternative hypothesis)
Need a strong theoretical basis in your proposal for an alternative hypothesis - this doesn’t solve all of your questions
If probability is small - we can tentatively decide the observed relationship is characteristic of the “true” population
Why random sampling matters?
Researchers can often only study a sample, because studying the whole population would be too di cult, too expensive and too time- consuming. However, in order to do so, the sample has to be representative of the population. The only way to ensure representativeness is to draw a random sample, so that every unit of the population has an equal chance of being included in the sample – otherwise no gen- eralisations to the population are not possible, as one might have selected a biased sample that does not well represent the underlying population.
Measures of central tendency
In statistics, the term central tendency relates to the way in which quantitative data tend to cluster around some value. A measure of central tendency is any of a number of ways of speci- fying this “central value”. Measures of central tendency include the mode, median and mean.
Mean
The mean gives information about the average value of a certain variable is, like e.g. the number of hours spent online. For skewed data, the mean might not be the most appropriate statistic as it is influenced by extreme values or outliers.
Median
With the median, researchers can describe the value for which below and above are 50% of the population. This is especially important for skewed distributions like income, as then the mean is not the best measure of central tendency, as it is heavily influenced by extreme values and outliers.
Measures of Dispersion
Measures of dispersion are important for describing the spread of the data, or its variation around a central value. The most important measures of dispersion are the variance and its positive square root, the standard deviation.
Standard Deviation
The variance has the unit of measure which is
squared, which in most cases makes an intuitive interpretation impossible. To restore the original measurement intervals, the positive square root of the variance is taken.
Positive skew
Right skew. The tail of a skewed distribution is to the right of the median (mean greater than median)
Negative skew
Left skew. The tail of a skewed distribution is to the left of the median (median greater than mean)
Why does skewness matter?
The information about skewness is very im- portant in social statistics, as most distributions of social factors are positively skewed: E.g. for income there are many people in the lower classes, but fewer in the upper classes.
Moreover, information about skewness is important, as then certain statistics might not be as meaningful anymore. For example, the mean is not the best statistic to describe skewed distributions, as it is highly influenced by extreme values and might therefore not represent a very typical value of the distribution.
Interpreting skewness statistics
If skewness is less than -1 or greater than 1, the distribution is highly skewed.
If skewness is between -1 and -0.5 or between 0.5 and 1, the distribution is moderately skewed.
If skewness is between -0.5 and 0.5, the distribution is approximately symmetric.
Percentiles
Percentiles are very useful in social statistics in order to describe the value for which x% of values lie below and 100%-x% lie above. With percentiles distributions can be compared and information relative to other values can be given: For example, grades are often not given in absolute terms, but as percentiles, so that the 99th percentile means that 99% of the class scored worse.
Measures of Association
Statistics that show the direction and/or magnitude of a relationship between pairs of dis- crete variables. Usually 5 categories are assessed by measures of association: Existence, significance, strength, direction and pattern.
Many measures of assocation use the PRE interpretation concept: PRE (Proportionate re- duction in error) statistics reflect how well the knowledge of one variable improves prediction of the second variable.
Statistical Significance
A test of inference that conclusions based on a sample of observations also hold true for the population from which the sample was selected.
The null hypothesis always states that differences in percentages are due to chance, and the alternative hypothesis states that observed differences are too large to be explained by chance alone, therefore variables are associated.
Chi-Square Test
A test of statistical significance based on a comparison of the observed cell frequencies of a joint contingency table with frequencies that would be expected under the null hypothesis of no relationship.
MAIN difference from T-Test is that its focusing on the association more broadly (with categorical variables), versus T-Test is about means (continuous or ordinal outcome variables). Both about significance.
Existence
if there is any difference in the percentage distributions, an association exists.
Significance
The observed differences are too large to be explained by chance alone, and are thus statistically significant.
Strength
effect size (what is the size of the effect of the IV on the DV?). Not to be confused with statistical significance.
Direction
positive or negative; only applies to ordinal variables. Determined by sort order of pairs.
Pattern
for ordinal variables, linear is simplest. For nominal variables, interpret category by category.
Sample Chi-Square Statistic Interpretation
Null hypothesis: there is no relationship between internet usage and TV consumption.
Alternative hypothesis: there is a relationship between internet usage and TV consumption.
According to the output, we returned a chi-square statistic of 193.21 at df=2 and p
PRE statistic Interpretation
If, for example, our measure of association has a value of 0.465, we are able to say: “By using the independent variable [income], we reduced errors in prediction of the dependent variable [self-reported ability to use the internet] by 46.5%.”
PRE statistic Interpretation
If, for example, our measure of association has a value of 0.465, we are able to say: “By using the independent variable [income], we reduced errors in prediction of the dependent variable [self-reported ability to use the internet] by 46.5%.”
Examples include Lambda, Gamma
Ex post facto hypothesis
A hypothesis created after confirming data have already been collected.
Test variable
A test variable allows us to examine a relationship between two variables X and Y . By this, a researcher can determine whether the relationship between two variables is actually replication, explanation, interpretation, and specification. Moreover, in some cases test variables even act as suppressor variable: When using the test variable, a certain association in the data becomes visible that was not there before.
Replication - test variable has no effect
Spuriousness - when a test variable causes false appearance of a relationship between two variables
Suppressor - when a test variable disguises a true relationship between two variables.
Intervening - when a test variable is the mediating factor through with one variable has an effect on another
Specification - when test variable specifies conditions when one variable has an effect on another
Partial Association
The association between 2 variables when we hold the effects of a test variable constant.
Statistical Interaction
The association between 2 variables differs across the levels of a test variable.
Normal Distribution
A smooth, bell-shaped theoretical probability distribution for continuous variables that can be generated from a formula.
The normal distribution is important for social statistics, as it has important mathematical properties or “empirical rules”:
• 68% fall within 1 SD of the mean.
• 95% fall within 1.96 SD of the mean.
• 99% fall within 3 SD of the mean.
Why does checking for normal distribution matter?
According to the Central Limit theorem, for samples of the same size N, the mean of all sample means equals the mean of the population from which these samples were randomly drawn, and the resulting distribution of means looks like a normal distribution.
This makes the normal distribution important for every kind of distribution, as based on the mathematical properties, confidence intervals can be calculated about how sure one can be that the sample parameter is a true reflection of the population parameter: For example, for a 95% confidence interval, the standard error is added and substracted 1.96 times to get this interval.
T-Test
A test used to determine whether there is a significant difference between the mean of two groups.
Assumption of normal distribution of the outcome variable.
Chi-Square Test
Tests for the strength and significance of the association between two categorical variables
Small Chi-Square statistic - there is an association
Large Chi-Square statistic - there is no association
Alpha
Alpha is the significance threshold. The lower the value of alpha (typically, we set it to 0.05, 0.01 or 0.001), the more stringent and rigorous our criteria is and the more that we can trust our result. If we set alpha to 0.05 we are saying that we only have grounds to reject the null hypothesis if the probability of observing a test statistic this extreme or more given that the null hypothesis is true is equal to or less than 1 in 20.
Then, compare the p-value with alpha and decide whether to reject or accept the null. Low p-values mean that the result is very unlikely and give us grounds to reject the null hypothesis.
What tests/steps can you take when you’re analyzing categorical PV and OV?
Frequency table
Chi-square
Measures of association
What tests/steps can you take when you’re analyzing a categorical PV and continuous OV?
T-Test
What tests/steps can you take when you’re analyzing a continuous PV and categorical OV?
Logistic regression
What tests/steps can you take when you’re analyzing a continuous PV and OV?
Linear regression
Repeated Measures (Paired) T Test
Tests for the difference between two variables from the same population (e.g., a pre- and posttest score)
Useful in within subjects designs
Independent Samples T Test
Tests for the difference between the same variable from different populations (e.g., comparing boys to girls)
Useful in between subjects experimental designs
ANOVA
Tests for the difference between group means after any other variance in the outcome variable is accounted for (e.g., controlling for sex, income, or age)
Simple Regression
Tests how a predictor variable can predict (or explain the variance in) the outcome variable
Multiple Regression
Tests how a combination of two or more predictor variables can predict (or explain variance in) the outcome variable
Pearson Correlation
Tests for the strength of a potential linear association between two continuous variables
Spearman Correlation
Tests for the strength of the association between two ordinal variables (does not rely on the assumption of normally distributed data)
Between Subjects ANOVA
One of the most common forms of an ANOVA is a between-subjects ANOVA.
This type of analysis is applied when examining for differences between independent groups on a continuous level variable.
Within this “branch” of ANOVA, there are one-way ANOVAs and factorial ANOVAs.
One Way ANOVA
A type of between subjects ANOVA
Used when assessing for differences in one continuous variable between ONE grouping variable.
For example, a one-way ANOVA would be appropriate if the goal of research is to assess for differences in job satisfaction levels between functional groups (e.g., engineering, PM). In this example, there is only one dependent variable (job satisfaction) and ONE independent variable (functional group).
Factorial ANOVA
A type of between subjects ANOVA
A general term applied when examining multiple independent variables.
For example, a factorial ANOVA would be appropriate if the goal of a study was to examine for differences in job satisfaction levels by functional groups and interpersonal skills. In this example, there is only one dependent variable (job satisfaction) and TWO independent variables (functional group and interpersonal skills).
Within Subjects ANOVA
A within-subjects ANOVA is appropriate when examining for differences in a continuous level variable over time. A within-subjects ANOVA is also called a repeated measures ANOVA.
This type of test is frequently used when using a pretest and posttest design, but is not limited to only two time periods. The repeated measures ANOVA can be used when examining for differences over two or more time periods.
For example, this analysis would be appropriate if the researcher seeks to explore for differences in initial product satisfaction, measured at three points in time (pretest, posttest, 2-month follow up).
ANCOVA
An analysis of covariance (ANCOVA) is appropriate when examining for differences in a continuous dependent variable between groups, while controlling for the effect of additional variables.
The “C” in ANCOVA denotes that a covariate is being inputted into the model, and this covariate examination can be applied to a between-subjects design, a within-subjects design, or a mixed-model design.
ANCOVAs are frequently used in experimental studies when the researcher wants to account for the effects of an antecedent (control) variable.
MANOVA
MANOVA: Finally, a multivariate analysis of variance (MANOVA) is an extension on the ANOVA, and is appropriate when examining for differences in multiple continuous level outcome variables between groups.
For example, a MANOVA would be applicable if assessing for differences between ethnicities in job satisfaction AND intrinsic motivation levels of participants. In this example, job satisfaction and intrinsic motivation are the continuous level outcome variables. The MANOVA can be conducted with multiple outcome variables, and can also include covariates (i.e., MANCOVA).
Assumptions Underlying Linear Regression
- Model is correct
- No specification error
- We have included all important variables, and excluded all non-relevant variables
- Theory should drive model selection
- Linear relationship correctly describes relationship
- Normal distribution of residuals
- No outliers present (and if so, treated)
- Freedom from collinearity
- Homoscedasticity (even distribution of residuals across all values of predictor variables)
Simple Linear Regression
It is the simplest form of regression. It is a technique in which the predictor variable is continuous in nature. The relationship between the predictor variable and outcome variables is assumed to be linear in nature.
R^2 0.70: Hence we can see that 70% of the variation in Fertility rate can be explained by the predictor variable.
Logistic Regression
In logistic regression, the dependent variable is binary in nature (having two categories). Independent variables can be continuous or binary. In multinomial logistic regression, you can have more than two categories in your dependent variable.
Suppose outcome variable is customer churn and predictor variable is gender. If the odds ratio is 3, then the odds of a female churning is 3 times greater than the odds of a male churning.
You calculate the odds for each gender and then divide.
Multiple Regression
Adjusted R^2
Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of an outcome based on the value of two or more predictor variables.
The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.
Null Hypothesis
A statistical hypothesis that one usually expects to reject. Symbolised H0.
The null hypothesis is usually about randomness in the data, so the question of inference is: What is the probability that the relationship found in the sample data could have come from a population in which there is no relationship between the two variables? In this way, association is never “proven” to be true, it the opposite is “disproven.
Type 1 Error
A statistical decision error that occurs when a true null hypothesis is rejected
Its probability is alpha
Ways to mitigate risk:
– Increasing sample size, thus reducing sampling error
– repeat study using another independent sample
Type 2 Error
A statistical decision error that occurs when a false null hypothesis is not rejected.
Its probability is beta.
Moderation
Both mediation and moderation have to do with checking on how a third variable fits into that relationship.
Moderation is a way to check whether that third variable influences the strength or direction of the relationship between an independent and dependent variable. An easy way to remember this is that the moderator variable might change the strength of a relationship from strong to moderate, to nothing at all. It is almost like a turn dial on the relationship; as you change values of the moderator, a statistical relationship that you observed before might dissolve away.
Mediation
Both mediation and moderation have to do with checking on how a third variable fits into that relationship.
A mediator mediates the relationship between the independent and dependent variables – explaining the reason for such a relationship to exist. Another way to think about a mediator variable is that it carries an effect.
An obvious real-life mediator is temperature on a stove. Water will not start to boil until you have turned on your stove, but it is not the stove knob that causes the water to boil, it is the heat that results from turning that knob.
Precision
True Positive / Actual Results (True Positives + True Negatives)
Recall
True Positive/ Predicted Results (True Positives + False Negatives)
Accuracy
True Positive + True Negative / Total
Beta
The beta level (often simply called beta) is the probability of making a Type II error (accepting the null hypothesis when the null hypothesis is false). It is directly related to power, the probability of rejecting the null hypothesis when the null hypothesis is false. Power plus beta always equals 1.0.
Treating Outliers
Outliers are numerically distant from the rest of the dataset, and can distort apparent relationships between variables.
- Remove them if you think they may be error
- Change outliers to 3 standard deviations from mean (no clear reason to exclude, could be reasonable data)
- Transform variable
- Leave as is, though report them
Statistical Significance
How likely is it that a change is due to chance?
Effect size
Is a change important or meaningful
Why can no change in mean satisfaction be meaningful?
- We didn’t measure changes granularly enough (small update on large feature)?
- Hypothesis was wrong (change didn’t effect outcome)
- Polarization may have increased or decreased
- Changes within specific segments, but not overall
Why is cross-product or within-product metric comparison invalid?
Even with consistent research protocol, comparison may be invalid due to:
Differences between
- Sampling methodology
- Types of users using each product
- Users expectations
- Usage behaviors
- Users experience with similar products
Why doesn’t correlation prove causation?
Need to prove (only 1 of 3 often feasible without experimental procedure)
- Covaration (correlation)
- Directioanlity (one factor is antecedent to or precedes the other)
- Control of all extraneous variables
Reliability
Are your measurements consistent?
Validity
Are you measuring what you intend to measure?
How to determine sample size (3 cases)
- Uncovering universe of problems
- Estimating a population parameter
- Comparing between groups
Uncovering universe of issues/requirements
- How granular / low incidence are the issues we want to surface?
- If you conduct with 5 participants, you’ll surface issues that affect roughly ~30% of users
- If you conduct with 9 participants, you’ll surface issues that affect roughly ~20% of users
Estimating a population parameter
For a survey
- Of all important variables/parameters, identify the 1 that has the greatest variability (usually will be one that’s binary)
- Determine the confidence level we want for that variable (e.g., 90% or 95%)
- Refer to table to determine the MOE that will be allowed by different sample sizes
- If you conduct with 400 respondents, you’ll have a 5% MOE at 95% CL (so say the sample parameter is 80% - that means that in 95% of repeat samples the parameter will be between 75% and 85%)
For a tree test
- If we have ~40 people complete, we’ll have a 13% MOE at 90% CL
For a summative usability test
- If we have 30 people complete, we’ll have a ~17% MOE at 90% CL
Making comparisons
- Of all important variables/parameters, identify the 1 that has the greatest variability (usually will be one that’s binary)
- What size difference do you expect to or want to be able to detect? E.g., 5% difference in response between Android and iPhone users
- Identify if your study is between or within subjects; if you’re looking at separate populations, it will be between. Within subjects require less sample.
- What required statistical power? In other words, how granular of a difference do you want to ability to detect, if one exists? Usually set to 80%, but 90% within industry research.
Comparing over time (usability benchmark)
- Within subjects design (30 -> 30% difference detectable at 80% statistical power and 90% confidence level).
Comparing between groups (survey with segments)
- Between subjects design (2,500 -> 10% difference detectable at 80% statistical power and 90% confidence level).