Midterm revision 3 Flashcards
What is data analysis?
Non inferential description of an empirical distribution
What is an empirical distribution?
A set of scores on a variable or set of variables, where one score is for one variable or set of variables
Which has scientific priority - data analysis or inferential stats?
data analysis
What are quantitative representations?
Succinct and accurate descriptions of an empirical distribution’s: location, dispersion and shape
ARE NOT: references to normality, hypothesis speculations, claims regarding population
What is location?
In a metaphorical sense, it is where the empirical distribution sits on the X axis.
Includes the mean, median and mode
What is the median?
The point at/below which 50% of the scores lie
What is the mode?
The number of peaks
poor information
What is dispersion?
how spread out the scores are
Includes: variance, SD, and range
What is variance?
Average squared distance that each score is from the mean.
Natural dispersion counterpart of the mean
Has to be non-negative, not bounded above, bounded below
What is standard deviation?
Square root of variance
What is shape?
Shape of the empirical distribution
Includes skewness and kurtosis
What is skewness?
Captures degree of symmetry (NOT normality)
Symmetrical: 0
Positively skewed: hill falls down to positive end
Negatively skewed: hill falls down to negative end
What is kurtosis?
Captures the degree of peakedness or flatness
What are transformations?
Involve taking a function of a variable
What are the reasons for transforming?
- To re-express distribution for taste, preference, and convenience (e.g., transforming to a proportion for comparison across different scales)
- To bring a distribution into alignment with a theoretical distribution (involves nonlinear transformations that change the shape of a distribution, e.g., normality)
What are the outcomes of linear transformations?
Linear transformations do not change the shape, but change the location and dispersion in predictable ways
What happens to the mean when a constant is added/deducted to/from it?
+/- by the same
What happens to the mean when a constant is multiplied/divided to all scores?
x/div by the same
What happens to the SD when a constant is added/deducted to/from it?
It stays the same
What happens to the SD when a constant is x/div to all scores?
x/div by the same
What is the outcome of a nonlinear transformation?
It will alter the shape in unpredictable ways
What is a sampling distribution?
A frequency distribution of a statistic (repeated with an infinite number of samples)
*theoretical
What is standard error?
Standard deviation of a sampling distribution
*theoretical
What are confidence intervals?
Upper and lower bounds for the population. Shows where the data from the actual population is likely to fall.
e.g., 95% confidence interval = 95% all samples will contain that value; 95% chance that a sample from the population will have that obtained value.
What are the 10 steps in hypothesis testing?
- Data analysis
- Specify hypothesis pair
- Specify population distribution
- Deduce assumptions
- Formulate if/then link
- Type 1 Error control
- Type 2 Error control
- Decision Rule
- Employ a procedure to make a decision about hypothesis pair
- If H0 rejected, estimate effect size
What is the purpose of inferential testing?
To render a decision as to which of the two (H0 or H1) is in fact that state of nature at the moment the procedure is employed. i.e., to make a correct binary decision regarding the extant state of nature.
What are parametric approaches?
Begin with modeling the distribution of X in P by a theoretical density function
Makes an assertion regarding the parent population distribution
Makes assumptions regarding the parent population distribution
What are the advantages of parametric approaches?
you know what is going on regarding the distribution, errors, powers, etc.
What are the disadvantages of parametric approaches?
They invoke artificiality
What are non-parametric approaches?
No claims made regarding X in P
What are the disadvantages to using parametric approaches?
- Distributional results are shaky
2. B/c no structure - unclear about what is being tested: means? variance? shape?
Do parametric vs non-parametric have greater strength of inference?
Parametric have greater strength of inference
What is an assumption?
An assumption is any potentially false assertion made about the hypothesis pair, save for H0 and H1. An assumption is a side condition, asserted solely for the purpose of deriving a statistical method for testing a hypothesis.
What is another term for an assumption?
“nuisance conditions”
What are the typical parametric assumptions?
Normality and homogeneity of variance
What is an IF-THEN link?
The test statistic component of inferential machinery. Establishes what to expect at the level of the sample when H0 happens to be true.
What is the structure of an IF-THEN link?
Antecedent: If H0 is true and assumptions are met
Consequent: THEN [Test stat] will pile up as [insert null distribution]
What is a test statistic?
A statistical test employed to make a decision regarding the hypothesis pair
What is the null distribution?
The complete set of values of the test statistic, over an infinity of samples, under the condition that H0 is true.
A sampling distribution of the statistic observed under the condition that H0 is, in fact, the extant state of nature
i.e., if H0 was true, and you calculated statobs for an infinite number of samples, the plotted those statistics observed, you would get the null distribution
What is statobs?
The value obtained from the test statistic from one sample
Type 1 error
The probability of rejecting H0 when H0 true
Type 2 error
The probability of retaining H0 when H0 false
What is Beta?
Type 2 error
What is a sound inferential procedure?
One which yields correct decisions with high probability. We determine whether a procedure is sound by checking whether: 1) assumptions are reasonable; 2) probability of type 1 error has been made small; 3) probability of type 2 error has been made small
What are the two forms of decision rules?
Probability-probability
Point-point
What is the point-point decision rule?
Compares statobs with a critical value
If crit1 < statobs < crit2, then reject H0, else retain
What is a critical value?
A value used to partition the X axis into sections (acceptance and rejection regions)
How do you choose critical values?
The basis of choosing critical values, and thence forming a decision rule, rests on considerations of type 1 error control, and a sound inferential procedure
How do you control Type 1 error?
Choice of critical values - this is done by fixing alpha to a low value.
What is the probability-probability rule?
Compare Pobs to alpha
If Pobs is greater than or equal to alpha, reject H0, else retain
What is Pobs?
The probability of drawing, from the null distribution, a value as or more extreme than statobs
It is a nonlinear tranformation of statobs
Smaller Pobs, less in keeping with H0
Larger Pobs, more in keeping with H0
How do you control for Type II error?
- Change alpha
2. Increase sample size
How can H0 be false?
There are an infinite number of ways that H0 can be false.
For each and every possible departure from H0, there is an associated probability of a type II error
What is phiprime*?
To consider the control of type II error, we require an effect size: phiprime*
phiprime* quantifies possible degrees of departure from H0
Possible degrees by which H0 may be incorrect
What is power?
The probability that the procedure will detect a departure from H0 of phiprime*, should such a departure exist when the procedure is employed
What is the aim of Type II error control?
Maximizing power for each phiprime*, the detection of which is of interest
Describe power analysis
- Nominate one or more phiprime* that (should any of these be the state of nature when the procedure is employed) you are interested in detecting
- Calculate power for each
- Forward looking: consider in advance of employing a test procedure, how the procedures would do in a probabilistic sense, if nature happens to “serve up”, when the procedure is employed, a departure of phiprime*
How do you decide on a phi’* of interest?
- Knowledge of your research area
2. Cohen’s guidelines
What are the two power “games”?
- calculation of power for a fixed sample size, alpha and phi’*, the detection of which is of interest to us
- Calculation of sample size to yield desired power(phi’), for fixed alpha and phi’, the detection of which is of interest
What does power game one involve?
calculation of power for a fixed sample size, alpha and phi’*, the detection of which is of interest to us
- Use Keisan to evaluate relevant area, power’*, under the relevant alternative distribution
What is the issue with power game 1?
calculation of power for a fixed sample size, alpha and phi’*, the detection of which is of interest to us
Does not ensure the procedure is sound
Why would we use power game 1?
If our sample size is fixed or we are calculating for another researcher’s analysis
What does power game 2 involve?
Working backwards
How does the null distribution change when you have a directional hypothesis pair?
There is an infinity of values associated with the null state of nature
How do you control type II error with a directional hypothesis?
Because the alternative distribution shifts to the right, use the positive critical value when calculating power
How do you fix alpha for a directional hypothesis?
Use the boundary null - the last point in keeping with H0
What is the bivariate distribution of Y and X?
Shows how Y is dependent on X. i.e., how Y varies as a function of X
Represents an infinite number of individuals (p) in the population, each p has an X value and a Y value
What is the conditional distribution of Y given X?
The distribution of Y conditional on X given that X takes on a particular score of X.
A population of Y scores (all Y values with some particular X value)
of conditional distributions depends on the # of particular values
What is omega_sq?
Portion of the variance explained out of total variance
What are the possibilities with respect to strength for a dichotomous IV and continuous DV?
Linearity not defined, and no sense talking about steepness. Therefore we cannot talk about primary and secondary strength separately.
Thus, we blend primary and secondary strength with omega_sq
What are the focal factors?
The factors of interest