Descriptive Statistics, Inferential Statistics & Statistical Power Flashcards
What are the three measures of central tendency that describe the middle values of a data set?
Mode
▪ The most frequently occurring value
Median
▪ The middle score (also called the 50th percentile) of all the data once they are ordered
- The median only reflects the number of scores in the data set, is not dependent upon the magnitudes of the values)
Mean
▪ Arithmetic average of all data points
- Reflects both the number of scores and the values of all scores
What are the three measures of central tendency that describe the spread or dispersion of the data
Range & Inter-quartile Range (IQR)
Range: The spread between the highest and lowest values of data
IQR: The spread of between the 75th percentile and the 25th percentile scores
only consider two points to determine variability
Variance (variability, hoew close the data is ) (V 𝑜𝑟 𝑠2 𝑜𝑟 σ2)
Variance uses squared difference (𝑑) of each score (𝑋 ) from the
mean (𝑋) to estimate the spread of the data
Standard Deviation (𝑠 𝑜𝑟 σ)
The square root of the variance (𝑉)
The distance from the actual mean
What is the difference from Population vs. Standard deviation
Samples rarely catch extreme values that are present in a population = the standard deviation can be an underestimate.
Therefore, when dealing with sample data you subtract 1 from your sample size to derive the degrees of freedom
Degrees of freedom = Number of independent pieces of information that go into the estimate of a parameter
What is statistical significance?
Statistical significance implies that we are reasonably confident that an effect in our sample is big enough compared to any error in our estimate that it reflects the true state of the population
True or False: The bigger my sample size the more confident I am that even a small difference between samples will be present in the population.
True
What is common about all of these statistics used to make different inferences about a population?
All represent a ratio (or value scaled as a function of) variance that is explained (effect) vs. sampling error (i.e. effect that could have occurred due to chance)
What is the statistical significance and effect size?
▪ Statistical significance – whether something is likely true about the population
▪ Effect Size – The magnitude of the difference in some standardized units (in other words, how many z-scores is the experimental group mean from the control group mean)
As your degrees of freedom in a sample increase, the critical value your t- statistic needs to exceed decreases. What does this mean?
easier to find significance when you have a bigger sample size\
Therefore, if I perform the same experiment using 100 different random samples, I will get the same result more consistently if the sample is bigger→true result becomes more likely
What is statistical power?
The likelihood of a statistical analysis of a sample will find an effect when there is an effect to be found in the population
(ie, overlap in data can cause a portion of the experimental distribution in which experimental mean can fall but not be considered different than control mean)
What are the 3 factors that influence statistical power for a test of difference?
Effect Size
Sample Size
P-value (i.e. 1-level of confidence)
𝑃𝑜𝑤𝑒𝑟 = 𝐸𝑓𝑓𝑒𝑐𝑡 𝑆𝑖𝑧𝑒 | 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒 | 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒
What are the two ways statistical power is generally used?
Post-hoc→How likely was I to detect an effect if there was one to
find (i.e. probability of a Type II error)
A priori→Given an expected effect size and level of confidence and desired level of statistical power what sample size do I need to achieve statistical significance.
desired statistical power is 0.8
So you’ve designed a great experiment, determined sample size, collected your data and run the appropriate statistical test but still might not achieve the “true” conclusion about the population.
Why?
There always might be some sort of error involved
What are the two types of inferential statistics?
Parametric→make assumptions about the parameters of the
population distribution from which the sample is drawn
Non-parametric→Does not make any assumptions about the population distribution
What is normal distribution and its key assumption?
Parametric statistics assume that the parameter (i.e. height) sampled is drawn from a population where it is normally distributed.
If sample is random, and large enough then the sample should also be normal (key assumption of parametric tests!)
What is skewness and the 3 types?
asymmetry in the distribution of a set of observations
Normal: Tails are symmetrical with high and low scores equally distributed around the center
Positive: Tail is pulled to right by extreme high scores
Negative: Tail is pulled to left by extreme low scores