Stats - Week 2 Flashcards
Describing Distributions we look at what?
Measures of shape (Kurtosis and Skewness), central tendency (mean, median, and mode), “spread” or variation (range and variance & standard deviation)
Distribution shapes!
Normal distribution, Positive skew (left), Negative skew (right), Leptokurtic(Positive kurtosis), Platykurtic (Negative kurtosis)
Measures of central tendency =
= estimate the “center” of our data. Mode, Median, and Mean
MODE:
most frequent score in a distribution. A distribution can be bimodal or multimodal.
MEDIAN:
Middle score or 50th percentile
Arrange the scores in ascending order
Median = middle score if # of scores is odd
(average of middle 2 scores if # of scores is even)
MEAN:
Arithmetic average of the scores in a distribution.
The symbol for the mean of a population is omega;
The symbol for the mean of a sample is. (SUM) Mu is population and sample is x-bar
Which is most influenced by the skew: mean, median, or mode?
Mean
Measures of Variation Defined
The more variation in your data, the less precisely you can estimate the population’s location (e.g., mean) from the sample information.
Measures of variation are?
highest score minus lowest score(ie. Data hours of day spent on phone = 3, 4, 6, 7 so 7-3 = range of 4) and sum of squared errors “sum of squares” (gives the total deviation from the mean)(take every data point and subtract from mean and then square it to get rid of negative and then add all together. Want all numbers to be positive so we can actually see a variance.)
Range
(highest score minus lowest score)(ie. Data hours of day spent on phone = 3, 4, 6, 7 so 7-3 = range of 4)
sum of squared errors “sum of squares”
gives the total deviation from the mean
(ie. take every data point and subtract from mean and then square it to get rid of negative and then add all together. Want all numbers to be positive so we can actually see a variance.)
Variance
the average of the sum of squared deviations.
◦Is always a positive number
◦Accentuates the extreme differences
Standard Deviation
the standard deviation of a random variable, sample, statistical population, data set, or probability distribution is the square root of its variance.
Why? A measure of variance that is expressed in the same unit of measurement of the original data. Used more!
Z-scores (or “standard scores)
Tell how many standard deviations a raw score is from the mean.
e.g., z = 1.96 means 1.96 SDs above the sample mean.
I.e., deviations from the mean in SD units.
Any standardized variable has a mean = 0 and SD (& variance) = 1
(Does NOT mean variable is normally distributed)
Permits a standard way to compare across scores / measures
Z scores allow us to determine probabilities.E.g., the probability of a randomly selected student passing a class.
Null Hypothesis “significance Testing” Steps
Step 1: State the hypothesis.
Step 2: Set the criterion for rejecting the null hypothesis.
Step 3: Compute the test statistic.
Step 4: Decide whether to reject the null hypothesis.
Null hypothesis Defined
Norelationship between variables (in population!)
Or, no difference between groups
Alternative hypothesis (H1)
The null (no relationship) is not true. (non-directional hypothesis).
Or, there is a relationship, or difference between groups in one direction (a directional hypothesis).
E.g., “I/O is better than Clinical”
Alpha () level (p-value)
probability of a Type I error (e.g., 5%).
A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true).
The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
Type I error
Wrongly rejecting the NULL hypotheses.
(concluding that “there is a relationship” or “two groups differ” in the population, when it just ain’t so.)
Ie. Saying a man is pregnant
Set Criterion for Rejecting Ho
Statistical Significance
We decide how much risk to tolerate.
Traditional cutoff for statistical significance = .05 / .01 / .001
Type II error:
Wrongly failing to reject the null hypothesis.
(Saying “no difference”/“no relationship” when there is one in the population.)
β (beta)= probability of making a Type II error
Statistical power = (1 – β).
Probability of rejecting the null hypothesis (H0) when it is false
We want an “80% chance of finding an effect, assuming there is one in the population” (power of .80)
Ie. saying a pregnant women isn’t pregnant.
Critical value
indicate region of rejection:
Values (of a statistic) of the sampling distribution that are improbable if the NULL is true.
E.g., (z > 1.65) OR (z > 1.96 or < -1.96): for p-value of
Directional H1 is a one or two tailed test?
one-tailed test
Nondirectional H1 is a one or two tailed test?
two tailed test
Compute test statistic
-Some ratio of MODEL / ERROR
(variance explained by our model / unexplained variance)e.g., z, t-ratio, f-ratio, chi square (we’ll compute these later).
-Compare test statistic value (e.g., z) to critical value.
e.g., is test statistic > 1.65?
Correlation/regression
Is the relationship significantly different from zero?
t-test/anova
Difference between groups greater than zero ?
Correlation defined
How similar or how related your variables are. Do these variables walk together, vary in similar ways. As one increases does the other increase. Is there a relationship. An index of the linear relatedness of two variables.
Ranges between relationship -1 and +1
The sign of the correlation coefficient indicates?
- whether the relationship is positive or negative.
- Positive = as x increases, y increases. Negative = is as x increases, y decreases.
Absolute value of the coefficient indicates?
- How strong the relationship is and Pearson’s r is what we use to show that relationship.
- R value = how strong the relationship is. Between -1 and +1
Positive Skewness
when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode.
Negative Skewness
when the tail of the left side of the distribution is longer or fatter than the tail on the right side. The mean and median will be less than the mode.
Kurtosis
The measure of outliers present in the distribution.