Data viz, z-scores, & p Flashcards
Box plots: what 4 measures can they show you?
- Box → 25-75%ile
- Center line → median
- Star → mean
- Outliers show up outside of “whiskers”
How does a boxplot show you a skewed distribution?
- when the whiskers are not the same lengths
What do frq distributions show? (hint: what kind of data)
- Display raw data for ONE scale variable at a time
Why use frq distributions? (ie, what do they help us do?)
- Look for patterns
- Check for accurate data entry
- See outliers easily
- Make sure we meet assumptions of tests
How are frq distributions / histograms different than bar graphs?
- Frq dist (histogram): ONE scale variable
→ Ex. x-axis: travel time, y-axis frequency - Bar graph: TWO variables! One scale, one category
→ Ex. x-axis studying yes/no, y-axis exam score
Empirical vs theoretical distributions?
- Empirical: distribution of the raw data actually collected from the sample, approx. normal
- Theoretical: distribution based on math and logic, assumes its a normal distribution and okay to use z-scores
Normal distribution: what’s it look like? what can we do w/ it?
- Bell-shaped, unimodal, symmetrical
- Allows us to find stats like percentiles, z scores, t tests & p
Positive skew: looks like what? Ex?
- Tail points towards the RIGHT, towards the positive end
- Ex: household income, test that was too hard
Negative skew: looks like what? Ex?
- Tail points towards the LEFT, towards the negative end
- Ex. test that was too hard
Bimodal: looks like what? What does this say about the sample? Ex?
- 2 modes, two bumps
- Usually means you really have two different populations within your sample that have different means
- Ex. scores on an exam could be bimodal for “did” and “didn’t” study groups
How does mean compare to median for normal, positive, & negative skewed distributions?
- Normal: M = median
- Positive skew: M > median
- Negative skew: M < median
Outliers: what are they? How do they impact M & median?
- An extreme score (very high or low compared to others)
- Can really affect mean, don’t really affect median at all
Outliers: How do you identify them in JASP & what we do with them in an analysis?
- Scores that surpass the whiskers on a boxplot
- AKA (1.5 x IQR)
- What do we do → usually remove them, but most often you report the descriptives with and without them
Bar graphs: What variables/ levels of measurements do you use them with?
- TWO variables, usually nom/ord (categorical) IV & scale DV
Line graphs: What variables/ levels of measurements do you use them with? (2 options)
- BOTH scale (often time on x-axis)
OR
- nom/ord (categorical) IV & scale DV
- ONLY when trying to highlight change