Module 5- Descriptive Statistics Flashcards
Descriptive Statistics
- summarizing our data set to better understand and communicate important information
- helps researchers identify and communicate important characteristics about the empirical data
Raw Scores
- data resulting from our measurement procedures
- not informative
- ex. listing all the scores from the quiz
instead using descriptive statistics we could communicate performance on quiz by a class average
Frequency Distribution
- vital for describing data
-quick way to summarize how many scores were observed at each data point - type of freq dis used depends on the level of measurement
- x axis; observations of the variable in question
- y axis; frequency of each observation
Bar Graphs
- used for data representing discrete categories (distinct/ non overlapping categories)
- summarizes nominal or categorical data
- can also be used for interval and ratio data but not often
Frequency Polygon
- graph continuous data
- interval and ratio data
- not used for nominal data bc no assumption of equal intervals ^ cannot connect data points using a continuous line
- line to connect points represents equal intervals bw each data point
Grouped Bar Graph
- taking ratio data and grouping it into categories
- grouping continuous data into categories
ex. scores of quiz, group people scores of 70-79% together into a bar
Frequency distribution tells us
- number of observations at each data point
- normal vs skewed data
Normal Distribution
- symmetrical bell curve
- IQ, Height, Weight
- represents majority of scores are in the middle with fewer observations at the ends/ extremes
- most observations around the mean
Skewed Distribution
- scores are bunched at one end bc the extremes are pulling
Positive Skew
- mean greater than the median (mean is pulled by higher scores)
- more values are clustered to the left (lower end of the scale)
- right end of the distribution (high end of the scale) gets pulled to the right and has a longer tail
- this happens when have a few extremely high observations
Negative Skew
- mean less than the median ( mean is pulled by low scores)
- more values clustered to the right (higher end of the scale)
- left end of the distribution is pulled (lower end of the scale) and has a longer tail
- this happens when we have a few extremely low observation
Measures of central tendency
- Mean
- Median
- Mode
convey info about the typical observation of our data set
Mean
- most used MCT
- mathematical average of our data set
- mean= sum of scores/ number of scores
Mode
- Most frequent score/ observation in the data set
- peak of the frequency distribution
Bimodal Distribution
- when have 2 peaks in the distribution or 2 scores tied for the most frequent
Median
- middle point of the distribution
- to find; list all the scores in order of magnitude and the score that is in the middle= median
- cuts distribution in half; 50% of observations fall above and 50% fall below
- not used often
- use median when data is skewed bc gives more information bc mean is very much impacted by extreme scores
Mean and Median can only be calculated for…
Interval and Ratio Data
MCT and normal distribution
- Mean, median and mode are all equal
MCT and Skewed distribution
- Mean is very much impacted by extreme scores/ outliers
- Median is more informative and representative of the distribution
- positive skew; Mean > Median bc higher scores pull the mean
- negative skew; Mean < Median bc lower scores pull the mean
Variability
- provides us with an index of how spread out the scores are around the MCT
measures of variability
- range
- variance
- standard deviation
Range
- most basic way to represent dispersion of scores
- difference bw the largest and smallest score
- not always informative
- sensitive to outliers; one extreme score can drastically have an impact on the range of the data set
Variance
- how much each score in the distribution varies from the mean of the distribution
- average squared deviation from the mean
Problem with Variance
- is the sum of squares ^ different unit of measurement than the observations
- makes it hard to interpret
- ex. if looking at quiz grades the varience would be
% squared
Standard Deviation
- Measures the dispersion of the data set relative to the mean.
- determines the percentage of points that will fall around the mean
- solves the problem of variance
- is the square root of the variance
- therefore converts the scores back into the same scale as the observations
Properties of Normal Distribution
-68% of all observations/ scores will fall w/in (+/-) 1 SD of the mean
-95% of all observations will fall w/in (+/-) 2 SD of the mean
-99% of observations will fall w/in (+/-) 3 SD of the mean
Smaller the Standard Deviation…
- the smaller the interval and scores vary less around the mean
Larger the Standard Deviation…
- the larger the interval and scores vary more around the mean
If you know the mean and standard deviation, can calculate
- the interval in which 68, 95 or 99% of the scores will fall
Data Transformation
- transform data from its OG state to compare to data that has different measures
- cannot compare different measures therefore have to transform the data into the same units
Z scores
- most common transformation of data
- expresses each of the scores or observations in the data set in relation to the mean or standard deviation of the entire distribution
- measures exactly how many standard deviations above or below the mean a data point is
when z scores are used
- data did not form a normal distribution and have to do infernal stats
- want to compare 2 data sets of diff measures
Z score mean
Mean= 0
Z score Standard Deviation
SD=1
When can Z score not be used?
- Nominal or Ordinal Data
- bc they do no have a meaningful mean
equation for z score
(score- mean)/ Standard Deviation
Z scores tell us
- Valance
- Size
Z score Valance
+Z; observed score is larger than the mean
-Z; observed score is smaller than the mean
ex. if Z=-1.8, we know the student fell below the class average
Z score size
- tells us with more precision where on the distribution the score fell
68% of all Z scores fall between -1 and +1
95% of all Z scores fall between +2 and -2
99% of all Z scores fall between +3 and -3 - ex. Z= -1.8, close to -2 so we know the score fell more towards the left of the distribution
Pearson Product Moment Correlation Coefficient (r)
-type of descriptive data
- describes the relationship bw 2 variables based on how much they vary together
- used for interval or ratio data
- can use the analogy of 2 overlapping circles. the amount 2 circles overlap is how much variance the 2 variables share
- small overlap; correlation coefficient is small
- big overlap; correlation coefficient is big
Coefficient of Determination
- r^ Squared
- Proportion of variance accounted for in one variable by knowing the other variable
- allows to make predictions. if highly correlated can make predictions about the other variable
- HIGHER THE R2 THE BETTER OUR PREDICTIONS WILL BE
- r2= 0.45; proportion of variance accounted for is 45%
if SD is small
- tall and skinny graph
- little dispersion of scores around the mean
- what we want
If SD is large
- flat and wide graph
- large dispersion of scores around the mean