Module 14: Descriptive statistics Flashcards
2 most common ways to sumamrize data
- measure of central tendency
- Measure of variability
Measure of central tendency (4)
what it is + represented by
A measure of the typical value in a collection of numbers or a data set
- measured by mean, median and mode
Mean (2)
+ how to find?
The average
Sum of all the scores divided by the total number of scores
Population mean
Sample mean
Median (2)
How to find?
The value that lies in the middle of the data when the data set is ordered
- First rank the data, then the position of the median is equal to the number of enteries plus one divided by 2
Odd number of entries when caculating median:
median is the middle data entry
Even number of entries when calculating median:
Median is the mean of the 2 middle data entries
Mode
The most frequent value
If no data set is repeated then the data has no
mode
If two entries occur with the same greatest frequency each entry is a — and is called
- mode
- bimodal
Finding the mode
finding the greatest frequency
Advantage of using the mean (2)
- most common statistic
- Takes into account every entry of a data set
Disadvantage of using the mean (2)
- greatly affgected by extreme scores (outliers)
- Knowledge about individual cases is lost with averages
Advantages of using the median (2)
- Little influence by extreme scores
- Reasonable estimate of what most people mean by the center of a distribution
Disadvantage of using the median
- may not be good to ignore extreme values
Advanatges of using the mode (2)
- the most frequently obtained score
- not influenced by extreme score
Disadvanatge of using the mode (2)
- may not represent a large proportion of the scores
- ignores extreme values
Variability
numbers which describe how spread out a set of data is
Examples of variability meausres (4)
- range (interquartile range)
- deviation
- variance
- standard deviation
Range+ formula (2)
length of the smallest interval that contains all the data
range= largest value - smallest value
range is sensitive to
- sample size: small samples= less range (less respresentative range)
- extreme scores (tells you smallest and largest but not bulk)
Interquartile range (2)
+ formula
Measure of distance between first and third quartiles
- IQR= Q3-Q1
second quartile is the
median
benefits of IQR (2)
- less affected by extreme values
- helpful for identifying outliers
Quartile (2)
What it is+median
- positions in a range of values representing multiples of 25%
- 50% of scores fall below median, 50% scores above
First quartile (Q1)
25% of scores fall below Q1, 75% above
Third quartile (Q3)
75% of scores fall below Q3, 25% above
deviation
The diference between each score and the mean of the data set
How far you are from the mean
deviation formula
xi= xi-u
deviation scores always sum to
0
Difference between deviation and IQR/boxplots
Deviation scores show dispersion around the mean, IQR and boxplot show dispersion around the median
Variance
single number representing the average amount of variation in a set of scores/ how spread out the scores are
Steps for finding the sample variance (5)
Standard variation
Measure of the spread of scores out from the mean of the sample
How to cauclate standard deviation
- calculate the variance
- find the square root
Population standard deviation formula
Standard deviation is a measure of the typical amount an entry deviates from the mean, thus the more entries are spread out, the
greater the standard deviation
Descriptive statistics (2)
- cannot make predictions or generalizations
- only drawing conclusions about current sample and not extrapolating or going beyond
inferential statistics (2)
- can make predictions or generalizations
- allow conclusions about the population based on data from a sample
Data matrices
a table or worksheet that organizes the data together with all the variables of interest
Frequency distributions
A table indicating the frequency of each value in a data set
Histogram (3)
What it is+ illustrates+can help identify
- A graphical representation of the frequency of a variable
- illustrates the distribution of scores
- can help identify outliers or violations of normal distribution assumptions
symmetrical
Negative skew or left skew
Positive skew/right skew
Central tendency
helps identify the typical or most common value in data
Measures of central tendency
Mean
median
mode
measure of central tendency for symmetrical distribution/ skewed
If the average is 100 and the standard deviation is 10, then there is
2/3 of the data that falls between 90 and 110
for data that is skewed or has outliers, —- may be better choice to describe the centre of the distribution
median
Q position
Qposition= [(Q#)(n+1)]/4
Q#= number of quartile your trying to find
Round Q position to
the median
How to find outlieers with IQR
Scatterplots
- visualize the form, direction and strength of 2 variable relationships
correlation coefficients
indicate the degree of covariance between variables: how much one variable changes in relation to another
Data points that are more closely positioned around the best fit line represent
a stronger relationship than when data points are further from the lines