descriptive statistics (A) Flashcards
what is meant by descriptive statistics?
Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire or a sample of a population.
what is descriptive statistics are involved with this set?
- measure of central tendency
- measures of spread (dispersion)
what are the measures of central tendency?
- mean / average
- mode / popular
- median / middle
what are the measures of dispersion?
- range
- IQR
- Sandard deviation
what is meant by measures of central tendency?
term term average is used in everyday life ro express an amount that is typical for a group of people / things
typical tendency of groups
measures of CT is a single value that attempts to describe a set of data by identifying the central position in data set
why is average useful?
useful indicator of the general trend
- summarise large amounts of data
- indicate that there is some variability around the single value within original data
what are the pros of the mean?
- most popular measure
- yields one distinct answer
- useful for comparing data sets
what are the cons of the mean/
- answer can be affected by extreme values (outliers) or when skewed data is present
- mean has tendency to be pulled towards extreme values
what are the pros of the median?
- unaffected by extreme values
- if there is skewed data it may be more informative decriptce measure then the mean
what are the cons of the median?
- less amenable then the mean
what ate the pros of them doe?
- easy to obtain
- only measure that can used for data on nominal scale
what are the cons of the mode?
- it is not very stable from sample to sample
- there may be more than one mode for a particular score
what is non skewed data?
equal (perfect distribution)
mean is preferred
what is skewed data?
large amounts of outliers
median is less affected by skewed dat and id generally considered to be the best
when do you use the mode?
for nominal data
when do you use the median?
ordinal
interval/ration (skewed)
when do you use the mean?
interval/ratio (non skewed)
what are measures of spread?
describe how similar or varied the set of observed values are for a particular variable
large spread = large differences
seen as positive if there is little deviation
what are the types of measures of spread?
- range
- quatiles
- deviation (absolute deviation, variance, standard deviation)
what are the types of deviation?
absolute deviation, variance and standard deviation
what is the range?
difference between the highest and the lowest variable
what are the pros of range?
- useful when measuring variable with a critical low/high threshold that should not be crossed e.g. drinking age
- easy
what are the cons of the range?
- value is sensitive to outliers
what are quartiles?
tell us about the spread of data set by breaking data set into quarters, just as the median breaks in half
how many quartiles are there?
Q1
Q2
Q3
how do you work out the quartiles?
- if you find the median you will find Q2
- then work out the median of the number outside to find Q2 and 3
what is the interquartile range?
the difference between 1st and 3rd quartile
IQR = Q3 - Q1
what is a box plot?
visual description of the distribution based on minimum Q1, median, Q3, maximum
what is an outlier?
observation which does not appear to belong with other data (measurement or recording error / equipment failure)
have to find them as can skew the data
how do you find outliers?
Q1-1.5 x IQR
Q3+1.5 x IQR
anything outside of these fences is considered an outlier
what is the disadvantage to quartiles?
does not take into account every score just 25%
what is the difference between mean absolute deviation and standard?
both looking at the distance of the data oto its mean
standard is calculating the square of the difference
absolute is only looking at the absolute difference
how do yo work out mean absolute deviation?
- find the mean of all values
- find the distance of each value from that mean, subtract mean from each value, ignore minus)
- find the mean of those distances
SUM OF (x-u) / N
x = each value u = mean n = number of values
what is the variance?
the average of the squared differences from the mean
how do you calculate the variance?
- work out the mean
- for each number subtract the mean and square the result
- the work out the average of those squared differences (but minus one)
SUM OF (x-u)2 / n-1
why do you use n-1 for standard deviation and variance?
to overcome bias
what are the problems with sample variance?
- as they are squared this gives more weight to extreme scores
- it contains outliers, variance ay not represent data as a whole
- not same units as data (unit squared) cannot directly relate
wat is standard deviation?
measure of spread of scores within data set
use in conjunction with mean to summarise continuous data
- normally apparopraite if data not skewed or has outliers
how do you work out standard deviation
(square root of variance) work out mean subtract mean and quare add all square find mean of those squared find square root