descriptive stats Flashcards
What 3 factors should you try to encompass when designing a study
Types of data
If looking for difference or relationship
Number of groups or variables
what are the 2 types of data
measurement and categorical
what is measurement data
frequency or quantitative data
what is categorical data
qualitative data
what are the 4 types of scales
nominal
ordinal
interval
ratio
what is a nominal scale and when is it used
used for categorical data which reflects labels for categories
why shouldn’t you calculate summary descriptions for categorical data
results in nonsensical data
define ordinal scales and what they’re used for
ordering objects along continuum of various rankings
no information given on differences btwn scale points
give an example of a study using ordinal scales
Holmes and Rahe 1967
define interval scales and what they’re used for
used when have equal intervals btwn objects to represent equal differences
do not allow talk on ratios as 0 point on scale is arbitrary
define arbitrary
not based on system or re
define ratio scales and what they’re used for
have true zero point
true zero corresponds to absence of thing being measured
what are the aims descriptive statistics
to characterise numerical dataset representatively
to condense meaningful a lot of info
minimise error involved in condensing process
what are inferential statistics
goal to infer characs of whole pop from sample and make likely assertions from information instead of certain ones
use sample stats to estimate population parameters
use of theoretical sampling distributions made of innumerable random samples
uses p-values and confidence intervals
what are the 3 categories of descriptive statistics
measures of central tendency and measures of dispersion
what are the 3 measure of central tendency
mean
median
mode
what are the measures of dispersion
range
IQR
variance
standard deviation
what is the mean; give the equation
average score; calculate by sum of scores/number scores
Σ x / N
When is the mean most useful and why
For normal/symmetric distributions, the mean is the most efficient and least subject to sample fluctuations
what are the disadvantages of using the mean
greatly influenced by extreme scores
Inaccurate sometimes
how can you tell if the mean is an appropriate measure to use on a dataset
by using a histogram to see if data is symmetrical and if mean is appropriate
what type of distribution is unsuitable for the mean
skewed distributions
why is the median
when all scores arranged in order; central value
why and when is the median useful
less sensitive extreme scores; gives more accurate representation of data
better measure than mean for highly skewed distributions.
what is median formula
N+1/ 2
define mode
most common score
what happens if you have 2 adjacent modes
add them/2
what happens if you have 2 nonadjacent modes
bimodal distribution
what are the 2 defining features of measures of central tendency
they indicate typical values and are summarised by a single number
what are the suitable summary descriptions for categorical data
frequencies
percentages
mode
what are measures of variability
describe degrees to which values vary
what is the range and how is it computed
measure of distance from lowest to highest score; max value- min value
what are the disadvantages of using range
extreme values/ outliers distort
unstable across diff samples
what is the real advantage of range
straightforward to calculate and easy to interpret
what is the IQR, what does it use and how is it calculated
1/2 the distance needed to cover 1/2 the scores
it uses percentiles
It is computed as one half the difference between the 75th percentile [often called (Q3)] and the 25th percentile (Q1). The formula for semi-interquartile range is therefore: (Q3-Q1)/2.
what is the difference btwn IQR in a normal vs skewed distribution
In a symmetric distribution, an interval stretching from one semi-interquartile range below the median to one semi-interquartile above the median will contain 1/2 of the scores. This will not be true for a skewed distribution, however.
what are the advantages of IQR and what kind of distribution is it useful in
little affected by extreme scores; good measure of spread for skewed distributions.
what is the calculation for the separate IQR/percentiles
percentile/100 first, e.g. 50th percentile= 0.50
then
0.50 * (N+1) = rank X
then go across dataset and find number at rank position
what is the disadvantage for IQR in normal distributions
more subject to sampling fluctuation in normal distributions than the standard deviation and therefore not often used for data that are approximately normally distributed.
define variance
measure of how much scores vary in terms of distance from mean
average of each score’s squared deviation from mean score
what is variance formula
σ2= Σ (x- MEAN)2 / N
how does variance formula change when computing for sample vs population
N-1 for sample
N for pop
when do you use sample variance formula
when have done sample and want to generalise to wider population and so estimate population variance
what is standard deviation
square root of variance
what does a bigger SD value mean
values more spread out
what is the equation for sample and population SD
Population σ = √σ2
Sample s = √s2
what can you do if you know the SD and mean in normal distribution
possible to compute the percentile rank associated with any given score
in a normal distribution, how many of the scores are within 1 SD of the mean
68%
in a normal distribution, how many of the scores are within 2 SDs of the mean
95%
why is SD useful
used in many inferential stats tests
what is a disadvantage of the SD and how can this be overcome
not a good measure of spread in highly-skewed distributions
supplement by the IQR.