statistics Flashcards
random error
lecture : can be conceptualised as sample variability
notes: can’t avoid it only way is to test the whole population which is impossible but you can minimise by increasing sample size
bias (systemic error)
a difference between the observed value and true value due to all causes except sampling variability
random sample
each member of the population has an equal chance of being chosen
properties of a good sample
representative by structure
random
representative by number of cases
how to select a sample
- select a sampling method
- define target population
- determine sample size
high power
large sample size
little scatter
low power
small sample size
scatter is large
definition of paired data
when 2 or more measurements are made on the same observational unit
descriptive statistics
organising and summarising the data , tables, histograms, pie charts etc, tables (frequency distributions and relative frequency distributions)
measures of central tendency (mean , median, mode)
central tendency describes location and variation describes SPREAD (red book lec 2 )
measures of variability ( range , variance, standard deviation)
inferential statisics
using the sample that you worked with to make a general conclusion
uses probability to determines how confident we can be that the conclusions we derive are correct
what are measures ovariation in descriptive statisitcs
IQ range
variance
SD
range
mean
it’s the balance point
can be heavily affected by outliers so outliers can make the mean a bad measure of central tendancy
median
it’s the middle value when the variables are ranked in order
its the point that divided a distribution into 2 equal halves
its unaffected by outlierss? not sure
if you have normal distrubted data ( symmetric) how does this affect the central tendancy
mean and median will be the same and mode
what happens in skewed data to the central tendency
the mean lies further towards the skew than the median does (because rememeber mean is affected by outliers)
in skewed date the median and mean are more towards the skew than the mode
mode
the most common data point. its possible to have more than 1. if all values are unique there is no mode
SD
takes into account all individual deviations
the larger the SD , the greater the variation around the mean
google is a measure of the amount of variation or dispersion of a set of values.[1] A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.