general Flashcards
why use stats (2)
- tells us how likely our results are to be true
- tells us how wrong our results may be
the sample
the things/people we observe/experiment on
population
where the sample have been taken from, spread across time and space)
what do we use the sample to do
draw inferences about the population
chance is..
responsible for giving the wrong answer
e.g.tossing a coin- if i toss a coin 10 times and get 8 heads, this does not mean that the coin is unfair
why do we use null hypothesis instead of our hypothesis
proving something is true is v hard, easier to prove something is not true
stages of hypothesis testing 3
1) formulate a hypothesis
2) formulate a null hypothesis
3) calc the chance that you might see your data if the null hypothesis were true- p values
what is a p-value
the probability that you might see something as extreme, or more extreme, as that you see in your date under the null hypothesis
what is the normal p value threshold
0.05 -anything below this is seen to be significant and the null hypothesis can be rejected
more modern p-value approach
- 1= weak evidence
- 05= moderate evidence
- 01=strong
- 001= very strong
two types of data
numerical and categorical
two types of numerical data
continuous or count
continous data
can in theory take any value e.g. blood pressure
count data
only integer values and a count of discreet things e.g. number of children
two types of categorical data
nominal and ordinal
nominal
things with NO INHERENT order e.g. eye colour
ordinal
things WITH INHERENT order e.g. large/small
numerical data
anything that can be expressed in numbers
categorical data
things that don’t have an inherent numerical value
what is descriptive sets used for
to describe data in a sample: range and most typical data
inferential stats
used to draw inferences abut the pop. fro the sample e.g. will the drug work?
how can categorical data be displayed
frequency tables or graphical means (pie and bar charts)
How can numerical data be displayed
tabulating is not practical because continuous data can take any possible value and count data may be many possible values (e.g. age is continuous - to the nearest second over 3x109 possible ages b/w 0-100)
–> therefore data is grouped e.g. age brackets
in this way continuous data is turned into categorical e..g histograms
numerical data can be displayed using
histograms
SD
the spread of data around the mean
mean
sum of jobs/ no. of obvs
median
middle observation
mode
most frequently occurring value
interquartile range
range covering middle 50% of data
range
range covering all date