statistics Flashcards
what is modelling?
the process of setting up and using mathematical equations to describe and make predictions about the real world.
what must we recognise when dealling with models?
all mathematical models make simplifying assumptions and thus limitations must be considered
important exam technique for dealing with modelling questions?
read carefully, underline key terminology and decode into mathematical meaning
how do you formulate a linear model?
use give values and constants to fit an equation into the yb= mx+c format. usually requires some simultaneous solving
what needs to be remembered when interpreting models, evaluating and explaining assumptions/limitations?
be specific. state what constants mean in relation to actual situation, contrast actual values for model values when evaluating and contextualise limitations to real world scenarios
when is linear regression model used?
when there is a strong enough correlation between variables that all points cluster around a straight line and a linear equation can be given to it
regression model in laymans terms and uses
line of best fit used to predict one y variable based on one other known x value
what is the explanatory variable and where is it plotted?
independent variable plotted on the x axis - used to explain changes on the y axis
what is the response variable and where is it plotted?
dependent variable plotted on the y axis - responses to changes in the explanatory variable
full name and official form of the regression line?
least squares regression line
y = a + bx
dependent variable always the subject.
what is interpolation?
we know the relationship between the variables on our regression line for the spread of our data. hence this can confidently be used to predict values within the interval
what is bivariate data?
every data item is a pair of values, the association between these is called correlation
what are the degrees of correlation and aproximate pmccs?
what is the pmcc?
product-moment correlation coeficcient is the measure of strength of linear correlations, called r for a sample.
it varies between 0 and |1|, with ome bing a perfect correlation and 0 being no correlation
how do you interperate relationship between values?
full interpretation of r in both statistical and non statistical language - mention what the variables actually are in both observations:
eg.
“there is a strong positive correlation between height and weight of british men”(stat language)
“taller men tend to be heavier”(non stat language)
give both stat and non stat interpretations
important note when observing correlation?
correlation does not always imply causation
what is hypothesis testing?
test to see wether wether a sample set of data supports a claim about population - sees if some king of change has effected population.
hypothesis testing is a way of deciding wether something is unusual
what is a population parameter?
value that describes whole population
what is a null hypothesis (Ho)
statement about population parameter, normally fixed depending on type of parameter bing tested
what is alternative hypothesis (H1)?
statement about pop. parameter: determined by what kind of claim is being tested
what is a significance level?
proportion of variables that may give an alternative hypothesis outcome by chance due to natural variation
what is the critical value?
(stats)
pre-calculated “limiting value” for given significance level for particular hypothesis test; found in a table
wht is critical region?
fang e of values for test stat which is “significant” (unlikely - according to significance value - to happen by chance)
what is the p value?
probability of a sample testoccuring due to natural variation based on pop. parameter
what does a hypothesis test do?
compares a test stat with a critical value (or p value with a significance level) to decide if the chance of the result happening sue to natural variation is small enough to suggest there is evidence for the alternative variation.
does not prove anything is true or false on its own, used to suggest wether a further investigation is useful
tests wether pmcc ( r ) of a sample indicates linear relationship
what does greek rho denote? (stats)
pmcc of a population, not sample
what is the null hypothesis?
no correlation between 2 variables; pmcc = 0
denoted bt Ho: ρ = 0
what is a one tailled test?
test to see if there is correlation in a particular (+ or -) direction
what is a 2 tailled test?
tests for relationship between 2 variables in either direction
example of one tailled alt. hypothesis?
H1 :ρ > 0
H1 :ρ < 0
example of 2 tailled alt. hypothesis?
H1 : ρ ≠ 0
what is test stat?
always pmcc in sample r, value compared with critical value for r
what test stat concludes rejection of null hypothesis?
whan test stat is bigger (closer to|1| nad perfect correlation) than critical value
when is null hypothesis accepted?
if test stat is weaker than critical value
what must be done once you have decided to accept/reject Ho?
state conclusion in context
what is process for conducting a pmcc hypothesis test?
- state hypothesis
- calculate test stat (pmcc of sample)
- identify critical value from table (using sample size, sig. level)
- compare test sta with critical value
- reject/ accept Ho
- state conclusion in context
what does qualitative mean?
non numerical, something not given a numerical value but still data
what does quantitive discrete mean?
numerical data with specified incraments
what does quantitive contimuous mean?
counting, so can take any increasing/ dercimal value
how do you compare the overall typical value of a dataset?
use an average
how compare variation within each dataset?
use a measure of spread
what must be compared and commented on when evidencing claims about dataset?
mode, median, IQR and range.
explain what variations in these suggest
what is standard deviation?
sigma x - measure of spread, how far each value is from the mean
can be used with mean to support comparisons
why are mode and range poor measures?
- range uses most extreme values so is effected by outliars
- there may be too many modes or none at all and repetition doesn’t indicate a typical value
symbol for mean and how to calculate from summarised data?
x̄ - x bar, calculated by no. cases times mid value of group divided by no. values
where do you find how to calc sd from summary stats?
formula book
what is variance?
dtandard deviation squared
why can range and sd not be calculated from tabulated data with undefined limits?
cannot work out range as no min/max value, cannot work out sd as not all values are known
advantage of mean?
takes account of all values
disadvantages of mean?
outliars can have a significant impact
how calculate median?
take middle value (if even go halfway betwwen), could use linear interpolation
advantages of using median?
less sensitive to outliars
disadvantage of median?
does not use all values
advantage of mode?
can be used for qualitative data
disadvantages to mode?
- only relevant for high frequencies
- does not consider numerical value of data
advantages of standard deviation?
- includes all data
- ## takes account of all numerical values
disadvantage of standard deviation?
skewed by outliars
advantage of IQR?
less sensitive to to extreme values
disadvantage of IQR?
does not use all data
advantage of range?
quick to calculate
disadvantage of range?
only uses extreme values - highly susceptible to outliars
disadvantage of linear interpolation?
assumes even spread of data
how do you estimate values of grouped data using linear interpolation?
lower bound + class width X (position of desired percentile - cumulative freq of prev. group all divided by group freq)
or
double number line
what is effect of adding or subtracting a value from every value in a dataset on mean and sd?
- mean: increases/decreases by same amount
- ## standard deviation: no effect
effect of multiplying or dividing all values in dataset by same amount on mean and sd?
- mean: multiplied/divided by same amount
- standard deviation: multiply or divide by same amount