topic 2 Flashcards
What are the different kinds of statistics?
Descriptive, tests for differences and tests for similarities.
The latter two can be sub-divided into the categories parametric and non-parametric.
What denotes descriptive statistics?
Measures of central tendency (mean, median mode) and measure of spread (standard deviation, range, coefficient of variation).
What denotes a statistical test of difference?
Statistically analysing whether one group is significantly different from another. i.e. how confident is the researcher that others will find the same conclusion.
What denotes a statistical test of relationship?
Is there a correlation or association between x variable and y variable?
Note: The relationship is not a analysis of causation.
What are the two key questions in descriptive statistics?
What is the typical value for the data and how variable is the data.
i.e. what is the average and how much does each individual in the population being assessed deviate away from this value.
Describe the arithmetic mean, mode and median.
The arithmetic mean is the sum of all values divided by the number of values; the median is the middle number when all data has been ranked; the mode is the most frequently occurring value.
note: median is important for non-parametric data, less affected by outliers.
What are the 5 measures of variance?
Range (strongly affected by outliers) variance coefficient of variance standard deviation confidence intervals
Write the formula for calculating sample variance. Why would you use this instead of population variance?
(screen shot)
Ordinarily would not use population variance since the sample is not large enough.
What is the Standard Error?
The standard deviation of multiple sample means. i.e. measuring how good an estimate of the sample mean is.
It is highly affected by the size of a sample; the estimate is more accurate when the sample size is large.
Write the formula for the standard error of the estimated sample mean.
Standard deviation over the square root of the sample size. n
note: SE is therefore completely dependent on sample size.
Describe the difference between SD and SE.
SD is the analysis of the variation between sample values whereas SD is the analysis of the variation between sample means.
The former estimates the spread of values around the mean, the latter analyses how good the estimate of the population (true) mean is.
What is meant by parametric statistics?
Parametric statistics denotes data that is normally distributed. i.e. 95% of observations fall within 1 +/- SD of the mean.
What statistical tests are performed on parametric datasets? And what do they ordinarily analyse?
T-tests, ANOVA and linear regression.
These statistical tests can be used to assess whether the mean of two populations differ?
What is the null hypothesis for an independent-samples T-test and which output would you utilise to either accept or reject this statement?
Null hypothesis:
The two sample means come from a population with the same true mean (i.e. the two sets of data are the same).
The output box is detailed “Independent Samples Test”; the Sig(2-tailed) value will indicate whether the null should be accepted or rejected with a critical threshold of 0.05.
What is meant by paired data? Provide an example.
Paired data is when 2 units in a pair are more linked to each other than units within another pair. i.e. they may be dependent on one another. Therefore used when an individual has been sampled more than once.
For example, measuring an individual before and after drug treatment OR when 2 clones have been exposed to different treatments.
How does the paired T-test work?
It divides the mean difference of the paired values by the standard error of the differences.
What is the null hypothesis for a paired T-test?
The null hypothesis for this analysis is always that there is no difference between the two samples. i.e. that the samples come from the same dataset.
Calculate ‘t’ and the degrees of freedom from a paired samples dataset where the mean = 16.3 and SD = 22.7.
First need to calculate SE: SD/ square root of sample number = 6.6 t = mean/SE = 2.5 df = (n- 1) = 11. Note: there is considerable variation between the two samples if the significance of t is <0.05.
How does the level of variation differ between independent samples T-test and the paired approach?
All variation is considered in the independent samples T-test wheres as the variation in each sample for the paired approach has no influence (only the differences between the two paired units are considered).