C15 Flashcards
Looking for patterns and relationships
Bivariate descriptive analysis is about describing the relationship or association or connecting between two variables and that it is also about measuring strength of that association.
A measure of association, like m/o central tendency and variation, is a summary statistic. A single number that tells you something; in this case about relationship / association / correlation between two variables.
Measure of association will tell you whether there is a relationship or not between two variables - it will not tell you which variable influences the other. Just because there is a relationship or association between two variables does not mean that the relationship is causal, that one causes another. They may co-vary, that is, one may follow the other - they are strongly correlated. It is possible to observe covariance and correlation without there being a causal relationship. They may be spurious (not causally related at all). May be that correlation is a result of another variable and extraneous / confounding variable.
At a basic level what a measure of association tells you is that there is either positive (if one value increases, the value of the other increases) relationship / negative relationship between two variables.
There are several measures of association. Choosing which to use depends largely on measurements of variables, nominal / ordinal and interval / ratio.
Inferential analysis
If you have findings from a sample you want to generalise, to talk about the findings in terms of the population and not the sample - then you should use inferential analysis. You can generalise from your sample to the wider population with some conviction if you know that your sample is truly representative of its population.
Sample must be a random / probability, and that in doing research you must have achieved a response rate of 65% (to ensure representative of the sample). If your data is quota / has a poor response rate it is not advisable to use inferential analysis
Even with random sample there is a chance that it is not truly representativeness, and therefore you cannot be certain that findings apply to the population. Inferential analysis - and statistical tests that are part of it, tell you what the probability is that differences have arisen by chance rather than being real.
Parametric and non-parametric tests
Samples the meet conditions such as response rate, random sample and normal distribution of variance (for both groups if using bivariate analysis) in order to use parametric tests. When these conditions are not met you would use a non-parametric test. If you are using a metric variable it is likely that it will meet condition of normal distribution - categorical variables are less likely, so better to use non-parametric.
Significance levels, confidence levels and confidence intervals
Significance level is the point at which the sample finding or statistic differs too much from population expectation for it to have occurred by chance - difference cannot be explained by random error or sampling variation and is accepted as a true / statistical difference.
At the 5% significance level there is a 1 in 20 chance that the result / finding has occurred by chance. It can also be expressed as a confidence level which would be 95% confidence. It tells people how confident you are about population estimates based on sample statistics / data.
A confidence interval is a range of values around sample value within which you expect the population value to lie. The extremes of these intervals are called confidence limits. The higher the level of confidence, the wider the confidence interval will be. The more confident you are about sample value, the less precise it will be.
You can reduce the confidence interval and maintain a high confidence level if you increase the size of the sample. This is not often done as conducting research with large samples is resource heavy (time & £), and trade-off between precision and price may not be worth it. With a larger sample you run the risk of introducing a greater level of non-sampling error.
Significance tests
Significance tests can be used during bivariate descriptive / explanatory analysis to determine if relationships, associations, correlations and influences seen in sample exist within the population from which it was drawn.
Because you cannot prove an empirical assertion you have to disprove it by testing null hypothesis. If a significance test disproves null hypothesis then you can accept alternative / research hypotheses. If you fail to reject a null hypothesis then you cannot accept the research hypothesis.
Procedure for hypothesis testing:
Formulate specific research hypothesis State the null hypothesis Set significance level Choose appropriate significance test Apply test and get the test statistic Interpret the test statistic - determine probability associated with it or the critical value of it) Accept / reject null hypothesis State finding in context of research hypothesis and research problem Draw conclusion
Type I & II errors
Type I error is made if you reject a null hypothesis when it is in fact true and you should have accepted it, a type II error is when you accept a hypothesis when it should have been rejected. You can reduce risk of type I error by setting significance level at 1% or .1%.
In setting significance levels you must reach a compromise between types of error.