Stage 5 Flashcards
What is the purpose of of descriptive statistics?
- To organise, summarise and describe various measures of a sample. (Descriptive statistics do not allow you to predict anything about the population from which the sample is taken)
What are the common forms of analysis for descriptive statistics?
- Measures of central tendency (mean, median, mode) 2. Measures of dispersion (range, inter quartile range, standard deviation) 3. Comparative measures of dispersion used to compare multiple samples of the same variable
Define the mean.
The sum of all the values in a data set divided by the total number of values
What are the advantages of using the mean value?
- Most widely used measure of average 2. Easy to calculate 3. Often used/needed for further statistical analysis
What are the disadvantages of using the mean value?
- Distorted by extreme values 2. Unreliable if there are only a few values used in calculation
Define the median.
The middle value of a data set when the values are arranged in rank order. It is easily found if values are plotted on a dispersion graph
What are the advantages of using the median value?
- Unaffected by changes in extreme values 2. More representative of the whole data set if the data set is extremely skewed
What are the disadvantages of using the median value?
- Equal weight is given to each item in the data set regardless of the value of that item 2. Of limited use in further data analysis
Define the mode.
The value that occurs most frequently in a data set
What are advantages of using the mode value?
- Simplest measure 2. Model class can be useful when data is grouped
What are disadvantages of using the mode value?
- Often more than one mode in a data set
What is the purpose of inferential statistics?
- To infer (i.e. predict) population parameters from a sample 2. Involves testing a hypothesis against the probability that a significant result has not occurred as a result of random factors (i.e. by chance)
What are the common forms of analysis for inferential statistics?
- Tests of difference between two samples 2. Tests of correlation between two samples 3. Tests of association within or between samples
What are the 7 stages of the methodology used for inferential statistics?
- State a null hypothesis 2. State an alternative hypothesis 3. State the rejection level 4. Work out the calculated statistic 5. Compare the calculated statistic with the critical statistic 6. Decision making 7. Write the conclusion
What is Spearman’s Rank correlation coefficient?
• Measures the strength of the relationship between two variable. • Allows you to state the apparent trend • Varies from -1 to +1 and can be used when there are equal numbers of data in each set (at least 12)
What are the advantages of using Spearman’s Rank?
1.Shows the significance of the data 2. Proves/disproves correlation 3. Allows for further analysis 4. Doesn’t assume normal distribution
What are the disadvantages of using Spearman’s Rank?
- Does not tell you the cause of the relationship 2. Does not allow you to predict one variable from another 3. Does not explain anything about anomalies in the relationship 4. Quite a complicated formula so can be difficult to work out 5. Need 2 sets of variable data so the test can be performed 6. Only looking at data in relation to one another, so doesn’t analyse the actual value.
Name 4 examples of using inferential statistics.
- Spearman’s Rank correlation coefficient 2. Best-fit lines and linear regression 3. Mann Whitney U Test 4. Chi Square
What are best-fit lines and linear regression?
• Highlights the trend in the scatter graph • The steepness of best-fit line shows the rate of change in y as x changes • Best-fit lines are calculated from linear regression: line is positioned so that the distance between it and all the points is at a minimum • Calculated using the regression formula
What are the advantages of using lines of best-fit and linear regression?
- Shows the relationship between two variables
- used for prediction = so gaps in data can be estimated
- Identifies anomalies in the data - the bigger the residual (distance from the regression line to a value of y) the more anomalous
- Residuals can be mapped to see if there is a spatial pattern or other explanation for anomalies
What are the disadvantages of using lines of best-fit and linear regression?
- Regression analysis can only be used for straight line relationships and sometimes relationships are non-linear
- Only an estimate and could have a sampling error
- Might be a extraneous (third) variable affecting the data set and x and y may be independent from each other
What is the Mann Whitney U Test?
• Measures any difference between two variables • Can be used when there are unequal numbers of data in each set
What are advantages of using the Mann Witney U Test?
- Shows spread from the mean
- Very visual
- Indicates relaibility of data
- Compare graphs easily and anomalies are clearly shown
- Easy to work out mean, median, etc
What are disadvantages of using the Mann Witney U Test?
- Time consuming to construct
- Standard deviation is easily manipulated and bias
- Large data set needed