Stage 5 Flashcards by Tessa Glass

What is the purpose of of descriptive statistics?

To organise, summarise and describe various measures of a sample. (Descriptive statistics do not allow you to predict anything about the population from which the sample is taken)

How well did you know this?

Not at all

Perfectly

What are the common forms of analysis for descriptive statistics?

Measures of central tendency (mean, median, mode) 2. Measures of dispersion (range, inter quartile range, standard deviation) 3. Comparative measures of dispersion used to compare multiple samples of the same variable

How well did you know this?

Not at all

Perfectly

Define the mean.

The sum of all the values in a data set divided by the total number of values

How well did you know this?

Not at all

Perfectly

What are the advantages of using the mean value?

Most widely used measure of average 2. Easy to calculate 3. Often used/needed for further statistical analysis

How well did you know this?

Not at all

Perfectly

What are the disadvantages of using the mean value?

Distorted by extreme values 2. Unreliable if there are only a few values used in calculation

How well did you know this?

Not at all

Perfectly

Define the median.

The middle value of a data set when the values are arranged in rank order. It is easily found if values are plotted on a dispersion graph

How well did you know this?

Not at all

Perfectly

What are the advantages of using the median value?

Unaffected by changes in extreme values 2. More representative of the whole data set if the data set is extremely skewed

How well did you know this?

Not at all

Perfectly

What are the disadvantages of using the median value?

Equal weight is given to each item in the data set regardless of the value of that item 2. Of limited use in further data analysis

How well did you know this?

Not at all

Perfectly

Define the mode.

The value that occurs most frequently in a data set

How well did you know this?

Not at all

Perfectly

What are advantages of using the mode value?

Simplest measure 2. Model class can be useful when data is grouped

How well did you know this?

Not at all

Perfectly

What are disadvantages of using the mode value?

Often more than one mode in a data set

How well did you know this?

Not at all

Perfectly

What is the purpose of inferential statistics?

To infer (i.e. predict) population parameters from a sample 2. Involves testing a hypothesis against the probability that a significant result has not occurred as a result of random factors (i.e. by chance)

How well did you know this?

Not at all

Perfectly

What are the common forms of analysis for inferential statistics?

Tests of difference between two samples 2. Tests of correlation between two samples 3. Tests of association within or between samples

How well did you know this?

Not at all

Perfectly

What are the 7 stages of the methodology used for inferential statistics?

State a null hypothesis 2. State an alternative hypothesis 3. State the rejection level 4. Work out the calculated statistic 5. Compare the calculated statistic with the critical statistic 6. Decision making 7. Write the conclusion

How well did you know this?

Not at all

Perfectly

What is Spearman’s Rank correlation coefficient?

• Measures the strength of the relationship between two variable. • Allows you to state the apparent trend • Varies from -1 to +1 and can be used when there are equal numbers of data in each set (at least 12)

How well did you know this?

Not at all

Perfectly

What are the advantages of using Spearman’s Rank?

1.Shows the significance of the data 2. Proves/disproves correlation 3. Allows for further analysis 4. Doesn’t assume normal distribution

How well did you know this?

Not at all

Perfectly

What are the disadvantages of using Spearman’s Rank?

Study These Flashcards

Does not tell you the cause of the relationship 2. Does not allow you to predict one variable from another 3. Does not explain anything about anomalies in the relationship 4. Quite a complicated formula so can be difficult to work out 5. Need 2 sets of variable data so the test can be performed 6. Only looking at data in relation to one another, so doesn’t analyse the actual value.

Name 4 examples of using inferential statistics.

Study These Flashcards

Spearman’s Rank correlation coefficient 2. Best-fit lines and linear regression 3. Mann Whitney U Test 4. Chi Square

What are best-fit lines and linear regression?

Study These Flashcards

• Highlights the trend in the scatter graph • The steepness of best-fit line shows the rate of change in y as x changes • Best-fit lines are calculated from linear regression: line is positioned so that the distance between it and all the points is at a minimum • Calculated using the regression formula

What are the advantages of using lines of best-fit and linear regression?

Study These Flashcards

Shows the relationship between two variables
used for prediction = so gaps in data can be estimated
Identifies anomalies in the data - the bigger the residual (distance from the regression line to a value of y) the more anomalous
Residuals can be mapped to see if there is a spatial pattern or other explanation for anomalies

What are the disadvantages of using lines of best-fit and linear regression?

Study These Flashcards

Regression analysis can only be used for straight line relationships and sometimes relationships are non-linear
Only an estimate and could have a sampling error
Might be a extraneous (third) variable affecting the data set and x and y may be independent from each other

What is the Mann Whitney U Test?

Study These Flashcards

• Measures any difference between two variables • Can be used when there are unequal numbers of data in each set

What are advantages of using the Mann Witney U Test?

Study These Flashcards

Shows spread from the mean
Very visual
Indicates relaibility of data
Compare graphs easily and anomalies are clearly shown
Easy to work out mean, median, etc

What are disadvantages of using the Mann Witney U Test?

Study These Flashcards

Time consuming to construct
Standard deviation is easily manipulated and bias
Large data set needed

What is Chi Square?

* Measures 'goodness of fit' (i.e. whether data are in line with theoretical predictions) * Test to compare data collected with expected values using theoretical hypothesis which is being tested • Measures the association/difference between two variables

What are the advantages of using Chi Square?

1. Can test association between variables 2. Identifies differences between observed and expected values

What are the disadvantages of using Chi Square?

1. Can't use percentages 2. Large number of observations needed for the test to be considered valid 3. Difficult formulae

What are the 3 main groups of statistics?

1. descriptive statistics 2. inferential statistics 3. spatial statistics

What is nearest neighbour analysis?

- quantifies spatial distribution of data points * Measures point patterns in space so they can be described as clustered, random or uniform * The index varies from clustered (0) to uniform (2.15) •

What are the advantages of using nearest neighbour analysis?

1. Identifies clusters which can be further analysed 2. Identifies changes of distribution over time 3. Useful for comparisons in different areas

What are the disadvantages of using nearest neighbour analysis?

1. Cannot be used in an irregular shaped area having rivers or relief barrier separating nearest neighbor 2. Not always valid for comparison - Size of settlements will vary in terms of population and land mass

What is location quotient?

* Measures the concentration of population in a small area engaged in a particular activity in relation to the geographical area it is located * Mostly used to measure concentration of economic activity in an area or region compared to national average

Gini Coefficient or Index of Dissimilarity

• A statistical measure of the degree of variation represented in a set of values • Used especially in analysing income inequality or the degree of segregation in a town or city • The closer the index is to 1 the more uneven the distribution

When would you use descriptive statistics?

When you need to organise/summarise/ describe statistics. You cannot use it to predict patterns.

When would you use inferential statistics?

When you need to predict population parameters. Used when testing a hypothesis against the probability of something happening, in order to prove that the result is significant and did not just occur because of random factors.

Inferential statistics - Give 2 examples of tests of difference between two samples.

1. Mann Witney U test 2. Student's t-test

Inferential statistics - Give 2 examples of tests of correlation between two samples.

1. Spearman's Rank Correlation Coefficient 2. Pearson's Product Moment Correlation Coefficient

Inferential statistics - Give 2 examples of tests of association between two samples.

1. Chi - squared test

Descriptive statistics - Why are comparative measures of dispersion needed?

Because it's unfair to compare IQRs and SDs across data sets as the value of each is strongly influenced by the size of the mean and median.

Descriptive statistics - Name 2 examples comparative measures of dispersion.

1. index of variability 2. coefficient of variation

Descriptive statistics - Comparative measures of dispersion - how does the index of variability work?

Index of variability expresses the quartile deviation as a % of the median of a data set.

Descriptive statistics - Comparative measures of dispersion - how does the coefficient of variation work?

Coefficient of variation expresses the standard deviation as a % of the mean set of data.

Stage 5 Flashcards

(42 cards)