Stage 5 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is the purpose of of descriptive statistics?

A
  1. To organise, summarise and describe various measures of a sample. (Descriptive statistics do not allow you to predict anything about the population from which the sample is taken)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the common forms of analysis for descriptive statistics?

A
  1. Measures of central tendency (mean, median, mode) 2. Measures of dispersion (range, inter quartile range, standard deviation) 3. Comparative measures of dispersion used to compare multiple samples of the same variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the mean.

A

The sum of all the values in a data set divided by the total number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the advantages of using the mean value?

A
  1. Most widely used measure of average 2. Easy to calculate 3. Often used/needed for further statistical analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the disadvantages of using the mean value?

A
  1. Distorted by extreme values 2. Unreliable if there are only a few values used in calculation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define the median.

A

The middle value of a data set when the values are arranged in rank order. It is easily found if values are plotted on a dispersion graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the advantages of using the median value?

A
  1. Unaffected by changes in extreme values 2. More representative of the whole data set if the data set is extremely skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the disadvantages of using the median value?

A
  1. Equal weight is given to each item in the data set regardless of the value of that item 2. Of limited use in further data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define the mode.

A

The value that occurs most frequently in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are advantages of using the mode value?

A
  1. Simplest measure 2. Model class can be useful when data is grouped
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are disadvantages of using the mode value?

A
  1. Often more than one mode in a data set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of inferential statistics?

A
  1. To infer (i.e. predict) population parameters from a sample 2. Involves testing a hypothesis against the probability that a significant result has not occurred as a result of random factors (i.e. by chance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the common forms of analysis for inferential statistics?

A
  1. Tests of difference between two samples 2. Tests of correlation between two samples 3. Tests of association within or between samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 7 stages of the methodology used for inferential statistics?

A
  1. State a null hypothesis 2. State an alternative hypothesis 3. State the rejection level 4. Work out the calculated statistic 5. Compare the calculated statistic with the critical statistic 6. Decision making 7. Write the conclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Spearman’s Rank correlation coefficient?

A

• Measures the strength of the relationship between two variable. • Allows you to state the apparent trend • Varies from -1 to +1 and can be used when there are equal numbers of data in each set (at least 12)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the advantages of using Spearman’s Rank?

A

1.Shows the significance of the data 2. Proves/disproves correlation 3. Allows for further analysis 4. Doesn’t assume normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the disadvantages of using Spearman’s Rank?

A
  1. Does not tell you the cause of the relationship 2. Does not allow you to predict one variable from another 3. Does not explain anything about anomalies in the relationship 4. Quite a complicated formula so can be difficult to work out 5. Need 2 sets of variable data so the test can be performed 6. Only looking at data in relation to one another, so doesn’t analyse the actual value.
18
Q

Name 4 examples of using inferential statistics.

A
  1. Spearman’s Rank correlation coefficient 2. Best-fit lines and linear regression 3. Mann Whitney U Test 4. Chi Square
19
Q

What are best-fit lines and linear regression?

A

• Highlights the trend in the scatter graph • The steepness of best-fit line shows the rate of change in y as x changes • Best-fit lines are calculated from linear regression: line is positioned so that the distance between it and all the points is at a minimum • Calculated using the regression formula

20
Q

What are the advantages of using lines of best-fit and linear regression?

A
  1. Shows the relationship between two variables
  2. used for prediction = so gaps in data can be estimated
  3. Identifies anomalies in the data - the bigger the residual (distance from the regression line to a value of y) the more anomalous
  4. Residuals can be mapped to see if there is a spatial pattern or other explanation for anomalies
21
Q

What are the disadvantages of using lines of best-fit and linear regression?

A
  1. Regression analysis can only be used for straight line relationships and sometimes relationships are non-linear
  2. Only an estimate and could have a sampling error
  3. Might be a extraneous (third) variable affecting the data set and x and y may be independent from each other
22
Q

What is the Mann Whitney U Test?

A

• Measures any difference between two variables • Can be used when there are unequal numbers of data in each set

23
Q

What are advantages of using the Mann Witney U Test?

A
  1. Shows spread from the mean
  2. Very visual
  3. Indicates relaibility of data
  4. Compare graphs easily and anomalies are clearly shown
  5. Easy to work out mean, median, etc
24
Q

What are disadvantages of using the Mann Witney U Test?

A
  1. Time consuming to construct
  2. Standard deviation is easily manipulated and bias
  3. Large data set needed
25
Q

What is Chi Square?

A
  • Measures ‘goodness of fit’ (i.e. whether data are in line with theoretical predictions)
  • Test to compare data collected with expected values using theoretical hypothesis which is being tested

• Measures the association/difference between two variables

26
Q

What are the advantages of using Chi Square?

A
  1. Can test association between variables 2. Identifies differences between observed and expected values
27
Q

What are the disadvantages of using Chi Square?

A
  1. Can’t use percentages
  2. Large number of observations needed for the test to be considered valid
  3. Difficult formulae
28
Q

What are the 3 main groups of statistics?

A
  1. descriptive statistics 2. inferential statistics 3. spatial statistics
29
Q

What is nearest neighbour analysis?

A
  • quantifies spatial distribution of data points
  • Measures point patterns in space so they can be described as clustered, random or uniform
  • The index varies from clustered (0) to uniform (2.15) •
30
Q

What are the advantages of using nearest neighbour analysis?

A
  1. Identifies clusters which can be further analysed
  2. Identifies changes of distribution over time
  3. Useful for comparisons in different areas
31
Q

What are the disadvantages of using nearest neighbour analysis?

A
  1. Cannot be used in an irregular shaped area having rivers or relief barrier separating nearest neighbor
  2. Not always valid for comparison - Size of settlements will vary in terms of population and land mass
32
Q

What is location quotient?

A
  • Measures the concentration of population in a small area engaged in a particular activity in relation to the geographical area it is located
  • Mostly used to measure concentration of economic activity in an area or region compared to national average
33
Q

Gini Coefficient or Index of Dissimilarity

A

• A statistical measure of the degree of variation represented in a set of values • Used especially in analysing income inequality or the degree of segregation in a town or city • The closer the index is to 1 the more uneven the distribution

34
Q

When would you use descriptive statistics?

A

When you need to organise/summarise/ describe statistics. You cannot use it to predict patterns.

35
Q

When would you use inferential statistics?

A

When you need to predict population parameters. Used when testing a hypothesis against the probability of something happening, in order to prove that the result is significant and did not just occur because of random factors.

36
Q

Inferential statistics - Give 2 examples of tests of difference between two samples.

A
  1. Mann Witney U test 2. Student’s t-test
37
Q

Inferential statistics - Give 2 examples of tests of correlation between two samples.

A
  1. Spearman’s Rank Correlation Coefficient 2. Pearson’s Product Moment Correlation Coefficient
38
Q

Inferential statistics - Give 2 examples of tests of association between two samples.

A
  1. Chi - squared test
39
Q

Descriptive statistics - Why are comparative measures of dispersion needed?

A

Because it’s unfair to compare IQRs and SDs across data sets as the value of each is strongly influenced by the size of the mean and median.

40
Q

Descriptive statistics - Name 2 examples comparative measures of dispersion.

A
  1. index of variability 2. coefficient of variation
41
Q

Descriptive statistics - Comparative measures of dispersion - how does the index of variability work?

A

Index of variability expresses the quartile deviation as a % of the median of a data set.

42
Q

Descriptive statistics - Comparative measures of dispersion - how does the coefficient of variation work?

A

Coefficient of variation expresses the standard deviation as a % of the mean set of data.