Chapter 9 - Collecting, Representing and Interpreting Data Flashcards

1
Q

What are the 6 Sampling Techniques?

A

Sampling Techniques:

  • Simple Random Sampling
  • Systematic Sampling
  • Stratified Sampling
  • Opportunity Sampling
  • Quota Sampling
  • Cluster Sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Population and Sample.

A

A population is the set of things that data is going to be collected about. A sample is a subset of that population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define a Parameter in Statistics, and therefore, define a statistic.

A

A parameter is a number that describes the entire population. A statistic is a number taken from a single sample to estimate the parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define a sample bias.

A

A sampling method is biased if it creates a sample that does not represent the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you find the mode of a set?

A

The mode of a set of data is the value or category that occurs most often or has the largest frequency. For grouped data, the modal interval or modal group is normally given.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you find the mean of a set?

A

To work out the mean x̄ of a set of n observations, calculate their sum (of the x values) and divide the result by n.

x̄ = ∑x / n

x̄ = ∑fx/∑f, where f is frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Systematic Sampling.

A
  • Systematic Sampling - Find a sample of size n from a population size N by taking one member from the first k members of the population at random, and then selecting every kth member after that, where k = N/n.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Stratified Sampling.

A
  • Stratified Sampling - When you know you want distinct groups to be represented in your sample, split the population into these distinct groups and then sample within each group in proportion to its size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Opportunity Sampling.

A
  • Opportunity Sampling -Take samples from members of the population you have access to until have a sample of the desired size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Quota Sampling.

A
  • Quota Sampling - When you know want distinct groups to be represented in your sample, decide how many members of each group you wish to sample in advance and use opportunity sampling until you have a large enough sample for each group.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define Cluster Sampling.

A
  • Cluster Sampling - Split the population into clusters that you expect to be similar to each other, then, take a sample from these clusters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Simple Random Sampling.

A
  • Simple Random Sampling - Every Member of a population is equally likely to be chosen. I.E, assign each member of the population with a number. Then, choose a random number.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the four measures of spread?

A

The four measures of spread are:

  • Range
  • Interquartile Range
  • Variance
  • Standard Deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define Range.

A

The range of a set of data is the largest value minus the smallest value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define the Medium, and how to calculate it.

A

The median of a set of data is the middle value of data listed in order of size.

To calculate the position of the medium of a set of n observations, work out the value of (n+1)/2. If this value is a whole number, the medium is in that position. If it is decimal, the median is the mean of the two positions on either side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you evaluate the lower quartile and the upper quartile?

A
  • To evaluate the lower quartile for a set of n ordered values, work out the value of (n+1)/4.
  • To evaluate the upper quartile for a set of n ordered values, work out the value of 3(n+1)/4.
  • To find the position, use the above formulas in the same manner as finding the medium.
17
Q

How do you find the interquartile range?

A

To find the interquartile range, subtract the value of the upper quartile from the value of the lower quartile.

18
Q

Define population variance, and how to calculate it.

A

The variance of a set of data measures how spread out the values are from the mean.

𝞂² = ∑(x-µ)² / n

19
Q

Define population standard deviation.

A

The population standard deviation is the square root of the population variance.

𝞂 = √𝞂² = √(∑(x-µ)² / n)

20
Q

Define sample variance, and how to calculate it.

A

An unbiased estimate of population variance using a sample of n observations with sample mean x̄ is given by the sample variance, s².

s² = ∑(x-x̄)² / n-1
or
s² = (∑x² / n-1) - (∑x)²/n(n-1)

21
Q

Define sample standard deviation.

A

The sample standard deviation is the square root of the sample variance.

s = √s²

22
Q

What are the pros and cons of using the mode as a measure of central tendency?

A

Pros of mode:
- Useful for non-numerical data.
- Not usually affected by outliers.
- Not usually affected by errors or omissions.
- Is always an observed data point.
Cons of mode:
- Doesn’t use all of the data.
- It may not be representative if it has a low frequency.
- There may be other values with similar frequency.

23
Q

What are the pros and cons of using the median as a measure of central tendency?

A
Pros of median:
- Not affected by outliers.
- Not significantly affected by errors.
Cons of median:
- Doesn't make use of all the data.
24
Q

What are the pros and cons of using the mean as a measure of central tendency?

A

Pros of mean:
- When the data set is very large a few extreme values have negligible impact.
Cons of mean:
- When the data set is small a few extreme values or errors have a big impact.

25
Q

What are the pros and cons of using the range as a measure of spread?

A

Pros of range:
- Reflects the full data set.
Cons of range:
- Distorted by outliers.

26
Q

What are the pros and cons of using the IQR as a measure of spread?

A

Pros of IQR:
- Not distorted by outliers.
Cons of IQR:
- Does not reflect the full data set.

27
Q

What are the pros and cons of using the standard deviation as a measure of spread?

A

Pros of standard deviation:
- When the data set is very large a few outliers have negligible impact.
Cons of standard deviation:
- When the data set is small a few outliers have a big impact.

28
Q

What is the five number summary? And how can it be plotted?

A

Data is often summarised by five main statistics.

The five number summary gives the minimum value, lower quartile, median, upper quartile and maximum value.

These can be plotted using a box-and-whisker (or box) plot.

29
Q

What are the methods for plotting continuous single variable data?

A
The methods for plotting continuous single variable 
data:
- Box Plot
- Cumulative Frequency Graph
- Histogram
30
Q

How do you plot a cumulative frequency graph?

A

For a cumulative frequency graph, the x-coordinates are the upper boundary of each interval, and the y coordinates are the sum of the frequencies up to those points.

31
Q

How do you plot a histogram?

A

For a histogram, the width of the rectangles are the size of the intervals, and the height is the frequency density, where:
Frequency Density = Frequency ÷ Class Width

32
Q

What are the three types of correlation?

A

The three types of correlation:

  • Perfect Positive Correlation, r = +1
  • Perfect Negative Correlation, r = -1
  • No Correlation, r = 0.
33
Q

What are the rules of correlation? (for whether variables affect one another)

A

When a change in one variable does affect the other, they have a causal connection.

Correlation without a causal connection is known as spurious correlation.

34
Q

What are the advantages and disadvantages of: Box Plots

A
35
Q

What are the advantages and disadvantages of: Histograms

A
36
Q

What are the advantages and disadvantages of: Cumulative Frequency curves?

A