Stat: Data Collection Flashcards

1
Q

Primary data

A

The data is collected by, or on behalf of, the person who is going to use the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Secondary Data

A

The data is not collected by, nor on behalf of, the person who is to use the data. The data are second hand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population

A

The whole set of individuals or items that are of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Census

A

Every member of the population is observed or measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample

A
  • A carefully selected sub-set of the population.
  • It should be representative of the population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sample Survey

A

This where information about the population is found out from the information obtained from sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling Unit

A

An individual member of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sampling Frame

A

A list identifying every single sampling unit that is in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Random Sample meaning

A

Every possible sample of size n has an equal chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Simple Random Number Sampling

A
  • Each item or individual is given a number.
  • Selection is then carried out via random number tables or generators.
  • (random numbers generated using a calculator or Electronic random number indicating Equipment)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Systematic Sampling

A

Elements are chosen at regular intervals from an ordered list.
- (if sample size is 20 from a population of 100, you’d take every fifth person 100/20)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Stratified Sampling

A
  • Population is divided into mutually exclusive strata.
  • A simple random sample is taken from each strata.
  • The proportion of each strata in the sample is the same as that in the population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stratified sampling formula

A

the no. sampled in a stratum= no. in stratum/ no. in a population x overall sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Quota Sampling

A
  • Population is dvided into groups of gender, social class, etc.
  • Number of individuals selected is set to reflect the population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Opportunity sampling

A
  • taking the sample from people who are available at the time the study is carried out
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Census Advantages

A
  • It should give a completely accurate result.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Census Disadvantages

A
  • It is time consuming and expensive.
  • It cannot be used when testing leads to destruction (e.g testing lifetime of batteries)
  • lots of data, so can be difficult to process.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sample survey Advantages

A
  • It is cheaper than a Census.
  • Results are quicker compared to a Census.
  • Less data to deal with than a Census.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Sample survey Disadvantages

A

The data may not be as accurate, so sample is not representative of the population.
- Sample may not be large enough to provide information about the small sub- groups of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Random sampling with replacement

A

Each unit is replaced back into the population before the next selection is made. So each unit can appear more than once in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sampling without replacement

A

If a unit is selected, it’s not replaced. So for each draw only the sampling units that have not been selected previously are eligible for the next draw.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Quota sampling advantages

A
  • quick since a representative sample can be achieved with a small sample size
  • cheap
  • easy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Quota sampling disadvantages

A
  • can introduce bias by person picking sampling units
  • inaccurate since it’s impossible to estimate the sampling errors as the process is not a random process
  • non responses are not recorded
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Opportunity sampling advantages

A
  • easy to carry out
  • cheap
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Opportunity sampling disadvantages

A
  • unlikely to provide a representative sample
  • dependent on individual researcher
26
Q

Measures of location

A

Comparing mean and median

27
Q

Measures of spread

A

Standard deviation and IQR

28
Q

Which measures to use together?

A

Mean and standard deviation
Median and IQR

29
Q

What does it mean if the median is greater than the mean

A
  • There’s fewer large distances
  • The distribution is positively skewed
30
Q

The higher the standard deviation/IQR…

A

The greater the spread

31
Q

What month is least sunshine?

A

October since least sunshine

32
Q

A stratified sample must have…

A

a sampling frame

33
Q

Difference between stratified sampling and quota sampling

A

Same method but
Stratified vs quota
- sampling frame is required vs not required
- random sampling error can be estimated vs cannot be estimated

34
Q

When to use median and IQR?

A

if there are outliers (the data is skewed) then use median and IQR since this will affect mean and standard deviation

35
Q

How will extra values below the medium affect values?

A
  • Q2 will be lower
  • Q3 will be lower
36
Q

Statistic

A

A random variable that is a function of a random sample that contains no unknown parameters

37
Q

Explain what you understand by the sampling distribution of Y

A

The probability distribution

38
Q

What’s not a statistic?

A

The equation with unknown parameters

39
Q

Sampling distribution

A

the values of a statistic and the associated probabilities is a sampling distribution

40
Q

Give a reason we should include outliers and a reason why we shouldn’t

A
  • it’s a piece of data so we should consider all data
  • it’s an outliers that could effect the results
41
Q

The range of distances in m that corresponds to the recorded value 0 for daily mean visibility

A

0-500m

42
Q

Use mean or median to analyse data

A

If outliers use median since it will affect mean

43
Q

Quota vs stratified

A

Stratified: Take a (simple) random sample from (mutually exclusive) groups of the population
Sample sizes within strata in strict proportion to numbers in each strata in the population
Quota: Non-random sampling
from groups of the population

44
Q

Scatter diagram

A

X

45
Q

High standard deviation

A

data are more spread out

46
Q

Low standard deviation

A

clustered around the mean

47
Q

Extrapolation

A

estimating an unknown value based on extending the values

48
Q

Dangers of extrapolation

A
  • can be unreliable, since trend might not continue (especially when there are disparities in the existing data sets)
    . Extrapolation doesn’t account for qualitative values that can trigger changes in future values within the same observation. It hardly accounts for causal factors in the observation.
49
Q

How t9 know is PMCC value is wrong

A

If it is greater than 1

50
Q

State two variables from the large data set that are not suitable to be modelled by a normal distribution.

A
  • daily wind speed (Beaufort) since it is qualitative data
  • rainfall (since not symmetric)
51
Q

Comment on the suitability of Sara’s sampling method of this study

A

Too little days measures or data

52
Q

Suggest how Sara could make better use of the large data set for her study

A

Use more data from more of the UK locations and more of the months

53
Q

From your knowledge of the large data set, explain why this process may not generate a large enough sample size

A

In the large data set, some days might have gaps because the data was not recorded

54
Q

Big IQR/range or spread

A

Larger standard deviation

55
Q

If median increases but mean (22.5) is same. Suggest values

A

Both values must be greater than median and values must add to 45

56
Q

Additional values are added. Explain why the standard deviation will be lower

A

Both values must be less than 1 standard eviction from the mean

57
Q

Non random sampling methods

A

Opportunity or quota

58
Q

Why might a stratified random sampling not be used?

A

It is not possible to have a sampling frame

59
Q

Qualitative variables

A

Wind speed

60
Q

Smaller mean but larger standard deviation

A