Vocab Flashcards

Chapter 1 - Collection of data.

1
Q

Population

A

Everyone/everything involved in an investigation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Census

A

An investigation with data taken from every member of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample

A

An investigation with data taken from a select few of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bias

A

Anything that distorts the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Strata

A

A subgroup/subcategory within a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sampling frame

A

A list of all the items/people forming a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling unit

A

One item from a sampling frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Observation

A

You record something happening.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Experiment

A

You record data from something you make happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Qualitative data

A

Describes certain qualities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quantitative data

A

Describes certain quantities, can be discrete or continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Continuous data.

A

Data we can measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Discrete data.

A

Data we can count.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Primary data.

A

Collected by the user.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Secondary data.

A

You obtain the data from somebody else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Questionnaire.

A

A set of questions used to obtain data, which respondents complete, can be anonymous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Interview/Survey.

A

Data collection methods. Ask people their opinions, can be anonymous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pilot survey.

A

Testing a questionnaire on a small group of people first.

-identifies likely responses
-checks response rate
-see if questions are understood
-checks how long it will take

-unexpected outcomes(refine hypothesis/change something)
-problems easier and less costly to fix before full study
-check methods of distribution/collection work
-estimate time/costs of full study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Open questions.

A

No suggested answers, differently worded answers can make data analysis difficult.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Closed questions.

A

Suggested answers to choose from, opinion scales where people tend to answer in the middle as they do not wish to be extreme.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Capture recapture

A

A population estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Judgement sampling.

A

Use judgement to select a sample representative of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Opportunity sampling.

A

Use available people/objects at the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Systematic sampling.

A

Choose a starting point from your sampling frame at random, then choose items at regular intervals. (e.g. sampling frame of 1st 32, use RNG to pick number in 1st 32, then go up sample in intervals of 32s asking every person selected.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Random sampling.

A

Everyone in the population has an equal chance of being selected (unbiased).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Quota sampling.

A

Group by characteristics, and interview a number from each group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Cluster sampling.

A

Data naturally splits. List of clusters = sampling frame. Randomly select clusters to form sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Stratified sampling.

A

Intentionally different proportion of people asked from each strata, depending on size. (e.g. 60/1000 x 250 =15 year 7s in sample).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Random response method.

A

For sensitive questions which people are likely to answer dishonestly (e.g. flipping coins, if heads, tick yes, if tails, answer honestly.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Primary data advantages

A

gather data that directly relates to hypothesis

you know reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

primary data disadvantages

A

expensive
time consuming
difficult/impossible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

secondary data advantages

A

easier to get hold of
can gather data quickly and cheaply
large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

secondary data disadvantages

A

wrong format/rounded

difficult to find data that matches your hypothesis exactly
(out of date, no relevant data available)

don’t know accuracy, may be biased, unreliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

census pros

A

representative of entire pop.
unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

census cons

A

hard/impossible for big pops.

expensive

impractical

might be tricky to define entire pop/access all members

not an option when items being used up/damaged by investigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

sample pros

A

quicker
cheaper
more practical than a census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

sample cons

A

less accurate
not fully representative
biased
variability between samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

random sampling pros

A

unbiased
(should be) representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

random sampling cons

A

not always practical/convenient-if pop. spread over large area, travel

impossible to list entire pop. or access everyone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

stratified sampling pros

A

likely gives a representative sample if you have easy to define categories (e.g. gender)

can compare results from different groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

stratified sample cons

A

not useful when no obvious categories/hard to define

can be expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

systematic sampling pros

A

unbiased sample
can be done by machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

systematic sample cons

A

nth item might coincide with a pattern (e.g. fault) so biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

cluster sampling pros

A

convenient (saves travel time when pop. spread over large area)

45
Q

cluster sampling cons

A

biased if similar clusters sampled, e.g. with similar incomes per region.

46
Q

quota sampling pros

A

quick
representation of all diff groups (genders etc)
can be done with no sample frame

member easily replaced by one of the same characteristics

47
Q

quota sampling cons

A

biased- interviewer bias
refusal to take part (might have similar views)
-not all may have an equal chance of being selected

48
Q

opportunity sampling pros

A

convenient

49
Q

opportunity sample cons

A

-not representative of pop.
-very biased.
-selecting at a particular time and place so not all students have an equal chance of being selected.

50
Q

judgement sampling pros

A

quick
sometimes may be the only suitable method to use

51
Q

judgement sampling cons

A

researcher bias

researcher unreliable-though should have good knowledge of pop.

not random -very biased

52
Q

categorical scale

A

gives names or numbers to classes of qualitative data so it can be more easily processed. (numbers don’t have meaning).

53
Q

ordinal scale

A

(rank scale)

gives numbers to the classes of data which can be ordered in a meaningful way.

54
Q

multivariate data

A

made up of two or more variables

55
Q

bivariate data

A

data made up of two variables (numerical)

56
Q

questionnaire pros

A

quick and cheap

well written ones shouldn’t be biased

respondents aren’t under pressure, so their answers likely truthful

can distribute to large numbers of people

57
Q

questionnaire cons

A

distribution can lead to bias

non-responses
(particularly on sensitive Qs)
(discard but might remove certain parts of pop.)

questions might not be understood by respondent

58
Q

methods to distribute questionnaires (pros and cons)

A

hand it out - target pop gets, but time consuming

put it online -data recorded and collected easily, but ppl without internet access excluded

post/email - wide reaching, not sure who is responding

ask ppl to collect it - easy, but people with strong views are more likely to take one.

59
Q

interview pros

A

ask more complex questions

can explain Qs if someone doesn’t understand/ask follow up questions

higher response rate

you know the right person answered the questions

60
Q

interview cons

A

time consuming - one person at a time

expensive - employ interviewers/travel if sample is geographically spread out

more likely to lie if questions are sensitive, they may be embarrassed

answers could be recorded in a biased way (accidental if untrained, deliberate if strong views)

61
Q

statistical enquiry cycle

A
  1. planning (hyp, what data and how use)
  2. collecting data (prim/sec, constraints)
  3. processing and presenting data (diagrams/measures, tech)
  4. interpreting results (plan analysis, conclusions, predict)
  5. communicating results clearly and evaluating methods (aware of target audience, clear visual representation of results)
62
Q

collecting data

A

primary data by experiment - reliable recording of data accurately/fairly

secondary data from a website- more reliable in cases, for sensitive topics (income, (money spent) weight, age)

63
Q

processing and presenting

A

Distribution?
-averages
-measures of spread
-box plots
-(pie charts)
-(histograms)
-(bar graphs)

Correlation
-Scatter graph
-line of best fit
-SRCC
-PMCC

Over time
-time series graph

64
Q

Interpreting data

A

-compare averages
or
-find correlation

do the result prove/disprove hypothesis

-do I need to repeat to find more results? (c+e)

65
Q

Closed vs open questions

A

Closed questions have a fixed number of possible answers whereas open questions can be answered in any way.

66
Q

Questionnaire questions, think: (SABCURL)

A

-Is it understandable and clear?
-Is it relevant?
-Is it leading?
-Is is biased?
-Is it ambiguous?
-Is it sensitive?

67
Q

How can we reduce the problem of non-responses?

A

-Follow up people who did not respond
-Provide an incentive for people to answer (prize)
-Use clear questions that are easy to answer

68
Q

Remember to:

A

Answer the question in a statement
Look at how many parts to q and how many marks

69
Q

How to use technology

A

Can use technology to…

-order data (e.g. by age)
-identify missing data
-remove irrelevant columns/data
-remove extraneous symbols
-remove outliers

-automate the calculation of summary statistics (using a computer) e.g. mean point, line of best fit.
-set up a computer to visually represent data

70
Q

Advantages of using technology

A

-can reduce human error
-uses all data so unbiased
-more visually appealing
-saves time

71
Q

constraints when planning an investigation:

A

time - under pressure?

costs - budget? minimise spending? longer investigation = more expensive, costs of travel and equipment

ethical issues - no harm/ distress

confidentiality- sensitive information e.g income? could be hard to get accurate data- ppl may lie or refuse to answer.

convenience - hyp could be difficult/ impossible to test, think abt most convenient way to access data you need

72
Q

observation

A

involves counting or measuring

73
Q

reference sources

A

secondary sources of information:

-acknowledge its source
-consider reliability(biased?)
-out of date? wrong format? data incomplete/missing?

74
Q

explanatory variable

A

the variable you are in control of/ the variable that has an affect on the other variable

75
Q

response variable

A

the variable you measure/ changes as a result of changing the explanatory variable.

76
Q

when considering a lab, field, or natural experiment, think:

A

how far can I control the explanatory variable?

77
Q

How can we clean raw data?

A

-Remove outliers
-Put data in the dame format
-Remove extraneous symbols
-Identify missing values
-Remove irrelevant columns

78
Q

Why would we repeat a simulation/experiment a number of times?

A

-Find the mean average
-Compare results/see patterns
-Spot anomalous results
-Results will vary

79
Q

Steps for a simulation

A

-Choose a suitable method for getting random numbers
-Assign numbers to the data
-Generate random numbers
-Match the random numbers
-count how many rolls or whatever it took
-repeat a number of times and find the mean average

80
Q

Frequency polygon

A

Use midpoints

81
Q

Cumulative frequency chart

A

Use endpoints/the highest value.

82
Q

Why would you expect a smaller sample to have a greater standard deviation?

A

More variation between samples.

83
Q

Why may it be appropriate to remove outliers?

A

-May be an error in data
-Doesn’t fit trend

84
Q

What should you look for in tables?

A

Patterns in the data e.g. is distribution symmetric?

85
Q

why might the mean be appropriate?

A

takes into account all the data
can be used to calculate standard deviation

86
Q

why might the mean not be appropriate?

A

may be significantly affected by extreme values or outliers

87
Q

why might the median be appropriate?

A

-useful when data is skewed or contains outliers as not distorted by extreme values
-easy to find in ordered data
-can be used alongside range and IQR

88
Q

why might the median not be appropriate?

A

isn’t always a data value
not always a good representation of the data

89
Q

why might the mode be appropriate?

A

always a data value
can be used with non-numerical data
easy to find in tallied data

90
Q

why might the mode not be appropriate?

A

-doesn’t always exist
-may be more than one
-may be a misleading value far from the mean
-may not be a good representation of the data.

91
Q

What does PMCC tell you?

A

It measures how close the points on a scatter diagram are to a straight line (how linear the correlation is)

92
Q

What does SRCC tell you?

A

It measures correlation between ranks. (this can be strong even if the data values themselves have a non-linear relationship so SRCC can detect both a linear and non-linear association).

93
Q

How will SRCC and PMCC compare if there’s a non-linear association between two variables?

A

Both will be positive or negative but SRCC will be stronger (closer to 1 or -1).

94
Q

If the mean is low then…

A

more than 50% of data values must be above the mean.

95
Q

If the mean in high then…

A

More than 50% of data values must be below the mean.

96
Q

Why should a control group be used?

A

Allows for comparisons (between control and test group).

97
Q

how could matched pairs be used? (2)

A

Will aim to pair people based on similar characteristics (e.g. age, gender) and place one in each group.

98
Q

What can you do when given a pie chart?(or comparative pie charts)

A

Measure the radius! With a ruler!

99
Q

index numbers

A

talk about rate

100
Q

for probability tree diagrams…

A

multiply all the branches out to find values at end of each branch.

101
Q

for comparing regression lines…

A

-talk about gradient.
-plug the values given into the equation or imagine x as 0.
-interpret each correlation.

102
Q

Cumulative frequency step polygons

A

along and then up
the height of each step is the same as the frequency for its corresponding value e.g. 5 boxes (vertical) have 48 matches (horizontal)

103
Q

why might the mean increase?

A

if you add a value greater or take away a value less than the mean, it increases

104
Q

Why is combining results (e.g. into one grouped frequency table) an advantage?

A

Only need to calculate one mean

105
Q

Why is combining results (e.g. into one grouped frequency table) a disadvantage?

A

Can’t compare classes

106
Q

What do we do for a systematic sample?

A

number
divide
choose
go in intervals

107
Q

What do we do for a systematic sample?

A

number
divide
choose
go

108
Q

Are population or sample means more consistent? 😡

A

sample means so standard deviation of pop is bigger