Data processing and analysis Flashcards

1
Q

What is coding?

A

Coding involves allocating a number to each of the possible responses provided to a closed question, or allocating a code to a response to an open question. Numbers for closed question responses can be written into a questionnaire.

A code is quicker to enter onto computer systems than text responses. The frequency of selection can also be calculated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Before data can be entered onto a computer for data processing it must first be …………?

A

Coded - allocate a number to each of the possible responses provided.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the typical codes used in coding of responses to show that the response is missing?

A
. 
#
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is it necessary to have a code for missing numerical values when coding?

A

This will prevent confusion with actual responses.

Sometimes 999 is used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can coded date be analysed?

A

Using software packages such as SPSS or SAS. A large range of summary statistics can be produced like the mean or standard deviation, trends, irregular movements, outliers, tables or graphs. Drawback is that analysts need to be trained and skilled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When can coding be carried out?

A
  • before the survey
  • during an interview
  • after the survey
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data capture?

A

The process where by data collected on questionnaires or forms is transferred to an electronic file and put onto the computer.

  • batch keying = manual keying of data
  • scanning = using Intelligent Character Recognition
  • direct entry by an interviewer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
  • automatic correction of data that does not make sense.
  • validation gates can check the feasibility of data and highlight anything that needs to be checked by data experts. The respondent can be contacted to check to verify.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an outlier?

A

A freak value that is unusual compared to other responses.

  • representative outliers are genuine values
  • non- representative outliers are unique or incorrect data values which should be looked at and treated by editing and imputation systems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you tell if a value is an outlier?

A
  • create a scatter plot and see which values fall away from the bulk of values.
  • calculate the distance from the mean - measure the relative distance between a response and the average response. Those with a large distance are outliers.
  • sort the responses into ascending order and trim off the top x % and bottom y %.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Should outliers be removed from the data?

A

Outliers should be re-checked in case it was a recording error, scanning problem or keying error. Analysis can be carried out with or without the questionable values to look at what what impact they have on the results.

The more extreme values give information on how variable the data are, which in some studies, is very important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the name for a single observation that is inconsistent with the rest of the data for the variable being observed?

A

An outlier or freak value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ‘querying’ a database?

A

Querying a database refers to retrieving particular data, like filtering out what you need.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is weighting?

A

You can weight the data to attempt to more accurately reflect the population. For example, if you think there were more old responders than young, you can increase the weight of the answers you got from the young people. The drawback is that you could well just be scaling up errors if results you do have are not accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mean?

A

An average. It is calculated as the sum of the values divided by the number of cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What 2 main errors can come about when coding?

A
  • Coding decision errors

- Accidental entry error

17
Q

What is a record?

A

The data collected from a unit of the sample population. This is often given an identifier number.

18
Q

What order should the codes be allocated in?

A

In the order that the questions and responses are given in the survey.

19
Q

What should you have codes for?

A
  • The response options
  • For missing values
  • For refusals
  • For ‘I don’t know’
  • For ‘other’
  • For when that question is not applicable to the respondent
20
Q

How should you code number values such as someone’s age or how many times they do something?

A

Where possible it is better to use real world numbers in the coding so if someone is 18, the code can be 18.

21
Q

How can you create your code book?

A
  • Use piloting to help
  • Code some responses and add codes based on this is necessary
  • Have code categories but remember that they cannot overlap
22
Q

How are open questions in face-to-face interviews coded?

A

Often answer are typed/ written and are coded at a later stage when there is more time for analysis.

23
Q

What is data cleaning?

A

This takes place after data has been coded and entered onto a computer. This checks that data is complete and consistent and to make sure that only recognised codes are used. If errors are found it may be possible to return to a hard version of the form to check. This is a drawback of initial computer entry and their is no hard copy to go back to.

24
Q

In data analysis, what is adjustment?

A

If you know that a certain sub-set has been under or over represented in your survey you may adjust your data using weighting.

25
Q

How could question non-response be dealt with?

A
  • The average value for that question could be inputted.
  • Imputation can be used - this is a complex model as the answer is based on making an informed decision on how the respondent may well have answered, based on how a similar unit responded.

This needs to be a approached carefully as if there is loads of imputation and the model does not reflect real world realities it will bias the estimates.