What are validation and automatic editing rules in data capture? -

- automatic correction of data that does not make sense. - validation gates can check the feasibility of data and highlight anything that needs to be checked by data experts. The respondent can be contacted to check to verify.

A freak value that is unusual compared to other responses. - representative outliers are genuine values - non- representative outliers are unique or incorrect data values which should be looked at and treated by editing and imputation systems.

Data processing and analysis Flashcards by Jenny Neale

What is coding?

Coding involves allocating a number to each of the possible responses provided to a closed question, or allocating a code to a response to an open question. Numbers for closed question responses can be written into a questionnaire.

A code is quicker to enter onto computer systems than text responses. The frequency of selection can also be calculated.

How well did you know this?

Not at all

Perfectly

Before data can be entered onto a computer for data processing it must first be …………?

Coded - allocate a number to each of the possible responses provided.

How well did you know this?

Not at all

Perfectly

What are the typical codes used in coding of responses to show that the response is missing?

. 
#

How well did you know this?

Not at all

Perfectly

Why is it necessary to have a code for missing numerical values when coding?

This will prevent confusion with actual responses.

Sometimes 999 is used.

How well did you know this?

Not at all

Perfectly

How can coded date be analysed?

Using software packages such as SPSS or SAS. A large range of summary statistics can be produced like the mean or standard deviation, trends, irregular movements, outliers, tables or graphs. Drawback is that analysts need to be trained and skilled.

How well did you know this?

Not at all

Perfectly

When can coding be carried out?

before the survey
during an interview
after the survey

How well did you know this?

Not at all

Perfectly

What is data capture?

The process where by data collected on questionnaires or forms is transferred to an electronic file and put onto the computer.

batch keying = manual keying of data
scanning = using Intelligent Character Recognition
direct entry by an interviewer

How well did you know this?

Not at all

Perfectly

automatic correction of data that does not make sense.
validation gates can check the feasibility of data and highlight anything that needs to be checked by data experts. The respondent can be contacted to check to verify.

How well did you know this?

Not at all

Perfectly

What is an outlier?

A freak value that is unusual compared to other responses.

representative outliers are genuine values
non- representative outliers are unique or incorrect data values which should be looked at and treated by editing and imputation systems.

How well did you know this?

Not at all

Perfectly

How can you tell if a value is an outlier?

create a scatter plot and see which values fall away from the bulk of values.
calculate the distance from the mean - measure the relative distance between a response and the average response. Those with a large distance are outliers.
sort the responses into ascending order and trim off the top x % and bottom y %.

How well did you know this?

Not at all

Perfectly

Should outliers be removed from the data?

Outliers should be re-checked in case it was a recording error, scanning problem or keying error. Analysis can be carried out with or without the questionable values to look at what what impact they have on the results.

The more extreme values give information on how variable the data are, which in some studies, is very important.

How well did you know this?

Not at all

Perfectly

What is the name for a single observation that is inconsistent with the rest of the data for the variable being observed?

An outlier or freak value

How well did you know this?

Not at all

Perfectly

What is ‘querying’ a database?

Querying a database refers to retrieving particular data, like filtering out what you need.

How well did you know this?

Not at all

Perfectly

What is weighting?

You can weight the data to attempt to more accurately reflect the population. For example, if you think there were more old responders than young, you can increase the weight of the answers you got from the young people. The drawback is that you could well just be scaling up errors if results you do have are not accurate.

How well did you know this?

Not at all

Perfectly

What is the mean?

An average. It is calculated as the sum of the values divided by the number of cases.

How well did you know this?

Not at all

Perfectly

What 2 main errors can come about when coding?

Study These Flashcards

Coding decision errors

- Accidental entry error

What is a record?

Study These Flashcards

The data collected from a unit of the sample population. This is often given an identifier number.

What order should the codes be allocated in?

Study These Flashcards

In the order that the questions and responses are given in the survey.

What should you have codes for?

Study These Flashcards

The response options
For missing values
For refusals
For ‘I don’t know’
For ‘other’
For when that question is not applicable to the respondent

How should you code number values such as someone’s age or how many times they do something?

Study These Flashcards

Where possible it is better to use real world numbers in the coding so if someone is 18, the code can be 18.

How can you create your code book?

Study These Flashcards

Use piloting to help
Code some responses and add codes based on this is necessary
Have code categories but remember that they cannot overlap

How are open questions in face-to-face interviews coded?

Study These Flashcards

Often answer are typed/ written and are coded at a later stage when there is more time for analysis.

What is data cleaning?

Study These Flashcards

This takes place after data has been coded and entered onto a computer. This checks that data is complete and consistent and to make sure that only recognised codes are used. If errors are found it may be possible to return to a hard version of the form to check. This is a drawback of initial computer entry and their is no hard copy to go back to.

In data analysis, what is adjustment?

Study These Flashcards

If you know that a certain sub-set has been under or over represented in your survey you may adjust your data using weighting.

How could question non-response be dealt with?

- The average value for that question could be inputted. - Imputation can be used - this is a complex model as the answer is based on making an informed decision on how the respondent may well have answered, based on how a similar unit responded. This needs to be a approached carefully as if there is loads of imputation and the model does not reflect real world realities it will bias the estimates.

Data processing and analysis Flashcards

(25 cards)