Data processing and analysis Flashcards
What is coding?
Coding involves allocating a number to each of the possible responses provided to a closed question, or allocating a code to a response to an open question. Numbers for closed question responses can be written into a questionnaire.
A code is quicker to enter onto computer systems than text responses. The frequency of selection can also be calculated.
Before data can be entered onto a computer for data processing it must first be …………?
Coded - allocate a number to each of the possible responses provided.
What are the typical codes used in coding of responses to show that the response is missing?
. #
Why is it necessary to have a code for missing numerical values when coding?
This will prevent confusion with actual responses.
Sometimes 999 is used.
How can coded date be analysed?
Using software packages such as SPSS or SAS. A large range of summary statistics can be produced like the mean or standard deviation, trends, irregular movements, outliers, tables or graphs. Drawback is that analysts need to be trained and skilled.
When can coding be carried out?
- before the survey
- during an interview
- after the survey
What is data capture?
The process where by data collected on questionnaires or forms is transferred to an electronic file and put onto the computer.
- batch keying = manual keying of data
- scanning = using Intelligent Character Recognition
- direct entry by an interviewer
- automatic correction of data that does not make sense.
- validation gates can check the feasibility of data and highlight anything that needs to be checked by data experts. The respondent can be contacted to check to verify.
What is an outlier?
A freak value that is unusual compared to other responses.
- representative outliers are genuine values
- non- representative outliers are unique or incorrect data values which should be looked at and treated by editing and imputation systems.
How can you tell if a value is an outlier?
- create a scatter plot and see which values fall away from the bulk of values.
- calculate the distance from the mean - measure the relative distance between a response and the average response. Those with a large distance are outliers.
- sort the responses into ascending order and trim off the top x % and bottom y %.
Should outliers be removed from the data?
Outliers should be re-checked in case it was a recording error, scanning problem or keying error. Analysis can be carried out with or without the questionable values to look at what what impact they have on the results.
The more extreme values give information on how variable the data are, which in some studies, is very important.
What is the name for a single observation that is inconsistent with the rest of the data for the variable being observed?
An outlier or freak value
What is ‘querying’ a database?
Querying a database refers to retrieving particular data, like filtering out what you need.
What is weighting?
You can weight the data to attempt to more accurately reflect the population. For example, if you think there were more old responders than young, you can increase the weight of the answers you got from the young people. The drawback is that you could well just be scaling up errors if results you do have are not accurate.
What is the mean?
An average. It is calculated as the sum of the values divided by the number of cases.