Lecture 4 Flashcards
Statistical data editing
Observed data genrallt contains error and missing values. Thus the data must underdo preliminary preparation before the data can be Analysed
Statistical data Editif is process of checking or observed data and when necessary, correcting them
Essential tasks
Error localisation: determine which values are erroneous
Correction: correct missing and erroneous data in best possible way
Consistency adjust values such that all edits become satisfied
Statistical data editing - why does the data need to be edited?
Interview error: interviewers may not be giving the respondents the correct instructions
Omissions: respondents often fail to answer a single question or a section of the questionanaire either deliberately or inadvertently
Ambiguity: a response might mnot be legible or it might be unclear
Inconsistencies: sometime two responses can be logically inconsistent. Eg lawyer may tick box saying they didn’t attend school
Lack of cooperation: in a long questionnaire with hundreds of attitude questions, a respondent might rebel and check the same response in a long list of questions
Ineligible respondent : an inappropriate respondent may be included in the sample (eg underage respondents)
Interview error
Interviewers may not be giving the respondents the correct instructions
Omissions
Respondents often fail to answer a single question or a section of the questionnaire, either deliberately or inadvertently
Data techniques to prepare data for model estimation
Data coding
Data matching
Data imputation
Data adjusting
Data coding
Data coding is specifying how the info should be categorised to facilitate the analysis. Transform data into a suitable form for the analysis
Data matching
Data matching is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database
Data imputation
Data imputation is the process of estimating missing data and filling these values into data set
Data adjusting
Data adjusting is a process to enhance the quality of the data for the data analysis (eg weighing, variable, repsect, scale transformation)
Common procedures for statistically adjusting data
Weighting = each observation is assigned a number according to some pre specified rule eg weighting is used to make the smalle data more representative
Variable specification = existing data are modified to create new variables or in which a large number of variables are reduced into fewer variables eg six categories are summarised in four categories
Scale transformation = adjust the scale to ensure the compatibility with other scales eg some respondent may consistently use a lower end of a rating scale and some uppers
Two main ways we can use data
Two main ways we use data:
Language reflects: tech reflects intentions, relationships context and more
Eg people tweet about events near, brand positioning,
Language affects: text affect perceptions, firm outcomes and more
Eg online chatter increases stock value, narrative reviews are more persuasive than non narrative reviews
Mode
Mode is the value in a measurement series (category) with maximum frequency (multiple mode values are possibibke)
Mode is low data requirement (nominal scaling)
Limits to mode is ambiguous in interpretation if multiple mode values exist + cannot be used for analysis with advanced stat modules
Median
Median is the value that lies in the middle of a frequency distriburion(same number of instances above and below the median)
Low data requirements (Ordinal scaling) * low sensitivity to outliers
Limits to using median/ cannot be used for analysis with advanced stat method a
Mean
Mean is most popular location parameter + basis for many advanced stat analyses (t test, variance analysis)