lecture 4 summary Flashcards
statistical data editing
is the process of checking observed data and correcting them if necessary
error localization
determines which values are erroneous
we recognize the following types of errors
interviewer error: interviews may not be giving the respondents the correct instruction
Omissiong: respondents often fail to answer a single question or a section of the questionnarie, either deliberately or inadvertently
AMbiguity: a response might not be legible, or it might be unclear
Inconsistncies: sometimes two responses can be logically inconsistent
lack of cooperation: in the long questionnaire with hundreds of attitude questions, a respondent might rebel and checkthe same response in a long list of questions
Ineligible respondent: an inappropriate respondent may be included in the sample (e.g. underage respondents)
Data coding
is specifying how the information should be categorized to facilitate the analysis. The main purpose is to transform the data into a form suitable for the analysis
Data matching
is the task of identifying, matching and mergin records that correspond to the same entities from severaldatabases or even within one database
Data imputation
is the process of estimating missing data and filling these valuees into the dataset
Data adjusting
refers to the process to enhance the quality of the data for the data analysis
Weighting
is the procedure by which each observation in the database is assigned a number according to some pre-specified rule
Variable re-specification
is the procedure in which the existing data are modified to create new variables, or in which a large number of variables are reduced into fewer variables
scale transformation
is the procedure to adjust the scale to ensure comparability with other scales
The model
is the value in a measurement series (category) with maximum frequency (multiple mode values are possible)
Median
is the value that lies in the middle of a frequency distribution (same number of instances above and below the median)
discrete distributions
such as binomial distribution, poisson distributions, and multinomial distributions.
Continuous distributions
such as normal distributions, log-normal distributions, t-distributions and f-distributions
A positive correlation and negative correlation reflects
a positive correlation reflects a tendency for a high value in one variable to be associated with a high value in a second variable.
A negative correlation reflects an association between a high value in one variable and a low value in a second variable