Everything (Semester 2) Flashcards
Data Preparation/Pre-processing
the act of cleaning and consolidating raw data prior to using it for analysis
Primary process of data preparation
to ensure that the raw data is ready for processing and analysis
Initial step of questionnaire checking
check all questionnaires for completeness and interviewing quality.
Editing
review of the questionnaires with the objective of increasing accuracy and precision.
Ways to edit
Returning to the field, assigning missing values, discarding unsatisfactory respondents
Coding
assigning a code, usually a number, to each possible
response to each question
Fixed field codes
the number of records for each respondent is the same and the same data appear in
the same column(s) for all respondents, are highly desirable.
Category codes should be
mutually exclusive and
collectively exhaustive.
Codebook
coding instructions and
the necessary information about variables in the
data set.
what does a codebook contain?
-column number
-record number
-variable number
-variable name
-question number
-instructions for coding
Transcribing
transferring the coded data
from the questionnaires directly into computers by keypunching or other means.
Consistency checks
identify data that are out of the range, logically inconsistent or have extreme values
Substitute a neutral value
A neutral value, typically the mean response to the variable, is substituted for the missing responses.
Substitute an Imputed Response
The respondents’ pattern of responses to other questions are used to impute or calculate a suitable response to the missing questions.
Casewise deletion
cases or respondents, with any missing responses are discarded from the analysis
Pairwise deletion
instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for each calculation.
Categorical data/qualitative
Nominal or ordinal data
Discrete or continuous data
interval scaled or ratio scaled data
Measuring
assigning numbers or other symbols to characteristics of
objects according to certain pre-specified rules.
Variable
A characteristic of an object that can be measured
Scaling
A criteria in which a characteristic is measured
against
Levels of measurement, also called scales of measurement
how precisely variables are recorded.
Nominal data
the data can only be categorised
Ordinal
the data can be categorised and ranked
Interval
- the data can be categorised, ranked, and evenly spaced.
- contains all the information of an ordinal scale, but it
also allows you to compare the difference between objects.
Ratio
- the data can be categorised, ranked, evenly spaced, and has a natural zero.
- possesses all the properties of the nominal, ordinal, and
interval scales and , in addition, an absolute zero point.
Descriptive Statistics
-branch of statistics used to summarise and describe the characteristics of a dataset.
- Descriptive statistics involves calculating summary measures, such as the mean, median, mode, range.
Inferential Statistics
-branch of statistics used to make inferences or predictions about a population based on a sample of data.
-Inferential statistics involves using statistical tests, such as hypothesis tests and regression analysis.
Frequency Distribution
- displays the
frequency of various outcomes in a sample. - a summarised grouping of data divided into mutually exclusive classes and the number of occurrences in a class. It’s a way of showing unorganised data
Bar charts
Nominal or ordinal variables
Pie Charts
Nominal or ordinal variables
Histogram
Interval or ratio
Central Tendancy
concentration of the values in the central part of the distribution.