Lecture 3 Flashcards
Data coding
Specifying how the information should be categorized to
facilitate the analysis. The main purpose is to transform the data into a
form suitable for the analysis
Data matching
Task of identifying, matching and merging records that
correspond to the same entities from several databases or even within
one database
Data imputation
Process of estimating missing data and filling these
values in into data set
Data adjusting
Process to enhance the quality of the data for the data
analysis (e.g., weighting, variable respecification, scale transformation)
Weighting
Procedure by which each observation (e.g. consumer responses) in the database is
assigned a number according to some pre-specified rule
Variable respecification
Procedure in which the existing data are modified to create new variables, or in which a large number of
variables are reduced into fewer variables
For example, six categories are summarized in four categories
Scale
transformation
Procedure to adjust the scale to ensure comparability with other scales. Like grading systems in different countries
How to identify causal relationships
Evidence for a strong association (e.g. correlation) between two variables.
Evidence that no rival explanation (other correlated parameter) exists for the observed association of the variables.
Changing of the cause variable precedes changing of the result variable (e.g. through a time lag).
Experimental group
Test subjects who are exposed to the experimental stimulus, e.g. a new
advertisement
Control group
Test subjects who are not exposed to the experimental stimulus
Randomizing
Random assignment of test subjects to experimental / control groups
Matching
Test subjects in experimental and control groups share specific criteria (e.g.
gender, age)
Stimulus
Variation of a variable that should trigger a behavioral reaction in people (e.g.
response to price changes)
Entity extraction
Which words people write about?
Topic Modelling
What topic people write about?
Topics in movie reviews, motivations to host on AirBNB
Sentiment analysis
How positive/negative is the text?
Relationship between entities? (words)
How do words relate to each other (What side effects are mentioned with the drug?)
Writing style
What is the writing style between the words? (Identifying personality traits of social media users)
Mode
Low data requirements
Intuitive understanding
Ambiguous if multiple mode values exist
Cannot be used with advanced statistical methdos
Median
Low Data Requirements
Low sensitivity to outliers
Can’t be used with advanced statistical methods
Mean
Most popular location parameter
Basis for many advanced statistical analyses
Sensitive to outliers
High scale requirements (interval scaling)
What is causality?
Causality is variable X causing a change in variable Y. WE need to consider control variables to claim causality
How to identify causal releationships?
Evidence for strong association between two variables
Changing of the cause variable precedes changing of the result variable
Evidence that no rival explanation exists for the observed association of the variables