lecture 4 Flashcards
overview of brand perception data
several market research companies track the perception of firms and brands
This includes variables such as attitudes towards the firm, customer satisfactions, reputations, quality perceptions etc
Indication for how strong a brand is in the hearts and minds of consumers
Some potential limitations (brand perception data)
No actual purchase behaviour (e.g. sales)
Response bias
Sampling bias
Overview of stock return data
Refers to the current price that a share of s tock is trading for on the market
A companys stock price reflects investor perception of its ability to earn and grow its profits in the future
Issues within and outside of a company may cause a stock price to move in either direction
Stock price data is available for free (e.g. yahoofinance.com)
Some potential limitations (stock price data)
Sampling bias (i.e. only available for traded companies)
Only focus on investors
Text data
Huge amounts of text data: online reviews, social media posts, texts, customer service calls, open-ended survey questions, firm annual reports, advertisements, newspaper articles, movie scripts, song lyrics, etc.
Text data: textual data goes beyond social media
firm to firm
consumer to consumer
society to society
Cleaning big data
most time consuming and least enjoyable data science task, surve says
What data scientists spend the most time doing
building and training sets 3%
Cleaning and organizing data 60%
Collecting data sets 19%
Mining data for patterns 9%
Refining algorithms 4%
other 5%
Statistical data editing
observed data generally contains errors and missing values. Thus, the data must undergo preliminary preparation before the data can be analyzed
Process of checking observed data, and, when necessary, correcting them
Essential tasts (statistical data editing)
Error localization: determine which value are erroneous
Correction: correct missing and erroneous data in best passible way
Consistency: adjust values such that all edits become satisfied
Interviewer error
interviewers may not be giving the respondents the correct instructions
Omissions
respondents often fail to answer a single question or a section of the questionnaire, either deliberately or inadvertently
Ambiguity
a response might not be legible or it might be unclear
Inconsistencies
Sometimes two responses can be logically inconsistent. For example, a respondent who is a lawyer may have checked a box indicating that he or she did not compmlete high school
Lack of cooperation
In a long questionnaire with hundreds of attitude questions, a respondent might rebel and check the same response in a long list of questions
Ineligible respondent
An inappropriate respondent may be included in the sample (e.g. underage respondents)
data coding
specifying how the information should be categorized to facilitate the analysis. The main purpose is to transform the data into a form suitable for the analysis
Data matching
task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database
Data imputation
Process of estimating missing data and filling these values in into data set
Data adjusting
Process to enhance the quality of the data for the data analysis (e.g. weighting, variable respecification, scale transformation)
Common procedures for statistically adjusting data
Weighting: Procedure by which each observation (e.g. consumer responses) in the database is assigned a number according to some pre-specified rule
For example, Weighting is used to make the sample data more representative
Variable respecification: procedure in which the existing data are modified to create new variables or in which a large number of variables ar ereduced into frewer variables
For example, six categories are summarized into four categories
Scale transformation: procedure to adjust the scale to ensure comparability with other scales
For example, some respondents (e.g. from different cultures) may consistently use the lower end of the rating scale, whereas other may consistently use the upper end. These differences can be corrected for
two main ways we can use text data
Language FEFLECTS
Text reflects intentions, actions, relationships, context and more
Eg. People tweet about events near vs far using difference in concreteness
Brand positioning maps
Customer service that uses “I” vs “we” can have greater impact on customer satisfaction
Language AFFECTS
Text affects perceptions, firm outcomes and more
Eg. Online chatter increases stock value
narrative reviews are more persuasive than non narrative reviews
Frames impact implicit attitudes about consumption pratice
types of scales and informative statistics location parameter (dispersion parameter)
nominal (mode)
Ordinal (median, mode)
Interval + ratio (mean, median, mode) + (variane and SD)
Mode
value in a measurement series (category) with maximum frequency
Median
value that lies in the middle of a frequency distribution
mode meaning and limits
low data requirements (nominal scaling) + intuitive understanding
Limits: ambiguous in interpretation if multiple mode values exist
Cannot be used for analysis with advanced statistical methods
Median meaning and limits
low data requirements (ordinal scale) + low sensitivity to outliers
Limits: cannot be used for analysis with advanced statistical methods
mean (meaning and limits)
most popular location parameter
basis for many advanced statistical analyses (t-test, variance analysis etc)
limits: sensitive to outliers
high scale requirements (interval scaling))
discrete distribution
Name: binomial distribution, poisson distribution, multinominal distribution
E.g. Customer retention rate, frequency of purchase, brand selection probability
Continuous distribution
name: normal distribution, log-normal distribution, x^2 distribution, t-distribution
e.g. image ratings, scales, special distribution in inferential statistics
Empirical distribution
Many features exhibit normal distribution in reality (e.g. body size)
Distribution model for statistical parameters
statistical parameters such as mean and variance exhibit normal distribution upon multiple sampling
mathematical base distribution
Distributions in inferential statistics are derived from normal distribution
Distribution in error theory
random errors in repeated measurements exhibit normal distribution
explanatory power and limits of correlation analysis
measurement of the linear association strength between two metrically scaled variables
Direction of the correlation is visible
Values are comparable across different variables due to restriction to interval (-1,+1)
No dependence on the sample size
Strength of the correlation in thesense of the explained variance can be identified (r^2 = Explained variance)
prerequisite for statistical verification of (linear) causal relationships
Limits: only linear correlations can be depicted
no sufficient evidence for the presence of a causal relationship
strength of the correlation in the sense of a leverage effect cannot be identified
Spurious association possible if background variables are not controlled for (-> partial correlation coefficient as workaround)
How to identify causal relationships
1) evidence for a strong association (e.g. correlation) between two variables
2) Changing of the cause variable precedes changing of the result variable (e.g. through a time lag)
3) Evidence that no rival explanation (other correlated parameter) exists for the observed association of the variable
Experiments establish (the best) conditions that make it possible to determine causal relationships
Features of an experiment
1) formulate a causal relationship
2) evaluation of the directional influence of one or more independent variables on one or more dependent variables
(definition of the independent variables to be manipulated, definition of the dependent variables to be measured, definition of the variation steps (manipulation) of the independent variables)
3) controlling of all disturbing influences (control variables) to exclude distortion of the results
(selection of the test subjects and assignment to the groups, Controlling of the selection bias, minimizing the inlfuence of other external variables
experimental group
test subjects who are exposed to t he experimental stimulus, e.g. a new advertisement
Control group
test subjects who are not exposed to the experimental stimulus
Randomizing
random assignment of test subjects to experimental/ control groups
Matching
test subjects in experimental and control groups share specific criteria (e.g. gender, age)
Stimulus
variation of a variable that should trigger a behavioral reaction in people (e.g. response to price changes)
Lab experiment
performance of the experiment in an artificial laboratory environment
Test subjects are aware that they are participating in a test
Advantages: higher internal validity because stimuli can be more effectively manipulated and exetnal factors better cotnrolled, lower costs
Disadvantages: test subjects do not react as in a natural environment, making generalizations and predictions of the effect difficult
Lower external validity
Field experiment
Performance of the experiment in a natural environment
Test subjects are not aware that they are part of an experiment
e.g. INtroduction of a new sales promotion plan to retailers
Advantages: higher external validity because test subjects are acting under real conditions
Easier to predict and generalize the effect
Disadvantages:
cost intensive
Activities visible to competitors
less manipulation freedom (e.g. limits to changes in the price)
More difficult to control extraneous factors