7 lecture Flashcards
Regression basics video Equation
Y = a(intercept) + b(slope) * X
º Intercept
a The independent variable
Slope:
b The dependent variable
R^2 =
ow much of the variation in the dependent variable is explained by the model?
R2 = 0.9033 = it explains 90,33 of the variation in te number of comments
Concepts:
1. Slope
A unit change in X lead to Beta units change in Y
Ex. With every new day (whatever the time unit is) the number of comments is
Concepts:2. Intercept
Average value of Y when X=0
Intercept interpretation not always possible (if the value 0 does not make sense for X)
Dependent variable Independent variable Interpretation of β (slope)
Y X =
A unit change in X leads to β units change in Y
Dependent variable Independent variable Interpretation of β (slope)
Y (log) X =
A 1% change in X leads to a β units change in Y
Dependent variable Independent variable Interpretation of β (slope)
(log)Y X =
A unit change in X leads to a β % change in Y
Dependent variable Independent variable Interpretation of β (slope)
(log)Y (log) X =
A 1% change in X leads to a β % change in Y
What to look at? with a lineair regression model? 2X
- Is the coefficient for the number of website visits statistically significant? A coefficient is statistically significant when its p-value < 0.05 (5% is a standard level of significance=we are willing to accept a 5% probability of making the wrong conclusion)
- Is there an effect? If yes, what is the direction (+/-) and magnitude of the effect on purchase value?
If no, is there some potential explanation?
Regression model workflow 5 steps
Regression model workflow 1. Model specification based on theory and logic Which variables to include? Possible interactions? 2. Estimate parameters using software 3. Interpret coefficients (significant coefficients only) Direction and magnitude 4. Evaluate model Overall model significance Model fit 5. Use model for prediction
Model specification: linear regression
Which variables to include?
5X
- Marketing variables
- Customer characteristics
- Product characteristics
- Competitor activity
- Seasonality
what leads to biased results?
Omitting relevant variables or incorrectly specifying the relationship between variables
ProductSales_i=
β_0+β_1 VolumeOwned_i+β_2 VolumeEarned_i
Natural logarithm (LN) transformations: ln(ProductSales_i )
ln(ProductSales_i )=β_0+β_1 ln(VolumeOwned_i )
+ β_2 ln(VolumeEarned_i)
Earned volume
nr comments from consumers on social media
Owned volume =
nr posts made by the brand on its social media
How to obtain coefficients that are interpreted as elasticities?
Transforming the variables and calculating by Natural logarithm (Ln) transformations:
Interaction effects in regression models
Does the effect of earned social media volume on product sales depend on the volume of owned social media?
In other words: Does the volume of owned social media have an influence on the effect of earned social media volume on product sales
write the equation
Yi: product sales
X1,i: Earned social media volume for product I
X2,i: Owned social media volume for product I
Y_i=β_0+β_1 X_(1,i)+β_2 X_(2,i)+β_3 X_(1,i)×X_(2,i)
Model specification: usual steps
WHY not put everything we can think of in a model?
We need to make choices about which variables to include, because large number of parameters require a large sample size to produce reliable estimates of coefficients and standard errors (at least 10 observations for each parameter that needs to be estimated).