Market Research Flashcards

1
Q

3 ways in which Data is prepared & Define

A
  1. Data Entry: Convert data to electronic form
  2. Data coding: Group and assign numeric codes to responses
  3. Data Cleaning: Check for errors & inconsistencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an example of data coding?

A

I.E. female=1 male =2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the types of errors & inconsistencies that can be found during data cleaning?

A

Skipping patters: Answers when they shouldn’t or doesn’t when they should
Incompleteness
Impossible values: IE age = 999
“Straight lining”: occurs when survey respondents give identical (or nearly identical) answers to items in a battery of questions using the same response scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is descriptive statistics?

A

Summarizes data
Measures of Central Tendency: using Mean, Median, Mode as well as
Measures of dispersion: Standard Deviation, Variance and Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of data can be measured using mean, median and mode?

A

Mean: Interval/Ratio
Median: all except nominal
Mode: any

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the mean & SD calculated?

A

Mean: sum x/number of x
SD: sqrt[sum( X(i)-mean)^2/ (n)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is variance calculated?

A

Var = sum (X(i)-mean)^2/(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the sample SD and the population SD differ?

A

The samples SD uses n-1, whereas the population SD is n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a one way frequency table?

A

Table that shows number of respondents choosing each answer yo a survey question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Application Rule for Frequency Tables

A

Always applicable, but not always effective if the variable contains too many values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What measurement scales is used for Mean, Median, Mode?

A

Nominal: Mode
Ordinal: Mode & Median
Interval/Ratio: Mode, Median, Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Covariance

A

How much two random variables change together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pearson Correlation

A

A scaled version of covariance:
when p=0 no relationship,
|p|<0.3 weak
|p|> 0.49 strong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 caveats on correlation

A
  1. When p=0 there is no Linear Correlation, which means there may be non linear relationships
  2. Measures how closely data is scattered around a linear line & has nothing to do with the slope
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you interpret the crosstabulation?

A

Lecture 12 Slide 2 - photo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What questions does the chi-square analysis answer?

A

Are the percentages found on a cross tab table actually different or did they happen by chance, or is it an overall population pattern?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a hypothesis?

A

Is an assumption that a researcher makes about some characteristics of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Null hypothesis?

A

The status quo, no effect, no relationship, no difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Alternative hypothesis

A

There is an effect, there is a difference, relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the hypothesis framework?

A
  1. Hypothesize something about the population H0
  2. Measure the chance of observing the sample if H0 is true
  3. If the chance is high accept H0, if it’s low reject the H0 and conclude H1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Test Statistic

A

Is a standardized value that is calculated from the sample data during a hypothesis test conditional on the null hypothesis.
IE z-score, t-test, F-statistic, x^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

P-Value

A

It measures how likely we can observe the sample data if the null hypothesis is right.
If p is small the null must go

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Significance Level

A

Compare the p value to our significance level. Usually 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

We reject the null when ____ is less than the ______

A

P- value

Sig level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
5 steps in hypothesis testing
1. State the hypothesis 2. Choose the appropriate test based on the problem 3. Develop a decision rule 4. Calculate the value of the test statistic/p-value
26
Decision Rule
A standard to reject or fail to reject the null hypothesis. | P-value, Significant Value
27
When you look at SPSS where is the test statistic? Where is the P-value?
Pearson Chi-Square & Value = Test statistic | Pearson Chi-Square & Asymptotic Significance = P-Value
28
How do you state the conclusion?
With 95% confidence we can/ cannot reject the null hypothesis that there is not relationship between X and Y
29
Explain the types of errors?
Type 1: False Positive | Type 2: False Negative
30
Type I or II? The person is innocent but you conclude that the person is guilty The person is guilty but you conclude that the person is innocent
Type 2: False Negative | Type 1: False Positive
31
When do you use the Chi-Square test?
Chi-square Test when you want to examine the relationship of two nominal/ordinal variables • Compare the proportions (nominal/ordinal) of different groups
32
What is this problem type? | Do people’s perception of PCs have changed after seeing the ads?
Problem Type: Compare the mean of an (interval/ratio) variable to a number One-Sample T-Test
33
Do purchase intents of a PC vary between people who have and have not seen the ads?
Problem Type: Compare the mean of an (interval/ratio) variable of different groups (2 groups) Independent Sample T-Test
34
Do purchase intents of a PC vary among people who have PC only, who have Mac only, and who have both?
Problem Type: Compare the mean of an (interval/ratio) variable of different groups (more than 2 groups) One-Way Anova
35
Do people rate the importance of quality and that of reliability differently?
Problem Type: Compare the means of two (interval/ratio) variables Paired Sample
36
Explain the 4 types of Mean Comparisons tests
Compare the mean of an (interval/ratio) variable to a number -One-Sample T-Test Compare the mean of an (interval/ratio) variable of different groups (2 groups) -Independent Samples T-Test Compare the mean of an (interval/ratio) variable of different groups (more than 2 groups) -One-Way ANOVA Compare the means of two (interval/ratio) variables -Paired Samples T-Test
37
Hypothesis of a One-Way ANOVA
Null: There is no difference between variable one and categorical variable with 3+ categories IE mean 1=mean2=mean3 Alternative Hypothesis: at least one group has a different relationship from the other two
38
How do we run a hypothesis test on the following? | Do purchase intent of a PC vary between people who have and have not seen the ads?
1 State Hypothesis: Null: There is no difference between the two means 2 Test: Independent sample test 3 Decision Rule: 0.05 4 P-Value: 0.199 ->Is there a difference between the *variance* of the two groups? Based off that you choose which P-value to look at when comparing the means 5 With 95% confidence we fail to reject the null hypothesis XYZ
39
When do we use Z tests?
When random sample is >=30 + proportion, metric, one or two means
40
____________________ are those in which measurement of the variable of interest in one sample has no effect on measurement of the variable in the other sample
Paired Samples
41
The number of ________________ is the number of observations in a statistical problem that are not restricted or are free to vary
Degrees of Freedom
42
The _________________________ enables the research analyst to determine whether an observed pattern of frequencies corresponds to, or fits, an “expected” pattern
Chi-Square
43
_______________ are those in which measurement of the variable of interest in one sample may influence measurement of the variable in another sample
Related Sample
44
"Because the calculated χ2 value (7.6) is higher than the table value (5.99), we ___ the null hypothesis"
Can
45
For hypotheses about one mean, with small samples (n<30), the __________ with n − 1 degrees of freedom is the appropriate test for making statistical inferences
T-Test
46
How do you calculate the T- Statistic ?
Z = Sample mean - population Mean under H0 / Estimated Standard Error
47
How do you calculate the Estimated Standard Error
SD: Sqrt[ {sum( Xbar-Xi)^2} / n-1] OR Variance sqrt SE: SD/ Sqrt(n)
48
Marketing researchers often need to determine whether there is any association between two or more variables in a sample. The _______________ test for two independent samples is the appropriate test in such situations
T-test for independent samples
49
Although the _____________________ is generally used for large samples, nearly all statistical packages use the t test for all sample sizes
Z-test
50
In many situations, researchers are concerned with phenomena that are expressed in terms of percentages - also known as ____________________
Test for proportion
51
A hypothesis test of proportions is a test to determine whether the difference between proportions is greater than would be expected because of _________________________
Sampling Error
52
When the goal is to test the differences among the means of two or more independent samples, analysis of variance __________________________ is an appropriate statistical tool
ANOVA
53
What does ANOVA mean?
Analysis of Variance
54
1. Mathematical differences 2. Statistically significant difference 3. Managerially important differences a. if a difference is large enough to be unlikely to have occurred because of chance or sampling error b. statistically significant difference large enough to be important to management c. if numbers are not exactly the same
1. C. 2. A 3. B
55
What is bivariate analysis?
The degree of association between two variables
56
Criterion and predictor varables
Criterion: Dependent Variable - Explained by the X variable Predictor: Independent Variable - affect the value of the Y variable
57
______ AKA _______ is used to analyze the relationship between two variables when one is considered the dependent variable and the other the independent variable
Bivariate regression & Simple regression
58
How do you determine if using a linear regression model is appropriate?
Scatterplot
59
What is the Least-Squares Estimation Procedure?
Y= a+bX+e or Y = B0 + B1X
60
What is the R^2? What does it measure? What is it's range?
Describes the nature of the relationship between X & Y, a measure of the strength of the linear relationship btwn X & Y It is a measured percentage of the total variation in Y explained by the variation in X 0-1 where 1 is the strongest
61
What is the formula for R2?
Mean Variation - Unexplained variation / Mean Variation
62
SST & SSR?
Total Sum of Squares: Total variation | Sum of Squares due to Regression: Explained Variation
63
What is Beta? What is the hypothesis?
The Regression Coefficient | H0: B=0 Ha: B DNE 0
64
What is the range of the Pearson Correlation? What is weak, moderate and strong
-1 <= p(X,Y) => 1 Weak: less than 0.3 Moderate: greater than or equal to 3 and less than and equal to 0.49 Strong: greater than 0.49
65
What common issues arise in correlational interpretations?
Outliers Effect size may be too small to be a useful r Non-linear realtionships High correlations are often tautological
66
What does it mean when the Pearson correlation is 0.762 & P-Value is less than 0.05?
Positive strong linear relationship between the way X & Y move P-Value: The correlation is different from zero
67
1. Pearson r = .93 indicates a larger/steeper slope than a Pearson r = .4. 2. 2. If a Pearson r is statistically significant, it means that a linear approach is the best one. 3. A Pearson r with a p-value of < .001 indicates a weak correlation.
False
68
What is the simple linear regression model?
y= a + Bx + e a= intercept B=slope e= Random error
69
Draw the OLS model
Ordinary Least Squares Regression a hat: the intercept, value of why when X is zero b hat: slope, estimated change in the average value of Y as a result of a one-unit change in X e is the cumulative difference between the regression line and the points
70
Explain the goodness of fit in terms of regression?
R-Squared indicates how well the variables fit with the regression line, and the more variables that are in the line, the better the fit
71
What is the caveat with regression?
Loose confidence in the predictions when the results fall outside the current range of X
72
The three classifications of the data problems are:
Type A: Long term data Maximize profit for an existing product Type B: Short term data Increase visibility of just launched product Type C: No data Predict how a new product will perform
73
Which test to choose?
Within Group: - Does mean differ from benchmark? One Sample T-Test - Does mean of x and mean of y differ? Paired Sample T-Test Between Groups: - Does frequencies differ between groups? Chi-Square Test - Does mean of X differ between 2 groups? Independent Sample T-Test - Does mean of X differ between 3+ groups? ANOVA
74
What is multicollinearity? How do you discover it? Why is it bad?
When your independent variables are highly collinear with each other - Look at the correlation matrix of the independent variables Bad b/c we cannot distinguish between the individual effects of the independent variable on the dependent variables
75
How do you solve multicollinearity?
Get more data Don't include all of the independent variables Drop the correlated variables Or combine them to create a new variable, Factor Analysis
76
Dummy Variables
0 or 1 to let us know if there is or isn't the presence of a categorical value
77
When should we use dummy variables?
A categorical variable should be recoded into a dummy variable in regression analysis
78
How many dummy variables should we include?
K-1 for k categories
79
The value of the categorical variable that is not represented explicitly by a dummy variable is called the ________. An example of this in terms of gender would be ?
Reference group | Gender: if X1= 1 if women, 0 otherwise & X2=1 if male, 0 otherwise. THE Reference group would be non-binary
80
``` D1= 1 if female 0 if not D2= 1 if male 0 if not y= annual spending on clothes ($) x= age y= 200 + 20x - 50D2 Interpret the values ```
The average annual spending on clothes for women at the age of 0 is $200 If the age increases by one year, average spending on clothes increased by $20 Men on average spend $50 less than women fo every year
81
How does the reference group relate to the regression model?
Slope is the same interpretation alpha: When [the reference group] is activated then [alpha value] is [Y variable] Beta coefficient: Compared to the [Reference group] the average [Dummy variable] in/decreases by [Beta coefficient value]
82
What does the ADJ R2 value do?
It adjusts for more variables
83
What is the rule of thumb for the Beta coefficient?
when it is more than 2 times the standard error it is a good fit
84
What is VIF?
Variance Inflation Factor, gives a measure of multicollinearity. Keep it below 10, is caused by too many variables
85
MC = 1.23/bottle 10% markup for retail price Wholesale price points 1.80, 2.00 or 2.20 Regression: sales = 789.150 - 250.813 * RetailPriceOfBrand How to calculate?
1. Calculate Retail Price= Wholesale * (1 + 10%) 2. Put price into regression model 3. Calculate profit: Profit = (Retail price-MC) * Sales Highest profit is your choice
86
When can you not use linear regression?
Prediction would not be exactly 0 or 1 but some continuous number Predictions could be outside the range of [0,1]
87
In binary regression what are the outcomes of the dependent and independent variables
Dependent: Outcome is binary Independent: What do you think can predict the outcome
88
What is the logistics regression model? What are the constraints?
ln (p/1-p) = a +B1X1 + ...+ BkXk p= exp(a +B1X1 + ...+ BkXk)/ 1+exp(a +B1X1 + ...+ BkXk) 0