Market Research Flashcards

1
Q

3 ways in which Data is prepared & Define

A
  1. Data Entry: Convert data to electronic form
  2. Data coding: Group and assign numeric codes to responses
  3. Data Cleaning: Check for errors & inconsistencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an example of data coding?

A

I.E. female=1 male =2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the types of errors & inconsistencies that can be found during data cleaning?

A

Skipping patters: Answers when they shouldn’t or doesn’t when they should
Incompleteness
Impossible values: IE age = 999
“Straight lining”: occurs when survey respondents give identical (or nearly identical) answers to items in a battery of questions using the same response scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is descriptive statistics?

A

Summarizes data
Measures of Central Tendency: using Mean, Median, Mode as well as
Measures of dispersion: Standard Deviation, Variance and Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of data can be measured using mean, median and mode?

A

Mean: Interval/Ratio
Median: all except nominal
Mode: any

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the mean & SD calculated?

A

Mean: sum x/number of x
SD: sqrt[sum( X(i)-mean)^2/ (n)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is variance calculated?

A

Var = sum (X(i)-mean)^2/(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the sample SD and the population SD differ?

A

The samples SD uses n-1, whereas the population SD is n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a one way frequency table?

A

Table that shows number of respondents choosing each answer yo a survey question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Application Rule for Frequency Tables

A

Always applicable, but not always effective if the variable contains too many values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What measurement scales is used for Mean, Median, Mode?

A

Nominal: Mode
Ordinal: Mode & Median
Interval/Ratio: Mode, Median, Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Covariance

A

How much two random variables change together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pearson Correlation

A

A scaled version of covariance:
when p=0 no relationship,
|p|<0.3 weak
|p|> 0.49 strong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 caveats on correlation

A
  1. When p=0 there is no Linear Correlation, which means there may be non linear relationships
  2. Measures how closely data is scattered around a linear line & has nothing to do with the slope
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you interpret the crosstabulation?

A

Lecture 12 Slide 2 - photo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What questions does the chi-square analysis answer?

A

Are the percentages found on a cross tab table actually different or did they happen by chance, or is it an overall population pattern?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a hypothesis?

A

Is an assumption that a researcher makes about some characteristics of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Null hypothesis?

A

The status quo, no effect, no relationship, no difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Alternative hypothesis

A

There is an effect, there is a difference, relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the hypothesis framework?

A
  1. Hypothesize something about the population H0
  2. Measure the chance of observing the sample if H0 is true
  3. If the chance is high accept H0, if it’s low reject the H0 and conclude H1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Test Statistic

A

Is a standardized value that is calculated from the sample data during a hypothesis test conditional on the null hypothesis.
IE z-score, t-test, F-statistic, x^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

P-Value

A

It measures how likely we can observe the sample data if the null hypothesis is right.
If p is small the null must go

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Significance Level

A

Compare the p value to our significance level. Usually 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

We reject the null when ____ is less than the ______

A

P- value

Sig level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

5 steps in hypothesis testing

A
  1. State the hypothesis
  2. Choose the appropriate test based on the problem
  3. Develop a decision rule
  4. Calculate the value of the test statistic/p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Decision Rule

A

A standard to reject or fail to reject the null hypothesis.

P-value, Significant Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When you look at SPSS where is the test statistic? Where is the P-value?

A

Pearson Chi-Square & Value = Test statistic

Pearson Chi-Square & Asymptotic Significance = P-Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you state the conclusion?

A

With 95% confidence we can/ cannot reject the null hypothesis that there is not relationship between X and Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Explain the types of errors?

A

Type 1: False Positive

Type 2: False Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Type I or II?
The person is innocent but you conclude that the person is guilty
The person is guilty but you conclude that the person is innocent

A

Type 2: False Negative

Type 1: False Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When do you use the Chi-Square test?

A

Chi-square Test when you want to examine the relationship of two nominal/ordinal
variables
• Compare the proportions (nominal/ordinal) of different groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is this problem type?

Do people’s perception of PCs have changed after seeing the ads?

A

Problem Type: Compare the mean of an (interval/ratio) variable to a number
One-Sample T-Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Do purchase intents of a PC vary between people who have and have not seen the ads?

A

Problem Type: Compare the mean of an (interval/ratio) variable of different groups (2 groups)
Independent Sample T-Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Do purchase intents of a PC vary among people who have PC only, who have Mac only, and who have both?

A

Problem Type: Compare the mean of an (interval/ratio) variable of different groups (more than 2 groups)
One-Way Anova

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Do people rate the importance of quality and that of reliability differently?

A

Problem Type: Compare the means of two (interval/ratio) variables
Paired Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Explain the 4 types of Mean Comparisons tests

A

Compare the mean of an (interval/ratio) variable to a number
-One-Sample T-Test

Compare the mean of an (interval/ratio) variable of different groups (2 groups)
-Independent Samples T-Test

Compare the mean of an (interval/ratio) variable of different groups (more than 2 groups)
-One-Way ANOVA

Compare the means of two (interval/ratio) variables
-Paired Samples T-Test

37
Q

Hypothesis of a One-Way ANOVA

A

Null: There is no difference between variable one and categorical variable with 3+ categories IE mean 1=mean2=mean3
Alternative Hypothesis: at least one group has a different relationship from the other two

38
Q

How do we run a hypothesis test on the following?

Do purchase intent of a PC vary between people who have and have not seen the ads?

A

1 State Hypothesis: Null: There is no difference between the two means
2 Test: Independent sample test
3 Decision Rule: 0.05
4 P-Value: 0.199 ->Is there a difference between the variance of the two groups? Based off that you choose which P-value to look at when comparing the means
5 With 95% confidence we fail to reject the null hypothesis XYZ

39
Q

When do we use Z tests?

A

When random sample is >=30 + proportion, metric, one or two means

40
Q

____________________ are those in which measurement of the variable of interest in one sample has no effect on measurement of the variable in the other sample

A

Paired Samples

41
Q

The number of ________________ is the number of observations in a statistical problem that are not restricted or are free to vary

A

Degrees of Freedom

42
Q

The _________________________ enables the research analyst to determine whether an observed pattern of frequencies corresponds to, or fits, an “expected” pattern

A

Chi-Square

43
Q

_______________ are those in which measurement of the variable of interest in one sample may influence measurement of the variable in another sample

A

Related Sample

44
Q

“Because the calculated χ2 value (7.6) is higher than the table value (5.99), we ___ the null hypothesis”

A

Can

45
Q

For hypotheses about one mean, with small samples (n<30), the __________ with n − 1 degrees of freedom is the appropriate test for making statistical inferences

A

T-Test

46
Q

How do you calculate the T- Statistic ?

A

Z = Sample mean - population Mean under H0 / Estimated Standard Error

47
Q

How do you calculate the Estimated Standard Error

A

SD: Sqrt[ {sum( Xbar-Xi)^2} / n-1] OR Variance sqrt
SE: SD/ Sqrt(n)

48
Q

Marketing researchers often need to determine whether there is any association between two or more variables in a sample. The _______________ test for two independent samples is the appropriate test in such situations

A

T-test for independent samples

49
Q

Although the _____________________ is generally used for large samples, nearly all statistical packages use the t test for all sample sizes

A

Z-test

50
Q

In many situations, researchers are concerned with phenomena that are expressed in terms of percentages - also known as ____________________

A

Test for proportion

51
Q

A hypothesis test of proportions is a test to determine whether the difference between proportions is greater than would be expected because of _________________________

A

Sampling Error

52
Q

When the goal is to test the differences among the means of two or more independent samples, analysis of variance __________________________ is an appropriate statistical tool

A

ANOVA

53
Q

What does ANOVA mean?

A

Analysis of Variance

54
Q
  1. Mathematical differences
  2. Statistically significant difference
  3. Managerially important differences

a. if a difference is large enough to be unlikely to have occurred because of chance or sampling error
b. statistically significant difference large enough to be important to management
c. if numbers are not exactly the same

A
  1. C.
  2. A
  3. B
55
Q

What is bivariate analysis?

A

The degree of association between two variables

56
Q

Criterion and predictor varables

A

Criterion: Dependent Variable - Explained by the X variable
Predictor: Independent Variable - affect the value of the Y variable

57
Q

______ AKA _______ is used to analyze the relationship between
two variables when one is considered the dependent variable and the
other the independent variable

A

Bivariate regression & Simple regression

58
Q

How do you determine if using a linear regression model is appropriate?

A

Scatterplot

59
Q

What is the Least-Squares Estimation Procedure?

A

Y= a+bX+e or Y = B0 + B1X

60
Q

What is the R^2? What does it measure? What is it’s range?

A

Describes the nature of the relationship between X & Y, a measure of the strength of the linear relationship btwn X & Y

It is a measured percentage of the total variation in Y explained by the variation in X

0-1 where 1 is the strongest

61
Q

What is the formula for R2?

A

Mean Variation - Unexplained variation / Mean Variation

62
Q

SST & SSR?

A

Total Sum of Squares: Total variation

Sum of Squares due to Regression: Explained Variation

63
Q

What is Beta? What is the hypothesis?

A

The Regression Coefficient

H0: B=0 Ha: B DNE 0

64
Q

What is the range of the Pearson Correlation? What is weak, moderate and strong

A

-1 <= p(X,Y) => 1
Weak: less than 0.3
Moderate: greater than or equal to 3 and less than and equal to 0.49
Strong: greater than 0.49

65
Q

What common issues arise in correlational interpretations?

A

Outliers
Effect size may be too small to be a useful r
Non-linear realtionships
High correlations are often tautological

66
Q

What does it mean when the Pearson correlation is 0.762 & P-Value is less than 0.05?

A

Positive strong linear relationship between the way X & Y move
P-Value: The correlation is different from zero

67
Q
  1. Pearson r = .93 indicates a larger/steeper slope than a Pearson r = .4. 2.
  2. If a Pearson r is statistically significant, it means that a linear approach is the best one.
  3. A Pearson r with a p-value of < .001 indicates a weak correlation.
A

False

68
Q

What is the simple linear regression model?

A

y= a + Bx + e
a= intercept
B=slope
e= Random error

69
Q

Draw the OLS model

A

Ordinary Least Squares Regression
a hat: the intercept, value of why when X is zero
b hat: slope, estimated change in the average value of Y as a result of a one-unit change in X
e is the cumulative difference between the regression line and the points

70
Q

Explain the goodness of fit in terms of regression?

A

R-Squared indicates how well the variables fit with the regression line, and the more variables that are in the line, the better the fit

71
Q

What is the caveat with regression?

A

Loose confidence in the predictions when the results fall outside the current range of X

72
Q

The three classifications of the data problems are:

A

Type A: Long term data
Maximize profit for an existing product

Type B: Short term data
Increase visibility of just launched product

Type C: No data
Predict how a new product will perform

73
Q

Which test to choose?

A

Within Group:

  • Does mean differ from benchmark? One Sample T-Test
  • Does mean of x and mean of y differ? Paired Sample T-Test

Between Groups:

  • Does frequencies differ between groups? Chi-Square Test
  • Does mean of X differ between 2 groups? Independent Sample T-Test
  • Does mean of X differ between 3+ groups? ANOVA
74
Q

What is multicollinearity? How do you discover it? Why is it bad?

A

When your independent variables are highly collinear with each other
- Look at the correlation matrix of the independent variables

Bad b/c we cannot distinguish between the individual effects of the independent variable on the dependent variables

75
Q

How do you solve multicollinearity?

A

Get more data
Don’t include all of the independent variables
Drop the correlated variables
Or combine them to create a new variable, Factor Analysis

76
Q

Dummy Variables

A

0 or 1 to let us know if there is or isn’t the presence of a categorical value

77
Q

When should we use dummy variables?

A

A categorical variable should be recoded into a dummy variable in regression analysis

78
Q

How many dummy variables should we include?

A

K-1 for k categories

79
Q

The value of the categorical variable that is not represented explicitly by a dummy variable is called the ________.

An example of this in terms of gender would be ?

A

Reference group

Gender: if X1= 1 if women, 0 otherwise & X2=1 if male, 0 otherwise. THE Reference group would be non-binary

80
Q
D1= 1 if female 0 if not
D2= 1 if male 0 if not 
y= annual spending on clothes ($)
x= age
y= 200 + 20x - 50D2
Interpret the values
A

The average annual spending on clothes for women at the age of 0 is $200
If the age increases by one year, average spending on clothes increased by $20
Men on average spend $50 less than women fo every year

81
Q

How does the reference group relate to the regression model?

A

Slope is the same interpretation
alpha: When [the reference group] is activated then [alpha value] is [Y variable]
Beta coefficient: Compared to the [Reference group] the average [Dummy variable] in/decreases by [Beta coefficient value]

82
Q

What does the ADJ R2 value do?

A

It adjusts for more variables

83
Q

What is the rule of thumb for the Beta coefficient?

A

when it is more than 2 times the standard error it is a good fit

84
Q

What is VIF?

A

Variance Inflation Factor, gives a measure of multicollinearity.
Keep it below 10, is caused by too many variables

85
Q

MC = 1.23/bottle
10% markup for retail price
Wholesale price points 1.80, 2.00 or 2.20
Regression: sales = 789.150 - 250.813 * RetailPriceOfBrand
How to calculate?

A
  1. Calculate Retail Price= Wholesale * (1 + 10%)
  2. Put price into regression model
  3. Calculate profit: Profit = (Retail price-MC) * Sales
    Highest profit is your choice
86
Q

When can you not use linear regression?

A

Prediction would not be exactly 0 or 1 but some continuous number

Predictions could be outside the range of [0,1]

87
Q

In binary regression what are the outcomes of the dependent and independent variables

A

Dependent: Outcome is binary
Independent: What do you think can predict the outcome

88
Q

What is the logistics regression model? What are the constraints?

A

ln (p/1-p) = a +B1X1 + …+ BkXk
p= exp(a +B1X1 + …+ BkXk)/ 1+exp(a +B1X1 + …+ BkXk)
0