Chapter 15 & 16 - Quantitative Data Analysis I & II & III Flashcards
“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).
a. State an appropriate null hypothesis
There would be no difference between the two curricula (A & B) in terms of average level of student motivation
“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).
b. State an appropriate alternative hypothesis
There would be a difference between the two curricula (A & B) in terms of average level of student motivation
“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).
c. Describe a Type I error in the context of this study
Doug Answer: The null hypothesis (There is no difference in student motivation between curricula A and B) is false. One curriculum motivates the students better.
Blake Answer: A Type I error would be committed if the null hypothesis of no difference in average motivation were to be rejected, even though both curricula were to produce the same average motivation (in the school-system population).
“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).
d. Describe a Type II error in the context of this study
Doug Answer:The null hypothesis (There is no difference in student motivation between curricula A and B) is true. One curriculum does not motivate the students better.
Blake Answer:A Type II error would be committed if the null hypothesis of no difference in average motivation were to be retained (i..e, we fail to reject Ho), even though one curriculum were to produce a higher average level of motivation than the other (in the school-system population).
- Kay interviews a sample of females and
males. She wants to compare the average
amount of beer consumed per week by
females with the average amount consumed
by males. What t-test should Kay use?
a) related
b) dependent
c) within-groups
d) independent
e) paired-samples
d) independent
The t-test that Kay should use is the independent two-sample t-test. This is because she wants to compare the mean of two independent groups (females and males) that come from two different populations.
- Barry wants to examine differences in customer satisfaction,
which is measured using an interval (metric) scale, based on
customers’ frequency of patronage, which provides categorical
data indicating three levels of patronage: occasional, frequent,
& very frequent. What type of statistical analysis would be
most appropriate for Barry to use?
a) Chi-square test
b) One-way ANOVA
c) 2 x 3 factorial ANOVA
d) MANOVA
e) Multivariate
b) One-way ANOVA
The type of statistical analysis that would be most appropriate for Barry to use is the one-way ANOVA. This is because he wants to compare the means of a normally distributed interval dependent variable (customer satisfaction) across three levels of a categorical independent variable (frequency of patronage).
- What is the difference between MANOVA and
ANOVA?
a) MANOVA examines group differences across multiple metric dependent variables at the same time, whereas ANOVA examines group differences for only a single metric dependent variable.
b) MANOVA has several independent variables while ANOVA only has one.
c) MANOVA examines group differences across multiple nonmetric dependent variables at the same time, whereas ANOVA uses multiple metric dependent variables.
d) MANOVA indicates where differences are, whereas ANOVA can only indicate that differences in group means exist.
a) MANOVA examines group differences across multiple metric dependent variables at the same time, whereas ANOVA examines group differences for only a single metric dependent variable.
The other options are incorrect because they either confuse the number of independent variables or the type of dependent variables used in MANOVA and ANOVA.
- What measures the degree of covariation
between two variables?
a) alpha
b) multicollinearity
c) correlation coefficient
d) statistical significance
c) correlation coefficient
The measure that indicates the degree of covariation between two variables is the correlation coefficient.
- Which statistic represents the amount of
variation explained or accounted for in one
variable by one or more other variables, and it is
the square of the correlation (or multiple
correlation) coefficient?
a) Pearson correlation
b) Coefficient of determination
c) Likert correlation
d) Spearman’s rho
e) c2
b) Coefficient of determination
The statistic that represents the amount of variation explained or accounted for in one variable by one or more other variables, and it is the square of the correlation (or multiple correlation) coefficient, is the coefficient of determination.
- Sam has measured brand loyalty, price
sensitivity, and disposable income to predict
purchase intentions. All variables were
measured on a 5-point Likert-type scale.
Which analysis should she use?
a) Independent samples t-test
b) Dependent samples t-test
c) ANOVA
d) Wilcoxon’s test
e) Multiple regression
e) Multiple regression
The analysis that Sam should use is multiple regression. This is because she wants to predict a continuous dependent variable (purchase intentions) using three continuous independent variables (brand loyalty, price sensitivity, and disposable income).
Interpret this table.
Table 16.A shows descriptive statistics – means, standard deviations, minimums, and maximums – for sales, number of salespersons, population, per capita income, and advertising.
The sales variable, reported in 1000s of dollars, shows the average (mean) sales value to be $75,100 (SD=$8,600), ranging from a low of $45,200 to a high of $97,300. These appear to be sales values across the 50 different locations.
The average number of salesperson, presumably per location, is 25 (SD=6 salespersons), with the smallest location having 5 salespersons and the largest location having 50.
Population values for the cities in which the company operates range from 278 (2.78 x 100) to 712 (7.12 x 100) with a mean of 510 (SD=80 people). That seems odd. The population values seem small. We should check to make sure there isn’t an error in the table (e.g., missing one or more zeros in the units used to measure population). I can’t do that in this case, so I’ll make the best of the available information for now and flag the possible error.
It looks like the per capital incomes in the different cities range from $10,100 to $75,900 with a mean of $20,300 (SD=$20,100).
I suppose the advertising variable refers to the amount spent on advertising in the company’s different locations, though we are not given good descriptions of that or any of the variables. The advertising variable ranges from $6,100 to $15,700 with a mean of $10,300 (SD=$5,000). The table doesn’t specify the time period for this or other variables.
The various dollar amounts are probably annual, but I would double-check while I was looking into the population size puzzle.
Interpret this table.
Table 16.B is a correlation matrix. The value at the intersection of each row and column is a correlation coefficient between the variables identified in that row and that column. The notes on the table tell us about statistical significance. If we adopt alpha of .05, we see in the note that any correlation coefficient with an absolute value of .15 or greater is statistically significant.
The number of salespersons, population size, income level, and advertising expenditures were all strong predictors of sales. That is, each of the (predictor) variables demonstrated a significant and sizable correlation with sales (our likely DV or criterion). All of the correlations in the first column were above the general guideline for what constitutes a large effect size (r=.50). For example, the correlation between number of salespersons and sales was r=.76!
It is puzzling that the correlation (r=.06) between population and number of salepersons was small and non-significant. How does the company decide how many salespersons to employ in a given location? It seems odd that knowing the size of a community tells us nothing about how many salespersons the company employs in that location. The company does seem to use more salespersons in communities with higher per capita income (r=.21). It also spends more advertising dollars in higher-income communities (r=.23). Those latter correlations are significant, small-to-medium sized effects.
It is less surprising to see no significant correlation (and a small effect of r=.11) between population and per capita income since larger and smaller communities can include different levels of wealth.
Advertising expenditures were also positively correlated with the number of salespersons (r=.16, significant but toward the small end of the effect size continuum) and population (r=.36, a significant effect that exceeds the medium effect-size benchmark of .30). The company tends to spend more advertising dollars in larger communities and in locations where they employ more salespersons.
NOTE: I picked an alpha level and interpreted the results according to my choice (.05) and effect-size interpretation guidelines. If you chose a more stringent alpha (.001), that’s fine as long as you did not include any interpretations that suggest some correlations are “more significant” than others – that’s misleading and misguided.
Interpret this table.
Table 15.D reports the results of a multiple regression analysis. The R-squared=.44 (a coefficient of determination) means that 44% of the variance in sales, in this sample, was explained by the set of independent variables used in the model. Sales was not identified specifically as the DV, but the other variables are IVs in the model. Note that the adjusted R-squared value is lower which relates to the possible performance of this model in other samples.
The F test for the overall regression model (=5.278) had a “sig.” (p-value) of .000. That is less than .05 (my a priori alpha), so the overall regression model is statistically significant (i.e., reject the null hypothesis).
Considering the set of predictor variables identified in the bottom part of Table 16.D, we see that in the context of this particular model neither population nor per capita income contributed significantly to the prediction of sales. That is, their Betas (regression weights) were small and their p-values (identified as “Sig. t” in the table) were above alpha of .05. Each of the other variables added significantly to the multiple regression equation’s prediction of sales. Advertisement was weighted most highly (.47), followed by number of salespersons (.34), and training of salespersons (.28).
Remember that we need to interpret regression coefficients with caution because changes in the model can impact relative importance estimates among the IVs.
Interpret this table.
Table 16.C reports the results of a one-way analysis of variance (ANOVA) for the dependent variable sales by level of education (the independent variable in that analysis).
We are not given much information about the level-of-education variable. Does it refer to salespersons education levels or community education levels? I am guessing the former, but this exercise is highlighting the need for clarity in reporting of methods and results. (My guess is based partly on Table 16.D that indicates a training of salespersons variable which could be the same as this level of education variable, but I’m not certain about that.)
Based on the fact that they ran an ANOVA, we can presume that there are at least three categorical (ordinal) groups representing different levels of education. The F-test value was 3.6 with a reported “significance of F” (or p-value) of .01, which is statistically significant assuming an a priori alpha level of .05. That means that groups with different levels of education do not have equivalent sales (i.e., we reject the null hypothesis that there is no difference among the groups).
We would need to conduct follow-up or post-hoc statistical tests to determine where exactly differences lie, but there seems to be something about education worth examining further.
Make recommendations based on your interpretation of the results.
My recommendations are tentative given the need to get more information about and check on some variables (e.g., the population scale (100s?), as noted in responses to the previous parts of this exercise.
Nevertheless, it does not seem that population and per capita income are fundamental considerations when choosing a city in which to do business, at least not compared to having an adequate number of trained salespeople with sufficient advertising support. The population variability is quite narrow which could attenuate (reduce) correlations. Still, targeting similar-sized cities, which seems to be happening, could be revisited to see if this is overly restrictive and decreases overall sales.
I would also recommend greater clarity in reporting and table construction to eliminate confusion and the associated guess work that was needed in this exercise.
Data coding
– Assigning numbers to responses
* Mutually exclusive & collectively exhaustive
– A code book to keep things organized
Data entry
– Direct entry by respondents (e.g., electronic
questionnaires)
– Hand entry (keyboarding)
Frequency distribution
Displays the number of responses associated with
each value of the variable
Various ways of displaying the data
– E.g., histograms, bar charts, pie charts
Frequency distributions come in many shapes and
sizes
– We will focus on the normal distribution, where data
is distributed symmetrically around the mean
* Characterized by the bell-shaped curve
Measures of Central Tendency
- Mean
- Median
- Mode
Mean
– The average score (i.e., add up all the scores and
divide by the total number of scores)
– Most commonly used central tendency statistic
Median
– The middle score when scores are ranked in order of
magnitude
– Can be informative because it is relatively unaffected
by extreme scores
Mode
– Score that occurs most frequently in the dataset
– The mode can often take on several values
Measures of Dispersion
- Range
- Variance
- Standard deviation
Range
(largest score) - (smallest score)
– Minimum and maximum score
Variance
average variability (spread) of the data
– Average error between the mean and our observations
– Not easily interpretable, because it is squared
Standard deviation
square root of the variance
– Average variability (spread) of a set of data measured
in the same unit of measurement as the original data
68-95-99.7 rule
In a normal distribution, about 68% of the values lie within 1
standard deviation (SD) of the mean, about 95% within 2
SD, and 99.7% within 3 SD
What is a correlation?
It is a way of measuring the extent to which two
variables are related
Describes the strength and direction of the
relationship between two variables
Three questions we should ask about a correlation
coefficient
- What is the strength of the relationship?
- What is the direction of the relationship?
- Is the relationship statistically significant (e.g., at p < .05)?
(more on this point later)
The correlation coefficient is an effect size
– ±.1 = small effect / ±.3 = medium effect / ±.5 = large effect
The correlation coefficient varies between
-1 and +1
Direction of the Correlation
Positive correlation (+)
- The correlation is said to
be positive if the values of
two variables change in the
same direction - As A is increasing, B is
increasing - As A is decreasing, B is
decreasing - Example: Height and weight
Direction of the Correlation
Negative correlation (-)
- The correlation is said to
be negative when the
values of two variables
change in the opposite
direction - As A is increasing, B is
decreasing - As A is decreasing, B is
increasing - Example: Hours of Netflix
watched and academic
grades
Perfect positive correlation
r = +1.0
Strong positive correlation
r = + 0.8
Moderate positive correlation
r = + 0.4
Perfect negative correlation
r = -1.0
Strong negative correlation
r = -.80
Weak negative correlation
r = - 0.2
No correlation
r = 0.0