Chapter 15 & 16 - Quantitative Data Analysis I & II & III Flashcards

1
Q

“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).

a. State an appropriate null hypothesis

A

There would be no difference between the two curricula (A & B) in terms of average level of student motivation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).

b. State an appropriate alternative hypothesis

A

There would be a difference between the two curricula (A & B) in terms of average level of student motivation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).

c. Describe a Type I error in the context of this study

A

Doug Answer: The null hypothesis (There is no difference in student motivation between curricula A and B) is false. One curriculum motivates the students better.

Blake Answer: A Type I error would be committed if the null hypothesis of no difference in average motivation were to be rejected, even though both curricula were to produce the same average motivation (in the school-system population).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

“Think of an evaluation study involving two competing curricula [A & B], where the objective is to maximize student motivation. Suppose that you can conduct the study using random sampling and assignment of students selected from a particular school system, and that a good method of measuring student motivation is available to you” (Jaeger, 1990, p. 193).

d. Describe a Type II error in the context of this study

A

Doug Answer:The null hypothesis (There is no difference in student motivation between curricula A and B) is true. One curriculum does not motivate the students better.

Blake Answer:A Type II error would be committed if the null hypothesis of no difference in average motivation were to be retained (i..e, we fail to reject Ho), even though one curriculum were to produce a higher average level of motivation than the other (in the school-system population).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Kay interviews a sample of females and
    males. She wants to compare the average
    amount of beer consumed per week by
    females with the average amount consumed
    by males. What t-test should Kay use?
    a) related
    b) dependent
    c) within-groups
    d) independent
    e) paired-samples
A

d) independent

The t-test that Kay should use is the independent two-sample t-test. This is because she wants to compare the mean of two independent groups (females and males) that come from two different populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Barry wants to examine differences in customer satisfaction,
    which is measured using an interval (metric) scale, based on
    customers’ frequency of patronage, which provides categorical
    data indicating three levels of patronage: occasional, frequent,
    & very frequent. What type of statistical analysis would be
    most appropriate for Barry to use?
    a) Chi-square test
    b) One-way ANOVA
    c) 2 x 3 factorial ANOVA
    d) MANOVA
    e) Multivariate
A

b) One-way ANOVA

The type of statistical analysis that would be most appropriate for Barry to use is the one-way ANOVA. This is because he wants to compare the means of a normally distributed interval dependent variable (customer satisfaction) across three levels of a categorical independent variable (frequency of patronage).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What is the difference between MANOVA and
    ANOVA?
    a) MANOVA examines group differences across multiple metric dependent variables at the same time, whereas ANOVA examines group differences for only a single metric dependent variable.
    b) MANOVA has several independent variables while ANOVA only has one.
    c) MANOVA examines group differences across multiple nonmetric dependent variables at the same time, whereas ANOVA uses multiple metric dependent variables.
    d) MANOVA indicates where differences are, whereas ANOVA can only indicate that differences in group means exist.
A

a) MANOVA examines group differences across multiple metric dependent variables at the same time, whereas ANOVA examines group differences for only a single metric dependent variable.

The other options are incorrect because they either confuse the number of independent variables or the type of dependent variables used in MANOVA and ANOVA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. What measures the degree of covariation
    between two variables?
    a) alpha
    b) multicollinearity
    c) correlation coefficient
    d) statistical significance
A

c) correlation coefficient

The measure that indicates the degree of covariation between two variables is the correlation coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Which statistic represents the amount of
    variation explained or accounted for in one
    variable by one or more other variables, and it is
    the square of the correlation (or multiple
    correlation) coefficient?
    a) Pearson correlation
    b) Coefficient of determination
    c) Likert correlation
    d) Spearman’s rho
    e) c2
A

b) Coefficient of determination

The statistic that represents the amount of variation explained or accounted for in one variable by one or more other variables, and it is the square of the correlation (or multiple correlation) coefficient, is the coefficient of determination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. Sam has measured brand loyalty, price
    sensitivity, and disposable income to predict
    purchase intentions. All variables were
    measured on a 5-point Likert-type scale.
    Which analysis should she use?
    a) Independent samples t-test
    b) Dependent samples t-test
    c) ANOVA
    d) Wilcoxon’s test
    e) Multiple regression
A

e) Multiple regression

The analysis that Sam should use is multiple regression. This is because she wants to predict a continuous dependent variable (purchase intentions) using three continuous independent variables (brand loyalty, price sensitivity, and disposable income).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpret this table.

A

Table 16.A shows descriptive statistics – means, standard deviations, minimums, and maximums – for sales, number of salespersons, population, per capita income, and advertising.

The sales variable, reported in 1000s of dollars, shows the average (mean) sales value to be $75,100 (SD=$8,600), ranging from a low of $45,200 to a high of $97,300. These appear to be sales values across the 50 different locations.

The average number of salesperson, presumably per location, is 25 (SD=6 salespersons), with the smallest location having 5 salespersons and the largest location having 50.

Population values for the cities in which the company operates range from 278 (2.78 x 100) to 712 (7.12 x 100) with a mean of 510 (SD=80 people). That seems odd. The population values seem small. We should check to make sure there isn’t an error in the table (e.g., missing one or more zeros in the units used to measure population). I can’t do that in this case, so I’ll make the best of the available information for now and flag the possible error.

It looks like the per capital incomes in the different cities range from $10,100 to $75,900 with a mean of $20,300 (SD=$20,100).

I suppose the advertising variable refers to the amount spent on advertising in the company’s different locations, though we are not given good descriptions of that or any of the variables. The advertising variable ranges from $6,100 to $15,700 with a mean of $10,300 (SD=$5,000). The table doesn’t specify the time period for this or other variables.

The various dollar amounts are probably annual, but I would double-check while I was looking into the population size puzzle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpret this table.

A

Table 16.B is a correlation matrix. The value at the intersection of each row and column is a correlation coefficient between the variables identified in that row and that column. The notes on the table tell us about statistical significance. If we adopt alpha of .05, we see in the note that any correlation coefficient with an absolute value of .15 or greater is statistically significant.

The number of salespersons, population size, income level, and advertising expenditures were all strong predictors of sales. That is, each of the (predictor) variables demonstrated a significant and sizable correlation with sales (our likely DV or criterion). All of the correlations in the first column were above the general guideline for what constitutes a large effect size (r=.50). For example, the correlation between number of salespersons and sales was r=.76!

It is puzzling that the correlation (r=.06) between population and number of salepersons was small and non-significant. How does the company decide how many salespersons to employ in a given location? It seems odd that knowing the size of a community tells us nothing about how many salespersons the company employs in that location. The company does seem to use more salespersons in communities with higher per capita income (r=.21). It also spends more advertising dollars in higher-income communities (r=.23). Those latter correlations are significant, small-to-medium sized effects.

It is less surprising to see no significant correlation (and a small effect of r=.11) between population and per capita income since larger and smaller communities can include different levels of wealth.

Advertising expenditures were also positively correlated with the number of salespersons (r=.16, significant but toward the small end of the effect size continuum) and population (r=.36, a significant effect that exceeds the medium effect-size benchmark of .30). The company tends to spend more advertising dollars in larger communities and in locations where they employ more salespersons.

NOTE: I picked an alpha level and interpreted the results according to my choice (.05) and effect-size interpretation guidelines. If you chose a more stringent alpha (.001), that’s fine as long as you did not include any interpretations that suggest some correlations are “more significant” than others – that’s misleading and misguided.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpret this table.

A

Table 15.D reports the results of a multiple regression analysis. The R-squared=.44 (a coefficient of determination) means that 44% of the variance in sales, in this sample, was explained by the set of independent variables used in the model. Sales was not identified specifically as the DV, but the other variables are IVs in the model. Note that the adjusted R-squared value is lower which relates to the possible performance of this model in other samples.

The F test for the overall regression model (=5.278) had a “sig.” (p-value) of .000. That is less than .05 (my a priori alpha), so the overall regression model is statistically significant (i.e., reject the null hypothesis).

Considering the set of predictor variables identified in the bottom part of Table 16.D, we see that in the context of this particular model neither population nor per capita income contributed significantly to the prediction of sales. That is, their Betas (regression weights) were small and their p-values (identified as “Sig. t” in the table) were above alpha of .05. Each of the other variables added significantly to the multiple regression equation’s prediction of sales. Advertisement was weighted most highly (.47), followed by number of salespersons (.34), and training of salespersons (.28).

Remember that we need to interpret regression coefficients with caution because changes in the model can impact relative importance estimates among the IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpret this table.

A

Table 16.C reports the results of a one-way analysis of variance (ANOVA) for the dependent variable sales by level of education (the independent variable in that analysis).

We are not given much information about the level-of-education variable. Does it refer to salespersons education levels or community education levels? I am guessing the former, but this exercise is highlighting the need for clarity in reporting of methods and results. (My guess is based partly on Table 16.D that indicates a training of salespersons variable which could be the same as this level of education variable, but I’m not certain about that.)

Based on the fact that they ran an ANOVA, we can presume that there are at least three categorical (ordinal) groups representing different levels of education. The F-test value was 3.6 with a reported “significance of F” (or p-value) of .01, which is statistically significant assuming an a priori alpha level of .05. That means that groups with different levels of education do not have equivalent sales (i.e., we reject the null hypothesis that there is no difference among the groups).

We would need to conduct follow-up or post-hoc statistical tests to determine where exactly differences lie, but there seems to be something about education worth examining further.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Make recommendations based on your interpretation of the results.

A

My recommendations are tentative given the need to get more information about and check on some variables (e.g., the population scale (100s?), as noted in responses to the previous parts of this exercise.

Nevertheless, it does not seem that population and per capita income are fundamental considerations when choosing a city in which to do business, at least not compared to having an adequate number of trained salespeople with sufficient advertising support. The population variability is quite narrow which could attenuate (reduce) correlations. Still, targeting similar-sized cities, which seems to be happening, could be revisited to see if this is overly restrictive and decreases overall sales.

I would also recommend greater clarity in reporting and table construction to eliminate confusion and the associated guess work that was needed in this exercise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data coding

A

– Assigning numbers to responses
* Mutually exclusive & collectively exhaustive
– A code book to keep things organized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data entry

A

– Direct entry by respondents (e.g., electronic
questionnaires)
– Hand entry (keyboarding)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Frequency distribution

A

Displays the number of responses associated with
each value of the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Various ways of displaying the data

A

– E.g., histograms, bar charts, pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Frequency distributions come in many shapes and
sizes

A

– We will focus on the normal distribution, where data
is distributed symmetrically around the mean
* Characterized by the bell-shaped curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Measures of Central Tendency

A
  • Mean
  • Median
  • Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mean

A

– The average score (i.e., add up all the scores and
divide by the total number of scores)
– Most commonly used central tendency statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Median

A

– The middle score when scores are ranked in order of
magnitude
– Can be informative because it is relatively unaffected
by extreme scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Mode

A

– Score that occurs most frequently in the dataset
– The mode can often take on several values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Measures of Dispersion

A
  • Range
  • Variance
  • Standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Range

A

(largest score) - (smallest score)
– Minimum and maximum score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Variance

A

average variability (spread) of the data
– Average error between the mean and our observations
– Not easily interpretable, because it is squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Standard deviation

A

square root of the variance
– Average variability (spread) of a set of data measured
in the same unit of measurement as the original data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

68-95-99.7 rule

A

In a normal distribution, about 68% of the values lie within 1
standard deviation (SD) of the mean, about 95% within 2
SD, and 99.7% within 3 SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a correlation?

A

It is a way of measuring the extent to which two
variables are related

Describes the strength and direction of the
relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Three questions we should ask about a correlation
coefficient

A
  • What is the strength of the relationship?
  • What is the direction of the relationship?
  • Is the relationship statistically significant (e.g., at p < .05)?
    (more on this point later)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

The correlation coefficient is an effect size

A

– ±.1 = small effect / ±.3 = medium effect / ±.5 = large effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

The correlation coefficient varies between

A

-1 and +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Direction of the Correlation

Positive correlation (+)

A
  • The correlation is said to
    be positive if the values of
    two variables change in the
    same direction
  • As A is increasing, B is
    increasing
  • As A is decreasing, B is
    decreasing
  • Example: Height and weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Direction of the Correlation

Negative correlation (-)

A
  • The correlation is said to
    be negative when the
    values of two variables
    change in the opposite
    direction
  • As A is increasing, B is
    decreasing
  • As A is decreasing, B is
    increasing
  • Example: Hours of Netflix
    watched and academic
    grades
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Perfect positive correlation

A

r = +1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Strong positive correlation

A

r = + 0.8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Moderate positive correlation

A

r = + 0.4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Perfect negative correlation

A

r = -1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Strong negative correlation

A

r = -.80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Weak negative correlation

A

r = - 0.2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

No correlation

A

r = 0.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Determine the strength of the relationship

A

– Size of r can range from -1 to +1
– ±.1 = small effect; ±.3 = medium effect; ±.5 = large effect

43
Q

Determine the direction of the relationship

A

– Is there a negative sign in front of the r value? If so, then it is a
negative relationship.

44
Q

Assess the significance level

A

– It is statistically significant if its p-value is less than α (e.g., .05)

45
Q

Coefficient of Determination

A

A convenient way of interpreting the value of the
correlation coefficient is to use the square of the
coefficient of correlation, which is called the
Coefficient of Determination
– Coefficient of Determination = r2

46
Q

Suppose: r = 0.9, r2 = 0.81

A

This would mean that 81% of the variation in the
dependent variable has been explained by the
independent variable

47
Q

The maximum value of r2 is

A

1

Because it is possible to explain all of the variation in
Y, but it is not possible to explain more than all of it

48
Q

Cronbach’s α (also known as Coefficient α)

A
  • Popular measure
    – Don’t confuse this alpha with Type I error rate
  • A function of the average correlation among scale
    items and the number of items on the scale
  • Rule of thumb: Use α >.70 as “acceptable”
  • Text includes discussion of item-level diagnostics
    (see p. 272) [Note: use of , instead of .]
49
Q

Inferential statistics

A

Used to go beyond the available (sample) data
to make inferences about characteristics of a
population

50
Q

Null Hypothesis Significance Testing

A
  • The basic process:
    – A result is summarized with a sample statistic
    – The amount of sampling error associated with that statistic is
    estimated
    – The difference between the statistic and the corresponding
    population parameter in the null hypothesis (e.g., 0) is evaluated
    against estimated sampling error
    – A software package will convert the observed (sample) value of a
    test statistic to a probability (p-value; probability of the data given
    that the null hypothesis is true)
  • OR a Table is used to find a critical value that defines the null hypothesis
    rejection region
    – The observed p-value is compared to alpha (α), the a priori level of
    statistical significance. If p < α, the null hypothesis is rejected
  • OR a test statistic is compared to the critical value to determine if the null
    hypothesis should be rejected or retained
51
Q

The Logic of NHST

A
  • “A ritualized exercise of devil’s advocacy” (Abelson, 1995, p. 9).
  • “In hypothesis testing, belief in the validity of the null
    hypothesis continues, unless evidence collected from a sample
    is sufficient to make continued belief appear unreasonable”
    (Jaeger, 1990, p. 164).
    – If “unreasonable,” reject the null hypothesis in
    favour of an alternate hypothesis.
  • SAY: “Reject” or “fail to reject” the null hypothesis.
  • BEWARE of language like “accept” the null.
  • We do not “prove” the alternate hypothesis, if we have
    evidence to reject the null hypothesis we have evidence
    that “supports” the alternate hypothesis
52
Q

What is NHST?

A

Null Hypothesis Significance Testing

53
Q

Steps in NHST

A
  • State the null (H0) and alternative (HA) hypotheses
  • Select the appropriate statistical test based on whether
    data are parametric or nonparametric
    – Interval or ratio data: Use parametric procedures
  • E.g., t-test, ANOVA, Pearson’s correlation, regression, etc.
    – Ordinal or nominal data: Use nonparametric procedures
  • E.g., chi-square test, Spearman rank order coefficient, etc.
  • Decide on the desired level of significance (e.g., α = .05)
  • Collect the data and compute the appropriate test
    statistic to see if level of significance is met (p < α)
  • Reject or do not reject the null hypothesis
  • Evaluate the meaningfulness of the findings
54
Q

What does it mean to say that a result is statistically
significant at the .05-level?

A

– A conditional probability:
* p (Data | Null Hypothesis is True)
* The probability of the data, given the null hypothesis is true, is
< .05.
* Assuming that the null hypothesis is true, and the study is
repeated many times by drawing random samples from the same
population, less than 5% of those results will be even more
inconsistent with the null hypothesis.

55
Q

Regression (Ordinary Least Squares)

Multiple Regression

A

– Uses more than one predictor
– The slopes (b or betas) for
each predictor are affected by
the other variables in the
equation
* Interpret with caution, in
context
– Hierarchical regression and
change in R-squared
* Useful for assessing
incremental explanation of
variance in Y by a second (set
of) predictor(s)
– Use of regression to test
moderation and mediation is
beyond our current scope

56
Q

3 things to check in OLS regression

A

1) What is the proportion of variance
explained by the model? (R2)
2) Is the overall model significant? (F-test)
3) Which regression parameters/coefficients
are significant? (b or beta values)

– Note: Recognize that regression capitalizes on
chance by minimizing errors of prediction for
the sample data → performance with new data?

57
Q

Screen and clean data

A

– “Dirty” data can harm conclusion validity and
credibility
* But do not change data to favour hypotheses

58
Q

Data coding

A

In quantitative research data coding involves assigning a number
to the participants’ responses so they can be entered into a
database.

59
Q

Data editing

A

Data editing deals with detecting and correcting illogical,
inconsistent, or illegal data and omissions in the information
returned by the participants of the study.

60
Q

Outlier

A

An observation that is substantially different from the other
observations.

61
Q

Data transformation

A

The process of changing the original numerical representation of
a quantitative value to another value.

62
Q

Descriptive statistics

A

Statistics such as frequencies, the mean, and the standard
deviation, which provide descriptive information about a set of
data.

63
Q

Frequencies

A

The number of times various subcategories of a phenomenon
occur, from which the percentage and cumulative percentage of
any occurrence can be calculated.

64
Q

Measure of central tendency

A

Descriptive statistics of a data set such as the mean, median, or
mode.

65
Q

Measure of dispersion

A

The variability in a set of observations, represented by the range,
variance, standard deviation, and the interquartile range.

66
Q

Mean

A

The average of a set of figures.

67
Q

Median

A

The central item in a group of observations arranged in an
ascending or descending order.

68
Q

Mode

A

The most frequently occurring number in a data set.

69
Q

Range

A

The spread in a set of numbers indicated by the difference in the
two extreme values in the observations.

70
Q

Variance

A

Indicates the dispersion of a variable in the data set, and is
obtained by subtracting the mean from each of the observations,
squaring the results, summing them, and dividing the total by the
number of observations.

71
Q

Standard deviation

A

A measure of dispersion for parametric data; the square root of
the variance.

72
Q

Nonparametric test

A

A hypothesis test that does not require certain assumptions about
the population’s distribution, such as that the population follows
a normal distribution.

73
Q

Correlation matrix

A

A correlation matrix is used to examine relationships between
interval and/or ratio variables.

74
Q

Chi‐square test

A

A nonparametric test that establishes the independence or
otherwise between two nominal variables.

75
Q

Criterion-related validity

A

That which is established when the measure differentiates
individuals on a criterion that it is expected to predict.

76
Q

Factorial validity

A

That which indicates, through the use of factor analytic
techniques, whether a test is a pure measure of some specific
factor or dimension.

77
Q

Convergent validity

A

That which is established when the scores obtained by two
different instruments measuring the same concept, or by
measuring the concept by two different methods, are highly
correlated.

78
Q

Discriminant validity

A

That which is established when two variables are theorized to be
uncorrelated, and the scores obtained by measuring them are
indeed empirically found to be so.

79
Q

Inferential statistics

A

Statistics that help to establish relationships among variables and
draw conclusions therefrom.

80
Q

Type I error (α)

A

The probability of rejecting the null hypothesis when it is actually
true.

81
Q

Type II error (β)

A

The probability of failing to reject the null hypothesis given that
the alternative hypothesis is actually true.

82
Q

Statistical power (1 – β)

A

The probability of correctly rejecting the null hypothesis.

83
Q

One sample t‐test

A

A test that is used to test the hypothesis that the mean of the
population from which a sample is drawn is equal to a
comparison standard.

84
Q

Paired samples t‐test

A

Test that examines the differences in the same group before and
after a treatment.

85
Q

Wilcoxon signed‐rank test

A

A nonparametric test used to examine differences between two
related samples or repeated measurements on a single sample. It
is used as an alternative to a paired samples t‐test when the
population cannot be assumed to be normally distributed.

86
Q

McNemar’s test

A

A nonparametric method used on nominal data. It assesses the
significance of the difference between two dependent samples
when the variable of interest is dichotomous.

87
Q

Independent samples t‐test

A

Test that is done to see if there are significant differences in the
means for two groups in the variable of interest.

88
Q

Nominal scale

A

A scale that categorizes individuals or objects into mutually
exclusive and collectively exhaustive groups, and offers basic,
categorical information on the variable of interest.

89
Q

Interval scale

A

A multipoint scale that taps the differences, the order, and the
equality of the magnitude of the differences in the responses.

90
Q

Ratio scale

A

A scale that has an absolute zero origin, and hence indicates not
only the magnitude, but also the proportion, of the differences.

91
Q

ANOVA

A

Stands for analysis of variance, which tests for significant mean
differences in variables among multiple groups.

92
Q

Regression analysis

A

Used in a situation where one or more metric independent
variable(s) is (are) hypothesized to affect a metric dependent
variable.

93
Q

Multiple regression analysis

A

A statistical technique to predict the variance in the dependent
variable by regressing the independent variables against it.

94
Q

Standardized regression coefficients (or beta coefficients)

A

The estimates resulting from a multiple regression analysis
performed on variables that have been standardized (a process
whereby the variables are transformed into variables with a mean
of 0 and a standard deviation of 1).

95
Q

Dummy variable

A

A variable that has two or more distinct levels, which are coded 0
or 1.

96
Q

Multicollinearity

A

A statistical phenomenon in which two or more independent
variables in a multiple regression model are highly correlated.

97
Q

Discriminant analysis

A

A statistical technique that helps to identify the independent
variables that discriminate a nominally scaled dependent variable
of interest.

98
Q

Logistic regression

A

A specific form of regression analysis in which the dependent
variable is a nonmetric, dichotomous variable.

99
Q

Conjoint analysis

A

A multivariate statistical technique used to determine the relative
importance respondents attach to attributes and the utilities they
attach to specific levels of attributes.

100
Q

Two-way ANOVA

A

A statistical technique that can be used to examine the effect of
two nonmetric independent variables on a single metric
dependent variable.

101
Q

MANOVA

A

A statistical technique that is similar to ANOVA, with the
difference that ANOVA tests the mean differences of more than
two groups on one dependent variable, whereas MANOVA tests
mean differences among groups across several dependent
variables simultaneously, by using sums of squares and cross‐
product matrices.

102
Q

Canonical correlation

A

A statistical technique that examines the relationship between
two or more dependent variables and several independent
variables.

103
Q

Parametric test

A

A hypothesis test that assumes that your data follow a specific
distribution.

104
Q

Operations research

A

A quantitative approach taken to analyze and solve problems of
complexity.

105
Q

Data mining

A

Helps to trace patterns and relationships in the data stored in the
data warehouse.