L9 - Regression analysis Flashcards

Question 1

Q

Cross-tabulation: Chi-square test definition (Malhotra, 2013)

Answer

A

A statistical technique that describes two or more variables simultaneously and results in a table that reflects the joint distribution of two or more variables that have a limited number of categories or distinct value.

Question 2

Q

When to use cross-tabulation

Answer

A

to test the difference/association between variables
to compare the behaviour and intentions for different categories of predictor variables such as income, sex and marital status.

Question 3

Q

Role of Cross-tabulation (Malhotra, 2013)

Answer

A

(1) Simple to conduct analysis and appealing to less sophisticated researchers.
(2) Results can be easily interpreted and understood.
(3) Clear interpretation provides a stronger link between research results and managerial action.
(4) Greater insights into a complex phenomenon than a single multivariate analysis.
(5) Alleviate the problem of sparse cells in discrete multivariate analysis

Question 4

Q

Four possibilities of cross-tabulation for three or more variables (Malhotra, 2013)

Answer

A

Refined association between two original variables.
No association between two original variables despite initial observation.
Some association between two original variables despite initial observation.
No change in the initial association.

Question 5

Q

Process in cross-tabulation (Malhotra, 2013)

Answer

A

1) Test Ho
2) If reject Ho, determine the strength of association by phi coefficient, contingency coefficient, etc.
3) Interpret the pattern of relationship by computing the percentages in the direction of the independent variables
4) Conclude

Question 6

Q

Cons of Cross-tabulation (Malhotra, 2013)

Answer

A

1) Produce an endless variety of cross tabulation tables.

2) Complex and inefficient as it only examines the association between variables, not causation.

Question 7

Q

Expected count (expected frequency) calculation

Answer

A

fe = nr*nc / n

Question 8

Q

Chi-square calculation

Answer

A

X^2 = Σ (observed frequency - expected frequency)^2 / expected frequency

Question 9

Q

Chi-square analysis definition (slide)

Answer

A

assess how closely the observed frequencies fit the pattern of the expected frequencies, and is referred to as a “goodness-of-fit” (poor fit - reject Ho).
analyze the nominal-nominal and nominal-ordinal scaled.

Question 10

Q

Chi-square distribution definition

Answer

A

A skewed distribution whose shape depends solely on the number of df. As the number of df increases, the chi-square distribution becomes more symmetrical.

Question 11

Q

Measures for the strength of association

Answer

A

Phi coefficient (Ф), Contingency coefficient, Cramer’s V, Lambda coefficient, Other statistic (tau b, tau c, gamma)

Question 12

Q

Phi coefficient definition

Answer

A

to measure the strength of association in the special case of a table with two rows and two columns.

Question 13

Q

Phi coefficient (Ф) calculation

Answer

A

Ф = √ ( X2 / n )
+ Ф = 0: no association
+ Ф = 1: perfectly positive association
+ Ф = -1: perfectly negative association

Question 14

Q

Relationships between variables can be described in several ways:

Answer

A

Presence, direction, strength of association, and type of relationship (linear or curvilinear).

Question 15

Q

Covariation definition

Answer

A

The amount of change in one variable that is consistently related to the change in another variable of interest. Or simply, it is the degree of association between 2 variables.

Question 16

Q

Scatter diagram definition

Answer

A

A graphic plot of the relative position of two variables using a horizontal and a vertical axis to represent the values of the respective variables.

Question 17

Q

Pearson correlation coefficient (Product Moment correlation)

Answer

A

A statistical measure of the strength of a linear relationship between two metric variables.
r varies between -1.00 and 1.00.

Question 18

Q

Assumption of Pearson Correlation Coefficient:

Answer

A

Two variables are used interval/ratio-scaled measures.
The linear relationship between the variables of interest.
The variables being analyzed have a normally distributed population.

Question 19

Q

When the correlation coefficient is weak, there are two possibilities:

Answer

A

(1) there is not a consistent, systematic relationship between the two variables
(2) the association exists, but it is not linear, and other types of relationships must be investigated further.

Question 20

Q

Coefficient of determination (R^2) definition

Answer

A

Measures the proportion of variation in dependent variable explained by independent variable.
The larger of R^2, the stronger the linear relationship.

Question 21

Q

R^2 calculation

Answer

A

R^2 = SSR / TSS (explained variation / total variation)

Question 22

Q

Adjusted R^2

Answer

A

R^2 is adjusted for the number of independent variables and sample size for diminishing return.
It indicates how well the model generalizes.

Question 23

Q

Role of the Regression analysis

Answer

A

Predict the values of the dependent variables.
Determine the structure or form of the relationship
Indicate relative importance of independent variables

Question 24

Q

Bivariate / Multivariate regression analysis definition

Answer

A

Analyzes the linear relationship between two / multiple variables by estimating coefficients for an equation for a straight line.

Question 25

Q

Least squares procedure definition (Hair, 2017)

Answer

A

A regression approach that determines the best-fitting line by minimizing the vertical distances of all the points from the line.

Question 26

Q

Unexplained variance definition

Answer

A

The amount of variation in the dependent variable that cannot be accounted for by the combination of independent variables.

Question 27

Q

Ordinary least squares (OLS) definition (Hair, 2017)

Answer

A

A statistical procedure that results in equation parameters that produce predictions with the lowest sum of squared errors (SSE).

Question 28

Q

Total sum of squares calculation

Answer

A

TSS (total sum of squares) = SSR (squares regression + SSE (squares error)

Question 29

Q

Regression coefficients (b)

Answer

A

It is an indicator of the importance of an independent variable in predicting a dependent variable.
Large coefficients are good predictors.
Mean of 0 and standard deviation of 1

Question 30

Q

Multiple regression analysis assumptions

Answer

A

(1) Linear relationship
(2) Homoskedasticity
(3) Normal curve

Question 31

Q

Homoskedasticity

Answer

A

The constant pattern of covariation around the regression line.

Question 32

Q

Heteroskedasticity

Answer

A

The inconstant pattern of covariation around the regression line, and varies in some way when the values change from small to medium and large

Question 33

Q

Tolerance indicator definition (slide)

Answer

A

The amount of variability of a selected independent variable NOT explained by other independent variables.

Tolerance < 0.4&raquo_space; high multicollinearity.
VIF (Variance inflation factor) = 1 / tolerance.

Question 34

Q

Multicollinearity definition

Answer

A

A situation in which several independent variables are highly correlated with each other.
Result in difficulty in identifying the impact of each independent variable.

Question 35

Q

Error terms assumption

Answer

A

1) The error term is normally distributed.
2) The mean of all error terms is 0.
3) Variance of the error terms is constant.
4) Independent error terms (i.e. error are uncorrelated - relevant for time series data)

Question 36

Q

Assessment of the assumptions about independent variables

Answer

A

Multicollinearity and Error terms

Question 37

Q

Spearman rank order correlation coefficient

Answer

A

A statistical measure of the linear association between two variables where both have been measured using ordinal (rank order) scales.