L9 - Regression analysis Flashcards

1
Q

Cross-tabulation: Chi-square test definition (Malhotra, 2013)

A

A statistical technique that describes two or more variables simultaneously and results in a table that reflects the joint distribution of two or more variables that have a limited number of categories or distinct value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When to use cross-tabulation

A
  • to test the difference/association between variables
  • to compare the behaviour and intentions for different categories of predictor variables such as income, sex and marital status.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Role of Cross-tabulation (Malhotra, 2013)

A

(1) Simple to conduct analysis and appealing to less sophisticated researchers.
(2) Results can be easily interpreted and understood.
(3) Clear interpretation provides a stronger link between research results and managerial action.
(4) Greater insights into a complex phenomenon than a single multivariate analysis.
(5) Alleviate the problem of sparse cells in discrete multivariate analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Four possibilities of cross-tabulation for three or more variables (Malhotra, 2013)

A
  • Refined association between two original variables.
  • No association between two original variables despite initial observation.
  • Some association between two original variables despite initial observation.
  • No change in the initial association.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Process in cross-tabulation (Malhotra, 2013)

A

1) Test Ho
2) If reject Ho, determine the strength of association by phi coefficient, contingency coefficient, etc.
3) Interpret the pattern of relationship by computing the percentages in the direction of the independent variables
4) Conclude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cons of Cross-tabulation (Malhotra, 2013)

A

1) Produce an endless variety of cross tabulation tables.

2) Complex and inefficient as it only examines the association between variables, not causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Expected count (expected frequency) calculation

A

fe = nr*nc / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Chi-square calculation

A

X^2 = Σ (observed frequency - expected frequency)^2 / expected frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Chi-square analysis definition (slide)

A
  • assess how closely the observed frequencies fit the pattern of the expected frequencies, and is referred to as a “goodness-of-fit” (poor fit - reject Ho).
  • analyze the nominal-nominal and nominal-ordinal scaled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Chi-square distribution definition

A

A skewed distribution whose shape depends solely on the number of df. As the number of df increases, the chi-square distribution becomes more symmetrical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measures for the strength of association

A

Phi coefficient (Ф), Contingency coefficient, Cramer’s V, Lambda coefficient, Other statistic (tau b, tau c, gamma)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Phi coefficient definition

A

to measure the strength of association in the special case of a table with two rows and two columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Phi coefficient (Ф) calculation

A

Ф = √ ( X2 / n )
+ Ф = 0: no association
+ Ф = 1: perfectly positive association
+ Ф = -1: perfectly negative association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relationships between variables can be described in several ways:

A

Presence, direction, strength of association, and type of relationship (linear or curvilinear).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Covariation definition

A

The amount of change in one variable that is consistently related to the change in another variable of interest. Or simply, it is the degree of association between 2 variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Scatter diagram definition

A

A graphic plot of the relative position of two variables using a horizontal and a vertical axis to represent the values of the respective variables.

17
Q

Pearson correlation coefficient (Product Moment correlation)

A
  • A statistical measure of the strength of a linear relationship between two metric variables.
  • r varies between -1.00 and 1.00.
18
Q

Assumption of Pearson Correlation Coefficient:

A
  • Two variables are used interval/ratio-scaled measures.
  • The linear relationship between the variables of interest.
  • The variables being analyzed have a normally distributed population.
19
Q

When the correlation coefficient is weak, there are two possibilities:

A

(1) there is not a consistent, systematic relationship between the two variables
(2) the association exists, but it is not linear, and other types of relationships must be investigated further.

20
Q

Coefficient of determination (R^2) definition

A
  • Measures the proportion of variation in dependent variable explained by independent variable.
  • The larger of R^2, the stronger the linear relationship.
21
Q

R^2 calculation

A

R^2 = SSR / TSS (explained variation / total variation)

22
Q

Adjusted R^2

A
  • R^2 is adjusted for the number of independent variables and sample size for diminishing return.
  • It indicates how well the model generalizes.
23
Q

Role of the Regression analysis

A
  • Predict the values of the dependent variables.
  • Determine the structure or form of the relationship
  • Indicate relative importance of independent variables
24
Q

Bivariate / Multivariate regression analysis definition

A

Analyzes the linear relationship between two / multiple variables by estimating coefficients for an equation for a straight line.

25
Q

Least squares procedure definition (Hair, 2017)

A

A regression approach that determines the best-fitting line by minimizing the vertical distances of all the points from the line.

26
Q

Unexplained variance definition

A

The amount of variation in the dependent variable that cannot be accounted for by the combination of independent variables.

27
Q

Ordinary least squares (OLS) definition (Hair, 2017)

A

A statistical procedure that results in equation parameters that produce predictions with the lowest sum of squared errors (SSE).

28
Q

Total sum of squares calculation

A

TSS (total sum of squares) = SSR (squares regression + SSE (squares error)

29
Q

Regression coefficients (b)

A
  • It is an indicator of the importance of an independent variable in predicting a dependent variable.
  • Large coefficients are good predictors.
  • Mean of 0 and standard deviation of 1
30
Q

Multiple regression analysis assumptions

A

(1) Linear relationship
(2) Homoskedasticity
(3) Normal curve

31
Q

Homoskedasticity

A

The constant pattern of covariation around the regression line.

32
Q

Heteroskedasticity

A

The inconstant pattern of covariation around the regression line, and varies in some way when the values change from small to medium and large

33
Q

Tolerance indicator definition (slide)

A

The amount of variability of a selected independent variable NOT explained by other independent variables.

  • Tolerance < 0.4&raquo_space; high multicollinearity.
  • VIF (Variance inflation factor) = 1 / tolerance.
34
Q

Multicollinearity definition

A
  • A situation in which several independent variables are highly correlated with each other.
  • Result in difficulty in identifying the impact of each independent variable.
35
Q

Error terms assumption

A

1) The error term is normally distributed.
2) The mean of all error terms is 0.
3) Variance of the error terms is constant.
4) Independent error terms (i.e. error are uncorrelated - relevant for time series data)

36
Q

Assessment of the assumptions about independent variables

A

Multicollinearity and Error terms

37
Q

Spearman rank order correlation coefficient

A

A statistical measure of the linear association between two variables where both have been measured using ordinal (rank order) scales.