Correlation and Regression Analysis Flashcards

1
Q

What does correlation measure in statistics?

A

Correlation measures the strength and direction of the relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the difference between positive and negative correlation.

A

Positive correlation means that as one variable increases, the other also tends to increase, while negative correlation means that as one variable increases, the other tends to decrease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the strength of correlation determined?

A

The strength of correlation is determined by the absolute value of the correlation coefficient, with values closer to 1 indicating stronger correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a correlation coefficient of 0 indicate?

A

A correlation coefficient of 0 indicates no linear relationship between the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the range of values for the Pearson correlation coefficient?

A

The range of values for the Pearson correlation coefficient is -1 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When should you use Spearman’s rank correlation coefficient instead of Pearson’s?

A

Spearman’s rank correlation coefficient is used when the relationship between variables is not linear or when the data are ordinal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the process of linear regression analysis.

A

Linear regression analysis involves fitting a straight line to the data points to model the relationship between a dependent variable and one or more independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between simple linear regression and multiple linear regression?

A

Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you interpret the slope coefficient in regression analysis?

A

The slope coefficient represents the change in the dependent variable for a one-unit change in the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the intercept term represent in a regression equation?

A

The intercept term represents the value of the dependent variable when all independent variables are set to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What assumptions must be met for regression analysis?

A

Assumptions include linearity, independence of errors, homoscedasticity, and normality of errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of residual analysis in regression?

A

Residual analysis involves examining the differences between observed and predicted values to assess the model’s performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are influential points in regression analysis?

A

Influential points are data points that have a large impact on the regression coefficients and may significantly alter the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you assess the goodness of fit in regression analysis?

A

Goodness of fit is assessed using measures like R-squared, which indicates the proportion of variance in the dependent variable explained by the independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is multicollinearity, and why is it problematic in regression?

A

Multicollinearity occurs when independent variables in a regression model are highly correlated, leading to unreliable estimates of regression coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain the concept of homoscedasticity in regression analysis.

A

Homoscedasticity means that the variance of the errors is constant across all levels of the independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is autocorrelation, and how does it affect regression analysis?

A

Autocorrelation occurs when errors in a regression model are correlated with each other, violating the assumption of independence of errors.

18
Q

What is heteroscedasticity, and how does it differ from homoscedasticity?

A

Heteroscedasticity means that the variance of the errors is not constant across all levels of the independent variables.

19
Q

When should you use logistic regression instead of linear regression?

A

Logistic regression is used when the dependent variable is binary or categorical.

20
Q

What is the purpose of diagnostic plots in regression analysis?

A

Diagnostic plots help identify potential problems with the regression model, such as non-linearity or heteroscedasticity.

21
Q

How do you detect outliers in regression analysis?

A

Outliers can be detected by examining residual plots or leverage statistics.

22
Q

What are the assumptions of logistic regression?

A

Assumptions include linearity of the logit, independence of observations, absence of multicollinearity, and absence of influential points.

23
Q

How do you interpret odds ratios in logistic regression?

A

Odds ratios represent the change in odds of the dependent variable for a one-unit change in the independent variable.

24
Q

What is the Akaike Information Criterion (AIC), and how is it used in regression analysis?

A

AIC is a measure of the relative quality of statistical models, with lower values indicating better fit.

25
Q

What is stepwise regression, and when is it appropriate?

A

Stepwise regression is a method for selecting the most important independent variables for inclusion in a regression model.

26
Q

How does regularization help improve regression models?

A

Regularization helps prevent overfitting by penalizing large coefficients in regression models.

27
Q

What is the purpose of cross-validation in regression analysis?

A

Cross-validation is used to evaluate the performance of a regression model on unseen data.

28
Q

What are the limitations of regression analysis?

A

Limitations include the assumption of linearity, sensitivity to outliers, and potential violations of regression assumptions.

29
Q

How can you assess the linearity assumption in regression?

A

Linearity assumption can be assessed using scatter plots or residual plots.

30
Q

Explain the difference between correlation and causation.

A

Correlation indicates a relationship between variables, while causation implies that one variable directly influences the other.

31
Q

What is a dummy variable, and how is it used in regression analysis?

A

A dummy variable is a binary variable used to represent categories in regression analysis.

32
Q

How do you handle missing data in regression analysis?

A

Missing data can be handled using methods like imputation or excluding incomplete cases.

33
Q

What is the difference between a parametric and a non-parametric regression model?

A

Parametric regression models assume a specific form for the relationship between variables, while non-parametric models do not.

34
Q

How do you interpret R-squared in regression analysis?

A

R-squared represents the proportion of variance in the dependent variable explained by the independent variables.

35
Q

What is the difference between prediction and inference in regression analysis?

A

Prediction involves estimating future outcomes based on the model, while inference involves understanding the relationship between variables.

36
Q

What is the Durbin-Watson statistic, and what does it indicate?

A

Durbin-Watson statistic tests for the presence of autocorrelation in the residuals, with values close to 2 indicating no autocorrelation.

37
Q

How do you handle multicollinearity in regression analysis?

A

Multicollinearity can be handled by removing one of the correlated variables or by using techniques like principal component analysis.

38
Q

What is the purpose of interaction terms in regression?

A

Interaction terms allow for the examination of how the relationship between two variables changes depending on the value of a third variable.

39
Q

Why is it important to check for normality in regression residuals?

A

Checking for normality in regression residuals ensures that the assumption of normally distributed errors is met.

40
Q

What are the assumptions of robust regression?

A

Assumptions of robust regression include linearity, independence of errors, and homoscedasticity, but it is less sensitive to violations of these assumptions.

41
Q

How do you assess the model’s predictive performance in regression analysis?

A

The model’s predictive performance can be assessed using metrics like mean squared error or cross-validated R-squared.

42
Q

What is the role of feature selection in regression modeling?

A

Feature selection involves choosing the most relevant independent variables for inclusion in the regression model.