CHAPTER 2 Correlation: What Is It and What Is It Good For? Flashcards

1
Q

What do correlations tell us?

A

The extent to which two features of the world tend to occur together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is required to measure correlations?

A

Data with variation in both features of the world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the potential uses of correlations?

A
  • Description
  • Forecasting
  • Causal inference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does correlation not imply?

A

Causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the definition of a correlation?

A

The extent to which two features of the world tend to occur together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the three types of correlation?

A
  • Positively correlated
  • Uncorrelated
  • Negatively correlated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an example of a binary variable?

A

Whether it is after noon or before noon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the resource curse?

A

The idea that countries with an abundance of natural resources are often less economically developed and less democratic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is a country classified as a major oil producer?

A

If it exports more than forty thousand barrels per day per million people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Table 2.1 illustrate?

A

The correlation between oil production and type of government.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a positive correlation between two features indicate?

A

That they tend to occur together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a scatter plot?

A

A simple graph that shows the relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the slope of the line of best fit indicate?

A

The relationship between two continuous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a negative slope indicate?

A

A negative correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What must you have to establish whether a correlation exists?

A

Variation in both variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which statements describe a correlation?

A
  • Cities with more crime tend to hire more police officers
  • Older people vote more than younger people
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the key issue with statement 4 regarding politicians facing a scandal?

A

It does not compare the rate of reelection for those facing scandal to those not facing scandal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What data is needed to assess the correlation between scandal and reelection?

A

Comparison of scandal-plagued members to scandal-free members.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does Table 2.2 show?

A

That there is a slight negative correlation between facing a scandal and winning reelection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Fill in the blank: Correlation is the primary tool through which ______ describe the world.

A

[quantitative analysts]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True or False: Correlation can be used for causal inference.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the steepness of the slope in a correlation indicate?

A

The strength of the correlation between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the main issue with statement 4 regarding scandal and reelection?

A

It does not provide enough information to assess a correlation between scandal and reelection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is necessary to determine if there is a correlation between scandal and winning reelection?

A

Compare the share of politicians facing a scandal who win reelection to the share of scandal-free politicians who win.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does statement 2 imply about cities with more crime?

A

Cities with more crime have, on average, larger police forces than cities with less crime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does statement 5 indicate about voting behavior?

A

Older people tend to vote at higher rates than younger people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the three uses of correlation mentioned in the text?

A
  • Description
  • Forecasting
  • Causal inference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the most straightforward use of correlations?

A

Describing the relationships between features of the world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

In the context of voting, what does a slope of 0.006 indicate?

A

For every additional year of age, the chances of turning out to vote increase by 0.6 percentage points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does a descriptive analysis of age and voting turnout reveal?

A

Younger people were less likely than older people to vote in the 2014 election.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does forecasting involve?

A

Using information from some sample population to make predictions about a different population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Why is accurate forecasting of voter turnout rates important for an electoral campaign?

A

It improves the efficiency of targeting supporters for mobilization efforts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What must be considered when using a correlation for forecasting?

A

Whether the relationship is indicative of a broader phenomenon and if the sample is representative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the risk of extrapolating predictions beyond the range of available data?

A

Predictions may not be accurate for populations not represented in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a potential issue when using correlations for prediction?

A

The act of using correlations for prediction can change the relationships observed in the past.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What ethical consideration is raised regarding the use of online reviews in predicting health code violations?

A

Online reviewers may be biased, leading to disproportionate targeting of certain types of restaurants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What does correlation not imply when discussing causal relationships?

A

Correlation does not imply causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is a potential explanation for the correlation between advanced math classes and college completion?

A

Students who take advanced math may be more academically motivated, which is correlated with college completion.

39
Q

What must analysts be cautious about when inferring causality from correlations?

A

They must consider that other factors may be influencing the observed correlation.

40
Q

What is the correlation between taking calculus and graduating college?

A

There is a positive correlation between taking calculus and graduating college, but it may not imply causation.

41
Q

What could be an alternative explanation for the correlation between calculus and college completion?

A

Motivated students are more likely to take calculus and also more likely to graduate college.

42
Q

What is the implication if requiring a student to take calculus helps with college completion?

A

It suggests that calculus provides better preparation for college.

43
Q

What could be a negative consequence of requiring students to take calculus?

A

It might impose real costs in terms of self-esteem, motivation, or time without benefits.

44
Q

What mistake did researchers make in their study of high school math courses?

A

They mistook correlation for causation in recommending intensive math courses to increase college graduation chances.

45
Q

True or False: It is generally correct to infer causality from correlations.

46
Q

What are the three common statistics used to measure correlation?

A
  • Covariance
  • Correlation coefficient
  • Slope of the regression line
47
Q

What does the mean represent in statistics?

A

The mean is the average value of a variable’s distribution.

48
Q

How is variance calculated?

A

Variance is calculated as the average of the squared deviations from the mean.

49
Q

What does a high variance indicate about a variable?

A

It indicates that the individual values of the variable are spread out from the mean.

50
Q

What is standard deviation?

A

Standard deviation is the square root of the variance, measuring how spread out a variable’s distribution is.

51
Q

What does covariance measure?

A

Covariance measures how two variables change together.

52
Q

What does a positive covariance indicate?

A

It indicates a positive correlation between the two variables.

53
Q

What is the correlation coefficient?

A

The correlation coefficient is the covariance divided by the product of the standard deviations of the two variables.

54
Q

What is the range of values for the correlation coefficient?

A

The correlation coefficient ranges from -1 to 1.

55
Q

What does a correlation coefficient of 1 indicate?

A

It indicates a perfect positive correlation.

56
Q

What does r-squared represent?

A

R-squared represents the proportion of variation in one variable explained by another.

57
Q

What is a limitation of the correlation coefficient?

A

It does not indicate the size or substantive importance of the relationship between variables.

58
Q

What does the slope of the regression line indicate?

A

The slope indicates the expected change in the dependent variable for a one-unit change in the independent variable.

59
Q

Fill in the blank: The mean is denoted by _______.

60
Q

Fill in the blank: The variance is denoted by _______.

61
Q

Fill in the blank: The standard deviation is denoted by _______.

62
Q

What is the line of best fit?

A

A line that minimizes how far data points are from the line on average, according to some measure of distance from data to the line.

63
Q

What does the ordinary least squares (OLS) regression line do?

A

Minimizes the sum of squared errors.

64
Q

How is the slope of the regression line calculated?

A

From the covariance and variance.

65
Q

What does the slope of the regression line indicate?

A

How much Y changes, on average, as X increases by one unit.

66
Q

True or False: The slope of the regression line can be called the regression coefficient.

67
Q

What is the difference between population and sample statistics?

A

Population statistics correspond to the whole population, while sample statistics correspond to a subset of that population.

68
Q

What does it mean if two variables are positively correlated?

A

Higher (lower) values of one variable tend to occur with higher (lower) values of another variable.

69
Q

What does it mean if two variables are negatively correlated?

A

Higher (lower) values of one variable tend to occur with lower (higher) values of another variable.

70
Q

What does it mean if two variables are uncorrelated?

A

There is no correlation between the two variables.

71
Q

Fill in the blank: The average of the square of the deviations from the mean is called _______.

A

Variance (σ²)

72
Q

What is the standard deviation?

A

The square root of the variance.

73
Q

What is covariance?

A

A measure of the correlation between two variables, calculated as the average of the product of the deviations from the mean.

74
Q

What does the correlation coefficient (r) represent?

A

A measure of the correlation between two variables, taking a value between -1 and 1.

75
Q

What does R² represent?

A

The square of the correlation coefficient, interpreted as the proportion of variation in one variable explained by the other.

76
Q

What is the sum of squared errors?

A

The sum of the square of the distance from each data point to a given line of best fit.

77
Q

What is the significance of linear relationships in data analysis?

A

They are often interesting and important, but not all relationships are linear.

78
Q

What is an example of non-linear relationships being useful?

A

Drawing two lines of best fit for different ranges of a variable.

79
Q

What happens to the relationship between two variables if you zoom in on a graph?

A

Non-linear relationships may appear approximately linear.

80
Q

What should be cautious about when extrapolating data?

A

Predictions become less accurate as you move farther from the observed range of data.

81
Q

What is the slope of a line?

A

How much the line changes on the vertical axis as you move one unit along the horizontal axis.

82
Q

Fill in the blank: The distance between an observation’s value for some variable and the mean of that variable is called _______.

A

Deviation from the mean

83
Q

What is the average value of a variable called?

84
Q

True or False: Correlations can be used for description, forecasting, and causal inference.

85
Q

What is the relationship between correlation and causation?

A

Correlation need not imply causation.

86
Q

What is the title of the article by Emily Badger regarding online reviews?

A

How Yelp Might Clean Up the Restaurant Industry

Published in The Atlantic, July/August 2013

87
Q

Who are the authors of the article discussing algorithms and public enforcement?

A

Kristen M. Altenburger and Daniel E. Ho

The article is titled ‘When Algorithms Import Private Bias into Public Enforcement: The Promise and Limitations of Statistical Debiasing Solutions’

88
Q

What is the focus of the study by Jerry Trusty and Spencer G. Niles?

A

The relationship between high-school math courses and completion of the Bachelor’s degree

Published in Professional School Counseling, 2003

89
Q

Which authors explored the use of forecasting in policy problems?

A

Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer

Their work is titled ‘Prediction Policy Problems’ and published in American Economic Review, 2015

90
Q

Fill in the blank: The study by Jerry Trusty and Spencer G. Niles focuses on high-school math courses and __________.

A

completion of the Bachelor’s degree

91
Q

True or False: Emily Badger’s article discusses the impact of Yelp on the restaurant industry.

92
Q

What is the publication year of the article by Kristen M. Altenburger and Daniel E. Ho?

A

2018

The article appears in the Journal of Institutional and Theoretical Economics

93
Q

Fill in the blank: Jon Kleinberg and others published their work on forecasting in __________.

A

American Economic Review

94
Q

What is the key topic discussed in the article ‘Prediction Policy Problems’?

A

The growing use of forecasting and prediction in addressing important policy problems