CHAPTER 2 Correlation: What Is It and What Is It Good For? Flashcards
What do correlations tell us?
The extent to which two features of the world tend to occur together.
What is required to measure correlations?
Data with variation in both features of the world.
What are the potential uses of correlations?
- Description
- Forecasting
- Causal inference
What does correlation not imply?
Causation.
What is the definition of a correlation?
The extent to which two features of the world tend to occur together.
What are the three types of correlation?
- Positively correlated
- Uncorrelated
- Negatively correlated
What is an example of a binary variable?
Whether it is after noon or before noon.
What is the resource curse?
The idea that countries with an abundance of natural resources are often less economically developed and less democratic.
How is a country classified as a major oil producer?
If it exports more than forty thousand barrels per day per million people.
What does Table 2.1 illustrate?
The correlation between oil production and type of government.
What does a positive correlation between two features indicate?
That they tend to occur together.
What is a scatter plot?
A simple graph that shows the relationship between two variables.
What does the slope of the line of best fit indicate?
The relationship between two continuous variables.
What does a negative slope indicate?
A negative correlation.
What must you have to establish whether a correlation exists?
Variation in both variables.
Which statements describe a correlation?
- Cities with more crime tend to hire more police officers
- Older people vote more than younger people
What is the key issue with statement 4 regarding politicians facing a scandal?
It does not compare the rate of reelection for those facing scandal to those not facing scandal.
What data is needed to assess the correlation between scandal and reelection?
Comparison of scandal-plagued members to scandal-free members.
What does Table 2.2 show?
That there is a slight negative correlation between facing a scandal and winning reelection.
Fill in the blank: Correlation is the primary tool through which ______ describe the world.
[quantitative analysts]
True or False: Correlation can be used for causal inference.
True.
What does the steepness of the slope in a correlation indicate?
The strength of the correlation between the two variables.
What is the main issue with statement 4 regarding scandal and reelection?
It does not provide enough information to assess a correlation between scandal and reelection.
What is necessary to determine if there is a correlation between scandal and winning reelection?
Compare the share of politicians facing a scandal who win reelection to the share of scandal-free politicians who win.
What does statement 2 imply about cities with more crime?
Cities with more crime have, on average, larger police forces than cities with less crime.
What does statement 5 indicate about voting behavior?
Older people tend to vote at higher rates than younger people.
What are the three uses of correlation mentioned in the text?
- Description
- Forecasting
- Causal inference
What is the most straightforward use of correlations?
Describing the relationships between features of the world.
In the context of voting, what does a slope of 0.006 indicate?
For every additional year of age, the chances of turning out to vote increase by 0.6 percentage points.
What does a descriptive analysis of age and voting turnout reveal?
Younger people were less likely than older people to vote in the 2014 election.
What does forecasting involve?
Using information from some sample population to make predictions about a different population.
Why is accurate forecasting of voter turnout rates important for an electoral campaign?
It improves the efficiency of targeting supporters for mobilization efforts.
What must be considered when using a correlation for forecasting?
Whether the relationship is indicative of a broader phenomenon and if the sample is representative.
What is the risk of extrapolating predictions beyond the range of available data?
Predictions may not be accurate for populations not represented in the data.
What is a potential issue when using correlations for prediction?
The act of using correlations for prediction can change the relationships observed in the past.
What ethical consideration is raised regarding the use of online reviews in predicting health code violations?
Online reviewers may be biased, leading to disproportionate targeting of certain types of restaurants.
What does correlation not imply when discussing causal relationships?
Correlation does not imply causation.
What is a potential explanation for the correlation between advanced math classes and college completion?
Students who take advanced math may be more academically motivated, which is correlated with college completion.
What must analysts be cautious about when inferring causality from correlations?
They must consider that other factors may be influencing the observed correlation.
What is the correlation between taking calculus and graduating college?
There is a positive correlation between taking calculus and graduating college, but it may not imply causation.
What could be an alternative explanation for the correlation between calculus and college completion?
Motivated students are more likely to take calculus and also more likely to graduate college.
What is the implication if requiring a student to take calculus helps with college completion?
It suggests that calculus provides better preparation for college.
What could be a negative consequence of requiring students to take calculus?
It might impose real costs in terms of self-esteem, motivation, or time without benefits.
What mistake did researchers make in their study of high school math courses?
They mistook correlation for causation in recommending intensive math courses to increase college graduation chances.
True or False: It is generally correct to infer causality from correlations.
False.
What are the three common statistics used to measure correlation?
- Covariance
- Correlation coefficient
- Slope of the regression line
What does the mean represent in statistics?
The mean is the average value of a variable’s distribution.
How is variance calculated?
Variance is calculated as the average of the squared deviations from the mean.
What does a high variance indicate about a variable?
It indicates that the individual values of the variable are spread out from the mean.
What is standard deviation?
Standard deviation is the square root of the variance, measuring how spread out a variable’s distribution is.
What does covariance measure?
Covariance measures how two variables change together.
What does a positive covariance indicate?
It indicates a positive correlation between the two variables.
What is the correlation coefficient?
The correlation coefficient is the covariance divided by the product of the standard deviations of the two variables.
What is the range of values for the correlation coefficient?
The correlation coefficient ranges from -1 to 1.
What does a correlation coefficient of 1 indicate?
It indicates a perfect positive correlation.
What does r-squared represent?
R-squared represents the proportion of variation in one variable explained by another.
What is a limitation of the correlation coefficient?
It does not indicate the size or substantive importance of the relationship between variables.
What does the slope of the regression line indicate?
The slope indicates the expected change in the dependent variable for a one-unit change in the independent variable.
Fill in the blank: The mean is denoted by _______.
μ
Fill in the blank: The variance is denoted by _______.
σ²
Fill in the blank: The standard deviation is denoted by _______.
σ
What is the line of best fit?
A line that minimizes how far data points are from the line on average, according to some measure of distance from data to the line.
What does the ordinary least squares (OLS) regression line do?
Minimizes the sum of squared errors.
How is the slope of the regression line calculated?
From the covariance and variance.
What does the slope of the regression line indicate?
How much Y changes, on average, as X increases by one unit.
True or False: The slope of the regression line can be called the regression coefficient.
True
What is the difference between population and sample statistics?
Population statistics correspond to the whole population, while sample statistics correspond to a subset of that population.
What does it mean if two variables are positively correlated?
Higher (lower) values of one variable tend to occur with higher (lower) values of another variable.
What does it mean if two variables are negatively correlated?
Higher (lower) values of one variable tend to occur with lower (higher) values of another variable.
What does it mean if two variables are uncorrelated?
There is no correlation between the two variables.
Fill in the blank: The average of the square of the deviations from the mean is called _______.
Variance (σ²)
What is the standard deviation?
The square root of the variance.
What is covariance?
A measure of the correlation between two variables, calculated as the average of the product of the deviations from the mean.
What does the correlation coefficient (r) represent?
A measure of the correlation between two variables, taking a value between -1 and 1.
What does R² represent?
The square of the correlation coefficient, interpreted as the proportion of variation in one variable explained by the other.
What is the sum of squared errors?
The sum of the square of the distance from each data point to a given line of best fit.
What is the significance of linear relationships in data analysis?
They are often interesting and important, but not all relationships are linear.
What is an example of non-linear relationships being useful?
Drawing two lines of best fit for different ranges of a variable.
What happens to the relationship between two variables if you zoom in on a graph?
Non-linear relationships may appear approximately linear.
What should be cautious about when extrapolating data?
Predictions become less accurate as you move farther from the observed range of data.
What is the slope of a line?
How much the line changes on the vertical axis as you move one unit along the horizontal axis.
Fill in the blank: The distance between an observation’s value for some variable and the mean of that variable is called _______.
Deviation from the mean
What is the average value of a variable called?
Mean (μ)
True or False: Correlations can be used for description, forecasting, and causal inference.
True
What is the relationship between correlation and causation?
Correlation need not imply causation.
What is the title of the article by Emily Badger regarding online reviews?
How Yelp Might Clean Up the Restaurant Industry
Published in The Atlantic, July/August 2013
Who are the authors of the article discussing algorithms and public enforcement?
Kristen M. Altenburger and Daniel E. Ho
The article is titled ‘When Algorithms Import Private Bias into Public Enforcement: The Promise and Limitations of Statistical Debiasing Solutions’
What is the focus of the study by Jerry Trusty and Spencer G. Niles?
The relationship between high-school math courses and completion of the Bachelor’s degree
Published in Professional School Counseling, 2003
Which authors explored the use of forecasting in policy problems?
Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer
Their work is titled ‘Prediction Policy Problems’ and published in American Economic Review, 2015
Fill in the blank: The study by Jerry Trusty and Spencer G. Niles focuses on high-school math courses and __________.
completion of the Bachelor’s degree
True or False: Emily Badger’s article discusses the impact of Yelp on the restaurant industry.
True
What is the publication year of the article by Kristen M. Altenburger and Daniel E. Ho?
2018
The article appears in the Journal of Institutional and Theoretical Economics
Fill in the blank: Jon Kleinberg and others published their work on forecasting in __________.
American Economic Review
What is the key topic discussed in the article ‘Prediction Policy Problems’?
The growing use of forecasting and prediction in addressing important policy problems