Chapter 4 Flashcards
Bivariate Data
Bivariate data consists of two variables that are related.
It is best represented using a scatter diagram.
Independent and Dependent Variables
Independent variable (explanatory variable): The variable that is controlled or chosen (plotted on the x-axis).
Dependent variable (response variable): The variable that is measured (plotted on the y-axis).
Scatter Diagrams
Each cross (point) represents a data pair.
Helps identify correlation trends between two variables.
Types of Correlation
Strong Positive Correlation: As x increases, y also increases.
Weak Positive Correlation: y increases as x increases, but not as strongly.
No Correlation: No clear pattern between x and y.
Weak Negative Correlation: y decreases slightly as x increases.
Strong Negative Correlation: As x increases, y decreases significantly.
Correlation vs Causation
Correlation does not imply causation.
Even if two variables are correlated, one does not necessarily cause the other to change.
Always consider the context before assuming causation.
Interpreting Scatter Diagrams
When analysing scatter diagrams:
Describe the type of correlation (strong/weak, positive/negative, or none).
Identify outliers (if any).
Consider real-world context (whether correlation suggests causation).
Line of Best Fit (Least Squares Regression Line)
A line of best fit is a straight line that best represents the data.
The least squares regression line is calculated to minimise the sum of squared differences between data points and the line.
Equation of the regression line:
y=a+bx
where:
a is the y-intercept.
b is the gradient (change in y per unit of x).
Interpretation of Gradient (b) in Regression
If b > 0, y increases as x increases (positive correlation).
If b < 0, y decreases as x increases (negative correlation).
Example:
If
𝑏
=
2.3
b=2.3, then for every 1 unit increase in x, y increases by 2.3.
Using the Regression Equation
Used to predict values of y for given x-values.
Only valid within the range of given data (interpolation).
Extrapolation (predicting outside the data range) is unreliable.
Interpolation vs Extrapolation
Interpolation: Predicting values within the given data range (more reliable).
Extrapolation: Predicting values outside the given data range (less reliable).
Exam Tips for Regression
Ensure correct interpretation of gradient.
Only use interpolation for predictions.
Do not assume causation from correlation.
Check whether predictions are reasonable based on real-world context.
Real-World Examples of Correlation
Education vs Income: Weak negative correlation.
Temperature vs Rainfall: Weak negative correlation.
House Price vs Internet Speed: Likely no correlation.
Outliers in Regression
Outliers can distort correlation strength and regression line accuracy.
If a value is too far from the line, check for data errors before deciding to keep or remove it.
Summary of Key Points
Bivariate data involves two related variables.
Correlation describes the relationship between variables.
Correlation does not imply causation.
Regression lines predict values of y for given x.
Only use the regression equation within the range of given data.