IA2 - Exam Flashcards
Bivariate Data
define explanatory variable
- also known as independent variable
- used to explain or predict value of response variable
Bivariate Data
define response variable
- also called dependent variable
- changes in response to the explanatory variable
Bivariate Data
P(event) = ?
P(event) = (number of successful outcomes/ total number of outcomes)
Bivariate Data
how can we tell if there is an association based on percentages?
- if the percentages are very different, there IS and association
- if they are similar there is NO association
Bivariate Data
What are the 6 features of a scatterplot?
- Explanatory variable: x - axis
- response variable: y - axis
- title, axis label (units)
- Arrows
- use ‘lightning bolt’ to show not starting at 0
- use an appropriate scale
Bivariate Data
what are the 2 types of Form (type) used to describe patterns/associations?
- linear
- non-linear
Bivariate Data
what are the 2 types of direction used to describe patterns/associations?
- positive
- negative
Bivariate Data
what are the 5 types of strength used to describe patterns/associations?
- no correlation
- weak
- moderate
- strong
- perfect
Bivariate Data
define pearson’s correlation coefficient
- does not tell if there is an association
- instead assumes there is a linear association
- gives a measurement of it’s strength and direction
Bivariate Data
how can you tell direction and strength from correlation coefficient?
direction = sign (positive or negative)
strength = value (number)
Bivariate Data
how can you tell direction and strength from correlation coefficient?
direction = sign (positive or negative)
strength = value (number)
Bivariate Data
define coefficient of determination (r squared)
R^2 tells us how much of our correlation is because of the two variables
- ie. if R^2 = 0.82, then 82% of effect is because of two variables. Other 18% is due other factors
Bivariate Data
define least squares regression line
line of best fit
- residual tells us how far away our points are from the line of best fit
Bivariate Data
how do you know if your residual is + or -?
- data points above the line of best fit have a positive residual
- data points below the line of best fit have a negative residual
- sum of residuals = 0 in a least squares line of best fit
Bivariate Data
what are the assumptions of using a LSRL?
- numerical data
- linear association
- No clear outliers
Bivariate Data
what is the equation of LSRL?
refer to photo
Bivariate Data
how do you find LSRL using calculator?
refer to photo
Bivariate Data
what is the formula for calculating residual values?
- residual plots mean same thing as LSRL
Bivariate Data
how do you know if residual plots are linear or non-linear?
- even number of points above and below line = linear (R = 0)
- if there is some sort of patterns = non-linear
Bivariate Data
Recall and explain three reasons why causation may not be present?
- common response
- when 2 variables are associated because they are both strongly assoicated with a common third variable - confounding variables
- when there is at least two possible causal explanations for the observed association, but we have no way of knowing their separate effects. The effects of the two possible explanatory variables are said to be confounded because there is no way of knowing which is the actual cause of the association - coincidence
- when it is impossible to identify any feasible confounding variable to explain a particular association
- ie. happens by chance
how do we describe trends in time series plots?
Ignores fluctutaion but reflects overall trend of plot
- positive (upward)
- negative (downward)
- constant
can have multiple trends in the one plot
features of cycles
repeated patterns
usually greater than a year
describe seasonal fluctuations
- seasonal factors (time of day, day of week, month of year, quarter of year, season of year (winter ect)
- quarter = Jan-mar, Apr-Jun, Jul-Sep, Oct-Dec- peaks and troughs consistently occur after the same time interval; e.g. ice-cream sales peak in warmer months and drop away in cooler months
describe outliers
- one-off unanticipated events; can be difficult to recognise, especially if data is irregular or seasonal
- including an outlier may be detrimental for forecasting (predicing)
- possible outliers should be investigated before being ‘eliminated’ from data