Test 3: Making sense of data and regressions Flashcards
Making Sense of your Data:
Descriptives and Inferential available across (4) types of data.
Both Descriptives and Inferential Analysis need to be reported in a lab report or scientific study.
Descriptives:
o Measures of central tendencies and variability.
o Helps the reader get a feel for the data, identify any
important characteristics and identify any errors.
What you can report depends on the type of data you have!
o Descriptives
Categorical = frequency table
Interval (continuous) = Mean, SD, correlations
o Inferential Statistics: Otherwise called “planned analysis” Includes t-tests, ANOVA, Regression. Inferences refers to hypothesis testing rather than describing the data (descriptive).
Frequency Distributions:
Primary method for describing categorical data
(A) Table: 75% Pakeha 10% Maori 6% Pacific Nations 4% Asian 5% other
(B) Bar Graphs (common) and Pie Charts (not common) in
SPSS
Note: Jamovi will only produce a histogram or table more complicated graphs need to be made in excel or powerpoint.
Descriptive Statistics with Interval Data:
> (4) types of central tendency
APA report __ and ___
Include median or mode when…
Central Tendency (does the data cluster in the middle?)
- Mean: the average score is calculated by sum of total
scores divided by the number of individuals. - Median: the score that divides the group in half (1/2
fall above and below this value). - Mode: the cost common score
- Max and Min: scores in data set.
> APA only need to report mean and standard
deviations.
Can report median or mode if the data is skewed.
Skewed Data: assumption of normal distribution is not met:
-Mean is not central in the data set, skewed to the left
or right.
- Left Skew= positive
- Right Skew = negative
Extreme left skew example:
Where mode is 0. At the very end of the scale. o Mode= 0 o Median= 6 o Mean= 10 *not close indicates skew
Normal Distribution approx. o Mean= 2.42 o Median= 2.40 o Mode= 2.40 *Close together indicates normal distribution.
Analysis of Variance (ANOVA):
IV must be…
DV must be….
Analysis of Variance (ANOVA):
- The IV’s of an ANOVA must be categorical
(dichotomous) - The DV’s must be continuous (i.e. interval or ratio)
For example, IV (treatment=1, control=0) DV (memory
for stimuli)
Can ANOVA’s be Performed with Non-Experimental Data?
Yes. In contrast to the wiidespread assumption that non-experimental data can ONLY be analysed with correlations or regressions.
IF you have categorical IV[s] (dichotomous coded into 0 and 1), you can run an ANOVA with non-experimental data if the DV is continuos. For example, survey data is non-experimental and provides us with interval continuous data.
The reverse is also true, you can also conduct an associations analysis on experimental data.
*There is nothing mathematically stopping you from
using either analysis but there is a tradition which
experimental data being analyzed with mean
differences!
i.e they fall under the same general linear model
Correlations measure…
must be both ___ variables or __ and __
assumptions of…
- A correlation is a measure of the degree to which two
variables covary. - Both variables must be continuous data! (i.e. interval or
ratio).
i.e. met the assumption that data needs to be normally distributed - The correlation coefficient (Pearson r correlation) varies
from:
o -1.00 (high negative correlation)
o 0.00 (no-correlation)
o +1.00 (high positive correlation) - The correlation coefficient is very enlightening about
the extent to which X and Y covary.
Interpreting Results of Correlation (3):
We look at three things-
- Direction:
Positive: Increases in X leads to increases in Y.
Negative: Increases in X leads to decreases in Y. - Strength:
What is the strength of the relationship between X and
Y?
Stronger relationships are closer to +1.00 or -1.00! - Significance:
Is the correlation significant?
Is the p-value less than .05?
Note: Correlations and Regressions are interrelated analysis with similar mathematical equations. However, they have distinctives differences that are important to grasp!
Correlation Matrix is a….
A table summarises a series of correlations between several variables. Each cell in the table shows the correlation between two variables.
Note: 2 variables = 1 correlation coefficients
Note: 4 variables = each combination of 2 variables will produce a correlation coefficient i.e. 2
components of a scatterplot
o Regression line: Reflects the slope of the association
between X and Y (e.g. .538, .001 or -.53 visually
depicted).
o Regression lines:
Are an estimate.
Are a straight line.
Slope of the regression line indicates whether the
association between X and Y is +, - or null.
o Each Point (or diamond) reflects each subject’s set of
scores/observations.
Darker shaded diamonds reflect multiple participants
with the same pattern of scores.
Lighter diamonds reflect single participants set of
scores.
o Grey area around the regression line reflects the
amount of variability between actual score and
estimate. Smaller in the middle of the trend line
where the data is clustered and we can be more
confident in our estimate.
o Null Correlations: visually are reflected as a flat
regression line.
Generally, we do NOT predict null correlations.
They sneak up on us when we predict positive
correlation.
They may be disappointing, but they are still useful.
They remind us of a hard fact about life that we do not
always get what we expect.
o Labeling the Axis:
For correlations the labeling of the axis is arbitrary, it
does not matter which variable is placed on each axis.
In contrast, for regression the predictor variable[s] (IV)
have to go on the x-axis and outcome variable goes on
the y-axis.
Important Fact:
> a zero-order correlation and basic linear regression
will produce the same __ and the persons r will equal
to….
A zero-order correlation and a simple linear
regression analysis will produce the same p-value
and the r = the beta weight.
Residuals:
o the error (residual) is the difference between the
predicted y value and the actual y value!
o the vertical distance along the y-axis between the
actual data point and the predicted data point.
Correlations is a measure of…
> uses ___ variables or ___
> correlation coefficient ranges between __ and __
> Assumption
- A correlation is a measure of the degree to which two
variables covary. - Both variables must be continuous data! (i.e. interval or
ratio).
i.e. met the assumption that data needs to be normally distributed - The correlation coefficient (Pearson r correlation) varies
from:
o -1.00 (high negative correlation)
o 0.00 (no-correlation)
o +1.00 (high positive correlation) - The correlation coefficient is very enlightening about
the extent to which X and Y covary.
A zero order (raw) correlation is:
- A basic correlation which measures the association
between exactly two variables and non-directional (i.e.
double headed arrows). - There is no “to and from”
A multiple regression is:
- Regression has three or more variables.
- Multiple regression is sometimes called multiple
correlations. - There can be multiple predictors (IV) and only ONE
outcome (DV). - Regressions tell us:
o how well the group of IV’s predict the DV
o AND how well each IV independently predicts the DV.
You report ___ for a correlation but ___ for regressions:
• Correlations are reported as: r (218) = -.533, p < .001
- Pearsons R and P-value and sometimes degrees of
freedom.
• A linear regression is reported as beta weight, R2 and
p-value.
• B, Constant and SD are kept separate and are used
for graphing.
Write-up example: “The variable of rumination predicted subjective happiness in a linear regression analysis, and it was found to be a statistically significant negative predictor, beta = -.53, R2 = .28, p = .001. As expected, rumination was negatively and significantly predictive of reports of subjective happiness.”