Test 3: Making sense of data and regressions Flashcards by Melanie Turner

Making Sense of your Data:

Descriptives and Inferential available across (4) types of data.

Both Descriptives and Inferential Analysis need to be reported in a lab report or scientific study.

Descriptives:
o Measures of central tendencies and variability.
o Helps the reader get a feel for the data, identify any
important characteristics and identify any errors.

What you can report depends on the type of data you have!

o Descriptives
 Categorical = frequency table
 Interval (continuous) = Mean, SD, correlations

o Inferential Statistics:
 Otherwise called “planned analysis”
 Includes t-tests, ANOVA, Regression.
 Inferences refers to hypothesis testing rather than 
   describing the data (descriptive).

How well did you know this?

Not at all

Perfectly

Frequency Distributions:

Primary method for describing categorical data

(A)	Table:
		75% Pakeha
		10% Maori
		6% Pacific Nations
		4% Asian
		5% other

(B) Bar Graphs (common) and Pie Charts (not common) in
SPSS

Note: Jamovi will only produce a histogram or table more complicated graphs need to be made in excel or powerpoint.

How well did you know this?

Not at all

Perfectly

Descriptive Statistics with Interval Data:

> (4) types of central tendency
APA report __ and ___
Include median or mode when…

Central Tendency (does the data cluster in the middle?)

Mean: the average score is calculated by sum of total
scores divided by the number of individuals.
Median: the score that divides the group in half (1/2
fall above and below this value).
Mode: the cost common score
Max and Min: scores in data set.

> APA only need to report mean and standard
deviations.
Can report median or mode if the data is skewed.

Skewed Data: assumption of normal distribution is not met:
-Mean is not central in the data set, skewed to the left
or right.
- Left Skew= positive
- Right Skew = negative

Extreme left skew example:

Where mode is 0. At the very end of the scale. 
o Mode= 0
o Median= 6
o Mean= 10 
*not close indicates skew

Normal Distribution approx.
o Mean= 2.42
o Median= 2.40
o Mode= 2.40
*Close together indicates normal distribution.

How well did you know this?

Not at all

Perfectly

Analysis of Variance (ANOVA):
IV must be…
DV must be….

Analysis of Variance (ANOVA):

The IV’s of an ANOVA must be categorical
(dichotomous)
The DV’s must be continuous (i.e. interval or ratio)
For example, IV (treatment=1, control=0) DV (memory
for stimuli)

How well did you know this?

Not at all

Perfectly

Can ANOVA’s be Performed with Non-Experimental Data?

Yes. In contrast to the wiidespread assumption that non-experimental data can ONLY be analysed with correlations or regressions.

IF you have categorical IV[s] (dichotomous coded into 0 and 1), you can run an ANOVA with non-experimental data if the DV is continuos. For example, survey data is non-experimental and provides us with interval continuous data.

The reverse is also true, you can also conduct an associations analysis on experimental data.

*There is nothing mathematically stopping you from
using either analysis but there is a tradition which
experimental data being analyzed with mean
differences!

i.e they fall under the same general linear model

How well did you know this?

Not at all

Perfectly

Correlations measure…
must be both ___ variables or __ and __
assumptions of…

A correlation is a measure of the degree to which two
variables covary.
Both variables must be continuous data! (i.e. interval or
ratio).
i.e. met the assumption that data needs to be normally distributed
The correlation coefficient (Pearson r correlation) varies
from:
o -1.00 (high negative correlation)
o 0.00 (no-correlation)
o +1.00 (high positive correlation)
The correlation coefficient is very enlightening about
the extent to which X and Y covary.

How well did you know this?

Not at all

Perfectly

Interpreting Results of Correlation (3):

We look at three things-

Direction:
 Positive: Increases in X leads to increases in Y.
 Negative: Increases in X leads to decreases in Y.
Strength:
 What is the strength of the relationship between X and
Y?
 Stronger relationships are closer to +1.00 or -1.00!
Significance:
 Is the correlation significant?
 Is the p-value less than .05?

Note: Correlations and Regressions are interrelated analysis with similar mathematical equations. However, they have distinctives differences that are important to grasp!

How well did you know this?

Not at all

Perfectly

Correlation Matrix is a….

A table summarises a series of correlations between several variables. Each cell in the table shows the correlation between two variables.

Note: 2 variables = 1 correlation coefficients
Note: 4 variables = each combination of 2 variables will produce a correlation coefficient i.e. 2

How well did you know this?

Not at all

Perfectly

components of a scatterplot

o Regression line: Reflects the slope of the association
between X and Y (e.g. .538, .001 or -.53 visually
depicted).

o Regression lines:
 Are an estimate.
 Are a straight line.
 Slope of the regression line indicates whether the
association between X and Y is +, - or null.

o Each Point (or diamond) reflects each subject’s set of
scores/observations.
 Darker shaded diamonds reflect multiple participants
with the same pattern of scores.
 Lighter diamonds reflect single participants set of
scores.

o Grey area around the regression line reflects the
amount of variability between actual score and
estimate. Smaller in the middle of the trend line
where the data is clustered and we can be more
confident in our estimate.
o Null Correlations: visually are reflected as a flat
regression line.
 Generally, we do NOT predict null correlations.
 They sneak up on us when we predict positive
correlation.
 They may be disappointing, but they are still useful.
They remind us of a hard fact about life that we do not
always get what we expect.

o Labeling the Axis:
 For correlations the labeling of the axis is arbitrary, it
does not matter which variable is placed on each axis.
 In contrast, for regression the predictor variable[s] (IV)
have to go on the x-axis and outcome variable goes on
the y-axis.

How well did you know this?

Not at all

Perfectly

Important Fact:

> a zero-order correlation and basic linear regression
will produce the same __ and the persons r will equal
to….

 A zero-order correlation and a simple linear
regression analysis will produce the same p-value
and the r = the beta weight.

How well did you know this?

Not at all

Perfectly

Residuals:

o the error (residual) is the difference between the
predicted y value and the actual y value!
o the vertical distance along the y-axis between the
actual data point and the predicted data point.

How well did you know this?

Not at all

Perfectly

Correlations is a measure of…
> uses ___ variables or ___
> correlation coefficient ranges between __ and __
> Assumption

A correlation is a measure of the degree to which two
variables covary.
Both variables must be continuous data! (i.e. interval or
ratio).
i.e. met the assumption that data needs to be normally distributed
The correlation coefficient (Pearson r correlation) varies
from:
o -1.00 (high negative correlation)
o 0.00 (no-correlation)
o +1.00 (high positive correlation)
The correlation coefficient is very enlightening about
the extent to which X and Y covary.

How well did you know this?

Not at all

Perfectly

A zero order (raw) correlation is:

A basic correlation which measures the association
between exactly two variables and non-directional (i.e.
double headed arrows).
There is no “to and from”

How well did you know this?

Not at all

Perfectly

A multiple regression is:

Regression has three or more variables.
Multiple regression is sometimes called multiple
correlations.
There can be multiple predictors (IV) and only ONE
outcome (DV).
Regressions tell us:
o how well the group of IV’s predict the DV
o AND how well each IV independently predicts the DV.

How well did you know this?

Not at all

Perfectly

You report ___ for a correlation but ___ for regressions:

• Correlations are reported as: r (218) = -.533, p < .001
- Pearsons R and P-value and sometimes degrees of
freedom.
• A linear regression is reported as beta weight, R2 and
p-value.
• B, Constant and SD are kept separate and are used
for graphing.

Write-up example: “The variable of rumination predicted subjective happiness in a linear regression analysis, and it was found to be a statistically significant negative predictor, beta = -.53, R2 = .28, p = .001. As expected, rumination was negatively and significantly predictive of reports of subjective happiness.”

How well did you know this?

Not at all

Perfectly

The slope of a regression equation is __ and the constant is ___ (these are calculated to reduce…)

B and y-intercept are calculated to reduce the size of the residual.

What (3) components of the regression equation is needed to form a graph?

(A) b: the unstandardized beta weight
(B) the constant: y-intercept
(C) The Mean and SD of the predictor variable which
from the descriptives (X).

Can we claim causality from a correlation?

A correlation can indicate direction for causality but has no evidence to claim a causal relationship. We can only discuss covariance between two (or more) variables.

***not when we have a concurrent data set! if we have a
longitudinal data set or experimental data then we can
claim about causaility.

Regressions:what do we and do we not report according to APA standards?

Do NOT report in APA:

• B (unstandardized estimate)
• Constant (y-intercept)
• R = the overall correlation of all predictor variables with
outcome variable.
 Hard to use and understand so people opt. for using
the R2.

Do report:

• R2 = the amount of variance explained by our (Y) i.e. DV
by our IV (x).
• The beta weight ()
• p-value = whether correlation is statistically significant.

Note: the other variables are useful for graphing but are not reported in APA format (i.e. the
constant, B, SD, Mean).

Number of Variables in a linear vs. a multiple regression?

Linear:
 1x IV and 1x DV
 Most basic form of regression analysis.

Multiple:
 2+ IV and 1x DV
 multiple regression or otherwise called a multiple
correlation.
 correlation between a group of predictor variables and
a single outcome variable.

R is a ….

 R or “multiple R” is like the Pearson r statistic, but it’s an
overall correlation of all of the predictors on a single DV
i.e. for a linear regression it’s technically 1x R but in a
multiple regression there is an R for each IV!
> R does not equal B

Why do we report R2?
ranges between…
can be…
can never be…

 R2 indicates the amount of variance in the DV that can
be jointly explained by our IV[s] (group of predictors).

o Ranges between 0-100% of variance explained.
o Can NEVER be negative or above 100.
o CAN be 0.

What is Conhen’d D used for? what are the cut offs for acceptability?

Used to compare the obtained beta-weight to (size and direction of correlation) to determine the size of the effect.

Small: 0.02
Medium: 0.13
Large: 0.26

If only a few of our IV’s are statistically significant in predicting variance in the DV can we conclude that the others are not useful in predicting the DV?

No. We cannot conclude that only 2 of 5 IV’s are significant! Looking at the raw correlation data we can see that two other predictor variables are moderately significant, but their effect is masked by the two stronger predictors.

i.e. significant in raw correlations but insignificant in multiple regression (Non-judging and acting on awareness).

Why? Because all of our IV’s are correlated with one another the two “losing” IV’s were not able to explain a significant amount of new variance in the DV, above the two “winners”

What is the multicollinearity effect?

When some of our IV[s] in our group of predictor variables are significantly correlated with one another (IV-IV). This can cause a problem if the correlation between our predictor variables is excessively correlated with one another i.e. within the range of +.80 and +1.00. This can cause stronger predictor variables to mask the effects of weaker predictor variables, indicating they have no effect when they do-just not above that of the "winner" variables. The multicollinearity effect can cause us to misestimate variance in the DV as predicted by our IV[s]. If it’s present, then we may need to remove one of the excessively correlated IV[s] or join them together to form a new item. e.g. in our example, the highest correlation was .539, which is not within the excessive range (+.80 and +1.00). Note: can be + or –

Can you conduct a longitudinal regression by comparing FOH1 to Hap2? > ____ > Why do we need to do this? > then we conduct a ___

No. 1. You need to residualise happiness i.e. calculate DV test-retest reliability. i.e. Hap1-Hap2 (test-restest reliability -stable variance in Hap overtime). Why do we need to do this? To determine the stable variability in the DV. This then allows us to focus on the CHNAGE of variance in the DV that we are more interested in and can be explained by our IV. The results of the R2, tell us how much variance in Hap2 can be explained by Hap1. 2. Once this is calculated we want to focus on the residual of Hap2- the unexplained variance in variable at time 2 not explained by the same variable at an earlier time point i.e. we partition Happiness at time 2 into explained and unexplained variance. 3. Then we need to conduct a residual heirarchal regression to determine if FoH T1 predicts significant change in subjective happiness over time (either an increase or a decrease).

Can there be a significant relationship between IV-DV at a concurrent point in time and not over time i.e. longitudinal? but...

Yes. FOH predicted Happiness at a concurrent point in time but not significantly overtime. but... There has to be a significant concurrent relationship in order for there to be a possible longitudinal relationship!

What are the (4) Possible Longnitudinal relationships you can find? > (2) examples > Why is knowing the direction of the longitudinal relationship important?

a. Non-significant relationship found over time (directional) b. X predicts Y, but Y does not predict x (unidirectional) c. Y predicts X, but X does not predict Y (Unidirectional) d. X and Y predict each other overtime (bidirectional) e.g. example 1 (a.): we found FOH and Happiness did not predict one another over time (FOH1- Hap2 and reverse order Hap1-FOH2). example 2(b.): we found a unidirectional longitudinal relationship where FOH predicted Depressive symptoms overtime but depressive symptoms did not predict FOh overtime. Why is knowing the direction of a longitudinal relationship important? Because it provides us with targets for intervention. i.e. that it will be more effective to decrease fear of happiness in depressed individuals to reduce their depressive symptoms overtime.

> B is meant to be interpreted with ___ but.... > B is better for __ > Beta weight is better for ___

“Unstandardized Estimate” (B) and the Standard Error are meant to be interoperated together. o This estimate is harder to interpret because of this; the majority of people are not able to look at this value confidently interpret the strength of the correlation between two variables. o Is better than Beta weight when computing statistical analysis and graphs. ``` Standardized Estimate (- beta weight): o Easier to interpret because it is standardized and thus, can be read like a correlation which ranges between -1.00 and 1.00, with stronger correlations being closer to -1.00 or 1.00. ```