PSYU2248 Design & Statistics Flashcards

1
Q

Define mean median mode

A

mean is the average, median is the middle, mode is the most frequent number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between an experimental design and non experimental design?

A

Experimental research has a variable that is being manipulated in order to determine its effect on the control group.

A non-experimental research design does not have any manipulation or control of any variables. Can not find a cause-effect relationship, methods of study are often correlational or case studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between correlation and regression?

A

Correlation and regression are inter-related analyses; regression is an extension of correlation.

Regression; predict Y from X
Correlation; relations between X and Y

Where Y is the IV
Where X is the DV

The most commonly used techniques for investigating the relationship between two quantitative variables are correlation and linear regression.

Correlation quantifies the strength of the linear relationship between a pair of variables

Regression expresses the relationship in the form of an equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do research?

A

Psychological scientists want to
Describe human behaviour (what is going on?)
Predict human behaviour (use information about one thig to make a decent guess at something else)
Explain human behaviour (understand why things are related)
Control human behaviour (make change)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the regression line predict?

A

A regression line predicts the score of Y for any given value of X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Regression Model?

A

A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the explanatory variables. It does this by essentially fitting a best-fit line and seeing how the data is dispersed around this line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the total sums of squares (TSS)

A

Total Sum of Squares (TSS): The TSS is the sum of all squared differences between the mean of a sample and the individual values in that sample.

The degree of variability is represented by the total sums of squares.

Explaining true variability between scores and why they exist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression line - what does if actual scores are closer to the predicted scores?

A

This means that the better the model predicts y, the less variability around the line there is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regression line - what does if actual scores are further away from the predicted scores?

A

This means the worse the model predicts y, the more variability around the line there is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a residual?

A

Difference between predicted Y and actual Y for any given value of X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the method of least squares?

A

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation. The most important application is in data fitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The line of best fit is also known as….

A

A regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the regression model (equation)?

A

Y (hat) = alpha + beta x
predicted Y (DV) = alpha is the intercept estimate point, beta steepness, flatness, slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe what each of the following formulas are looking for

1.Total Sum of Squares
2.Residual Sum of Squares
3.Regression or Model Sum of Squares

A

1.Total sum of squares measuring how close the data points are to the mean
2.Residual sum of squares measuring the variation around the regression line
3.Regression sum of squares the difference between the TSS and the RSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define p-value

A

The P value is defined as the

probability under the assumption of no effect or no difference (null hypothesis),

of obtaining a result equal to or more extreme than what was actually observed.

The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define F ratio

A

The F-ratio is the ratio of the between group variance to the within group variance. It can be compared to a critical F-ratio, which is determined by rejecting or accepting the null hypothesis, which determines whether or not there are no differences between groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the correlation coefficient?

A

Numerical measurement of some type of correlation (a statistical relationship between two variables).

The number between +1 and -1 calculated so as to represent the linear interdependence of two variables or sets of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When dealing with the relationship between two variables, we are concerned with……

A

Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between correlation and correlation coefficient?

A

The correlation is the relationship between two variables

The correlational coefficient is the measure of the degree or strength of this relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the Pearson correlation coefficient?

A

The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation.

It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What diagram is best for examining the relationship between two variables?

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The independent variable sits on which part of the axis?

A

On the x axis (horizontal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The dependent variable sits on which part of the axis

A

On the y axis (vertical)

24
Q

What is another name for a regression line?

A

Line of best fit.

25
Q

What is a regression line / line of best fit?

A

A regression line is an estimate of the line that describes the true, but unknown, linear relationship between the two variables.

26
Q

What is the regression line of y on x?

A

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).

27
Q

What does a correlational coefficient of 0 mean?

A

There is no relationship between the two variables.

28
Q

What does a correlational coefficient close to 1 mean?

A

There is a strong linear relationship between the variables.

29
Q

What does a correlational coefficient close to -1 mean?

A

There is a negative strong linear relationship between the variables.

30
Q

What is a linear relationship?

A

The line of best fit, or the regression line is straight.

31
Q

If the best fit line does not show a linear relationship (straight line) it is called a ____________ relationship?

A

Curvilinear relationship.

32
Q

What kind of research questions / research hypotheses lead us to regression (vs correlation)?

A

Research questions that ask for a prediction.

prediction = regression

Or if you were looking at a particular variable in a data set and the relationship in the results, it would be the dependent variable. Trying to understand variability in the DV. So if we have a single outcome DV question and many IV and we are looking at the relationship.

If we have two variables, and we are looking at a relationship this could be a correlation type question.

relationship=correlational

33
Q

What is the formula for

A

Y(hat)=a+betaX

34
Q

Regression equation Y(hat)=a+betaX what is a (where it comes from and what it represents)?

A

Predicted IV score when x = 0, y intercept when x = 0

35
Q

Regression equation Y(hat)=a+betaX what is beta (where it comes from and what it represents)?

A

Beta is the slope of the regression line
How much we would predict the DV to change per unit increase in the IV.
Positive or neg relationship

36
Q

How do we test statistical significance (F vs t)?

A

F is about the whole regression model, predicting the DV, can be simple or multiple IV

T is a single IV or a single predictor

37
Q

What is R-squared?

A

The proportion of the variation in the DV that is predictable from the IV

The EXPLAINED variance, how much variance in the DV we are explaining in our regression model.
Measure of effect size, how much variance in the DV is explained

38
Q

R-squared in linear regression. If you are given a result of lets say 0.327, this is the EXPLAINED variance, the proportion of the variation in the DV that is predicted from the IV.

How could we express this in a sentence? If 0.327 is the DV PWB(psychological well being) and the IV is internet addiction?

A

33% of the variation in PWB is explained from internet addiction.

That much variance of your DV (33%) is explained by your IV /s. Large effect.

39
Q

In a linear regression analysis, if you have a B (beta) result -0.613 what does this mean?

How could we express this in a sentence? DV PWB(psychological well being) and the IV is internet addiction?

A

The slope in the regression line, negative or downwards slope, negative effect.

As internet addition increases PWB decreases

For every one point increase in the IV (internet addiction) the DV would decrease by .613 units.

40
Q

In the linear regression analysis, what does the t value and significant (p) value tell us?

A

P value will tell us if we have a significant result.
So we have a statistically significant effect, we get this from the p value, that corresponds to the t statistic (degrees of freedom)
so the effect of internet addition is statistically significant, significantly in predicting PWB

41
Q

In the linear regression analysis, what does the F value and significant (p) value tell us?

A

Whether the regression model as a whole is significant.

Telling us the same thing as the t value.

value, that corresponds to the t statistic (degrees of freedom)
so the effect of internet addition is statistically significant, significantly in predicting PWB

42
Q

What are the 5 assumptions of residuals?

A

1 Independence of observations (residuals)
2 Normal distribution of residuals
3 Homoscedacity (AKA constant variance AKA homogeneity of variance)
4 Linearity
5 No collinearity (only applies to multiple regression)

43
Q

Why do we do normality on residuals?

A

The assumptions of normality in regression is that the residuals are normally distributed.

Caveat: This is what the actual regression model built around, the variables feed into the model, but there are multiple variables in a model, whereas the residuals are the more important bit as they are the bones of the regression model, they are the data points around the regression line.

44
Q

Reading the data output from the variables table, please identify the:
DV
IV
and
Interpret the pred variable scores
and
Resid variable scores

A

DV - support for redistribution
IV - political preference
Used to run the regression

The DV values you expect to get for that value of X - pred variable
Example - Based on our regression line if someone had a political preference score of 5 we predict their predict for redistribution would be 3.75 - predicted valuables

  • resid variable - number 7 political preference of 4 we would predict their support for redistribution is a score of 4.05 and their actual score is 4.25.

Take home message:
Not predicting the DV using scores on the IV, you will not perfectly predict anything, how well we predict is partly what the regression model is telling us. How well we are predicting scores in the DV based on scores on the IV, is actually what the regression model tells us that is also what the r2 tells us, its how much variance in the DV we’re actually predicting / explaining purely using scores on the IV. We never perfectly everything. How big that difference is for any individual person is what the residual is telling us. It tells us how big the discrepancy is between their predicted score of Y and their actual score of Y.

45
Q

Define dichotomous variable.

A

Dichotomous variables are nominal variables which have only two categories or levels. For example, if we were looking at gender, we would most probably categorize somebody as either “male” or “female”. This is an example of a dichotomous variable (and also a nominal variable).

46
Q

What is a point-biserial correlation? And what is it typically conducted between?

A

a numerical variable and a dichotomous variable

The Point-Biserial Correlation is a special case of the Pearson Correlation and is used when you want to measure the relationship between a continuous variable and a dichotomous variable, or one that has two values (i.e. male/female, yes/no, true/false).

47
Q

What is the difference between a paired t-test and an independent (unpaired) t-test?

A

A paired t-test is designed to compare the means of the same group or item under two separate scenarios. ie rating of the same restaurant before and after COVID, same restaurant.

An independent (unpaired) t-test compares the means of two independent or unrelated groups. In an unpaired t-test, the variance between groups is assumed to be equal. ie two different restaurants ratings of food quality

In a paired t-test, the variance is not assumed to be equal.

48
Q

When would you use a paired t-test?

A

When we are interested in the difference between two variables for the same subject.

49
Q

What are degrees of freedom, how is it calculated and what test does it commonly show up in?

A

Degrees of freedom refer to the maximum number of logically independent values, which may vary in a data sample. It’s calculated as the sample size minus the number of restrictions. Found in a t-test under the t-value.

50
Q

What is an ANOVA, and when is it used?

A

ANOVA, which stands for Analysis of Variance, is a statistical test used to analyse the difference between the means of more than two groups.

A one-way ANOVA uses one independent variable, while a two-way ANOVA uses two independent variables.

51
Q

What is the difference between a t-test and an ANOVA test? When to use them?

A

The t-test is a method that determines whether two populations are statistically different from each other, whereas ANOVA determines whether three or more populations are statistically different from each other.

52
Q

Can you use a t-test over an ANOVA?

A

One of the primary benefits of the ANOVA test is its ability to compare means across three or more groups simultaneously. Instead of conducting multiple t-tests for each pair of groups, ANOVA allows researchers to analyse the variations between all groups in one comprehensive test.

53
Q

Define the correlational coefficient

It can be positive negative or zero, what does each of those mean?

If there is a positive correlation coefficient, what does this mean and what is the disclaimer?

What does the size of the correlation coefficient say about the two variables?

A

The correlation coefficient quantifies the relationship between two variables

It can be positive, negative or zero (no relationship at all)

Just because two things are correlated does not mean that one caused the other

The size of the correlation coefficient tells us the strength of the relationship between the two variables

54
Q

What type of question would lead us down the path of looking at relationships between variables?

A

Correlation, where is there an association between the two variables

54
Q

What kind of questions or research hypothesis would lead us to do a regression vs a correlation type analysis?

A

If the research question asks for a prediction this leads to a regression analysis, if we’re trying to understand variability in the dependent variable, and we’re trying to use information on other variable or some other variables in order to predict or explain that, then we might have a regression type analysis. If we have a single outcome variable and a number of IV that could be a regression type question.

If the research question asks about a relationship, this leads to a correlational analysis, if we are looking at two variables and we are looking for an association or a relationship between them that could be a correlation type question.

55
Q

What is the regression formula and what does each character represent?

A

Ŷ=a+βx
* Ŷ is the DV
* a is the predicted DV score when x=0
* β is the
o slope, regression line, gradient
o how much we would expect the DV to change per unit increase in the IV
o positive or negative - Does the dependent variable increase as the independent variable increases? That would be a positive relationship. Or does the dependent variable decrease as the independent variable increases? That would be a negative relationship.
* x is the IV or explanatory variable

56
Q
A