Statistic definitions Flashcards

1
Q

Cronbach α (Alpha)

A

Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items are as a group. It is considered to be a measure of scale reliability.

A Cronbach score can be used to assess scale
reliability- with α > .70 indicative of satisfactory
reliability.
Sometimes to improve the Cronbach α, items from
a subscale are removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Null hypothesis: H0

A

In scientific research, the null hypothesis is the claim that no relationship exists between two sets of data or variables being analysed. The null hypothesis is that any experimentally observed difference is due to chance alone, and an underlying causative relationship does not exist, hence the term “null”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hypothesis: H1

A

A hypothesis is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with the available scientific theories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Z-score

A

Z-score is a statistical measurement that describes a value’s relationship to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point’s score is identical to the mean score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cohens d

A

Cohens d is a standardized effect size for measuring the difference between two group means. Frequently, you’ll use it when you’re comparing a treatment to a control group. It can be a suitable effect size to include with t-test and ANOVA results. The field of psychology frequently uses Cohens d

Effect sizes

  • 0.2 (small)
  • 0.5 (medium)
  • 0.8 (large)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Confidence Interval

A

A Confidence Interval is a range of values we are fairly sure our true value lies in. A confidence interval is the mean of your estimate plus and minus the variation in that estimate. This is the range of values you expect your estimate to fall between if you redo your test, within a certain level of confidence. Confidence, in statistics, is another way to describe probability. overall your confidence interval is the range you expect to find the r value to be 95% out of 100 samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bivariate

A

involving or depending on two variates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pearson correlation coefficient ( r )

A

What is r? Put simply, it is Pearson’s correlation coefficient (r). Or in other words: R is a correlation coefficient that measures the strength of the relationship between two variables, as well as the direction on a scatterplot. The value of r is always between a negative one and a positive one (-1 and a +1)

.50 high correlation/large effect
.30 moderate correlation/moderate effect
.10 low correlation/low effect

it can also be used to quantify the difference in means between two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Correlation Coefficients

A
  • Bounded between -1 and 1.
    A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution. There are multiple Correlation Coefficients which include Covariance, Pearson’s Spearman’s, and Polychoric Correlation Coefficient.
  • It’s worth bearing in mind that r is not measured on a linear scale, so an effect with r = 0.6 isn’t twice as big as one with r = 0.3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Heteroscedasticity

A

Heteroskedasticity (or heteroscedasticity) happens when the standard deviations of a predicted variable, monitored over different values of an independent variable or as related to prior time periods, are non-constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Normal distribution

A

a function that represents the distribution of many variables as a symmetrical bell-shaped graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression to the mean:

A

In statistics, regression toward the mean (also called reversion to the mean, and reversion to mediocrity) is a concept that refers to the fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

MCAR – Missing completely at random

A

We can test if the data is missing completely at random with a statistical test. It is called Little’s missing completely at random (MCAR) test- if the p value from the test is <.05, the data are not MCAR and listwise deletion of these cases may lead to biased/inaccurate conclusions. Imputation is justified when Little’s MCAR is < .05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Categorical variable

A

In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
Examples:
- Gender
- Blood type
- Language you speak
- Country you were born etc
My words: A categorical variable has limits due to the variable having a fixed set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variance:

A

The variance is the average distance of scores from the mean. It is the sum of squares divided by the number of scores. It tells us about how widely dispersed scores are around the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Covariance:

A

Covariance is a measure of the relationship between two random variables and to what extent, they change together. Or we can say, in other words, it defines the changes between the two variables, such that change in one variable is equal to change in another variable.

17
Q

b0: (B zero)

A

b0:
- b0 is the Y intercept:
My words: that part where the line starts on a Y axis

18
Q

b1:

A

b1:
- The slope of the line

19
Q

Type I error:

A
  • Supporting the alternate hypothesis when the null hypothesis is
    true.
  • Incorrectly rejecting the H0 (Null hypothesis)

My words: Rejecting the H0 when its true

20
Q

Type II error:

A
  • Not supporting the alternate hypothesis when the alternate
    hypothesis is true.
  • Failing to reject the the H0 (Null hypothesis)

My words: Not Rejecting the Ho when it actually is wrong.

21
Q

Simple regression

A
  • Similar to looking at relationship between 2 variables.
  • Looking at the predictive power of 1 variable
  • Predicting number of Instagram likes based on their followers is = Regression
  • Predicting weight based off height.
22
Q

Predictor variable and Predicted variable

Mnemonic Narrative technique - X = crossing arms. Y = asking Y do I

A

Predictor variable = Independent Variable = X
Predicted variable = Dependent Variable = Y

How I remember (side note: This is a mnemonic called ‘Narrative technique’)
X = Crossing your arms as you are independent a can predict what others will do
Y = Asking why (Y) do I depend on others, everyone seems to predicts what I’m going to do..

23
Q

Linear equation of Y’ = bX + c

Mnemonic Narrative = If we know Y (Why) the X is growing we may know how the Y may grow

A
  • Where Y’ is the predicted variable, X is the predictor variable, b is the slope of the line and c is the intercept (i.e., value of Y’ when X = 0).
    This formula can assist in predicting what the Y variable based on the X variables unstandardized coefficient.
    Overall, this equation can obtain the X variable unstandardized coefficient and constant ‘b’.
24
Q

standardized scores

A

A score is “standardized” when it is expressed as a deviation from its mean, divided by the SD. As such, a score of z = +1 means that it is 1 SD higher than the mean

standardized scores also assist in keeping the body of scientific research standardized. For example the standardized scores of a study comparing weight and height compared to another study in inches and pounds will have there findings/results written in standardized scores rather then reflecting imperial and metric findings.

25
Q

Multiple regression

A

Multiple regression is a statistical technique that can be used to analyse the relationship between a single dependent variable and several independent variables. The objective of multiple regression analysis is to use the independent variables whose values are known to predict the value of the single dependent value.

To have multiple regression you need at least 2

26
Q

Multiple Regression model - Y’ = b1X1 + b2X2 + … biXi + a

A

Y’ Y predicted by the weighted linear combination of Xsbi( partial regression coefficient of Xi ) predicted average unit change in Y for each unit change in Xi when other Xs are held constant a Y predicted when all Xs are zero.
- This formula is like the linear equation, the only difference is that this equation has other multiple independent variables attached to it. Given this you can predict the dependant variable based on either of the independent variables.
- You can also standardise this equation by implementing the Z scores

27
Q

Test of significance: F (p, N-p-1) =

A
  • With bigger scores and larger samples, you are more likely to get a significant finding (F)
  • Less predictors = more power
28
Q

Multicollinearity.

Mnenomic: Multiple dependent variables can predict me

A
  • A problem as highly correlated independent variables violates the assumption for multiple regression.
  • In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.

Multicollinearity (i.e., redundancy) refers to a very high degree of intercorrelation among predictors (Xs).

29
Q

Mahalanobis Distance

Mnenomic: Outlier detector

A

Mahalanobis distance is one of the most used indicators in statistical methods designed to detect outlying residual scores

30
Q

R2 and r. what are they

A

r: the correlation strength between X and Y.
R2 can add multiple variables and reflect the overall overlap in a multiple regression

31
Q

p value

A

The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.

if p value is less the 0.05 the finding is significant

if the p value is greater than 0.05 the finding is non significant

In the multiple regression analysis in SPPS, SIG. F change = pvalue

The P value is testing if the Hypothesis is significant.

32
Q

What is adjusted R2 in SPSS mean

A

This is a conservative adjusted R2 that is typically used when the sample size is small

33
Q

R2 and r effect sizes

A

.01 = Small effect 1%

.05 = Medium effect 5%

.25 = Large effect 25%

34
Q

Light bulb moment

A

In statistics different tests measure different thigs. For example the r value will measure the strength between 2 correlations, hence a test of regarding the variables. A p value is testing if the hypothesis is relevant or not. So let says p = 0.1 we can be confident that

35
Q

Listwise deletion

A

Only use data that is complete with all variables
Reduces sample size.
In listwise deletion a case is dropped from an analysis because it has a missing value in at least one of the specified variables. The analysis is only run on cases which have a complete set of data

36
Q

Casewise deletion

A

It involves removing all questionnaire answers from a respondent because responses to one or more questions are missing from that respondent
Reduces sample size

37
Q

Alpha = 0.5

A
  • The alpha level can give the probability of making a type I error.
  • If you reduce the alpha to 0.1 there is more of a chance you wont reject the Ho which could cause a type II error.
  • The significance level or alpha level is the probability of making the wrong decision when the null hypothesis is true. Alpha levels (sometimes just called “significance levels”) are used in hypothesis tests. Usually, these tests are run with an alpha level of the following
  • .05 (5%),
  • but other levels commonly used are .01 and .10
  • Cronbach α (Alpha) can be used to reject the Ho (Null Hypothesis)
  • You would need to set the criteria for Cronbach α (Alpha). Say at 0.5 which says that we want there to be less than 5% chance that the Ho is true
  • Cronbach α (Alpha) score of 0.5 computes to a Z-score of +1.96