Week 1: The Descriptive Stats of Outcomes - How is the Data Distributed and How can we Assess the Distribution Flashcards by Gitanjali Sharma

What distribution is needed for parametric tests?

A normal distribution

How well did you know this?

Not at all

Perfectly

The normal distribution curve is also referred as the

bell curve

How well did you know this?

Not at all

Perfectly

Normal distribution is symmetrical meaning

This means that the distribution curve can be divided in the middle to produce two equal halves

How well did you know this?

Not at all

Perfectly

The bell curve can be described using two parameters called (2)

Mean (central tendency)
Standard deviation (dispersion)

How well did you know this?

Not at all

Perfectly

μ is

mean

How well did you know this?

Not at all

Perfectly

σ is

standard deviation

How well did you know this?

Not at all

Perfectly

Diagram shows:

e.g., If we move 1σ to the right then it contains 34.1% of the valeues

How well did you know this?

Not at all

Perfectly

Many statistical tests (parametric) cannot be used if the data are not

normally distributed

How well did you know this?

Not at all

Perfectly

The mean is the sum of

scores divided by the number of scores

How well did you know this?

Not at all

Perfectly

Mean is a good measure of

central tendency for roughly symmetric distributions

How well did you know this?

Not at all

Perfectly

The mean can be a misleading measure of central tendency in skewed distributions as

it can be greatly influenced by scores in tail e.g., extreme values

How well did you know this?

Not at all

Perfectly

Aside from the mean, what are the 2 other measured of central tendency? - (2)

Median
Mode

How well did you know this?

Not at all

Perfectly

The median is where (2)

the middle score when scores are ordered.

the middle of a distribution: half the scores are above the median and half are below the median.

How well did you know this?

Not at all

Perfectly

The median is relatively unaffected by … and can be used with… (2)

extreme scores or skewed distribution
can be used with ordinal, interval and ratio data.

How well did you know this?

Not at all

Perfectly

The mode is the most

frequently occurring score in a distribution, a score that actually occurred

How well did you know this?

Not at all

Perfectly

The mode is the only measure of central tendency that can be used with

with nominal data

How well did you know this?

Not at all

Perfectly

The mode is greatly subject to

sample fluctuations and is therefore not recommended to be used as the only measure of central tendency

How well did you know this?

Not at all

Perfectly

Many distributions have more than one

mode

How well did you know this?

Not at all

Perfectly

The mean, median and mode are identical in

symmetric distribtions

How well did you know this?

Not at all

Perfectly

For positive skewed distribution, the

mean is greater than the median, which is greater than the mode

How well did you know this?

Not at all

Perfectly

For negative skewed distribution

usually the mode is greater than the median, which is greater than the mean

How well did you know this?

Not at all

Perfectly

Kurtosis in greek means

bulge or bend in greek

How well did you know this?

Not at all

Perfectly

What is central tendency?

the tendency for the values of a random variable to cluster round its mean, mode, or median.

How well did you know this?

Not at all

Perfectly

Diagram of normal kurotsis, positive excess kurotsis (leptokurtic) and negative excess kurotsis (platykurtic)

How well did you know this?

Not at all

Perfectly

What does lepto mean?

prefix meaning thin

What is platy

a prefix meaning flat or wide (think Plateau)

Tests of normality (2)

Kolmogorov-Smirnov test Shapiro-Wilks test

Tests of normality is dependent on

sample size

If you got a massive sample size then you will find these normality tests often come out as .... even when your data visually can look - (2)

significant normally disttibuted

If you got a small sample size, then the normality tests may look non-siginificant, even when data is normally distributed, due to

lack of power in the test to detect a significant effect

There is no hard or fast rule for

determining whether data is normally distributed or not

Plot your data because this helps inform on what decisions you want to make with respect to

normality

Even if normality test is sig and data looks visually normally distributed then still do

parametric tests

A frequency distribution or a histogram is a plot of how many times

each score occurs

2 main ways a distribution can deviate from the normal - (2)

1. Lack of symmetry (called skew) 2. Pointyness (called kurotsis)

In a normal distribution the values of skew and kurtosis are 0 meaning...

tails of the distribution are as they should be

Is age nominal or continous?

Continous

Is gender continous or nominal?

Nominal

Is height continous or nominal?

Continous

Which of the following best describes a confounding variable? A. A variable that affects the outcome beingmeasured as well as, or instead of, theindependent variable B. A variable that is manipulated by theexperimenter C. A variable that has been measured using an unreliable scale D.A variable that is made up only of categories

If a test is valid , what does it mean? A.The test measures what it claims to measure. B. The test will give consistent results. (Reliability) C.The test has internal consistency (measure for correlations between different items on same test = see if it measures same construct) D. Test measures a useful construct or variable = test can measure something useful but not valid

A variable that measures the effect that manipulating another variable has is known as: A. DV B. A confounding variable C. Predictor variable D. IV

The discrepancy between the numbers used to represent something that we are trying tomeasure and the actual value of what we are measuring is called: A. Measurement error B. Reliability C. The 'fit' of the model D. Variance

A frequency distribution in which low scores are most frequent (i.e. bars on the graph arehighest on the left hand side) is said to be: A. Positively skewed B. Leptokurtic = distribution with positive kurotsis C. Platykurtic = negative kruotsis D. Negatively skewed = frequent scores

Which of the following is designed to compensate for practice effects? A. Counterbalancing B. Repeated measures design = practice effects issue in repeated measures C. Giving a participants a break between tasks = this compenstates for bordeom effects D. A control condition = provides reference point

Variation due to variables that has not been measured is A. Unsystematic variation B. Homogenous variance = assumption variance each population is equal C. Systematic variation = due to exp manpulation D. Residual variance = confirms how well regression line constructed fit to actual data

Purpose of control condition is to A. Allow inferences about cause B. Control for participants' characteristics = randomisation C. Show up relationship between predictor variables D. Rule out tertium quid

A Allow inferences of cause

If the scores on a test have a mean of 26 and a standard deviation of 4, what is the z-score for a score of 18? A. -2 B. 11 C. 2 D. -1.41

A (18-26) = -8/4 = -2

The standard deviation is the square root of the A. Variance B. Coefficient of determination = r squared C. Sum of squares = sum of squared deviances D. Range = largest = smallest

Complete the following sentence:A large standard deviation (relative to the value of the mean itself A. Indicate data points are distant from man (i.e., poor fit of data) B. Indicate the data points are close to mean C. Indicate that mean is good fit of data D. Indicate that you should analyse data with parameteric

The probability is p = 0.80 that a patient with a certain disease will besuccessfully treated with a new medical treatment. Suppose that thetreatment is used on 40 patients. What is the "expected value" of thenumber of patients who are successfully treated? A. 32 B. 20 C. 8 D. 40

A = 80% of 40 is 32 (0.80 * 40)

Imagine a test for a certain disease. Suppose the probability of a positive test result is .95if someone has the disease, but the probability is only .08 that someone has the disease if his or her test result was positive. A patient receives a positive test, and the doctor tellshim that he is very likely to have the disease. The doctor's response is: A. confusion of intervse B. Law of small numbers C. Gambler's fallacy D. Correct, because test is 95% accurate when someone has the disease = incorrect as doctor based assumption on incorrect inverse proability

Which of these variables would be considered not to have met the assumptions ofparametric tests based on the normal distribution? (Hint many statistical tests rely on data measured on interval level) A. gender B. Reaction time C. Temp D. Heart rate

The test statistics we use to assess a linear model are usually _______ based on thenormal distribution (Hint: These tests are used when all of the assumptions of a normal distribution havebeen met A. Parametric B. Non-parametric C. Robust D. Not

Which of the following is not an assumption of the general linear model? A. Dependence B. Addictivity C. Linearity D. Normally distributed residuals

A = independence is an assumption of parametric and not dependence

Looking at the table below, which of the following statements is the most accurate? Hint: The further the values of skewness and kurtosis are from zero, the more likely it is that thedata are not normally distributed A. For the number of hours spent practicsing , there is not an issue of kruotsis B. For level of msucial skill, data are heavily negatively skewed C. For number of hours spent practicsing there is an issue of kruotsis D. For the number of hours spent practicsing, the data is fairly positively skewed

A - correct B. Incorrect as value of skewnessis –0.079, which suggests that the dataare only very slightly negatively skewedbecause the value is close to zero C. Incorrect as value of kurtosis is0.098, which is fairly close to zero,suggesting that kurtosis was not aproblem for these data D. Incorrect as value of skewnessfor the number of hours spent practisingis –0.322, suggesting that the data areonly slightly negatively skewed

Diagram of skewness

In SPSS, output if value of skewness is between -1 and 1 then

all good

In SPSS, output if value is below -1 or above 1 then

data is skewed

In SPSS, output if value of skewness is below -1 then

negatively skewed

In SPSS, output if value of skewness is above 1 then

positively skewed

Diagram of lepto kurotic, platykurtic and mesokurtic( normal)

What does kurotsis tell you?

how much our data lies around the ends/tails of our histogram which helps us to identify when outliers may be present in the data.

A distribution with positive kurtosis, so much of the data is in the tails, will be

pointy or leptokurtic

A distribution with negative kurtosis, so the data lies more in the middle, will be more

be more sloped or platykurtic

Kurtosis is the sharpness of the

peak of a frequency-distribution curve

If our Kurtosis value is 0, then the result is a

normal distribution

If kurotsis value in SPSS between -2 and 2 then

all good! = normal distribution

If kurotsis value in SPSS less than -2 then

platykurtic

If kurotsis value is greater than 2 in SPSS then

leptokurtic

Are we good for skewness and kurotsis in this output SPSS?

Good because both the skewness is between -1 and 1 and kurtosis values are between -2 and 2.

Are we good for skewness and kurotsis in this output SPSS?A

Bad because although the skewness is between 1 and -1, we have a problem with kurtosis with a value of 2.68 which is larger than 2 and -2

Correlational research doesn’t allow to rule out the presence of a

third variable = confounding variable e.g, we find that drownings and ice cream sales are correlated, we conclude that ice cream sales cause drowning. Are we correct? Maybe due to the weather

The tertium quid is a variable that you may not have considered that could be

influencing your results e.g., ice cream and drowning session

How to rule out tertium quid? - (2)

Use of RCTs. Randomized Controlled Trials allow to even out the confounding variables between the groups

Correlation does not mean

causation

To infer causation,

we need to actively manipulate the variable we are interested in, and control against a group (condition) where this variable was not manipulated.

Correlation does not mean causation as according to Andy

causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results”

Aside from checking of kurotsis and skewness assumptions in data also check if it has

linearity or less commonly additivity

Additivity refrs to the combined

effect of many predictors

What does this diagram show in terms of additivty /linearity? - (5)

There is a a linear effect when the data increases at a steady rate like the graph on the left. Your cost increases steadily as the number of chocolate bars increases. The graph on the right shows a non-linear effect when there is not this steady increase rather there is a sharp change in your data. So you might feel ok if you eat a few chocolate bars but after that the risk of you having a stomach ache increases quite rapidly the more chocolates you eat. This effect is super important to check or your statistical analysis will be wrong even if your other assumptions are correct because a lot of statistical tests are based on linear models.

Discrepnacy between measurement and actual value in population is .. and not..

measurement error and NOT variance

Measurement error can happen across all psychological experiments from.. to ..

recording instrument failure to human error

What are the 2 types of measurement errors? - (2)

1. Systematic 2. Random

What is systematic measurement error?

: predictable, typically constant or proportional to the true value and always affect the results of an experiment in a predictable direction

Example of systematic measurement error

for example if I know I am 5ft2 and when I go to get measured I’m told I’m 6ft this is a systematic error and pretty identifiable - these usually happen when there is a problem with your experiment

What is random measurement error?

measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken.

Example of random measurement error

for example my height is 5ft2 when I measure it in the morning but its 5ft when I measure myself in the evening. This is because my measurements were taken at different times so there would be some variability – for those of you who believe you shrink throughout the day.

What is variance?

Average squared deviation of each number from its mean.

Variability is an inherent part of

things being measured and of the measurement process

Diagram of variance formula

In central limit theorem - (2)

states that the sampling distribution of the mean approaches a normal distribution, as the sample size increases. This fact holds especially true for sample sizes over 30. Therefore, as a sample size increases, the sample mean and standard deviation will be closer in value to the population mean μ and standard deviation σ .

What does histogram look at? - (2)

Frequency of scores Look at distribution of data, skewness, kurotsis

What does boxplot look at? - (2)

To identify outliers Shows median rather than mean (good for non-normally distributed data)

What do line graphs are?

simply bar charts with lines instead of bars

Bar charts are a good way to display

display means (and standard errors)

What do scatterplot illustrates? - (2)

a relationship between two variables, e.g. correlation or regression Only use regression lines for regressions!

What are matrix scatterplots? - (2)

Particular kind of scatterplot that can be used instead of the 3-D scatterplot clearer to read

Using data provided how would you summarise skew? A. The data has an issue with positive skew B.The data has an issue with negative skew C.The data is normally distributed

What is the median number of bullets shot at a partner by females?

67.00

What descriptive statistics does the red arrow represents? A. Inter quartile range B. Median C. Mean D. Range

What is the mean of males and females SD? - (2)

Males M = 27.29 Females SD = 12.20

What is the respective standard error of mean for femals and males?

3.26 & 3.42

Week 1: The Descriptive Stats of Outcomes - How is the Data Distributed and How can we Assess the Distribution Flashcards

(103 cards)