All topics Flashcards
What makes a good theory?
- Falsifiability
- Parsimony (elegance of theory – simplest explanation is best)
- coherence
- correspondence with reality (more likely to have a high pay-off)
Reliability of Measures
- Test-retest – administering a test twice
- inter-rater reliability – extent to which 2 raters (judges) obtain the same result using the same measure
- Split-half reliability – a test is split in 2 and the scores from each half are compared with eachother
Validity of measures:
- Face validity – the extent to which an assessment measures the variable/construct in purports to measure
- Content validity –
- Construct validity – 2 types
– convergent – when 2 tests that purport to measure the same thing are highly related - divergent (discriminant) – tests that measure different but related constructs should not be highly correlated (eg. IQ for spatial v reading)
Research method:
- Experimental
- quasi-experimental – (manipulation of IV but cannot randomly assign participants) eg. male v female, smoker v non-smoker. Don’t talk about cause and effect
- Correlational
What are the different kinds of research design?
- Between subjects – different participants assigned to each condition
- within subjects or repeated measures design – each participant exposed to both conditions
- matched pairs – different participants assigned to each group but matched on particular characteristics
What is difference between descriptive and inferential statistics?
- Descriptive statistics – summarise data eg. mean, median, mode, variance, SD,
- Inferential statistics – help us test hypotheses. Allow us to make generalisation’s about populations of interest based on samples eg. correlation, regression, ANOVA
Define:
- Mean
- median
- Mode
- Reliability
- Validity
Mean – average
Median – the middle score in a distribution
Mode – score that occurs the most often
Reliability v validity
Reliability – consistency of a measure
Validity – accuracy of a measure (measures what it purports to measure)
How do you find the median with an even number of scores?
- add two middle scores and divde by 2 – ie. Average them
Describe different scales of measurement
- Nominal – consists of categories with no underlying scale or order. Eg. religious affiliation – Christian, buddhist, hindu, muslim etc.
- Ordinal – Consists of categories that are ORDERED, but don’t know what the distance is between ranks (ie. The distance between scale values is unknown). Eg. police ranks.
- Interval – Meaningful distances between points on the scale eg. termperature. Interval scales lack true zero point (zero is the absence of something, you can still feel temperature at zero)
- Ratio – All the characteristics of an interval scale plus a true zero point – weight and length are examples
Discrete v continuous variable
Discrete – Takes on whole numbers
Continuous – can take any fraction on non-whole number
Shape of Distribution
- normal – bell shaped
- positively skewed – tail pointing to the right
- negatively skewed – tail pointing to the left
Research ethics (1q)
- informed consent
- voluntary participation
- passive deception (don’t tell whole truth but don’t tell lies)
- active deception (delierately mislead the participant with information)
- withdrawal anytime
Central Tendency (3 q’s
the tendency for the values of a random variable to cluster round it’s mean, mode, or median
mean/median/mode – which would be the best to use?
- mean is affected by outliers and can be skewed
- median – less affected by outliers and skewed data
- mode (most frequent) – normally used for categorical data – problematic when 2 categories have highest value
- not a good mark when most common data is far away from the rest of the data in the set.
- when data is skewed – median is best representative of central location of data
Type of variable and best measure of central tendency?
Nominal - Mode
Ordinal - Median
Interval/Ratio (not skewed) - Mean
Interval/Ratio (skewed)- Median
Population v sample
Populaiton – all the individuals of interest
- population values are called parameters
Sample – the individuals selected from the population used in study
- sample values are called statistic
Sampling error
the discrepancy between population parameter and sample statistic
What is the relationship between sample statistics and population parameters?
A sample is a part or portion of a population
- parameter is a measure of describing whole population
- statistic is a measure of a sample/portion of a target population
What is standard devitation?
a measure of variability – how spread out are the scores?
Variability 3 (qs) What does SS denote?
- sum of squares = sum of squared deviation from the mean
Variability?
- how much scores vary from each other and from the mean
Variance
- the average of the squared differences from the mean
Standard deviation?
- numerical depiction of variability
- under a normal distribution 68% of scores fall within +_ 1 SD from the mean (95.44 within 2SD, 99.72 within 3SD)
Define and describe the relationship between variance and SD?
- As variance increases so does standard deviation
- low variability in data set = low standard deviation
What are the degrees of freedom?
In a sample N-1 scores are free to vary. For example if have sample of 3 scores and we know first 2 scores and the mean we know what the 3rd score must be. So 2 scores are free to vary but third is not, thus N-1.
Why do we adjust degrees of freedom in a sample?
We do n-1 because of sampling error in sample that may not be representative of the population
What is a z-score?
- a standardised score (transformation of distribution of raw scores into z-score distribution)
- Z-score will always have mean of 0 and SD of 1
- Z-score is expressed in standard deviation units
What do you need to know to calculate z-score?
X – individual score
M () – sample mean
SD () – sample standard deviation
If you convert all the raw scores to z-scores what do you get?
- mean = 0
- SD = 1
- distribution is same shape as before ALWAYS (ie. Still normal/skewed etc.)
Benefit? – allows you to compare scores from different distributions
What does a z-score of +1 mean? What does z-score of -2 mean?
- The score is one SD above mean
- the score is 2 SD below the mean
Why do we hypothesis test?
- To get around heuristics (mental shortcuts – availability/representative) and human biases (hindsight/cognitive)
What is a theory?
- a ‘model’ that describes how certain phenomenon work
What is a hypothesis?
- A statement derived from a theory or theories about the relationship between variables or differences between groups
What is the null hypothesis and alternative hypothesis?
null - states there is no effect
alternative - states there is a difference
Error Types
Type I – Reject the null hypothesis when it is TRUE (false positive) (alpha - 5% chance)
Type II – Accept the null hypothesis when it is FALSE (false negative) (beta – 20% chance)
Type III – (only applicable to a directional hypothesis; H1) – predicting the inverse of a
relationship
What does p
In NHST p < .05 means that there is less than a 5% chance of obtaining the results (or more extreme) if the null hypothesis were true
What factors affect the p-value?
- Size of mean differences – Increases probability of rejecting the null
- Variability of scores – decreases probability of rejecting null
- sample size – larger sample size increases probability of rejecting the null
What is correlation?
- when 2 variables are related to eachother a correlation exists
- measures relationship between 2 variables
- correlation is a prediction NOT causation
What is the correlation coefficient? (r)
- numerical index of strength and direction of relationship
- expressed as number between 0 and 1
- direction can be positive or negative (as one goes up the other goes up OR as one goes up the other goes down)
- numbers closer to 1 indicate stronger relationship
What does positive/negative/no correlation look like on scatter plot?
- positive – slopes up from left to right
- negative – slopes down from left to right
- no correlation – no pattern (ill defined scatter)
Perfect correlation
- perfect linear relationship – every change in x is accompanied by a corresponding change in y variable
What is small/medium/large correlation? (coefficient (r))
- small – 0.1 to 0.3
- medium – 0.3 to 0.5
- large – 0.5 to 1.0
What is the coefficient of determination?
- is the correlation coefficient squared
- the percentage variation in one variable that can be predicted based on the other variable
- as the magnitude of the correlation increases, our ability to predict one variable based on knowledge of the other variable increases
Calculate the coefficient of determination from r = .70 and what does the result mean?
=r squared
= .70 x .70
= .49
Means that variable X can account for 49% of the variation in variable Y
The higher the correlation coefficient the higher the coefficient of determination will be
What is the 3rd variable problem?
- As correlation is a prediction not a causation, the observed relationship may be accounted for by some other third variable eg. size of foot might be strongly correlated to IQ in children, but the 3rd variable – age may account for the relationship
What are the assumptions for correlations?
- Independence – each participant should participate only once in the research and should not influence the participation of others
- Normality – each variable should be normally distributed – ie. Data form a symmetrical bell-shaped curve about the mean. To assess normality we can look at Skewness and kurtosis
- Linearity – should be a linear (straight line) relationship between the variables. If the relationship is not linear it will not be adequately captured and summarised by Pearson’s r.
- Homoscedasticity – the error variance is assumed to be the same at all points along the linear relationship. That is the variability in one variable should be similar across all values of the other variable.
What is skewness?
- is a measure of symmetry of distribution
- when the skewness statistic is 0 the distribution is perfectly symmetrical
What is kurtosis?
- how peaked or flat is the distribution
- a kurtosis statistic of 0 (plue skewness statistic of 0) indicates distribution is normally distributed.
How to calculate normality?
- Use the Shapiro-Wilk test if N is lower than 50 – if p >.01 then null hypothesis is not rejected and there is no difference indicating distribution is normal (Kolmnogrov-smirnov test for N =50 +)
- calculate by dividing the skewness/kurtosis statistic by the standard error.. if falls in 3.29 then distribution is normal. If falls outside of this then not normal.
What is the difference between bivariate correlation and partial correlation?
Bivariate – used to measure the linear association between 2 continuous variables
Partial – used to measure linear association between 2 continous variables after controlling for a third (and fourth, fifth etc.) continuous variable
What is the t-statistic?
T = the actual difference between sample mean (from data) and the population mean (hypothesised from H0)/ estimate of standard error (estimate of standard distance between sample mean and population mean)
How to calculate probability in one sample t-test?
Is the population mean (in one sample t-test compare sample with a predetermined value (test value)
What is the definition of degrees of freedom?
- the number of scores that can vary given a constant mean
Eg. If you only have a set amount of money to pay 50 people in your company you could allow 49 of them to set their own salary but the unlucky last would have to get a very small salary or even pay to work in your company to make up for 49 that gave themselves high pay. - thus only 49 (N-1) could vary but one is fixed by total amount
What is the t-distribution table showing?
- The numbers are the values of t that separate the tail from the main body of the distribution.
How to find if result statistically significant from t-distribution table?
T-critical value is found in table, t-obtained value is calculated from results – if t-obtained is greater than value found in table – results are statistically significant.
two types of 2 sample t-test:
- Independent samples – aka independent groups, between groups, between subjects
- Paired Samples t-test – aka repeated measures, within samples, matched samples, dependent samples
Relationship between t-statistic, sample variance and statistical significance:
- When the variance increases, so does the standard error. Since the standard error occurs in the denominator of the t statistic, when the standard error increases, the value of the t decreases.
- when the t-statistic decreases less probability of getting a t-obatined greater than t-critical, decrease in p-value. So when variance increases p-value decreases.
independent sample t-test
- uses a between groups design between sample mean 1 and sample mean 2.
- Actual or observed difference divided by the estimated standard error. And
- when n for sample 1 is not equal to n of sample 2 have to pool the variance!
- To pool the variance – sum of squared deviations from the mean in sample + SS (sample 2)/ degrees of freedom sample 1 + degrees of freedom sample 2.
Independent Samples t-test assumptions:
- Scale of measurement – DV should be interval or ratio data
- Independence – Each participant should participate only once in the research, and should not influence the participation of others
- Normality – each group of scores should be approximately normally distributed
- Homogeneity of variance – There should be an approximately equal amount of variability in each set of scores
How does adjusting the degrees of freedom make it harder to detect a significant result?
- looking at the t-distribution table – the smaller the df = larger t-critical value. This means you would need a larger t-obtained value to get a statistically significant result. So harder to get a statistically significant result with smaller df.
What is Cohen’s D?
- A measure of effect size
- Measures the extent to which the 2 sample distributions overlap – measured in standard deviations. If cohen’s D was zero there would be a complete overlap between 2 populations
How to measure homogeneity?
- Using Levene’s test.
- For this course if Levene’s test p < .001 then have violated the assumption of homogeneity – the result is significant meaning there is a difference in variance between groups.
- If p > .001 then is not significant and there is no difference in variance between groups.
PAIRED SAMPLES t TEST
- Measure participants before and after treatment
- Compare the performance (DV) of males and females (IV) that are matched on different critera
How to report results APA style:
participants had a higher recall score in the images condition (M = 26.00, SD = 4.71) than the no images condition (M = 18.00, SD = 4.22), t(9) = 3.24, p = .010 (two-tailed).
Notion of pooled variance
- when sample size of first sample is different relative to the second sample, to eliminate the disparity the variance is pooled. (when n1 is not equal to n2 - need to pool variance)
Difference between one sample and independent t-test
- the difference between the sample mean and the population mean expected by chance. The actual difference versus the expected or the actual difference versus the error.
Independent t-test - actual difference v difference expected by error (between 2 sample means)
Why is it called a one-way ANOVA?
- because there is one independent variable which might have 2 or more groups or levels.
Eg. IV – temperature had 3 groups/levels at 50 degrees, 70 degrees or 90 degrees
Why use ANOVA?
- to compare more than two treatments
- Anova = Analysis of variance – all about variance
What is sample variance?
Sample variance is the sum of squares divided by N-1 (so = first value minus the mean)squared), do this for each score and sum them together to get sum of sqaures
What is the key to understanding ANOVA?
- understanding between group variability versus within group variability
What does the alternative hypothesis state in ANOVA?
- that one or more pairs of treatment means will be different from each other
What is the total variability made up of in ANOVA?
- Between treatment variance which is due to:
- Treatment effects
- chance - Within treatment variance which is due to:
- chance/individual differences
What is the F value and how do you calculate it?
F is a ratio of variability. = Variance between treatments/variance within treatments
F= 0 + error (individual differences + other)/ error (individual differences + other) = 1
If null hypothesis is true than F will be 1 or close to 1. (if treatment effect will not be 0)
How to calculate total number of participants and number of groups from df?
- Total – go to total df then +1 = total number of participants
- look at df for between groups and +1 = total number of groups
What are the assumptions for One-way between groups ANOVA?
- Scale of measurement – DV should be interval or ratio data
- Independence – each participant only participates once
- Normality – each group of scores should be approximately normally distributed
- Homogeneity of Variance – there should be approximately equal amount of variability in each set of scores
Why do we have to use a post-hoc test with ANOVA and what is most commonly used in psychology?
- ANOVA can tell us there is a difference but doesn’t tell us where the difference lies.
- Tukey post-hoc test is most often used
When would we use a 2-way ANOVA?
When we have more than one independent variable in the analysis
Example of 2 way ANOVA
Example used is whether the effects of puppet-type are the same for binge eaters and non-binge eaters. – 3 different kinds of puppet – cookie monster, counts and warrens
- each of these are either – binge eaters or non-binge eaters
= 3 x2 factorial anova (6 factorial combinations)
- DV is how many cookies eaten
Can you use a 1-way ANOVA when you have 2 independent variables?
No, must use 2-way ANOVA when have 2 factors
When does a significant interaction effect occur in 2 way ANOVA?
- An interaction occurs when the effect of one factor on the dependent variable is not the same at all levels of the other. You can tell this by looking at the graph: if the lines are not parallel there is an interaction effect
Assumptions of 2way ANOVA
- Independence
- Population distributions are normal
- Homogeneity of variance (s2)
What is partial ETA?
A measure of effect size
What is effect size?
- A measure of the magnitude of a treatement effect
- independent of sample size
- can be standardised – measured in standard deviations. (how many standard deviations are the means apart)
- referred to as d or cohen’s d
How is effect size measured in a correlation?
= r (or Pearson’s r) effect size is the strength of the association between 2 variables - ranges between -1.0 and +1.0 Small r between -.10 and +.10 Medium: between .10 to .40 Large: r > .40
What is r ²?
- is the proportion of variance explained – the proportion of variance in one variable that can be explained by the other variable. r² is like an apple with a bite taken out of it, if you take a little bite it accounts for a small percentage of variability in the apple, if you take many bites and leave a core the percentage of variability explained is much larger.
what are the conventions of r²
- 01 < r² < 0.09
0. 09
what are the conventions of r²
0.01 < r² < 0.09 - small
0.09 < r² < 0.25 - medium
r² < 0.25 - large
Large effect is 25% of the variability in the dependent variable can be explained by the independent variable. What’s the chance of a single independent variable accounting for 100% of the variability in the DV? – not high as normally there are multiple factors accounting for the variability.
What is Cohen’s f?
It is commonly used in power analysis and is the ratio of variance explained to variance unexplained.
- similar to n² and R²
- What sample size will you need to detect a statistically significant effect?
What is power?
- the probability of correctly detecting a statistically significant effect if one exists.
How do you calculate power?
Power = ES x N x alpha
What are the error types?
Type I – Reject the null hypothesis when the null hypothesis is true (False positive)
Type II – Accept the null hypothesis when it is false (False negative)
Type III – predicting the inverse of a TRUE relationship
What is the type I error rate set at ?
Is set at the alpha level of p < .05 – meaning the probability of a type I error is 5%. (so type I error is also alpha)
So what is power?
1 – beta. Where beta is type II error rate (accept the null hypothesis when false)
Cohen has set power at .80 – an 80% power of detecting an effect if it exists.
What is the relationship between Type I error, Type II error and statistical power?
When we increase α, we decrease β and increase our statistical power. This is because when we increase alpha the threshold for significance is moved further from the end of the tail, increasing the area of beta (under the normal distribution) and increasing statisitical power.
- When beta decreases power increases.
- When increase alpa, beta gets smaller and power gets larger
What is power determined by?
- sample size
- effect size
- P-level
What will result in an increase in power? *
- large effect size (large differences between means and small SD)
- Large sample size (as standard error decreases and sample mean closer to pop mean)
- High alpha level (.05 or .10) (the larger the alpha the bigger the rejection region, the more chance to reject the null hypothesis, high alpha – high power)
- One-tail test
- Within subjects design (as decreases variability – any variability in the DV is not due to individual differences)
What decreases power?*
- Small Effect size (small associations or small differences between means and large SD)
- Small N (less likely to be representative of the population)
- Low alpha level (.01 or .001) (the small the rejection region – few scores are going to fall in it = less power)
- two-tail test
- Between subject design (more variability due to individual differences – error is obscuring the treatment effect making it harder to detect)
What is the relationship between alpha and z-score?
If we increase alpha (from .05 to .10) we reduce z-score
What happens to power with directional v non-directional hypothesis?
Critical rejection region is placed in both ends of the tail. This means there is .025 pprobability in each tail (smaller alpha level), with a smaller alpha level, get larger z-score, with larger z-score get less power.
If you are completing a research topic and know that a particular effect size is small what can you do to increase power?
- One-tailed hypothesis
- Increase sample size to large
- Have a within subjects design (as limits variability or error between groups)
CHI SQAURE – Non-parametric tests for nominal data
What is meant by non-parametric?
- Distribution-free tests
- Are not normally distributed – may be positively/negatively skewed.
- may display kurtosis – Mesokurtic (normal distribution)
- Leptokurtic (long tailed kangaroo)
- Platokurtic (flatter like a platypus)
Why are parametric tests preferred?
- in general the same number of observations are more likely to lead to the rejection of a false null hypothesis
- they have more statistical power
What is nominal data used for?
- Labels used for CATEGORIES of data
- No meaningful underlying scale
- eg. religious affiliation
Assumptions for Chi-square?
χ2
- no assumptions of homogeneity or variance
- no assumptions of population distribution
What is chi-sqaure dealing with?
- Frequencies
- Do no have typical variance, SD, M etc as dealing with frequencies not numerical scores
What is chi-sqaure test used for?
- Goodness of fit (comparing frequencies of one nominal variable to theoretical expectations)
- Independence (comparing frequencies of one nominal variable for different values of second nominal variable)
Describe what is meant by goodness of fit?
- How do the frequencies that we have observed fit the frequencies that we expected?
- If fit is good then χ2 will be small. We want there to be a big difference between observed and expected frequencies in order to reject the null hypothesis.
- We want a bad fit to reject the null hypothesis
What are the 2 ways goodness of fit can be evaluated?
- No preference or variation from category of nominal variable to the next – same frequency value in every category, researchers would calculate every frequency by hand
- No difference from comparison population – research examines the literature and sees what one should expect for the frequencies of each category for the single nominal variables as specified by the null hypothesis.
What is meant by the term observed frequency?
The actual data obtained. Observed frequencies are always whole numbers, as we are dealing with individuals and not part of an individuals
What is meant by the term expected frequency?
- Calculated based on proportions – can be fractions
- those based on the null hypothesis
Chi-sqaure formula?
- Chi square is about comparing observed frequencies with expected frequenices
- under the null hypothesis expected frequencies are the same across all categories.
- bad fit means probably statistically significant – good fit probably not
How to tell if chi-sqaure is significant?
If chi-sqaure obtained (24.30) is greater than chi-sqaure CV (9.49) then it is significant!
(same as in t-test if t-obtained is greater than t-critical reject the null and state is significant)
- the result above is stating there is a bad fit between observed frequencies and expected frequencies
How would you report results APA style?
χ2(4, N = 100) =24.30, p < .05, phi (symbol)= .49
Tell me more about the standardised residual and it’s contribution to significance?
- there is a standardised residual for each category – looking at this can tell us which category contributed to the chi square being significant
- If the absolute value for R > 2 then that category contributes to overall significance
- in example above category C and HD are > 2, meaning these categories have contributed to overall significance
Chi-square test for contingencies (independence)
- same formula as goodness of fit but have a Second nominal variable to consider
- eg – gender and grades – there would be 2 equivalent variations on null hypothesis
– there is no relationship between grade and gender
– there is no grade difference for gender
How to calculate degrees of freedom for chi-sqaured?
(R-1)(c-1)
= rows (how many categories for gender = 2)
= Columns (how many categories for columns – grades = 5)
= (2-1)(5-1)
= 4
What to use for effect size for chi-sqaured?
- for a 2x2 use the pho-coefficient (2x2 means the first nominal variable has 2 categories and the second nominal variable has 2 categories)
- for larger tables such as 2x3, 3x3 use Cramer’s V (modification on phi) (3x3 means the first nominal variable has 3 categories and the second nominal variable has 3 categories)
- for df for cramer’s v to calculate df use the SMALLER of rows-1 or columns -1
What are the conventions for reporting phi?
- small (0.10), medium (0.30), large (0.50)
How to report the results APA style?
χ2(4, N = 100) =13.31, p = .01 (two-tailed), phi symbol = .37
Chi square v Phi?
- Chi-sqaure tells you if it is significant or not
= Phi and cramer’s V tells you the size of the relationship or the size of the difference.