Metod och Analys II Flashcards
Descriptive statistics – three important parts
- Frequency distribution
- Central tendency
- Variability
Descriptive statistics – three purposes
- Determining how many people got each score
- Providing information on the standing of a score relative to all other scores
- Graphically summarizing the set of scores
What could be 4 purposes of Frequency distribution?
- It is a record of the number of people with each score (or in each category) off the variable
- It allows examination of the full distribution at a “glance”
- Ideally, this will allow the reader to get a basic understanding of the data without being overwhelmed by all the raw scores
- It provides a visual assessment of central tendency and variability
What is two important factors when it comes to frequency distribution?
- There should be a listing of each possible score and the frequency occurrence
- As a check, the sum of the frequencies should be equal to n (the sample size)
There is the characteristics of frequency distribution shapes, describe them.
- Modality: the number of humps in a distribution
- Skewness: is a measure of whether the distribution is symmetrical or not
- Kurtosis: characterizes the relative peaked-ness/flatness of a distribution compared to the normal distribution
What is Normal Distribution?
- Can be described as the bell-shaped curve
- The majority of scores lie around the center of the distribution
- Symmetrical curve
What is left (negative) skewed distribution?
- Frequent scores are clustered at the higher end & tail points towards the lower negative scores
- Not symmetrical curve
What is right (positive) skewed distribution?
- Frequent scores are clustered at the lower end & tail points towards the higher or more positive scores
- Not symmetrical curve
What is Leptokurtic distribution?
- The curve is symmetrical, similar to a normal distribution
- But the center peak is much higher
- That is, frequent scores are near the mean
What is Platykurtic distribution?
- The curve is symmetrical, similar to a normal distribution
- But the frequency of most of the values are the same
- As a result, the curve is very flat, or plateau-lake
What happens when data are not normally distributed?
- Data that are positively skewed (many scores are low) may cause the mean score to be artificially inflated
Resulting in the mean pushed to higher a higher score - Data that are negatively skewed (many scores are high) might lead to an artificially deflated mean
Resulting in the mean pushed to a lower score - Leptokurtic distributions (high peak) may offer little variation in the data
Resulting in risk to not detect result - Platykurtic distributions (low peak) may offer too much variation in the data
Resulting in risk to have too high results - IF normal distribution has been compromised, we may have less confidence in the outcome of parametric tests
How do you measure normal distribution in SPSS?
- You have to check for histograms
- If skewness and kurtosis values are -/+ 1 range, we can assume that distribution is normal – strict criteria
- If skewness and kurtosis values are -/+ 2 range, we can assume distribution is normal – reasonable criteria
- When assessing statistical normal distribution – we use Kolmogorov-Smirnov test if the N is larger than 50
we use the Shapiro-Wilk test if the N is smaller than 50
What can we do if the distribution happens to not be normal?
- Check for outliers
- Transform data
- See textbook 57-61
What is Central Tendency?
- The goal of central tendency is to describe the average score on a variable for a distribution (eg sample or population)
- Ideally, this will be a single value, this will be an estimate of the middle or typical score in the distribution
Which three common measures is there to measure central tendency?
- Mean – problable the measure most frequently thought od as the average
- Median – the middle score merely as a function of the total number of scores in the distribution
- Mode – the most frequently occurring score in distribution
In which 3 ways are distribution shapes and central tendency measures correlated?
- In a perfectly normal distribution, the mean, median and mode are the same value
Mean = median = mode - In a positively (right) skewed distribution the mean is bigger than the median and the meadian is bigger than the mode
Mean > median > mode - In a negatively (left) skewed distribution the mean is smaller than the median and the median is smaller than the mode
Mean < median < mode
What is variability?
- Variability refers to how spread out the scores in the distribution are
- The mean is good for representing the typical score of a distribution, but the mean alone does not completely describe the distribution
- For example – two different distribution both has the sample size n = 1000, and each has the mean M = 100
But we still know nothing how the scores are spread out
What is variability?
Which tree ways of measuring variability are there?
- Range
- Interquartile range
- Standard deviation (most frequently used)
What is Standard Deviation?
- A deviation score is merely the difference between individual score (Xi) and the mean of the distribution (e.g. M)
- Deviation score = (Xi-M)
- We can think of the standard deviation as an average deviation score
- For example – we would expect smaller deviation scores, on average, in a distribution that has less variability (spread in the scores)
What is Standard Deviation?
What is the Standard deviation in a normal distributed sample?
- 68% falls within 1 standard deviation from the mean
- 95% falls within 2 standard deviations from the mean
- 99,7% falls within 3 standard deviations from the mean
How can inferential statistics be described?
In opposite to descriptive statistics we no longer try to describe our sample, we now try to imply/inference the statistics on the population
What is a direct vs an indirect approach when it comes to hypothesis testing?
Direct approach
* Conduct the study in the entire population
* Determine if the hypothesis is supported
* Is typically not feasible or even possible
Indirect approach
* Obtain a sample from the population
* Compute statistics in the sample (e.g. mean)
* Infer relations in population from the sample
There are 2 different types of hypotheses, these are?
- Scientific hypothesis
- Statistical hypothesis
- Null hypothesis
- Alternative hypothesis
What is the Scientific hypothesis?
This is what the researcher expects to find
Eg
* A new type of therapy will be more effective at reducing depressive symptoms that the old type
* Depression is related to low life satisfaction
Describe the two different types of Statistical hypotheses
Null hypothesis
* Symolized: H0
* The hypothesis of “no” effect
Ex. If testing for mean differences between two groups, H0 would specify no difference is present (no effect)
Alternative hypothesis
* Symolized: H1
* Hypothesis of “effect”
Ex. Testing for mean differences between two groups, H1 must specify all other outcomes other than 0
What is the p value?
- We assume that the null hypothesis is true (e.g. no effect)
- We fit a statistical model to our data that represents the alternative hypothesis
- We calculate probability of getting that model if the null hypothesis were true
- If the probability is very small (p < 0.05) we conclude that the model fits the data well – we reject the null hypothesis and gain confidence in the alternative hypothesis
- In other words, we conclude that the likelihood of getting our findings by chance is less than 5%
- As it is the null that is tested, process often referred to as “null hypothesis testing”
In reality H0 is either true or not true. Thus, there are four possible outcomes of statistical inference – which are these?
- The null hypothesis is correctly retained
The null hypothesis is true and not rejected - Type I error (α) – false alarm
The null hypothesis is true but rejected - Type II error (β) – missing the effect
The null hypothesis is false and not rejected - Correct rejection of the null hypothesis (1 – β e.g. power)
The null hypothesis is false and rejected
What is one-tailed vs. two-tailed tests?
- One-tailed hypothesis has a specific directional prediction
(e.g. patients depression scores will decrease after undergoing therapy) - Two-tailed hypothesis are non-directional predictions
(e.g. there will be a difference among males and females in their life satisfaction level) - In this course – we will only be using the two-tailed hypothesis and two-tailed tests
What is effect size – and what are the 3 types called?
Effect size is the actual magnitude of the difference between groups or the magnitude of the association between variables
- Cohen´s d (can exceed 1)
Used when the mean difference is tested
*Small cohen´s d is <0.25
Medium cohen´s d is 0.25 - 0.4
Large cohen´s d is 0.4 - ∞ *
- Pearson´s r (ranges from 0-1)
Used when correlation is tested - Eta-square
Used when variance is tested
In inferential statistics we have two different types, what are they called?
- Parametric tests
Is used when the outcome variable is continuous
(and will be the focus in this course) - Non-parametric tests
Is used when the outcome variable not is continuous
Which are the 4 basic assumptions that may be applied to most of the parametric tests?
- Dependent variable is normally distributed
- Homogeneity of variance
- Outcome variable is continuous (interval or ratio)
- Independence of observations
How can we know if the dependent variable is normally distributed?
- Check skewness and kurtosis values
- Check histogram
- Check normality by conducting some tests (Kolmogorov-smirnov or the Shapiro-Wilk test)
If the resulting p-value is under 0.05, then we have significant evidence that the sample is not normal, so we´re “hoping for a p-value of 0.05 or above.
What is homogeneity of variance?
- The variance of a variable should be stable throughout different levels of another variable
- Assume that you developed a delinquency prevention program. You randomly assigned a group of youth to prevention program, you compared the two groups on their engagement in delinquent behaviors (conducted independent sample t-test)
- The variance of delinquent behaviors in the prevention and control group should be roughly the same (Levene´s test)
What does independence of observations mean?
Data that is obtained from different Xi (participants) are independent
What does it mean that outcome variable is continuous?
- The outcome variable in parametric tests should be continuous
- In other words, the outcome variable should be measured on interval or ratio scale
What is t-tests?
T-tests are used to compare the means of two groups on a given variable
Two different types of t-test:
* Related t-test
* Independent t-test
What is related t-test?
- Examines difference in mean dependent variable scores across two within-group conditions (independent variable), measured across a single group
- Each (and every) participant in group 1 can be paired with a participant in group 2
- Participants can be matched/paired with themselves
- Matching is on a consistent basis (same rule for matching is applied to each pair)
- The purpose of using paired samples design is to reduce data variability (to reduce error in the outcome variable)
What is the t-value?
- Given that the obtained t-value is smaller than the critical t-value, retain the null hypothesis
- Simply, the results tell us that there is no apparent reason to believe that (for example husbands and wives) differ in their ratings
- Given that the obtained t-value is larger than the critical t-value, reject the null hypothesis
- Simply, the results tell us that the two groups differed significantly.