SPSS Flashcards
What is a continuous variable?
Arising from measurements (e.g. height)
What is a discrete variable?
Arising from counting (e.g. number of books on a bookshelf)
What is a nominal/categorical variable?
Having no natural order
What is an ordinal variable?
Having natural order
What is simple random sampling?
NPS: Each member of the population has equal chance of being selected.
What is systematic sampling?
NPS: Every nth subject from a population list is chosen
What is stratified random sampling?
NPS: The population is split into groups of similar individuals from which a sample is drawn
What is disproportionate sampling?
NPS: If strata in population are of substantially unequal size
What is Cluster sampling?
NPS: Successive random sampling a series of units in a population.
What is convenience sampling?
PS: Samples are based on availability
What is quota sampling?
Researcher guides sampling process until participant quota is met (e.g. volunteers called for until equal quota of males/females is met)
What is purposive sampling?
Subjects are hand picked based on certain criteria
What is snowball sampling?
Used when desired characteristics are rare. Initial subjects refer others with similar characteristics
What happens to your accuracy if you quadruple your sample size?
It doubles
Name 7 types of Experimental designs
RCT, Blind study, Cross over design (each subject has own control but order of treatments is randomised), Factorial design (several factors compared at once), outcome variables, Quasi experimental design (Often happens when independent variable in question is an innate characteristic of the participants involved), single subject study
Name 9 types of observational designs
Retrospective, prospective, surveys and polls, observation, longitudinal cohort studies, case-controlled study, cross sectional study, case reports, questionnaires.
What is dichotomous survey questioning?
Two possible answers - yes/no/agree/disagree
What is likert scale in surveys?
3-5 categories of responses usually provided
What is visual analogue scale in survey?
Results measured along a continuum
Why can histograms be subjective?
Dependent on number of bins
What is the interquartile range?
Minimum = 1st quartile 0.25 = 2nd quartile Median = 3rd quartile 0.75= 4th quartile Maximum = 5th quartile
Which chart/graph best displays the interquartile range?
Boxplots
What type of data best suits bar chart?
Categorical variables
What is the standard deviation
How far away values deviate from the mean
What is the formula for degrees of freedom
n-1
What is the standard error?
How far the sample mean deviates from the population mean
What is a parameter?
A numerical characteristic of a population
What is a statistic?
A numerical characteristic of a sample (e.g. mean, SD)
What are the confidence intervals associated with a normal distribution?
68% of values within 1 SD of the mean
95% of values within 2 SD of the mean
99.7% of values within 3 SD of the mean
What is the central limit theorem?
The Central Limit Theorem states that the sampling distribution of the sampling means approaches a normal distribution as the sample size gets bigger- no matter what the shape of the population distribution
What is a Type 1 Error?
Null hypothesis is rejected when it is actually true (False positive)
What is the relationship between Type 1 Error and the P-Value?
The probability of making a type 1 error is precisely the significance level we set our p-Value at
What is a Type II Error?
Where we don’t reject the Null hypothesis and we should have
What is Power?
The probability of detecting an effect when there is indeed an affect
How can power be improved?
Decreasing effect size, decreasing variability, Increasing sample size, decreasing the significance threshold (but this can increase Type I Error)
What is a Parametric test?
Tests some parameter in your population
What is a Non-Parametric Test?
Looks at some comparison between groups, such as comparing the “ranks” of values instead of the values themselves.
What are the three Parametric test assumptions?
- Normality: Data have normal distribution
- Homogeneity of variances: Data from multiple groups have the same variance
- Independence: Data are independent
What does a p-Value of <0.05 for Levene’s test tell you?
That the Variances are not equal and a parametric test cannot be performed.
What does a p-Value of >0.05 for Levene’s test tell you?
That there is less than 5% chance that the equality in the variances occurred by chance.
What is the purpose of a t-test?
To compare the means between two independent groups on the same, continuous, dependent variable.
What is the NULL hypothesis for a t test?
That the difference between the two means is zero
What type of data do you need to run a t-test?
One independent categorical variable and one continuous, dependent variable
What is the non-parametric equivalent of the independent t-test?
Mann-Whitney U Test
What does ANOVA do
Measures the difference between means
What type of data does ANOVA require
One categorical, independent variable and one dependent continuous variable
What is the Null hypothesis for ANOVA?
That there is no difference in the means of the groups
What is the F value in ANOVA?
The variability between the means / variability within the sample. i.e. Is the variability between group means larger than the variability of the observations within the groups
What does a large F value signify
That the variance between the groups is more than the variance within the groups. A high F value means that your data does not well support your null hypothesis
What test do you use to determine more specific difference between groups?
Tuckey post-hoc analysis
What is the non-parametric equivalent of ANOVA
Kruskal-wallis Test
What is a residual
The difference between an observed response and the value predicted for the response by our model
How do you calculate a residual value?
Residual for an observed value is the difference between that variable and the mean
Residual degrees of freedom?
n-2
What is another name for a residual?
Prediction error
What does a high correlation coefficient signify
Likely association
What is the R value?
Pearson correlation coefficient
What is the R-squared value
Statistical measure of how close the data are to the fitted regression line. e.g. if R2 = .97, this means that 97% of the variance in the dependent variable can be attributed to the independent variable.
What is the Null hypothesis of regression
That the underlying slope equals zero
What does the P-Value signify in regression?
The probability of getting an association by chance when there is no association
What are the assumptions for Linear regression?
Independent observations
Linear association
Normal variability
Equal variances
What does an ANOVA regression p-value <0.05 tell us?
There is strong evidence against the null hypothesis of 0 slope
How do you calculate the sample size for an ANOVA from SPSS?
TOTAL df + 1
Can you use ANOVA for linear association?
The ANOVA ideas extend from comparing means to testing for linear association
Which test would you immediately think of if you saw the terms “relationship between” in the question
Correlation
Which test would you immediately think of if you were asked to compare means of two groups or one group and 2 variables?
t-test
Which test would you immediately think of if you were asked to compare the means of more than two groups or multiple variables?
ANOVA
When would you use a Welch’s ANOVA
When you have normally distributed data that violates the assumption of homogeneity of variance
What is the nonparametric equivalent of Pearson’s Correlation?
Spearmans Correlation or Chi Squared
What is the nonparametric equivalent of the dependent t-test
Wilcoxon Signed Rank Test
Which test would you use for categorical outcome?
Chi Squared
What test would you use for multiple variable comparison in two or more groups?
MANOVA
What are some limitations of Pearson Coefficient?
Presence of outliers
Linearity (if plot is curved)
Limited range of scores will limit generalisation
Does not imply cause
What are confounding variables?
“lurking” variables which may be influencing the two variables of interest
What type of test would you perform when you have a scale (response) and Nominal (predictor) variable?
ANOVA, Independent Samples t-test, Mann-Whitney U test , Kruskall Wallis test
What kind of test would you perform when you have a Nominal (response) and Nominal (predictor) variable?
Chi Squared
What kind of test would you perform when you have a scale (response) and scale (predictor) variable?
Regression (ANOVA F-test or Coefficient t-test), Pearson correlation t-test, Spearman correlation
What kind of test would you perform if you had a scale (response) variable and no predictor variable?
One Sample t-test or Paired t-test
What is the Null hypothesis for Chi Squared?
The distribution is the same across x groups
What is the purpose of Chi Square?
To compare the observed counts with the counts we would expect if the Null hypothesis was true. Comparing expected and observed values
What is inter-rater reliability?
The degree to which ratings given by different observers agree
What is intra-rater reliability?
The degree to which ratings given by the same observer on different occasions agree
How do you measure intra/inter-rater reliability, taking chance into account?
Cohens Kappa (k)
What does the kappa value tell you?
Measure of agreement: Percentage of times results agreed and this did not take place by chance
What is an acceptable k score?
0.4 and above
How do you measure if a scale is internally consistent?
Cronbach’s alpha (a)
What does a value of zero represent for Cronbach’s alpha?
Internal consistency reliability is very low and consistency cannot be assumed
What is the acceptable score for Cronbach’s alpha?
0.8
What is sampling error?
error in a statistical analysis arising from the unrepresentativeness of the sample taken.
What are three limitations of convenience sampling
- Possible bias
- Poor generalisability
- Potential for sampling error
Why is it crucial to discuss attrition?
Attrition of the original sample represents a potential threat of bias if those who drop out of the study are systematically different from those who remain in the study.
What is the b value in regression?
The b value is the gradient of the regression line. The b value (on the second line of SPSS) tells you “if the other variable is increased by one point, the result will go up by “b”)
What does Central Limit Theorem tell us about in statistical inference?
Since t tests and ANOVA are based on assuming the sample means have Normal distributions, this means that we can use these methods even if the data seem slightly skewed, particularly if the sample sizes are large.
What do you need to remember when describing a relationship from a scatterplot?
Strength of association, direction/shape (pos/neg) and linearity
Would you go ahead with further statistical testing if a scatterplot showed moderate relationship, but had a number of points that deviated from the line?
Yes, but may be necessary to try non-parametric test. Strength of association may be stronger than anticipated due to involvement of values that deviate from line.
Can you imply cause from an observational study?
No, it is very difficult
What is Sampling variability?
Sampling variability refers to the process whereby statistics, such as the sample mean, would give different results if the random sampling process was repeated. We thus need to account for sampling variability when making any conclusions from our data.
What does the Central Limit Theorem say about statistical inference?
The Central Limit Theorem says that the distribution of the sample mean is approximately Normal for sufficiently large samples. Since t tests and ANOVA are based on assuming the sample means have Normal distributions, this means that we can use these methods even if the data seem slightly skewed, particularly if the sample sizes are large.
Why might we prefer parametric over non parametric test?
If assumptions are satisfied then parametric tests are more powerful than their non- parametric counterparts (although the difference can be minor). Parametric tests also provide direct estimates for effects, including confidence intervals. Nonparametric tests come with their own assumptions too.
What is the least squares line?
A way of fitting the data with the line that minimises the sum of the squared residuals.