Research methods Flashcards
What is a fact in a scientific context
generally accepted reality based on objective inferences, verifiable by evidence that is published and accepted via peer review and replicated over time. Can still be open to scientific enquiry
Define the steps of the scientific method
1 - develop and define the problem (observation)
2 - formulate hypothesis
3 - gather data
4 - analyse and interpret results (test)
What are the two types of hypotheses
Null hypothesis - no relationship between variables
Alternative hypothesis - there is a relationship between anxiety
What are the 2 kinds of statistics
Descriptive statistics and inferential statistics
What is a hypothesis
A statement about some real-world
phenomenon that can be tested through
observations.
what is an independent variable
the factor you will be testing. Thing you will change (eg temp, conc etc)
What is the dependent variable
the effect you think will change and the thing you measure - is effected by independent variable
What must the hypothesis include
Must state the question you are asking as well as include both dependent and independent variables.
How many relationships should be included in hypothesis
1
What do you need to check about hypothesis
make sure its testable
What does APPEAR stand for
Acquire
Process
Plot
Examine
Analyse
Report
Describe Acquire (APPEAR)
Acquire data through sampling.
Need to decide how you do it, think about assumptions, ethical considerations, numbers, resources available.
What are some assumptions you have to make when sampling a population
that pop. is normally distributed, that you are randomly sampling
What is a T-test
used to test hypotheses about means when the population variance is unknown (usual)
What sample size do you want
as large as possible - will give more accurate data
What is a single sample T-test
when we only have 1 group that we want to test against a hypothetical
What is an independent samples T-test
We have 2 means and 2 groups that have no relation between each other
What are the 3 varieties of T-Tests
Single sample, independent samples and dependent samples
What is a pooled average
A weighted average of the two sample variances with weighting done according to sample size
What is a dependent T-test
2 means that are related to each other
when do we use t tests
when population variance is unknown and sample size is small (usually under 100)
What is an ANOVA test
(AN)alysis Of (VA)riance
A statistical test for comparing means between more than 2 groups
What does the F-test test
the hypothesis that two variances are equal. Will be close to 1 if variances are equal
what function do you use in R to import your dataset
read.csv()
e.g. Dataset1 <- read.csv(“dataset1.csv”, header = TRUE)
what is the Shapiro Wilks test
a normality test for a null hypothesis to check that results are normally distributed
if p>0.05 we fail to reject null hypothesis
What is the Bartlett test
Tests if there is a difference in variance between multiple sets of data
p>0.05 there is no difference in variance
what does it mean if shapiro test and bartlett test give p>0.05
data is normal with homogenous variance and therefor you can proceed with t test to analyse data
what is a 1 sample t test
tests whether the mean of a single sample is significantly different from known or hypothesized mean
what is a 2 sample t test (aka independent sample test)
used when comparing the means of two independent groups or populations and assesses whether the difference in means between the two groups is statistically significant
what is a paired sample t test
used when comparing the means of two related groups or when each data point is paired. Determines whether the is a significant difference in the means
what is a Post hoc test
used when ANOVA indicates significant differences and can identify which group differs from others
What must you do before an ANOVA test
Must make sure data is normally distributed using Shapiro-Wilkes tests
Name an example of a Post hoc
Tukey’s HSD (Honestly Significant Difference test)
How do you interpret the results of an ANOVA test
if the p value (Pr(>f)) is less that 0.05 you can reject null hypothesis. This indicates that at least one group is significantly different from the others. Yu can use Post-hoc now
What is the ANOVA function in R
ExampleA <- aov(dependent variable ~ independent variable, data = ANOVA1)
What is a 2 way ANOVA test
used to investigate the effects of two categorical independent variables on a continuously dependent variable
what is the Wilcoxon test
non-parametric rank test for statistical hypothesis testing used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples
what needs to be checked before a two way ANOVA
Check groups are independent
Normality (shapiro test)
Homogeneity (Barlett test)
what are regression models
a relationship between one dependent variable and explanatory variables. They are used mainly for prediction and estimation
How do you set up relationships in regression models
Use equations - numerical dependent variables and 1 or more numerical or categorical independent (explanatory) variable
What are the steps of regression modeling
Hypothesize relationship between variables
Specify probability distribution of random error term
Evaluate the fitted model
Use the model for prediction and estimation
What is model specification based on
Theory - theory of field, mathematical theory, previous research and common sense
what are the two main types of linear regression and when are they used
simple linear regression when you have only one independent variable
multiple linear regression which uses two or more independent variables
How do you interpret the results of regression
by using summary function summary(model) which provides detailed output including coefficients, r squared p values and more
What are co-efficients (linear regression0)
The intercept and coefficients of independent variables
What is R-squared (linear regression)
this value measures the models goodness of fit and represents the proportion of the variance in the dependent variable
what is the p value (linear regression)
a low p value (<0.05) for the independent variable(s) suggests a significant relationship between variables
why are boxplots important
they are essential for visualising and summarising data. They provide a quick way to assess distribution of data and identify outliers
what is a boxplot
a graphical representation of the distribution of a dataset.
It displays a 5 number summary of a set of data
What are the 5 main points of a boxplot
the minimum, first quartile, median, third quartile and maximum
Describe the ‘anatomy’ of a boxplot
The box itself is the interquartile range and spans from q1 - q3 with the median (q3) inside
The lines on either side extend to the maximum and minimum values with 1.5x the IQR.
Anything out with this are the outliers
Why use boxplots
makes it easy to identify skewness, central tendency and spread in data
good for visualizing non normal distribution
helps spot outliers
easy to compare
How do you create a boxplot
Order the data
Calculate the quartiles
Determine the IQR (Q3-Q1)
Find upper and lower limits
Identify outliers
Plot boxplot
Give some examples of non parametric tests
The Wilcoxon test and the Mann-Whitney U test
How do you interpret a boxplot
The median represents the centre of the data distribution whilst the box length represents the spread of the middle 50% of the data. The whiskers show the range of most of the data and outliers can be identified as individual points outside the whiskers
what extra information does a parametric test use
they operate under the assumption that the data is normally distributed
Give some examples of parametric tests
t tests, anova
what is nonparametric data
A class of statistical procedure that do not rely on assumptions about the shape or form of the probability distribution from which the data were drawn
What are the advantages of nonparametric tests
You can use these tests with any numeric variables with any distribution
What are the advantages of parametric tests
they use more information from available data which allows for more confidence of ruling out chance and finding real differences
why do we not always use parametric tests
data has to be normally distributed, interval or ratio level, and variance must be similar
what is the chi squared test
A method suited for situations involving larger sample sizes where it provides reliable insights into the independence or association between categorical variables
TRUE OR FALSE: if all conditions are met you should use parametric tests over non parametric tests
TRUE
what is the Wilcoxon test
non-parametric rank test for statistical hypothesis testing used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples
What is the Fisher’s exact tests
useful in smaller sample sizes or when dealing with 2x2 contingency tables where expected cell counts are low
It computes the exact probability of obtaining observed distribution
What is the Mann Whitney U test
Similar to Wilcoxon test but for independent samples