quantitative methods Flashcards
describe quantitative?
- numerical data
- measured in numbers
- data/hypotheses
what is quantitative methods end goal?
record data where methods are repeatable and findings quantifiable
example methods?
- surveys/questionnaires
- biomarkers/imaging
- randomised controlled trials
- lab experiments
- systematic review & meta-analysis
advantages of quantitative methods?
- more control/limited variables
- representative samples
- anonymised
- precise for statistical comparison
- answer whether theories true or false
limitations of methods?
- little understanding of individual experience
- less contextual understanding
what is induction?
use raw data to generate a hypothesis or theory
what is deduction?
making predictions/hypothesis from a theory
we gave vaccine to 500 volunteers only 10 got covid - strong or weak evidence?
2% of people got covid - low but compared to infection rate 5% - isn’t massively effective compared to other vaccines but have a good population sample
we gave vaccine to 10 volunteers and none got covid - strong or weak evidence?
infection rate is 0% but volunteers is very low so needs to be tested further as efficacy cannot be validated
vaccine efficacy?
how effective one is, and how well it protects people against infection
solving vaccine problem?
- intuition
- systematic approach
population?
- population of the entire world
- complete set of objects
sample?
- participants for the vaccine testing
- subset of given population
sample design?
- deign sample with age group, how many, gender etc..
what to do once you have your sample?
testing on the sample by giving them vaccine to to produce the vaccine efficacy (effectiveness) result
what do you do once you have a vaccine result for sample population?
back to entire population and make in inference if it was applied to the entire world
what makes a good sample?
- careful consideration of sub-categories so sample reliably represents population
- sub-categories shouldn’t be modified once determined
what does the term scientific cherry picking reference?
- making selective choices amongst competing evidence
- dismissing finding not supporting results chosen
what are variables?
- set of related events that can take on more than one value
something that can be changed (characteristic/value)
independent and dependent
what is statistical inference?
working out how well a property of one variable can be inferred by that of another variable
what is an independent variable?
- predictor
- what the dependent variable depends on
represents value being changed/manipulated
controlled to determine relationship on an observed outcome
what is a dependent variable?
- outcomes
- something that depends on something
- observed rust of IV being manipulated
- e.g. person gets covid or not
what are the levels of independent variables?
- vaccine study Ps has 2 levels (vaccinated or not)
- undergrads have 3 levels (year 1,2,3)
- only belong to one level but have multiple IVs
what are control variables?
- kept constant to prevent them influence affect of IV on DV
what are the 4 types of data?
- nominal
- ordinal
- interval
- ratio
define interval data
- can be ordered and measured
- cannot compute a ratio between 2 values
e.g. exam mark, date, year
define ration data
- interval but can take the ratio between the 2
- distance, height, income
name descriptive statistics you have learnt?
- histograms
- central tendencies (mode, median, mean)
- spread (quantile and quartile and percentile, variance and SD, Z-score)
- shape (skewness, kurtosis)
- outliers (detection methods)
- box plots
what is frequency?
how often a value appears in data (a bin)
what is a histogram?
- visualises how data is distributed
- such group of coin stacks is a histogram
describe the mode
- find value of the highest stack/bin
- can be multiple
- all type of variables bit usually nominal/ordinal variables e.g., satisfaction score
describe the Median
- centre of the stacks/bins
- middle value so 2 groups with same number
median can be used in nominal variables, only ordered variables
describe the mean
- finding the centre by finding mass
- all point on left and right balances out
- average
- add all together and divide by how many
- interval and ratio variables
what is a spread?
- a distribution can have the same mean and median but a different spread
work out the spread?
- divide coins into sections with same number of data
- 20 sections of 10 coins
- reports where sections and cut-off points are in the spread
what are quantiles?
- cut-off points dividing sections
e.g. 200 coins in 10s = 20 quantiles which shows where the boundaries are
what are quartiles?
- when there’s 4 sections in total
- report 3 numbers (one divides group 1-2, 2-3 and 3-4)
- median = 2nd quartile
what are percentiles?
- when there’s 100 sections in total
- median is 50th percentile
- 99 numbers to report (boundaries)
will it be harder or easier to spin if data is more spread?
easier to spin around the mean
what is variance in spread?
- the 2nd moment of data
- how difficult to rotate data around centre
what is standard deviation?
- square root of variance
- standard distance from the mean
what does the mean and SD provide info on?
where the centre is and how spread data points are around it
what is a Z-score?
- given the SD, distance can be described as a ratio with respect to SD
- difference divided by the standard deviation
what does the shape of data statistics help to do?
extracts number describing more detailed info about the actual distribution
what is the skewness?
- measures degree of asymmetry
- corresponds to 3rd moment (distance from mean to power of 3 to each data point divided by no. data points)
- dimensionless = 3rd moment further divided by SD to power of 3
what does a negative/positive skewness mean?
- skew to left - positive
- skew to right - negative
zero skewness?
- data is symmetric
what is a kurtosis
- how sharp data is
- 4th moment
- high kurtosis = sharp
- low kurtosis = more spread
what are outliers?
- extreme outliers relative to bulk of values in data set
- distort data especially in smaller samples
why do outliers happen?
- inaccuracies in data processing
- problems with methodology
- actual extreme value from unusual P
how do you detect an outlier?
- based on z-score (if z-score is more then 3 or less than -3 i.e. distance from mean is more than 3x SD)
- IQR ( outlier value greater then 1.5 IQR above 3rd quartile or smaller than 1.5 IQR below 2nd quartile)
what are box plots?
- summarising quartile-based statistics of a data set
- includes location of quartiles, range of data excluding outliers and outliers detected by quartile)
what is the probability of a coin toss?
- assumed 0.5 (50%)
- you can check this
how do you calculate probability?
- probability of getting K heads in N tosses when probability of getting heads for each toss is Q(0.5)
- Bi(k|n,q)
- count all possible combinations of coins you get k heads of n tosses
toss coins n=10 times, write sequence of 10 HTHTTTHTTT - k=3 heads
what does probability (A|B) refer to?
- probability of obtaining A on the condition of B
- its a function
what is pascals triangle?
- for each node, all routes to that node from top have same number of Hs and Ts
- write number on each node as you go down
- probability of 3H and 1T - add all possible routes(bottom layer)
- 4/16 = 0.25
what is a binomial distribution?
- when there is always 2 choices (heads or tails) it is binomial
- probabilities in each node = probability distribution
what is cumulative probability?
- when number of coin tosses are high, it doesn’t make sense to use probability of getting exact number of heads
- use probability that value falls in a certain range
- toss 100 times what’s probability of less than 40 heads
how do you work out cumulative probability?
- add all the probability in the range your interested in
- out of 10 tosses probability for 0-3 heads
- for 0-3 range in binomial distribution and add probabilities together
what is a 2 tailed cumulative probability?
- probability of both ends to check probability that data has deviated from mean (centre)
what is a discrete distribution?
- coin tossing is a discrete event
- counted how many times something happened
- binomial is discrete
- something that can be divided by number
what is a continuous distribution?
- something that cannot be divided by the number
- measuring continuous variables
- height and weight
how to work out probability of continuous distribution?
- probability of variable being specific number is zero
- area under distribution in range indicates probability
- Y-axis is probability density
what is a normal distribution?
- most important
- described by mean and SD
- Gaussian
what is a statistical test?
- systematically testing whether a given scientific claim is valid or not
- not a 100% answer so base it on probability
work out probability of a binomial distribution
bi(k<10|n,q)
if probability is low do we reject or accept hypothesis
reject
if probability isn’t low do we reject hypothesis
we cannot reject it
what are the probabilities used to reject hypotheses called?
p-values
what is a P-value
- a probability your hypothesis is right
- if p-value is <0.05 - reject hyp
- p-value >-.5 - accept hyp
- threshold for p-value = alpha level (0.5)
what is an alpha-level
- determined before analysis
- set threshold - usually 0.5 (5%)
- above probability threshold we say it can happen by chance
- below it cannot happen by chance
what is a null hypothesis?
- hypothesis against research question
- no difference in result and only difference observed are just error
what is the opposite to the null hypothesis?
- research/alternative hypothesis
- there is a difference in result
what is hypothesis testing?
- test probability null hypothesis is true
- you are never prove if something is true so you do this to try prove it is false
what is a type 1 error?
- false-positive
- reject null hypothesis when true
- vaccine not effective but you conclude it is effective
- try to minimise
what is a type 2 error?
- false-negative
- don’t reject null hypothesis the false
- vaccine is effective but you conclude it is not
- try to minimise
the binomial test
- simplest statistical test
- tests statistical significance of deviations from a theoretically expected probability of a binary event
how do you run a binomial test?
- describe null with expected proportion
- report observed proportion
- report p-value - probability null is true
- or report confidence interval
what test do we use when there are proportions with more than 2 levels?
chi-square goodness-of-fit test
what test do we use when we are comparing proportions across 2 or more groups?
chi-square test of association
what test do we use when we are comparing a measure with a fixed value?
one-sample t-test
what test do we use when we are comparing a measure across 2 groups?
- independent = two-samples t-test
- paired = paired t-test
what test do we use when we are comparing a measure across more than 2 groups?
ANOVA
what does a chi-square test test?
- test of difference among categorical variables (nominal/ordinal)
goodness-of-fit
- how proportions in data fit to fixed proportions
test of association
- how proportions of 2 data sets are associated
what is the benfords law (chi-square goodness of fit)
- first digit law
- count each digit
- counts how many times 1 digit occurs, 2 digit, 3 digits etc
- 1 should occur most (30%)
- 2 should be around 17.6%
We how do you report chi-square goodness of fit test?
- squared = chi squared value - if big there’s a big difference
- d.f - degree of freedom: number of levels minus 1
- p-value
what does chi-square test of association test
- checks association between 2 nominal/ordinal variables
- can be summarised as a contingency table
how do you report chi-square test of association?
- build contingency table
- Xsquared, df and P-value
- no. of data points (add them)
McNemer test - paired samples
- data points paired across 2 groups
- only available for a 2-by-2 contingency table
T-tests (students)
- difference in group of measures (interval or ratio)
- compares means of pops
- 3 types and for each you decide whether to do a one tailed or 2 tailed
one sample t-test
- compares mean of one sample group against fixed value
- pop underlying sample has mean equal to fixed value = H0 - no significance
- significant difference is if it deviates massively
independent samples t-test
- compare observed difference between means of two independent samples
- null = populations underlying 2 samples have equal means
paired samples t-test
- compares main difference of one group measure on 2 occasions
- null = population mean did not change
students t-test - checking for normality
- parametric tests
- test of normality (Shapiro-wilk test)
- a violation of normality is indicated by low p-value
assumptions for independent samples t-test
- levenes test of equal variance
- significance of difference in variances are reported at p=value
- if variances are not equal - welchs t-test
how to report t-tests
- based on t-statistic
- statistical value
- d.f
- p-value
-usually reported together with descriptive statistics
what is a correlation?
- when 2 datasets are related and you see a relationship
- first statistics invented for analysing co-relationships
what is the unit if there is so much variability in your data
0
no relationship
what is the equation to work out the correlation coefficient
r= Sxy/Sx.Sy
what does Sxy stand for?
- covariance
- how much x and y change together
what does Sx.Sy stand for?
- how much x and y change individually
what was the r-value tell you?
- direction of your correlation (r>0 it is positive)
- strength of correlation (r close to 1v or -1 it is strong)
what happens if you square your r number
- how much of the variability changes in data
- e.g., predicting weight and height: r value 0.7 - 0.49 - about half of variability of weight is explained by your height
what is regression?
- slope (how steep is the relationship) - no slope = no relationship
- intercept (on co-variable is 0 what is the other) -
slope?
- how quickly the line changes
intercept
- where is the line when there is nothing on the x axis
what is the equation y-mx+c for?
y = axis (weight)
m = slope
x = axis (height)
c = intercept
what is y if x = 0? in regression
y = intercept
what happens when x increases by 1 - in regression
y increases by the slope