quantitative methods Flashcards

1
Q

describe quantitative?

A
  • numerical data
  • measured in numbers
  • data/hypotheses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is quantitative methods end goal?

A

record data where methods are repeatable and findings quantifiable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

example methods?

A
  1. surveys/questionnaires
  2. biomarkers/imaging
  3. randomised controlled trials
  4. lab experiments
  5. systematic review & meta-analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

advantages of quantitative methods?

A
  • more control/limited variables
  • representative samples
  • anonymised
  • precise for statistical comparison
  • answer whether theories true or false
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

limitations of methods?

A
  1. little understanding of individual experience
  2. less contextual understanding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is induction?

A

use raw data to generate a hypothesis or theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is deduction?

A

making predictions/hypothesis from a theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

we gave vaccine to 500 volunteers only 10 got covid - strong or weak evidence?

A

2% of people got covid - low but compared to infection rate 5% - isn’t massively effective compared to other vaccines but have a good population sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

we gave vaccine to 10 volunteers and none got covid - strong or weak evidence?

A

infection rate is 0% but volunteers is very low so needs to be tested further as efficacy cannot be validated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

vaccine efficacy?

A

how effective one is, and how well it protects people against infection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

solving vaccine problem?

A
  • intuition
  • systematic approach
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

population?

A
  • population of the entire world
  • complete set of objects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

sample?

A
  • participants for the vaccine testing
  • subset of given population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sample design?

A
  • deign sample with age group, how many, gender etc..
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what to do once you have your sample?

A

testing on the sample by giving them vaccine to to produce the vaccine efficacy (effectiveness) result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what do you do once you have a vaccine result for sample population?

A

back to entire population and make in inference if it was applied to the entire world

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what makes a good sample?

A
  • careful consideration of sub-categories so sample reliably represents population
  • sub-categories shouldn’t be modified once determined
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what does the term scientific cherry picking reference?

A
  • making selective choices amongst competing evidence
  • dismissing finding not supporting results chosen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are variables?

A
  • set of related events that can take on more than one value
    something that can be changed (characteristic/value)
    independent and dependent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is statistical inference?

A

working out how well a property of one variable can be inferred by that of another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is an independent variable?

A
  • predictor
  • what the dependent variable depends on
    represents value being changed/manipulated
    controlled to determine relationship on an observed outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is a dependent variable?

A
  • outcomes
  • something that depends on something
  • observed rust of IV being manipulated
  • e.g. person gets covid or not
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are the levels of independent variables?

A
  • vaccine study Ps has 2 levels (vaccinated or not)
  • undergrads have 3 levels (year 1,2,3)
  • only belong to one level but have multiple IVs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are control variables?

A
  • kept constant to prevent them influence affect of IV on DV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what are the 4 types of data?

A
  1. nominal
  2. ordinal
  3. interval
  4. ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

define interval data

A
  • can be ordered and measured
  • cannot compute a ratio between 2 values
    e.g. exam mark, date, year
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

define ration data

A
  • interval but can take the ratio between the 2
  • distance, height, income
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

name descriptive statistics you have learnt?

A
  • histograms
  • central tendencies (mode, median, mean)
  • spread (quantile and quartile and percentile, variance and SD, Z-score)
  • shape (skewness, kurtosis)
  • outliers (detection methods)
  • box plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what is frequency?

A

how often a value appears in data (a bin)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is a histogram?

A
  • visualises how data is distributed
  • such group of coin stacks is a histogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

describe the mode

A
  • find value of the highest stack/bin
  • can be multiple
  • all type of variables bit usually nominal/ordinal variables e.g., satisfaction score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

describe the Median

A
  • centre of the stacks/bins
  • middle value so 2 groups with same number
    median can be used in nominal variables, only ordered variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

describe the mean

A
  • finding the centre by finding mass
  • all point on left and right balances out
  • average
  • add all together and divide by how many
  • interval and ratio variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what is a spread?

A
  • a distribution can have the same mean and median but a different spread
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

work out the spread?

A
  • divide coins into sections with same number of data
  • 20 sections of 10 coins
  • reports where sections and cut-off points are in the spread
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what are quantiles?

A
  • cut-off points dividing sections
    e.g. 200 coins in 10s = 20 quantiles which shows where the boundaries are
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

what are quartiles?

A
  • when there’s 4 sections in total
  • report 3 numbers (one divides group 1-2, 2-3 and 3-4)
  • median = 2nd quartile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

what are percentiles?

A
  • when there’s 100 sections in total
  • median is 50th percentile
  • 99 numbers to report (boundaries)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

will it be harder or easier to spin if data is more spread?

A

easier to spin around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

what is variance in spread?

A
  • the 2nd moment of data
  • how difficult to rotate data around centre
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

what is standard deviation?

A
  • square root of variance
  • standard distance from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

what does the mean and SD provide info on?

A

where the centre is and how spread data points are around it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

what is a Z-score?

A
  • given the SD, distance can be described as a ratio with respect to SD
  • difference divided by the standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

what does the shape of data statistics help to do?

A

extracts number describing more detailed info about the actual distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

what is the skewness?

A
  • measures degree of asymmetry
  • corresponds to 3rd moment (distance from mean to power of 3 to each data point divided by no. data points)
  • dimensionless = 3rd moment further divided by SD to power of 3
46
Q

what does a negative/positive skewness mean?

A
  • skew to left - positive
  • skew to right - negative
47
Q

zero skewness?

A
  • data is symmetric
48
Q

what is a kurtosis

A
  • how sharp data is
  • 4th moment
  • high kurtosis = sharp
  • low kurtosis = more spread
49
Q

what are outliers?

A
  • extreme outliers relative to bulk of values in data set
  • distort data especially in smaller samples
50
Q

why do outliers happen?

A
  • inaccuracies in data processing
  • problems with methodology
  • actual extreme value from unusual P
51
Q

how do you detect an outlier?

A
  • based on z-score (if z-score is more then 3 or less than -3 i.e. distance from mean is more than 3x SD)
  • IQR ( outlier value greater then 1.5 IQR above 3rd quartile or smaller than 1.5 IQR below 2nd quartile)
52
Q

what are box plots?

A
  • summarising quartile-based statistics of a data set
  • includes location of quartiles, range of data excluding outliers and outliers detected by quartile)
53
Q

what is the probability of a coin toss?

A
  • assumed 0.5 (50%)
  • you can check this
54
Q

how do you calculate probability?

A
  • probability of getting K heads in N tosses when probability of getting heads for each toss is Q(0.5)
  • Bi(k|n,q)
  • count all possible combinations of coins you get k heads of n tosses
    toss coins n=10 times, write sequence of 10 HTHTTTHTTT
  • k=3 heads
55
Q

what does probability (A|B) refer to?

A
  • probability of obtaining A on the condition of B
  • its a function
56
Q

what is pascals triangle?

A
  • for each node, all routes to that node from top have same number of Hs and Ts
  • write number on each node as you go down
  • probability of 3H and 1T - add all possible routes(bottom layer)
  • 4/16 = 0.25
57
Q

what is a binomial distribution?

A
  • when there is always 2 choices (heads or tails) it is binomial
  • probabilities in each node = probability distribution
58
Q

what is cumulative probability?

A
  • when number of coin tosses are high, it doesn’t make sense to use probability of getting exact number of heads
  • use probability that value falls in a certain range
  • toss 100 times what’s probability of less than 40 heads
59
Q

how do you work out cumulative probability?

A
  • add all the probability in the range your interested in
  • out of 10 tosses probability for 0-3 heads
  • for 0-3 range in binomial distribution and add probabilities together
60
Q

what is a 2 tailed cumulative probability?

A
  • probability of both ends to check probability that data has deviated from mean (centre)
61
Q

what is a discrete distribution?

A
  • coin tossing is a discrete event
  • counted how many times something happened
  • binomial is discrete
  • something that can be divided by number
62
Q

what is a continuous distribution?

A
  • something that cannot be divided by the number
  • measuring continuous variables
  • height and weight
63
Q

how to work out probability of continuous distribution?

A
  • probability of variable being specific number is zero
  • area under distribution in range indicates probability
  • Y-axis is probability density
64
Q

what is a normal distribution?

A
  • most important
  • described by mean and SD
  • Gaussian
65
Q

what is a statistical test?

A
  • systematically testing whether a given scientific claim is valid or not
  • not a 100% answer so base it on probability
66
Q

work out probability of a binomial distribution

A

bi(k<10|n,q)

67
Q

if probability is low do we reject or accept hypothesis

A

reject

68
Q

if probability isn’t low do we reject hypothesis

A

we cannot reject it

69
Q

what are the probabilities used to reject hypotheses called?

A

p-values

70
Q

what is a P-value

A
  • a probability your hypothesis is right
  • if p-value is <0.05 - reject hyp
  • p-value >-.5 - accept hyp
  • threshold for p-value = alpha level (0.5)
71
Q

what is an alpha-level

A
  • determined before analysis
  • set threshold - usually 0.5 (5%)
  • above probability threshold we say it can happen by chance
  • below it cannot happen by chance
72
Q

what is a null hypothesis?

A
  • hypothesis against research question
  • no difference in result and only difference observed are just error
73
Q

what is the opposite to the null hypothesis?

A
  • research/alternative hypothesis
  • there is a difference in result
74
Q

what is hypothesis testing?

A
  • test probability null hypothesis is true
  • you are never prove if something is true so you do this to try prove it is false
75
Q

what is a type 1 error?

A
  • false-positive
  • reject null hypothesis when true
  • vaccine not effective but you conclude it is effective
  • try to minimise
76
Q

what is a type 2 error?

A
  • false-negative
  • don’t reject null hypothesis the false
  • vaccine is effective but you conclude it is not
  • try to minimise
77
Q

the binomial test

A
  • simplest statistical test
  • tests statistical significance of deviations from a theoretically expected probability of a binary event
78
Q

how do you run a binomial test?

A
  • describe null with expected proportion
  • report observed proportion
  • report p-value - probability null is true
  • or report confidence interval
79
Q

what test do we use when there are proportions with more than 2 levels?

A

chi-square goodness-of-fit test

80
Q

what test do we use when we are comparing proportions across 2 or more groups?

A

chi-square test of association

81
Q

what test do we use when we are comparing a measure with a fixed value?

A

one-sample t-test

82
Q

what test do we use when we are comparing a measure across 2 groups?

A
  • independent = two-samples t-test
  • paired = paired t-test
83
Q

what test do we use when we are comparing a measure across more than 2 groups?

A

ANOVA

84
Q

what does a chi-square test test?

A
  • test of difference among categorical variables (nominal/ordinal)
85
Q

goodness-of-fit

A
  • how proportions in data fit to fixed proportions
86
Q

test of association

A
  • how proportions of 2 data sets are associated
87
Q

what is the benfords law (chi-square goodness of fit)

A
  • first digit law
  • count each digit
  • counts how many times 1 digit occurs, 2 digit, 3 digits etc
  • 1 should occur most (30%)
  • 2 should be around 17.6%
88
Q

We how do you report chi-square goodness of fit test?

A
  • squared = chi squared value - if big there’s a big difference
  • d.f - degree of freedom: number of levels minus 1
  • p-value
89
Q

what does chi-square test of association test

A
  • checks association between 2 nominal/ordinal variables
  • can be summarised as a contingency table
90
Q

how do you report chi-square test of association?

A
  • build contingency table
  • Xsquared, df and P-value
    • no. of data points (add them)
91
Q

McNemer test - paired samples

A
  • data points paired across 2 groups
  • only available for a 2-by-2 contingency table
92
Q

T-tests (students)

A
  • difference in group of measures (interval or ratio)
  • compares means of pops
  • 3 types and for each you decide whether to do a one tailed or 2 tailed
93
Q

one sample t-test

A
  • compares mean of one sample group against fixed value
  • pop underlying sample has mean equal to fixed value = H0 - no significance
  • significant difference is if it deviates massively
94
Q

independent samples t-test

A
  • compare observed difference between means of two independent samples
  • null = populations underlying 2 samples have equal means
95
Q

paired samples t-test

A
  • compares main difference of one group measure on 2 occasions
  • null = population mean did not change
96
Q

students t-test - checking for normality

A
  • parametric tests
  • test of normality (Shapiro-wilk test)
  • a violation of normality is indicated by low p-value
97
Q

assumptions for independent samples t-test

A
  • levenes test of equal variance
  • significance of difference in variances are reported at p=value
  • if variances are not equal - welchs t-test
98
Q

how to report t-tests

A
  • based on t-statistic
  • statistical value
  • d.f
  • p-value
    -usually reported together with descriptive statistics
99
Q

what is a correlation?

A
  • when 2 datasets are related and you see a relationship
  • first statistics invented for analysing co-relationships
100
Q

what is the unit if there is so much variability in your data

A

0
no relationship

101
Q

what is the equation to work out the correlation coefficient

A

r= Sxy/Sx.Sy

102
Q

what does Sxy stand for?

A
  • covariance
  • how much x and y change together
103
Q

what does Sx.Sy stand for?

A
  • how much x and y change individually
104
Q

what was the r-value tell you?

A
  • direction of your correlation (r>0 it is positive)
  • strength of correlation (r close to 1v or -1 it is strong)
105
Q

what happens if you square your r number

A
  • how much of the variability changes in data
  • e.g., predicting weight and height: r value 0.7 - 0.49 - about half of variability of weight is explained by your height
106
Q

what is regression?

A
  • slope (how steep is the relationship) - no slope = no relationship
  • intercept (on co-variable is 0 what is the other) -
107
Q

slope?

A
  • how quickly the line changes
108
Q

intercept

A
  • where is the line when there is nothing on the x axis
109
Q

what is the equation y-mx+c for?

A

y = axis (weight)
m = slope
x = axis (height)
c = intercept

110
Q

what is y if x = 0? in regression

A

y = intercept

111
Q

what happens when x increases by 1 - in regression

A

y increases by the slope