Statistics Flashcards
Accuracy
- how close the measurement is to the true value
- compared to a gold standard
alpha error
probability of Type I error
beta
type II error
bias
outcome differs from the correct answer
in a systematic non-random way
biased studies
- subjects in group 1 differ from subjects in group 2
- in a meaningful way that will affect the conclusions
cohort
its a group with common characteristics
confidence interval
- The 95% confidence interval defines a range of values that you can be 95% certain contains the population mean. With large samples, you know that mean with much more precision than you do with a small sample, so the confidence interval is quite narrow when computed from a large sample.
- the true parameter (such as the mean) is expected to fall within this range
- Most commonly, the 95% confidence level is used
- Factors affecting the width of the confidence interval include
- the size of the sample,
- the confidence level,
- the variability in the sample.
- A larger sample size normally will lead to a better estimate of the population parameter.
- is a range of likely values for the population parameter based on: the point estimate, e.g., the sample mean
counfounding variable or factor
- when 2 variables are related to a 3rd variable
- one might or might not know the factor is related to the 2 principle variables
*
Dependent variable
- its the outcome or effect
- for example visual acuity
independent variable
null hypothesis
there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
normal
its a Gaussian or bell-shaped curve distribution
power
probability of finding a true difference
regression
how much a dependent variable Y changes based on changes of the independent variable
type I error
- if we reject the null hypothesis when in fact it is true
- its alpha error
- we reject the null hypothesis if p<0.05
- so the groups are different
- but in reality, they are not
type II error
- its the beta error
- when we accept the null hypothesis when in fact, it is FALSE
*
what are the 2 ways of missinterpreting p value?
and what to do to avoid missinterpretation?
- a p value >0.05
- thought to be non-significant
- “no effect” or “no difference”
- THE CORRECT interpretation should be:
- The is no strong evidence that the intervention has an effect
- no strong evidence that there is a difference
- THE CORRECT interpretation should be:
- ALWAYS CHECK THE P VALUE WITH THE CI
what to look for in the Confidence interval?
- you want a NARROW RANGE
- the wider the range, the worse
- the RESULT falls within that range with X% of conficence
give examples of categorical variables
- male/female
- republican/democrat
- pass/fail
- yes/no
- etc…..
what test to use to measure the association between categorical variables?
Chi square
what test to use to the correlation between
continuous variables?
correlation
regression
which is the most common measure of correlation?
how do you interpret?
- the R value (Pearson coefficient
- R ranges from -1 to +1
- 0 means no correlation
- the farther from 0 the better
what test to use to measure association
between categorical variables
when to use each?
- chi square and fisher exact test
- if small numbers (small n) use FISHER
- the chi squre is an aprox of the fisher so use chi2 when you have large numbers
when to sue a t-test?
- use t-test to determine if two different set are significantly different from each other
*
what are the 3 types of t-test?
- one-sample t-test
- compared the mean of the sample to a reference mean of a know population
- independent samples t-test
- compares 2 means from 2 independent samples
- paired samples t-test
- compares 2 means that are from repeated measurements of same participants
what to do if you want to compare the means or the differences of more than 2 groups?
what if you do multiple t-test? what eror you incur in?
- use ANOVA
- doing multiple t-test comparisons would result in a type I error
what number is the enemy of confidence interval?
1
if you can not mask the doctor or the patient what can you MASK at least?
the person that is making the fiinal outcome
how to calculate the degrees of freedom?
degrees of freedom = n-1
Degrees of freedom can be described as the number of scores that are free to vary. For example, suppose you tossed three dice. The total score adds up to 12. If you rolled a 3 on the first die and a 5 on the second, then you know that the third die must be a 4 (otherwise, the total would not add up to 12). In this example, 2 die are free to vary while the third is not. Therefore, there are 2 degrees of freedom.
In many situations, the degrees of freedom are equal to the number of observations minus one. Thus, if the sample size were 20, there would be 20 observations; and the degrees of freedom would be 20 minus 1 or 19.
what is the difference between a continious variable and a discrete?
- continuos has unlimited possibilities
- weight for ex 120; 120.1; 120.12 etc
- discrete is limited
- children in the family 1, 2, 3, 4
*
- children in the family 1, 2, 3, 4
what is the goal of research?
THE GOAL ID TO REJECT THE NULL HYPOTHESIS
thats it! (although the goal is to reject with what level of confidence)
For example:
null hypothesis: the vessel density is similar in RCD and CRD patients
what characteristics are important in a hypothesis?
- its an statement - not a question
- based on the research question
- is it testable?
- is it informative?
- s it relevant?
- TRY to make it simple (a 1:1) hypothesis
- the more variables the more difficult to reject
what is a type I error?
when you reject the null hypothesis when in fact is true…
for example:
HO= there is not difference in vessel density between RCD and CRD
my results rejects and says THERE is a difference (when there is not)
what is a type II error?
when you DONT reject the null hypothesis when in fact it is not true
for example:
HO = there is no difference in vessel density among RCD and CRD patients
Result: there is not difference
the real fact is that there is a difference.
what is the difference between type 1 and type 2 error?
type I rejects HO
Type 2 accepts HO
type 2 es complaciente y la acepta
when to use CI of 95% vs a higher CI?
use 95% for all
except in something super important like new medicines, then use a 99% CI
what is parametric and non-parametric data?
- parametric means that the data FOLLOWS PARAMETERS
- follows a predictable parameter
- in other words is normally distributed
- non-parametric does not follow normal distribution
- random values in the dispersion graph
how to describe the following data:
- normally distributed data -
- non-parametric -
- categorical -
- over time -
- two related variables -
- non-parametric variables -
- categorical -
- normally distributed data - mean, SD, min, max
- non-parametric - median, histogram
- categorical - frequencies
- over time - line plot / Time series
- two related variables - Person correlation
- non-parametric variables - Spearman’s correlation
- categorical - crosstabulations
how to describe categorical data?
- normally distributed data - mean, SD, min, max
- non-parametric - median, histogram
- categorical - frequencies
- over time - line plot / Time series
- two related variables - Person correlation
- non-parametric variables - Spearman’s correlation
- categorical - crosstabulations
how to describe the relation between non-parametric variables
- normally distributed data - mean, SD, min, max
- non-parametric - median, histogram
- categorical - frequencies
- over time - line plot / Time series
- two related variables - Person correlation
- non-parametric variables - Spearman’s correlation
- categorical - crosstabulations
how to describe non-parametric data?
- normally distributed data - mean, SD, min, max
- non-parametric - median, histogram
- categorical - frequencies
- over time - line plot / Time series
- two related variables - Person correlation
- non-parametric variables - Spearman’s correlation
- categorical - crosstabulations
what test to do in the following scenarios:
- comparing more than 2 groups in non-parametirc data
- comparing single sample to normative previously published sample - single t-test
- comparing 2 groups paired (normal, not normal and categorical
- normal - paired t-test
- not normal - Wilcoxon
- categorical - McNemar
- comparing 2 independent samples
- normal - independent t-test
- not notmal - Mann-whitney
- Categorical - ChiSquare
- comparing more than 2 groups - repeated measures
- normal - ANOVA
- not-normal FRIEDMAN ANOVA
- categorical - Cochran’s Q
- more than 2 groups - independent
- normal - one way ANOVA
- not normal - Kruskal Wallis
- categorical - chi Square
what test to do in the following scenarios:
comparing 2 grousp paired non-parametric data
wilcoxon
what test to do in the following scenarios:
comparing catagorical variables
chi square
what is the normal distribution curve?
is the most important and most widely used distribution in statistics. It is sometimes called the “bell curve,” although the tonal qualities of such a bell would be less than pleasing. It is also called the “Gaussian curve” after the mathematician Karl Friedrich Gauss.
in the normal gausian curve or bell curve, where do the majority of values fall inito?
- the majority in the +/- 2SD
- the center is the mean
- 95% of values lie within +/- 1.96 SD from the mean
in a normal distribution curve or bell curve or gausian curve, what is the value of 1 SD, 2 SD and 3 SD?
- 1 SD = 68%
- 2 SD = 95%
- 3 SD = 99%