Statistics Flashcards
Level of measurements
Nominal
Cannot order. Eg: Name, student number, blood group
Ordinal
Order but no numerical meaning in the difference: not quantified. Difference is not constant (can say which is better but not how much: difference not expressed in number) Eg: stages of cancer
Interval
Order and meaning. No absolute Zero (depends on unit) Eg: temperature
Ratio
Order and meaning. Absolute Zero. Eg: weight, height
Mean
sum of individual values divided by the number of values
Outliers affect the mean value of the data
Describes the distribution in regard to the location of the center
Median
order the values, take the center value, or divide by 2 the 2 middle value
Outliers have little effect on the median of a given set of data.
50th percentile, 2nd quartile: median
Describes the distribution in regard to the location of the center
Mode
value that occur most frequently
Outliers have little effect on the mode of a given set of data.
Describes the distribution in regard to the location of the center
Variance
s^2
sum of (individul value - mean)^2/(n values-1)
very sensitive to outliers
Describes the distribution in regard to the spread
Standard Deviation
SD or s
square root of variance
sensitive to outliers
Describes the distribution in regard to the spread
Interquartile Range
IQR
distance between 25th(1st quartile) and 75th(3rd quartile) percentile
Percentile: how much percent of data is smaller or equal
50th percentile, 2nd quartile: median
Describes the distribution in regard to the spread
Normal distribution
mean=median=mode
68% of values are between “mean-1SD” and “mean+1SD”
95% of values are between “mean-2SD” and “mean+2SD”
can be used to find the 2.5 and 97.5 percentile. Take mean and standard deviation then do the computation to find these percentile.
Symmetric: same distance from median to 3rd or 1st quartile
Box plot
bottom box: 1st quartile, 25%
Top box: 3rd quartile, 75%
Line in box: median
vertical line above and under box: maximum and minimum
Skewed
Left or Right skewed means that tail of the normal distribution points to the left or right
this means the vertical line under the box plot will be longer if left skewed: more small values
and the opposite for right skewed
Skewed and symmetry describe the distribution in regard to the symmetry
Correlation coefficient
r
increasing or decreasing trend: how strong is the association
Measures the degree of the linear association between 2 numerical values
-1 decreasing linear relationship, +1 increasing linear relationship. 0 no linear relation ship
Values in between describe the strength of the relation : no no exact treshold
The slope does no matter ! The coefficient compute whether each point increase/decrease
Look 1st at plot, there could be a relation but not a linear one
Inferential Statistics
Hypothesis Testing
Randomly select samples.
The most sample the more you reduce the coincidence/chance
Does … lead to a change
Study Design:
Before/ after, With/without, etc -> paired t-test : 2 measurement of same patient
H0: assumption of no difference
p value is area under curve
significance level: 0.05
if p<=0.05 -> reject H0 there is a difference
P-value
probability of obtaining the observed mean difference