Descriptive statistics Flashcards
measures: mean, trimmed mean, median, quantile, IQR, modus, sd
sem, cv - formula
Mean; Median; Modus are central
mean: sensitive against outliers
trimmed mean: take away 10% on both sides mean(v1,trim=0.1): less sensitive against outliers
median not sensitive against outliers is a robust measure (exactly the number in the middle)
Quartil: Aufteilung der Daten in viertel (oberes und unteres Quartil als Box bei Boxplot
Modus: welcher Wert kommt am häufigsten vor? (keinen gesonderten Befehl in R), Maximum der Daten liegt da
SD: Wurzel der Varianz
normal data:
* 2/3 of data are within 1 SD
* 95% of data are within 2 SD
coefficient of variation (CV): CV% = 100*sd(x)/mean(x)
–> used to compare different magnitudes
• standard error of the mean
- SEM = sd(x)/sqrt(N)
- how close are we to the true population mean
- more measurements ñ closer
IQR: inter quartil range
How to report results in general
- problem
- test name
- sample size
- test statistic
- P value
- condence interval
Contigency tables
• tables with count items
• each count is a number of cases of a certain level or sharing a given combination of levels
• normally used on factors/categorical data
• but on continuous data can be used with cut, use a “good”break –> hint: use quantiles for cutting
frequency: number of times a category is counted summc nc= n
relative frequency: sample proportion for each possiböe category (summ p categories= 1) Pcategory1= n1/n
Tabulating Categorical Data, ftable
- table function for tabulating one or two variables
- ftable function for tabulating more than two variables
- dim for exploring dimensions of a table
- sum to count the number of all items
1D, 2D, 3D table
access of multidimensional tables as with matrices and dataframes using rectangular braces and n 1 commas. n is the number of dimensions.
Independence table
• number of observations if there would be no
dependencies between the variables
• Expected= (Rowtotal * Columntotal )/ Total
Pearson residuals
–> A normalized measure for the distance to the expected data.
prop.table
Express Table Entries As Fraction Of Marginal Table
Graphics 1D
pie, barplot, dotchart
Graphics 2D
assocplot, mosaicplot, fourfoldplot
• exploring the relationship between two variables
• mosaicplot –> absolute numbers visualized
• assocplot –> residuals shown (zeigt ob mehr oder weniger, wie groß die Abnahme)
generate probability functions
- r: random number generator
- p: probability function (cumulative probability function c.d.f)
- d: density function (point probability)
- q: quantile function (inverse c.d.f)
Poisson Distribution
- binominal distribution has an upper limit
- if we through 50 times, the maximum achievable value is 50
- count numbers that are without theoretical limits (spatial or temporal) often follow a Poisson distribution
- lower limit is zero, but no upper limit
- parameter as the rate of occurence within a certain time or space
- count cells in a grid, number of visits of doctors at a
Poisson Distribution
- binominal distribution has an upper limit
- if we through 50 times, the maximum achievable value is 50
- count numbers that are without theoretical limits (spatial or temporal) often follow a Poisson distribution
- lower limit is zero, but no upper limit
- parameter as the rate of occurence within a certain time or space
- count cells in a grid, number of visits of doctors at a patient ..
Chisq Distribtion
distribution for chi^2 of tables without dependecies for the variables
df=(nr of levels var1-1)*(number of levels var2-1)
Distributions numerival vs categorical data
• Numerical data - Uniform - Normal - T - (Wilcox) • Categorical data - Bernoulli - 1 trial, upper limit - Binominal - unlimited trials, upper limit - Poisson - no upper limit - Chisq - two variables