Descriptive statistics Flashcards

Question 1

Q

measures: mean, trimmed mean, median, quantile, IQR, modus, sd
sem, cv - formula

Answer

A

Mean; Median; Modus are central
mean: sensitive against outliers

trimmed mean: take away 10% on both sides mean(v1,trim=0.1): less sensitive against outliers

median not sensitive against outliers is a robust measure (exactly the number in the middle)

Quartil: Aufteilung der Daten in viertel (oberes und unteres Quartil als Box bei Boxplot

Modus: welcher Wert kommt am häufigsten vor? (keinen gesonderten Befehl in R), Maximum der Daten liegt da

SD: Wurzel der Varianz
normal data:
* 2/3 of data are within 1 SD
* 95% of data are within 2 SD

coefficient of variation (CV): CV% = 100*sd(x)/mean(x)
–> used to compare different magnitudes

• standard error of the mean

SEM = sd(x)/sqrt(N)
how close are we to the true population mean
more measurements ñ closer

IQR: inter quartil range

Question 2

Q

How to report results in general

Answer

A

problem
test name
sample size
test statistic
P value
condence interval

Question 3

Q

Contigency tables

Answer

A

• tables with count items
• each count is a number of cases of a certain level or sharing a given combination of levels
• normally used on factors/categorical data
• but on continuous data can be used with cut, use a “good”break –> hint: use quantiles for cutting
frequency: number of times a category is counted summc nc= n
relative frequency: sample proportion for each possiböe category (summ p categories= 1) Pcategory1= n1/n

Question 4

Q

Tabulating Categorical Data, ftable

Answer

A

table function for tabulating one or two variables
ftable function for tabulating more than two variables
dim for exploring dimensions of a table
sum to count the number of all items

Question 5

Q

1D, 2D, 3D table

Answer

A

access of multidimensional tables as with matrices and dataframes using rectangular braces and n 1 commas. n is the number of dimensions.

Question 6

Q

Independence table

Answer

A

• number of observations if there would be no
dependencies between the variables
• Expected= (Rowtotal * Columntotal )/ Total

Question 7

Q

Pearson residuals

Answer

A

–> A normalized measure for the distance to the expected data.

Question 8

Q

prop.table

Answer

A

Express Table Entries As Fraction Of Marginal Table

Question 9

Q

Graphics 1D

Answer

A

pie, barplot, dotchart

Question 10

Q

Graphics 2D

Answer

A

assocplot, mosaicplot, fourfoldplot
• exploring the relationship between two variables
• mosaicplot –> absolute numbers visualized
• assocplot –> residuals shown (zeigt ob mehr oder weniger, wie groß die Abnahme)

Question 11

Q

generate probability functions

Answer

A

r: random number generator
p: probability function (cumulative probability function c.d.f)
d: density function (point probability)
q: quantile function (inverse c.d.f)

Question 12

Q

Poisson Distribution

Answer

A

binominal distribution has an upper limit
if we through 50 times, the maximum achievable value is 50
count numbers that are without theoretical limits (spatial or temporal) often follow a Poisson distribution
lower limit is zero, but no upper limit
parameter as the rate of occurence within a certain time or space
count cells in a grid, number of visits of doctors at a

Question 13

Q

Poisson Distribution

Answer

A

binominal distribution has an upper limit
if we through 50 times, the maximum achievable value is 50
count numbers that are without theoretical limits (spatial or temporal) often follow a Poisson distribution
lower limit is zero, but no upper limit
parameter as the rate of occurence within a certain time or space
count cells in a grid, number of visits of doctors at a patient ..

Question 14

Q

Chisq Distribtion

Answer

A

distribution for chi^2 of tables without dependecies for the variables
df=(nr of levels var1-1)*(number of levels var2-1)

Question 15

Q

Distributions numerival vs categorical data

Answer

A

• Numerical data
- Uniform
- Normal
- T
- (Wilcox)
• Categorical data
 - Bernoulli - 1 trial, upper limit
- Binominal - unlimited trials, upper limit
- Poisson - no upper limit
- Chisq - two variables

Question 16

Q

chisq.test

Answer

A

• same p-value as prop.test for 2x2 tables
• but no CI computed
• can be used for more than two levels (as for 3x2
tables, one of the two variables has 3 levels)
• compare the output chi^2 with tabulated values –> p.value

Question 17

Q

prop.test

Answer

A

test for differences between the groups (?)

Question 18

Q

fisher.test

Answer

A

permutations –> slower than prop.test

* required if one expected value in 2x2 table is <= 5

Question 19

Q

Odds/Odds ratios

Answer

A

Odds:
• event did occur / event did not occur
• ranges from 0 till Inf
• probability of 0.5 == odds of 1.0
• probability of 0.33 == odds of 0.5
• probability of 0.75 == odds of 3
Formula: odds= probability/(1-probability)

Odds ratios
• again from a 2x2 times contingency table
• odds1 / odds2 = odds ratio
• 0.19 (AZT) / 0.39 (Placebo) = 0.49
OR= O1/O2

Question 20

Q

ectsizes

Answer

A

Cohens w
Cohens h
Odds Ratio
Relative Risk
Numbers needed to treat (NNT)

Question 21

Q

Cohens W

Answer

A

Cohens w is the square root of the proportions basedchi^2 value:

w=√∑(n; i=1) (po,i-pe,i)^2/pe,i

It is useful also for larger contingency tables.
• po;i observed proportion in cell i
• pe;i expected proportion in cell i

Question 22

Q

α and Typ I/II errors

Answer

A

α is a decision threshold or signicance level
mainly used in science: α = 0.05
but this is completely arbitrary (!)
lowering α –> less false positives, more false negatives
increasing α –> less false negatives but more false positives
rejecting α with true H0 –> type I error
accepting α with false H0 –> type II error
α sets the probability of getting a type I error