R Year 2 Flashcards
t test
t.test(vector, mu = no, alternative = “g”)
mu
Average
Alternative =
l for lower
g for greater
Can a t-test be used?
Do a variance test
Variance test
var.test(height$female, height$male) compares the variances of two groups of data to see if a t test can be used
Is it normal
shapiro test
Shapiro test
Shapiro.test() compares data to a normal distribution
equation for chi squared
sum(((chi$observed-chi$expected)^2)/chi$expected)
multiple linear regression
model1 = lm(dv ~ iv1+iv2+iv3….etc, dataset)
Compares dv to each iv within the dataset.
Remove rows containing 0 values
na.omit(data)
Anova
anova(data)
Number within the range is not significant difference
AIC
AIC = step(model1, direction = “backward”)
checking by going forward
forward = step(naive, scope = dv ~ iv1+iv2+iv3…, direction = “forward”)
where naive is the simplest model = lm(iv~1, data)
Give all the data the same scale
scale(data)
Correlation PCA
PCA = princomp(data, cor=TRUE)
Covariance PCA
PCA = princomp(data)
Visualising PCA
screeplot(PCA)
Has elbow
Way to see first two PCs
biplot(PCA)
Influence of each component on original variables
loadings(PCA)
Triangular matrix of differences
matrix1 = dist(data)
Make clusters using matrix
clusters = hclust(matrix1)
can add in method = “single”, “complete”, etc default is euclidean
Create dendrogram
plot(clusters)
divide dendrogram into 4 clusters
rect.hclust(clusters, k=4, border=”red”)
Make triangular distance matrix of dendrogram
cophenetic = cophenetic(clusters)
Find cophenetic correlation
cor(matrix1, cophenetic)
Higher value given is better