R Flashcards
mode()
Identifies the type of variable in the brackets
i.e. is is a character or numeric
What is a vector?
A list of numbers or letters or character strings
c()
combines the string of data into a vector.
vectorname[n]
gives the nth value in the named vector
vectorname[n1 : n2]
Gives the n1th to n2th values in the vector, inclusive.
vectorname[-n]
Gives the whole of the named vector, without the nth value.
sum()
Gives sum of vector
mean()
Gives mean of vector
max()
Gives largest value of vector
median()
Gives median of vector
var()
Gives variance of vector
sd()
Gives standard deviation of the vector
name = function(x) actualfunction(variable)
Defines the name as a function which is another function applied to a variable
sqrt()
Square roots value
mad()
Gives the median absolute deviation (comparable to the standard deviation)
shapiro.test()
Applies the shapiro test to the data, comparing it to a normal distribution.
Lower p value than 0.05 lets us reject the null hypothesis.
IQR()
Gives the interquartile range of the data
summary()
Gives the minimum, maximum, median and mean as well as the 1st and 3rd quartiles.
barplot()
Gives a bar graph of a vector.
table()
Gives a frequency table of a vector.
length()
Gives the length of a vector.
labels = as.vector(c(list of the names for the bars)
barplot((data in graph), names.arg = labels, xlab = name of x axis, ylab = name of y axis)
Labels the bars in the graph after the items in the labels vector.
hist()
Gives histogram.
hist(GC, breaks = 50)
Allows us to chose how many bars in our graph.
hist(GC, breaks = 50, col=’green’, xlab=”GC content”, ylab = “absolute Frequency”, main = “main title”, cex.main=2)
Gives hitogram with given number of bars and labels.
cex.main is the size of the title text.
dataset = read.table(“filename.txt”, header = TRUE)
Save data in dataset.
h=F is a viable alternative.
attach(dataset)
Attaches all of the variables within the dataset.
stem()
Gives a stem and leaf plot of data
plot(a, b)
Gives a scatter graph comparing the two data sets.
plot(GC, reptime, main=”Title”, xlab=”GC content”, ylab=”Replication time”, pch=20, col=”red”)
Gives a scattergraph where pch controls the shape of the dots, 20 is circles
data1 <- read.csv(“plant_data.csv”, header = TRUE)
Reads data from csv files
boxplot(data1$height~data1$temp)
Gives a box and whisker plot.
boxplot(data1$height~data1$temp, xlab=expression(“Temperature, “^o* “C”), ylab= “Height, cm”, col = “lightseagreen”, notch = T, las = 1)
Adds a o to the x axis label. The notches means that there is a 95% confidence interval on the interval. The las = 1 rotates the numbers on the y axis notches so that they are vertical.
binom.test(no. successes, no.attempts)
Defines a binomal with probability of success, bounds of confidence and hypotheses.
binom.test(no. successes, no.attempts, p of Ho)
Refines binomial expression by adding the expected p, which provides a null hypothesis.
#
Gives comments
sample(c(“heads”, “tails”), 1)
Gives one of the values in the array
sample(c(“heads”, “tails”), 10, replace = TRUE)
Allows the “coin” to be flipped multiple times without using up the values in the array.
sum(flips==”heads”)
Sums values within flips that are the same as the provided string
head_count = function(k){
flips = sample(c(“heads”, “tails”), k, replace = TRUE)
sum(flips==”heads”)
}
defines function which takes a value k.
{}
Allows for code across multiple lines
heads = replicate(100, head_count(10))
Replicates the given function that number of times and collects the data.
chisq.test(c(55, 45))
Runs chi squared test when one result is achieved 55 times and the other is achieved 45 times
chisq.test(c(120, 480), p=c(1/6, 5/6))
Chi squared test where we provide probabilities for each outcome. As many outcomes can be listed as you like.
chi = chisq.test(cdata)
Saves the chi squared test of data as its own variable.
chi$expected
Gives the expected distribution of data.
chi$observed
Gives the actual dustribution of data
sum(((chi$observed - chi$expected)^2)/chi$expected)
Equation for chi squared.
variable = scan()
Can fill a variable by writing a value then each value followed by a return and two returns to end it.
t.test(iq, mu=100, alternative=”g”)
Does the t-test where mu is the average and g is an alternative hypothesis of the mean of the sample is greater than it should be and l would be lower.
var.test(height$female, height$male)
Compares the variances of two groups of data to see if a t-test can be used.
datafile = “http://personality-project.org/r/datasets/R.appendix1.text”
data.ex1 = read.table(datafile, header = TRUE)
Reads in data from online source, saves it to data.ex1
aov(Alertness~Dosage)
Runs ANOVA on the data called. The data before the ~ is the dependednt variable, and the one after is the independent.
anova1 = aov(Alertness~Dosage)
summary(anova1)
Creates a summary of anova data.
p value is given in Pr(>F) if below 0.05 we can reject there being no difference between the groups.
TukeyHSD(anova1)
Runs a Tukey test on anova’d data. Shows p values of comparisons of datasets
plot(TukeyHSD(anova1))
Plots graph of differences of means in data for the different groups compared.
cor.test(Relaxed, Hyperventilated)
Runs Pearson’s correlation, gives correlation coefficient from -1 to +1
cor.test(xaxis, yaxis, method=”spearm”)
Runs correlation tests as spearman’s rho.
cricketmodel = lm(freq~temp)
Get linear regression in terms of temp.
(cor(freq, temp))^2
Gives the multiple r-squared value - squared correlation coefficient.
abline(cricketmodel)
Adds line of best fit to plot.
count2 = na.omit(count)
Counts the number of N/As in the data.
plot(log(count2$Area), log(count2$Population))
Plots the logs of data
plot(log(count2$Area), log(count2$Population), xlab = expression (‘ln (Area, km^2)’), ylab = “ln(Population)”, col = “red”, las = 1)
plots scatter graph with labeled axis in red.
kruskal.test(list(leach, stimpson)
Runs Kruskal’s test on data.
p of less than 0.05 means we can reject null hypothesis that all samples are drawn from same population.
Can use more variables.
kruskal.test(allpay, bank)
Runs Kruskal’s test on data, can use comma or ~.
library(dunn.test)
downloads the function dunn.test from the library
dunn.test(allpay, bank, kw = TRUE, method = “boneferroni”)
compares allpay to bank using the bonferroni method.
kw = true meansn that kruskal-wallis is used as well.
help(p.adjust)
calls up the R notes associated with that function.
unstacked.reptime = unstack(dataset[,c(4, 1)])
Unstackes data from columns and adds it to a new variable.
wilcox.test(first, second, paired = TRUE, exact = FALSE)
Runs paired wilcoxon test.
exact = false used when there is a lot of data, so exact p value cannoot be calculated.
all = c(first, second)
Combines two vectors into one longer vector.
friedman.test(all.leaks, allsuits, pilots)
runs friedman rank sum test. Here finds whether there is a difference between the leakage of at least one suit compared to another.
friedman.test(all.leaks ~ allsuits | pilots)
Runs friedman test where:
pilots is the group
allsuits is the block
pairwise.wilcox.test(all.leaks, allsuits, p.adjust.method = “bonferroni”,
paired = TRUE, exact = TRUE)
Test to do after a significant friedman result.