R-Studio Flashcards
what can R be used for and and what is it very powerful in doing?
R is very powerful in analysing biological data efficiently, it can be used as a calculator, statistical modelling, to make graphs, for programming and much more
what type of language is R?
R is an object-oriented programming language i.e. one creates objects, gives them names - names that can be used for statistical tests or to make graphs/tables
we use a user friendly version: R-Studio
what do we do before we start processing data in R?
we must prepare our data in Excel before we export it to R
what are the four quadrants in the R-Studio:
console - where you write the code
graphical window - where the graphs show up
bottom right - variables, objects & formulas
how do you assign a value to a letter or word in order transform it into an object?
[name of object] <- [value to assign to object
what are the operators in R-Studio?
+ = addition
* = multiplication
/ = division
^ = raised to the power
sqrt = square root
how can we make R-Studio remember something?
through creating objects using the “<-“ method
why do we need to keep an eye on capitalisation?
because R is case-sensitive
how do we export our data from excel to R-Studio?
through creating a data frame using the following code:
beetle <- read.table(“location of excel file with desired data”, header = T, stringsAsFactors = T)
state what the following line of code does and dissect and explain each component of it:
beetle <- read.table(“C:/teaching/stats/lecture3/beetle_behaviour.txt” , header = T, stringsAsFactors = T)
this line of code steals all of your data from an excel sheet and places it neatly into R-Studio
beetle <- = simply the name of the data frame
read.table(…) = command to read in data frame with location of excel sheet
header = T (short for header = true): first row contains variable names
stringsAsFactors = T converts text in a data table to categorical variable (,factor’)
what do you always add at the end of your file name?
file name.txt
what is the purpose of the lines “header = T” & “stringsAsFactors = T”?
they confirm to the graph that the heading and columns are to be read correctly - T meaning ‘true’ where “header = T” confirms that the top row is the names of the variables and “stringsAsFactors = T” confirms that the factors (data points) lie beneath the headers in columns
what must you do once you have typed out your “read in data frame” line?
you must attach the data frame to the work space memory, i.e. make the data accessible through pressing enter after you’ve typed it to then, on the next line write: “attach(name of data frame)” which in this case would be: attach(beetle)
what do hashtags mean in R-Studio?
in R-Studio hashtags are used to make comments and will have no effect on the actual code
make the data accessible in R:
attach(name of data frame)
once you have made your data accessible to R through entering attach(name of data-frame) what must you enter and what will that give you?
once the data-frame is accessible, you type names(beetle) which will give you an output containing all of the variable names
once you have inputted your names(beetle) line to give you the name of all your variables, what must you now do and why?
immediately after names(beetle) you type head(beetle) which will automatically show the first rows of your data frame (the headings)
first three steps following the uploading of your excel file and why each step is important:
[beetle is the name of data-frame]
attach(beetle): this allows for your data to become accessible to the code
names(beetle): this gives the names of the variables in our data-frame “beetle”
head(beetle): this only shows the first rows of our data frame
how do you calculate the probability of a normal distribution in R?
using the command “pnorm”
what does R not like when it comes to excel sheets of data?
1) it does not like empty cells, therefore you must delete them of type n/a
2) it does not like headings with spaces, therefore if your header has two words - separate them with an underscore
get excel sheet ready for R:
(1) make sure that all measures/observations of one variable are in one column
(2) remove all spaces within your table and replace those by ‘_’
(3) try and give short and distinct variable names (as we may need to type them later)
(4) replace all empty cells, i.e. missing values, by ‘NA’
how can you test for normal distributions in R using the shapiro-wilk test?
shapiro.test(variable name)
where you will be given a W & P value where W is your test statistic and P is your probability value
if your p value is larger than 0.05 then you can conclude that your data is of a normal distribution
how can you get your critical value that is needed for comparison following the calculation of your chi-squared value in R-Studio?
qchisq(certainty [0.95], Degrees of Freedom)
> critical value
how do you calculate chi-squared in R-Studio?
check through command: count
[#1. we add the data columnwise into a matrix, called ‘count’]
embryo <- matrix(c(38,14,11,51), nrow=2)
[#2. our chi-squared test]
chisq.test(embryo, correct = F)
answer - including: p-values, degrees of freedom (DF) etc
nrow = specified the number of rows our matrix should have
c = connect
to understand and check the structure of your data-frame:
str(name of data-frame)
how can you create a subset in R?
say we are only interested in the first and second column of our data frame, we write:
twocols<-genome[,1:2]
how many columns we want<-data-frame name[, 1st column:last column]
if we are only interested in the first three rows:
threerows <-genome[1:3,] threerows
mean in R:
mean(variable name)
median in R:
median(variable name)
calculating variance in R:
tapply(variable name 1, variable name 2)
[can just do one variable if you wanted to though
standard deviation in R:
sd(variable name)
how can you use the shapiro-wilk test to see if your data is parametric (normally distributed) or non-parametric (non-normally-distributed)?
input: shapiro.test(variable)
this will give you a p-value, if your p-value is MORE than 0.05 this means that your data is normally distributed and therefore you accept the null hypothesis that the data is normally distributed
calculating standard error in R:
SE<-function(x)sqrt(var(x)/length(x))
where “x” are your to-be plugged in values
then we use our new object “SE” as a command where: SE(variable name)
what should you do if the following error message shows up: “Error in plot.new() : figure margins too large”?
move your mouse over the borders of the ‘Plots’ pane, i.e. the graphical window, until your cursor turns into a symbol consisting of four arrows; if you now press the mouse button, you can make the window larger; you can then re-send the above command and the graph should show)
how do we change the names of graphical axis in R-Studio?
after your plot(y variable~x variable) you add a comma before closing the brackets and write:
las =1, ylab = “name of new y axis”, xlab = “name of new x axis”)
[las = 1] means:
command for axis labelling, means that the axes labels are always horizontally written
how do we change the names of our graph axis?
[plot(command, las = 1, ylab = “new y name”, xlab = “new x name”]
if we have two levels factors on our x-axis (e.g. two box-and-whisker plots), how can we rename them?
to rename these pars of the graph you must (after your xlab = “new name” command for a new axis label) write names=c(“first new name” ”second new name ”)
if the data table is very small, how can you plot the data into R manually?
temperature <- c(data-point-1, data-point-2, etc) activity<- c(data-point-1, data-point-2, etc)
how can you change the range of the y-axis in R?
ylim=c(0,6),
in the above instance, this would make your data on the y axis now range from 0 → 6
how can you change the size of your axis labels and values in R, where the default value is 1:
changing size of axis labels - cex.lab = 1.5, (0.5 bigger)
changing size of axis values - cex.axis = 1.5, (0.5 bigger)
how can you change the appearance of your data-point symbols?
fill in the circle = …,pch=19)
colour the circle = …, col=“red” [if you want red]
how must you separate every single command when writing out a line of code in R?
you must separate each one using a comma
if you are done and happy with your graph, how can you export it to another programme?
(1) go to the tab “plots”
(2) export
(3) copy to clipboard
[you can also save your plot]
how can we clean up after ourselves by removing all the objects we have created?
we can do this through pressing the little broomstick in the top right corner of the environment pane
full and complete command needed to have a linear regression in R:
(1) data<-read.csv(“excel_sheet1.csv”, header = T, stringsAsFactor = T)
(2) attach(data)
(3) names(data)
(4) m1<-lm(y variable~x variable)
#”m1” is simply your model name
(5) summary.lm(m1)
(6) summary.aov(m1)
(7) plot(m1)
complete linear regression R command:
(1) read.csv
(2) attach(file name)
(3) names(file name)
(4) m1<-lm(growth~tannin) #m1 is the name of your regression
(5) summary.lm(m1)
(6) summary.aov(m1)
(7) plot(m1)
(8) plot(y-variable~x-variable, pch=19, las=1)
(9) abline(lm(y-variable~x-variable)
what singular command gives you the information to see the two mini graphs in linear regression that allow you to confirm parametric distributions?
plot(m1) #must be following certain previous commands
[gives one graph that must be “sky at night” and another graph which must have the data-symbols on or v.close to the line]
how can you command R to give the pearsons correlation?
after attaching your data-frame and variable names(x):
cor.test(variable_1, variable_2, method = “pearson”)
note: doesn’t matter what way around your variables are - answer will be the same either way
what do we receive from a pearsons cor.test command and how do you infer it?
you will get a p-value and a test statistic found underneath “cor” at the bottom of the output which is our correlation coefficient
(1) if the p value is smaller than <0.05 then we can assume that the two variables are correlated
(2) if the cor value if positive it means there is a positive correlation, if the cor value is negative it means there is a negative correlation
how can you use R to calculate your spearman’s rank values?
we, once again use:
cor.test(variable one, variable two, method = “spearman”)
χ²-tests enable us to judge whether:
the observed frequencies differ from the frequencies expected if the two variables were not associated
what is the first thing you must do before conducting a chi-squared test in R?
you must first create a matrix for your data via command:
> matrix-name<- matrix(c(20,40,30,30,42,18), nrow = 2